# Simulation with the Shyft api

## Introduction
At its core, Shyft provides functionality through an API (Application Programming Interface). All the functionality of Shyft is available through this API.

We begin the tutorials by introducing the API as it provides the building blocks for the framework. Once you have a good understan

In [Part I](run_nea_nidelva.ipynb) of the simulation tutorials, we covered conducting a very simple simulation of an example catchment using configuration files. This is a typical use case, but assumes that you have a model well configured and ready for simulation. In practice, one is interested in working with the model, testing different configurations, and evaluating different data sources.

This is in fact a key idea of Shyft -- to make it simple to evaluate the impact of the selection of model routine on the performance of the simulation. In this notebook we walk through a lower level paradigm of working with the toolbox and using the Shyft API directly to conduct the simulations.

**This notebook is guiding through the simulation process of a catchment. The following steps are described:**
1. **Loading required python modules and setting path to SHyFT installation**
2. **Running of a Shyft simulation**
3. **Running a Shyft simulation with updated parameters**
4. **Activating the simulation only for selected catchments**
5. **Setting up different input datasets**
6. **Changing state collection settings**
7. **Post processing and extracting results**

## 1. Loading required python modules and setting path to SHyFT installation

Shyft requires a number of different modules to be loaded as part of the package. Below, we describe the required steps for loading the modules, and note that some steps are only required for the use of the jupyter notebook.

In [1]:
# Pure python modules and jupyter notebook functionality
# first you should import the third-party python modules which you'll use later on
# the first line enables that figures are shown inline, directly in the notebook
%matplotlib inline
import os
import datetime as dt
import pandas as pd
from os import path
import sys
from matplotlib import pyplot as plt
from netCDF4 import Dataset

### The Shyft Environment

This next step is highly specific on how and where you have installed Shyft. If you have followed the guidelines at github, and cloned the three shyft repositories: i) shyft, ii) shyft-data, and iii) shyft-doc, then you may need to tell jupyter notebooks where to find shyft. Uncomment the relevant lines below.

If you have a 'system' shyft, or used `conda install -s sigbjorn shyft` to install shyft, then you probably will want to make sure you have set the SHYFTDATA directory correctly, as otherwise, Shyft will assume the above structure and fail. __This has to be done _before_ `import shyft`__. In that case, uncomment the relevant lines below.

**note**: it is most likely that you'll need to do one or the other.

In [2]:
# now we create the shyft specific environment
# set the path for your shyft build
# this should point to the directory that is created
# when you clone shyft, assuming you have built shyft
# there and not installed it to your system python
# if you followed the recommendations in the README, then
# you will have cloned three git repos in a parallel structure
# and can point to the shyft repository:
# Note: you could achieve the same by setting a PYTHONPATH

# sys.path.insert(0,os.environ['SHYFT_DEPENDENCIES_DIR'])
# shyft_path = os.path.abspath("../../../shyft")
# sys.path.insert(0, shyft_path)

# If you have set up a system shyft installation, or it has
# been set up for you somewhere, then you need to tell these
# notebooks where to find the data. This is relevant with respect
# to how the .yaml configuration files are set up. Set this to
# point to the shyft-data directory on your machine.
if not os.environ['SHYFTDATA']:
    os.environ['SHYFTDATA'] = os.path.join(os.environ['HOME'],'workspace/shyft_workspace/shyft-data')
    
print(os.environ['SHYFTDATA'])

D:\statkraft_data


In [3]:
sys.path.insert(0, r'D:\users\jfb\built_shyft\shyft-4.4.1462-py3.6.egg')
from shyft import api
import shyft

print(shyft.__path__)

['D:\\users\\jfb\\built_shyft\\shyft-4.4.1462-py3.6.egg\\shyft']


In [32]:
reg_repo = CFRegionModelRepository(SimDict, ModelDict)
rm = reg_repo.get_region_model('demo')
reg_repo


<shyft.repository.netcdf.cf_region_model_repository.CFRegionModelRepository at 0x4c587d4668>

## 2. A Shyft simulation

The purpose of this notebook is to demonstrate the functionality of the Shyft API. This is a **low level** approach. If you undertstand what is presented herein, you'll be well on your way to working with Shyft.

If you prefer to take a **high level** approach, you can start by looking at the [Run Nea Nidelva](simulation-yaml.ipynb) notebook. We recommend taking the time to understand the API, however, as it will be of value later if you want to use your own data and create your own repositories.

### Orchestration and Repositories
A core philosophy of Shyft is that "Data should live at the source". What this means, is that we prefer datasets to either remain in their original format or even come directly from the data provider. To accomplish this, we use "repositories". You can read more about repositories at the [Shyft Documentation](https://shyft.readthedocs.io/en/latest/orchestration.html).

#### Interfaces
Because it is our hope that users will create their own repositories to meet the specifications of their own datasets, we provide 'interfaces'. This is a programming concept that you may not be familiar with. The idea is that it is a basic example, or template, of how the class should work. You can use these and your own class can inherit from them, allowing you to override methods to meet your own specifications. We'll explore this as we move through this tutorial. A nice [explanation of interfaces with python is available here](http://masnun.rocks/2017/04/15/interfaces-in-python-protocols-and-abcs/).

### Simulation Configuration
What is required to set up a simulation? In the following we'll package some basic information into a dictionary that may be used to configure our simualtion. We'll start by creating a couple of dictionaries. These dictionaries will be used to instantiate an existing repository class that was created for demonstration purposes, `CFRegionModelRepository`.

In [4]:
# we need to import the repository to use it in a dictionary:
from shyft.repository.netcdf.cf_region_model_repository import CFRegionModelRepository

#### region and time specificatin

The first dictionary essentially establishes the domain of the simulation and the timing. We also specify a repository that is used to read the data that will provide Shyft a `region_model` (discussed below), based on geographic data.

In [None]:
# next, create the simulation dictionary
SimDict = {'start_datetime': "2013-09-01T00:00:00",
          'run_time_step': 86400, # seconds, daily
          'number_of_steps': 365, # one year
          'region_model_id': 'demo', #a unique name identifier of the simulation
          'domain': {'EPSG': 32633,
                        'nx': 400,
                        'ny': 80,
                        'step_x': 1000,
                        'step_y': 1000,
                        'lower_left_x': 100000,
                        'lower_left_y': 6960000},
          'repository': {'class': shyft.repository.netcdf.cf_region_model_repository.CFRegionModelRepository,
                             'params': {'data_file': 'shyft-data/netcdf/orchestration-testdata/cell_data.nc'}},

          }

In [5]:
# next, create the simulation dictionary
RegionDict = {'region_model_id': 'demo', #a unique name identifier of the simulation
              'domain': {'EPSG': 32633,
                        'nx': 400,
                        'ny': 80,
                        'step_x': 1000,
                        'step_y': 1000,
                        'lower_left_x': 100000,
                        'lower_left_y': 6960000},
              'repository': {'class': shyft.repository.netcdf.cf_region_model_repository.CFRegionModelRepository,
                             'params': {'data_file': 'shyft-data/netcdf/orchestration-testdata/cell_data.nc'}},
          }

The first keys, are probably quite clear:

* `start_datetime`: a string in the format: "2013-09-01T00:00:00"
* `run_time_step`: an integer representing the time step of the simulation (in seconds), so for a daily step: 86400
* `number_of_steps`: an integer for how long the simulatoin should run: 365 (for a year long simulation)
* `region_model_id`: a string to name the simulation: 'neanidelva-ptgsk'

We also need to know *where* the simulation is taking place. This information is contained in the `domain`:

* `EPSG`: an EPSG string to identify the coordinate system
* `nx`: number of 'cells' in the x direction
* `ny`: number of 'cells' in the y direction
* `step_x`: size of cell in x direction (m)
* `step_y`: size of cell in y direction (m)
* `lower_left_x`: where (x) in the EPSG system the cells begin
* `lower_left_y`: where (y) in the EPSG system the cells begin
* `repository`: a repository that can read the file containing data for the cells (in this case it will read a netcdf file)

#### Model specification

The next dictionary provides information about the model that we would like to use in Shyft, or the 'Model Stack' as it is generally referred to. In this case, we are going to use the PTGSK model, and the rest of the dictionary provides the parameter values.

In [6]:
ModelDict = {'model_t': shyft.api.pt_gs_k.PTGSKModel,  # model to construct
            'model_parameters': {
                'actual_evapotranspiration':{
                    'ae_scale_factor': 1.5},
                'gamma_snow':{
                    'calculate_iso_pot_energy': False,
                    'fast_albedo_decay_rate': 6.752787747748934,
                    'glacier_albedo': 0.4,
                    'initial_bare_ground_fraction': 0.04,
                    'max_albedo': 0.9,
                    'max_water': 0.1,
                    'min_albedo': 0.6,
                    'slow_albedo_decay_rate': 37.17325702015658,
                    'snow_cv': 0.4,
                    'tx': -0.5752881492890207,
                    'snowfall_reset_depth': 5.0,
                    'surface_magnitude': 30.0,
                    'wind_const': 1.0,
                    'wind_scale': 1.8959672005350063,
                    'winter_end_day_of_year': 100},
                'kirchner':{ 
                    'c1': -3.336197322290274,
                    'c2': 0.33433661533385695,
                    'c3': -0.12503959620315988},
                'precipitation_correction': {
                    'scale_factor': 1.0},
                'priestley_taylor':{'albedo': 0.2,
                    'alpha': 1.26},
                    }
            }               

In this dictionary we define two variables:

* `model_t`: the import path to a shyft 'model stack' class
* `model_parameters`: a dictionary containing specific parameter values for a particular model class

Specifics of the `model_parameters` dictionary will vary based on which class is used.

Okay, so far we have two dictionaries. One which provides information regarding our simulation domain, and a second which provides information on the model that we wish to run over the domain (e.g. in each of the cells). The next step, then, is to map these together and create a `region_repo` class.

This is achieved by using a repository, in this case, the `CFRegionModelRepository` we imported above.

In [8]:
region_repo = CFRegionModelRepository(RegionDict, ModelDict)

### The `region_model`

<div class="alert alert-info">

**TODO:** a notebook documenting the CFRegionModelRepository

</div>

The first step in conducting a hydrologic simulation is to define the **domain of the simulation** and the **model type** which we would like to simulate. To do this we create a `region_model` object. Above we created dictionaries that can provide this information. In this next step, we put it together so that we have a single object which we can work with "at our fingertips". You'll note above that we have pointed to a 'data_file' earlier when we defined the `SimDict`. This data file contains all the required elements to fill the cells of our domain. The informaiton is contained in a single [netcdf file](../../../shyft-data/netcdf/orchestration-testdata/cell_data.nc)

Before we go further, let's look briefly at the contents of this file:

In [9]:
cell_data_file = os.path.join(os.environ['SHYFTDATA'], 'shyft-data/netcdf/orchestration-testdata/cell_data.nc')
cell_data = Dataset(cell_data_file)
print(cell_data)

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): cell(4650)
    variables(dimensions): float64 [4mx[0m(cell), float64 [4my[0m(cell), float64 [4mz[0m(cell), int32 [4mcrs[0m(), float64 [4marea[0m(cell), float64 [4mforest-fraction[0m(cell), float64 [4mreservoir-fraction[0m(cell), float64 [4mlake-fraction[0m(cell), float64 [4mglacier-fraction[0m(cell), int32 [4mcatchment_id[0m(cell)
    groups: 



You might be surprised to see the dimensions are 'cells', but recall that in Shyft everything is vectorized. Each 'cell' is an element within a domain, and each cell has associated variables:

* location: x, y, z
* characteristics: forest-fraction, reservoir-fraction, lake-fraction, glacier-fraction, catchment-id

We'll bring this data into our workspace via the `region_model`. Note that we have instantiated a `region_repo` class using one of the existing Shyft repositories, in this case one that was built for reading in the data as it is contained in the example [shyft-data](https://github.com/statkraft/shyft-data) netcdf files: `CFRegionModelRepository`.

Next, we'll use the `region_repo.get_region_model` method to get the `region_model`. Note the name 'demo', in this case is arbitrary. However, depending on how you create your repository, you can specify what region model to return using this string.
<div class="alert alert-info">


**note:** *you are strongly encouraged to learn how to create repositories. This particular repository is just for demonstration purposes. In practice, one may use a repository that connects directly to a GIS service, a database, or some other data sets that contain the data required for simulations.*

<div class="alert alert-warning">

**warning**: *also, please note that below we call the 'get_region_model' method as we instantiate the class. This behavior may change in the future.*

</div>
</div>

In [10]:
region_model = region_repo.get_region_model('demo')
region_model.time_axis.start

period = region_model.time_axis.total_period()

In [11]:
print(period)

[-oo,-oo>


#### Exploring the `region_model`

So we now have created a `region_model`, but what is it actually? This is a very **fundamental class** in Shyft. Essentially, the `region_model` contains all the information regarding the simulation type and domain. There are many methods associated with the `region_model` and it will take time to understand all of them. For now, let's just explore a few key methods:

* `bounding_region`: provides information regarding the domain of interest for the simulation
* `catchment_id_map`: indices of the various catchments within the domain
* `cells`: an instance of `PTGSKCellAllVector` that holds the individual cells for the simulation (*note that this is type-specific to the model type*)
* `ncore`: an integer that sets the numbers of cores to use during simulation (Shyft is very greedy if you let it!)
* `time_axis`: a `shyft.api.TimeAxisFixedDeltaT` class (basically contains information regarding the timing of the simulation)

Keep in mind that many of these methods are more 'C++'-like than 'Pythonic'. This means, that in some cases, you'll have to 'call' the method. For example: `region_model.bounding_region.epsg()` returns a string. You can use tab-completion to explore the `region_model` further:

In [39]:
region_model.bounding_region.epsg()

'32633'

You'll likely note that there are a number of intriguing fucntions, e.g. `initialize_cell_environment` or `interpolate`. But before we can go further, we need some more information. Perhaps you are wondering about forcing data. So far, we haven't said anything about **model input**, we've only set up a container that holds all the information about our simulation. Still, we have made *some* progress. Let's look for instance at the cells:

In [40]:
cell_0 = region_model.cells[0]
print(cell_0.geo)

GeoCellData(mid_point=GeoPoint(204843.7371537306,6994695.209048475,978.344970703125),catchment_id=2305,area=209211.92418108872,ltf=LandTypeFractions(glacier=0.0,lake=0.0,reservoir=0.0,forest=0.0,unspecified=1.0))


So you can see that so far, each of the cells in the region_model contain information regarding their LandTypeFractions, geolocation, catchment_id, and area. 

There is a particulary important attribute of the cells: `env_ts`. This is a container for each cell that holds the "environmental timeseries", or forcing data, for the simulation. The container is there, and it is customized to provide timeseries as required by the model type we selected, in this case: `.PTGSKModel` (see the `ModelDict`. So for every cell in your simulation, there is a container prepared to accept the forcing data as the next cell shows. So now we'll continue on and populate these containers with forcing data for our simulation.

In [41]:
print([d for d  in dir(cell_0.env_ts) if  '_'  not in d[0]]) #just so we don't see 'private' attributes

['init', 'precipitation', 'radiation', 'rel_hum', 'temperature', 'wind_speed']


### Adding forcing data to the `region_model`

Clearly the next step is to add forcing data to our `region_model` object. This is where some of the 'magic' of Shyft starts to shine. We haven't yet focused on any particularly special functionality of Shyft. But hopefully you'll begin to see some of the power behind these configuration classes as we add data. Let's start by thinking about what kind of data we need. From above, it's clear that this particular model stack, `PTGSKModel`, requires:

* precipitation
* radiation
* relative humidity (rel_hum)
* temperature
* wind speed

We have stored this information each in seperate netcdf files. Again, these files **do not represent the recommended practice**, but are *only for demonstration purposes*. The idea here is just to demonstrate with an example repository, but *you should create your own to match **your** data*.

#### A word on units
Units in Shyft follow ........... [TODO]

#### "Sources"

We use the term *sources* to define a location data may be coming from. You may also come across *destinations*. In both cases, it just means a file, database, service of some kind, etc. that is capable of providing data. Repositories are written to connect to *sources*. Following our earlier approach, we'll create another dictionary to define our data sources, but first we need to import another repository:

In [42]:
from shyft.repository.netcdf.cf_geo_ts_repository import CFDataRepository


In [43]:
from shyft.repository.netcdf.cf_geo_ts_repository import CFDataRepository
Datasets = {'sources': [
        
    {'repository': shyft.repository.netcdf.cf_geo_ts_repository.CFDataRepository,
     'params': {'epsg': 32633,
            'selection_criteria': None,
            'stations_met': 'shyft-data/netcdf/orchestration-testdata/precipitation.nc'},
     'types': ['precipitation']},
       
    {'repository': shyft.repository.netcdf.cf_geo_ts_repository.CFDataRepository,
     'params': {'epsg': 32633,
            'selection_criteria': None,
            'stations_met': 'shyft-data/netcdf/orchestration-testdata/temperature.nc'},
    'types': ['temperature']},
        
    {'params': {'epsg': 32633,
            'selection_criteria': None,
            'stations_met': 'shyft-data/netcdf/orchestration-testdata/wind_speed.nc'},
     'repository': shyft.repository.netcdf.cf_geo_ts_repository.CFDataRepository,
     'types': ['wind_speed']},
    
    {'repository': shyft.repository.netcdf.cf_geo_ts_repository.CFDataRepository,
     'params': {'epsg': 32633,
            'selection_criteria': None,
            'stations_met': 'shyft-data/netcdf/orchestration-testdata/relative_humidity.nc'},
     'types': ['relative_humidity']},
    
    {'repository': shyft.repository.netcdf.cf_geo_ts_repository.CFDataRepository,
     'params': {'epsg': 32633,
            'selection_criteria': None,
            'stations_met': 'shyft-data/netcdf/orchestration-testdata/radiation.nc'},
     'types': ['radiation']}]
      }


#### Data Repositories

In another notebook, further information will be provided regarding the repositories. For the time being, let's look at this configuration dictionary that was created. It essentially just contains a list, keyed by the name `"sources"`. This key is known in some of the tools that are built in the Shyft orchestration, so it is recommended to use it.

Each item in the list is a dictionary for each of the source types, the keys in the dictionaries are: `repository`, `params`, and `types`. The general idea and concept is that in orchestration, the object keyed by `repository` is a class that is instantiated by passing the objects contained in `params`.

Let's repeat that. From our `Datasets` dictionary, we get a list of `"sources"`. Each of these sources contains a class (a repository) that is capable of getting the source data into Shyft. Whatever parameters that are required for the class to work, will be included in the `"sources"` dictionary. In our case, the `params` are quite simple, just a path to a netcdf file. But suppose our repository required credentials or other information for a database? This information could also be included in the `params` stanza of the dictionary.

You should explore the above referenced netcdf files that are available at the [shyft-data](https://github.com/statkraft/shyft-data) git repository. These files contain the forcing data that will be used in the example simulation. Each one contains observational data from some stations in our catchment. Depending on how you write your repository, this data may be provided to Shyft in many different formats.

Let's explore this concept further by getting the 'temperature' data:

In [53]:
# get the temperature sources:
tmp_sources = [source for source in Datasets['sources'] if 'temperature' in source['types']]

# in this example there is only one
t0 = tmp_sources[0]

# We will now instantiate the repository with the parameters that are provided
# in the dictionary. 
# Note the 'call' structure expects params to contain keyword arguments, and these
# can be anything you want depending on how you create your repository
tmp_repo = t0['repository'](**t0['params'])


`tmp_repo` is now an instance of the Shyft `CFDataRepository`, and this will provide Shyft with the data when it sets up a simulation.

Now that we have set up our `region_model` and we have a set of repositories available for reading our data, we can start to set up a simulation.

Note that our `tmp_repo` has a method called `get_timeseries`. This method requires a list of the time series types to return :

In [54]:
bbox = region_model.bounding_region.bounding_box(region_model.bounding_region.epsg())
print(bbox)
period = region_model.time_axis.total_period()
_geo_ts_names = ("temperature", "wind_speed", "precipitation",
                              "relative_humidity", "radiation")
# help(tmp_repo.get_timeseries)
source = tmp_repo.get_timeseries(_geo_ts_names, period, geo_location_criteria=bbox)


(array([ 100000.,  500000.,  500000.,  100000.]), array([ 6960000.,  6960000.,  7040000.,  7040000.]))


IndexError: index -1 is out of bounds for axis 0 with size 0

In [23]:
# we'll actually create a collection of repositories, as we have different input types.
from shyft.repository.geo_ts_repository_collection import GeoTsRepositoryCollection

def construct_geots_repo(datasets_config, epsg=None):
    """ iterates over the different sources that are provided 
    and prepares the repository to read the data for each type"""
    geo_ts_repos = []
    src_types_to_extract = []
    for source in datasets_config['sources']:
        if epsg is not None:
            source['params'].update({'epsg': epsg})
        geo_ts_repos.append(source['repository'](**source['params']))
        src_types_to_extract.append(source['types'])
    
    return GeoTsRepositoryCollection(geo_ts_repos, src_types_per_repo=src_types_to_extract)

# instantiate the repository
geots_repo = construct_geots_repo(Datasets)

In [None]:
#         if time_axis is None:
#             time_axis = self.time_axis
#         else:
#             self.region_model.initialize_cell_environment(time_axis)
#         self.region_model.initial_state = self.get_initial_state_from_repo() if state is None else state
#         bbox = self.region_model.bounding_region.bounding_box(self.epsg)
#         period = time_axis.total_period()
#         sources = self.geo_ts_repository.get_timeseries(self._geo_ts_names, period,
#                                                         geo_location_criteria=bbox)
#         self.region_model.region_env = self._get_region_environment(sources)
#         self.region_model.interpolation_parameter = self.ip_repos.get_parameters(self.interpolation_id)
#         self.simulate()
# def _get_region_environment(self, sources):
#     region_env = api.ARegionEnvironment()
#     region_env.temperature = sources["temperature"]
#     region_env.precipitation = sources["precipitation"]
#     region_env.radiation = sources["radiation"]
#     region_env.wind_speed = sources["wind_speed"]
#     region_env.rel_hum = sources["relative_humidity"]
#     return region_env

The following shows how to set up a Shyft simulation using the `yaml_configs.YAMLSimConfig` class. Note that this is a **high level** approach, providing a working example for a simple simulation. More advanced users will want to eventually make use of direct API calls, as outlined in [Part II](advanced_simulation.ipynb).

At this point, you may want to have a look to the [configuration file](./nea-config/neanidelva_simulation.yaml) used in this example.

```
---
neanidelva:
  region_config_file: neanidelva_region.yaml
  model_config_file: neanidelva_model_calibrated.yaml
  datasets_config_file: neanidelva_datasets.yaml
  interpolation_config_file: neanidelva_interpolation.yaml
  start_datetime: 2013-09-01T00:00:00
  run_time_step: 86400  # 1 hour time step
  number_of_steps: 365  # 1 year
  region_model_id: 'neanidelva-ptgsk'
  #interpolation_id: 2   # this is optional (default 0)
  initial_state:
    repository:
      class: !!python/name:shyft.repository.generated_state_repository.GeneratedStateRepository
      params:
        model: !!python/name:shyft.api.pt_gs_k.PTGSKModel
    tags: []
...

```

The file is structured as follows:

`neanidelva` is the name of the simulation. Your configuration file may contain multiple "stanzas" or blocks of simulation configurations. You'll see below that we use the name to instantiate a configuration object.

`region_config_file` points to another yaml file that contains basic information about the region of the simulation. You can [explore that file here](./nea-config/neanidelva_region.yaml)

`model_config_file` contains the model parameters. Note that when you are calibrating the model, [this is the file](./nea-config/neanidelva_model_calibrated.yaml) that you would put your optimized parameters into once you have completed a calibrations.

`datasets_config_file` contains details regarding the input datasets and the [repositories](../../repositories.rst) they are contained in. You can see [this file here](./nea-config/neanidelva_datasets.yaml)

`interpolation_config_file` provides details regarding how the observational data in your catchment or region will be interpolated to the domain of the simulation. If you are using a repository with distributed data, the interpolation is still used. [See this file](./nea-config/neanidelva_interpolation.yaml) for more details.

The following:

```
  start_datetime: 2013-09-01T00:00:00
  run_time_step: 86400  # 1 hour time step
  number_of_steps: 365  # 1 year
  region_model_id: 'neanidelva-ptgsk'
```

are considered self-explantory. Note that `region_model_id` is simply a string name, but it should be **unique**. We will explain the details regarding `initial_state` later on in this tutorial.

In [None]:
# Shyft imports
from shyft import api


### Exploring the Shyft API
In the following section we'll explore several components of the `shyft.api`. We consider the strength of Shyft to lie within this Application Programming Interface, or API. To the uninitiated, it adds quite a degree of complexity. However, once you understand the different components and paradigms of Shyft, you'll see the flexibility the API offers provides a great number of possibilities for exploring hydrologic simulations.

To start, as in the first tutorial, we'll start with a `YAMLSimConfig` object and create a `ConfigSimulator`. This will get us going quickly and provide a `region_model` as a starting point. But in this tutorial, the point is to work more closely with different components of the `Shyft.api`. In particular, we hope to demonstrate how the underlying `region_model`, rather than a `simulator`, is developed. The latter is just a wrapper that is made for convenience when running models operationally. If one is interested in working with Shyft to explore algorithm performance, it is easier to work directly with the `region_model` class.

The API approach will take a bit more code to get started, but will allow great flexibility later on. The first thing we'll do is to expose several of the attributes (which are mostly Shyft API classes) of the `simulator` object. Let's begin by getting those in our namespace:

In [None]:
# expose attributes of the simulator
region_model = simulator.region_model
region_model_id = simulator.region_model_id
interpolation_id = simulator.interpolation_id
 

In the [first tutorial](run_nea_nidelva.ipynb#The-simulator-and-the-region_model) we discussed the `region_model`. If you are unfamiliar with this class, we recommend reviewing the [description](region_model.rst). 

#### The concept of Shyft repositories

In Shyft, we consider that input data is a "source". Our source data resides in some kind of data serialization... be it a text file, netcdf file, or database... One could have any kind of storage format for the source data. [Repositories](repositories.rst) are Python based interfaces to data. Several have been created within Shyft already, but users are encouraged to create their own. A guiding paradigm to Shyft is that data should live as close to the source as possible (ideally, at the source). The repositories connect to the data source and make the data available to Shyft.

In [None]:
# expose the repositories
region_model_repo = simulator.region_model_repository
interpolation_param_repo = simulator.ip_repos
geo_ts_repo = simulator.geo_ts_repository
initial_state_repo = simulator.initial_state_repo

Now we have exposed the **repositories** that we connected to our `region_model` during configuration. Having access to the repositories, means that we have access to the input data sources directly (found in `geo_ts_repository`). We also have several other repositories, including a repository for the interpolation parameters, initial state, and the region_model. We'll explore some of these a bit deeper now. But first we'll expose a few more pieces of information from the region_model while we're at it.

In [None]:
epsg = region_model.bounding_region.epsg()
bbox = region_model.bounding_region.bounding_box(epsg)
period = region_model.time_axis.total_period()
geo_ts_names = ("temperature", "wind_speed", "precipitation", "relative_humidity", "radiation")

sources = shyft.repository.netcdf.cf_geo_ts_repository.geo_ts_repo.get_timeseries(geo_ts_names, period, geo_location_criteria=bbox)


The `epsg` is simply the domain projection information for our simulation. `bbox` provides the bounding box coordinates. `period` gives the total period of the simulation. Lastly, we create a tuple of the 'geolocated timeseries names' or `geo_ts_names` as it is referred to here. And use this to get the *sources* out of our repository. Note that these names:

    temperature
    wind_speed
    precipitation
    relative_humidity
    radiation
    
Are embedded into Shyft as timeseries names that are required for simulations. In the current implementations, these are the default names used in repositories and, at present, the only forcing data required. If one were to develop new algorithms that reqiured other forcings, you would need to define these in a custom repository. See `interfaces.py` for more details.

Before going further, let's look at what we have so far...

We won't look in detail at all the repositories, but let's take a look at the `geo_ts_repo`:

In [None]:
# explore geo_ts_repo
#help(geo_ts_repo)

The `geo_ts_repo` is a collection of geolocated timeseries repositories. Note that `geo_ts_repo` has an attribute: `.geo_ts_repositories`... this seems redundant? This is simply a list of the repositories this class is 'managing'.

Maybe we want to look at the precipitation input series in more detail. We can get at those via this class. NOTE, this may not be the most typical way to look at your input data (presumably you may have already done this before the simulation working with the raw netcdf files), but in case you wish to see the datasets from the "model" perspective, this is how you gain access. Also, maybe you want to conduct a simulation, then make a data correction. You could do that by accessing the values here. Each of the aforementioned series types have a specialized source vector type in Shyft. In the case of precipitation it is a `api.PrecipitationSourceVector`. If we dig into this, we'll find some aspects familiar from the first tutorial.

Let's get the precipitation timeseries out of the repository for the period of the simulation first:

In [None]:
# above we already created a `sources` dictionary by
# using the `get_timeseries` method. This method takes a 
# list of the timeseries names as input and a period 
# of type 'shyft.api._api.UtcPeriod'
# it returns a dictionary, keyed by the names of the timeseries
prec = sources['precipitation']

# `prec` is now a `api.PrecipitationSourceVector` and if you look
# you'll see it 10 elements:
print(len(prec))


We can explore further and see each element is in itself an `api.PrecipitationSource`, which has a timeseries (ts). Recall from the [first tutorial](run_nea_nidelva.ipynb#Visualizing-the-discharge-for-each-[sub-]catchment) that we can easily convert the `timeseries.time_axis` into datetime values for plotting.

Let's plot the precip of each of the sources:

In [None]:
fig, ax = plt.subplots(figsize=(15,10))

for pr in prec:
    t,p = [dt.datetime.utcfromtimestamp(t_.start) for t_ in pr.ts.time_axis], pr.ts.values
    ax.plot(t,p, label=pr.mid_point().x) #uid is empty now, but we reserve for later use
fig.autofmt_xdate()
ax.legend(title="Precipitation Input Sources")
ax.set_ylabel("precip[mm/hr]")

Before we leave this section, we can also take a quick look at the `interpolation_param_repo`. This is a different type of repository, and it contains the parameters that will be passed to the interpolation algorithm to take a point-source timeseries and interpolate them to the Shyft cells, or in the context of of the API: `region_model.cells`. We'll quickly look at the `.params` attribute, which is a dictionary.

In [None]:
interpolation_param_repo.params

One quickly recognizes the same input source type keywords that are used as keys to the `params` dictionary. `params` is simply a dictionary of dictionaries which contains the parameters used by the interpolation model that is specific for each source type.

In closing, one is encouraged to understand well the concept of the **repositories**. As a user of Shyft, it is likely you'll want to create your own repository to access your data directly rather than creating input files for Shyft. Keep in mind that the repositories are Python code, and not a part of the core C++ code of Shyft. They are designed to provide an interface between the C++ code and potentially more 'pythonic' paradigms. In the following section, you'll see that we populate a C++ class from a repository collection.

#### The ARegionEnvironment class

The next thing we'll do is to create an `api.ARegionEnvironment` class to use in our custom simulation. As the `geo_ts_repo` was a Python interface that provided a collection of all the timeseries repositories, the `region_env` is an API type that provides a container of the "sources" of data specific to the model. We will now create an `api.ARegionEnvironment` from the `geo_ts_repo`. It may be helpful to think of a `region_env` as the container of input data for the  `region_model` -- in fact, that is what it is.

In [None]:
def get_region_env(sources_):
    region_env_ = api.ARegionEnvironment()
    region_env_.temperature = sources_["temperature"]
    region_env_.precipitation = sources_["precipitation"]
    region_env_.radiation = sources_["radiation"]
    region_env_.wind_speed = sources_["wind_speed"]
    region_env_.rel_hum = sources_["relative_humidity"]
    return region_env_

region_env = get_region_env(sources)

What we have done here is to convert our input data from the Python based repositories into a C++ type object that is used in the Shyft core. It may feel redundant to `geo_ts_repo`, but there are underlying differences. Still, you'll see that now the 'sources' are direct attributes of the `region_env` class:

In [None]:
print(len(region_env.precipitation))
type(region_env.precipitation[0])

#### Interpolation Parameters
In the same manner that we need to convert the sources from the Python based container, we'll also create an API object from the `interpolation_param_repo`.

In [None]:
interpolation_parameters = interpolation_param_repo.get_parameters(interpolation_id)

Okay, now we are set to *rebuild* our `region_model` from scratch. In the next few steps we're going to walk through initialization of the `region_model` to set it up for simulation.

### Initialization of the region_model
The two `shyft.api` types: `api.ARegionEnvironment` and `api.InterpolationParameter` together are used to initialize the `region_model`. In the next step, all of the timeseries input sources are interpolated to the geolocated model cells. After this step, each cell is the model has it's own `env_ts` which contains the timeseries for that cell. Let's first do the interpolation, the we can explore the `region_model.cells` a bit further.

In [None]:
#region_model.run_interpolation(interpolation_parameters, region_model.time_axis, region_env)
region_model.interpolate(interpolation_parameters, region_env)

Okay, that was simple. Let's look at the timeseries in some individual cells. The following is a bit of a contrived example, but it shows some aspects of the api. We'll plot the temperature series of all the cells in one sub-catchment, and color them by elevation.

In [None]:
from matplotlib.cm import jet as jet
from matplotlib.colors import Normalize

# get all the cells for one sub-catchment with 'id' == 1228
c1228 = [c for c in region_model.cells if c.geo.catchment_id() == 1228]

# for plotting, create an mpl normalizer based on min,max elevation
elv = [c.geo.mid_point().z for c in c1228]
norm = Normalize(min(elv), max(elv))

#plot with line color a function of elevation
fig, ax = plt.subplots(figsize=(15,10))

# here we are cycling through each of the cells in c1228
for dat,elv in zip([c.env_ts.temperature.values for c in c1228], [c.mid_point().z for c in c1228]):
    ax.plot(dat, color=jet(norm(elv)), label=int(elv))
    
    
# the following is just to plot the legend entries and not related to Shyft
handles, labels = ax.get_legend_handles_labels()

# sort by labels
import operator
hl = sorted(zip(handles, labels),
            key=operator.itemgetter(1))
handles2, labels2 = zip(*hl)

# show legend, but only every fifth entry
ax.legend(handles2[::5], labels2[::5], title='Elevation [m]')

As we would expect from the temperature kriging method, we should find higher elevations have colder temperatures. As an exercise you could explore this relationship using a scatter plot.

Now we're going to create a function that will read initial states from the `initial_state_repo`. In practice, this is already done by the `ConfgiSimulator`, but to demonstrate lower level functions, we'll reset the states of our `region_model`:

In [None]:
# create a function to read the states from the state repository
def get_init_state_from_repo(initial_state_repo_, region_model_id_=None, timestamp=None):
    state_id = 0
    if hasattr(initial_state_repo_, 'n'):  # No stored state, generated on-the-fly
        initial_state_repo_.n = region_model.size()
    else:
        states = initial_state_repo_.find_state(
            region_model_id_criteria=region_model_id_,
            utc_timestamp_criteria=timestamp)
        if len(states) > 0:
            state_id = states[0].state_id  # most_recent_state i.e. <= start time
        else:
            raise Exception('No initial state matching criteria.')
    return initial_state_repo_.get_state(state_id)
 
init_state = get_init_state_from_repo(initial_state_repo, region_model_id_=region_model_id, timestamp=region_model.time_axis.start)


Don't worry too much about the function for now, but do take note of the `init_state` object that we created. This is another container, this time it is a class that contains `PTGSKState` objects, which are specific to the model stack implemented in the simulation (in this case `PTGSK`). If we explore an individual state object, we'll see `init_state` contains, for each cell in our simulation, the state variables for each 'method' of the method stack.

Let's look more closely:

In [None]:
def print_pub_attr(obj):
    #only public attributes
    print([attr for attr in dir(obj) if attr[0] is not '_']) 
    
print(len(init_state))
init_state_cell0 = init_state[0]
# gam snow states
print_pub_attr(init_state_cell0.gs)

#init_state_cell0.kirchner states
print_pub_attr(init_state_cell0.kirchner)

#### Summary
We have now explored the `region_model` and looked at how to instantiate a `region_model` by using a `api.ARegionEnvironment`, containing a collection of timeseries sources, and passing an `api.InterpolationParameter` class containing the parameters to use for the data interpolation algorithms. The interpolation step "populated" our cells with data from the point sources.

The cells each contain all the information related to the simulation (their own timeseries, `env_ts`; their own model parameters, `parameter`; and other attributes and methods). In future tutorials we'll work with the cells indivdual "resource collector" (`.rc`) and "state collector" (`.sc`) attributes.



