# Introduction to Pywr-DRB
## Overview:
If you want to learn how to use the [Pywr-DRB](https://github.com/Pywr-DRB/Pywr-DRB) water resource model, you are in the right place. 

This page is designed to introduce you to the Pywr-DRB code base, help you set up your environment, and show you how to access and begin interacting with a Pywr-DRB model instance.  

### Links:
- [The Pywr-DRB GitHub repository](https://github.com/Pywr-DRB/Pywr-DRB)
- [The Pywr-DRB documentation site](https://pywr-drb.github.io/Pywr-DRB/intro.html)


Outline
- [1.0 Getting Started](#10-getting-started)
- [2.0 Using the `pywrdrb` package](#20-using-the-pywrdrb-package)
    - [2.1 Creating a pywrdrb model instance](#21-creating-a-pywrdrb-model-instance)
    - [2.2 Loading a pywrdrb model](#22-loading-a-pywrdrb-model)
    - [2.3 Setting up the data recorder](#23-setting-up-the-data-recorder)
    - [2.4 Running a pywrdrb simulation](#24-running-a-pywrdrb-simulation)
    - [2.5 Loading output data](#25-loading-output-data)

***
## 1.0 Getting Started

The [Pywr-DRB GitHub organization page](https://github.com/Pywr-DRB), navigate to the [Pywr-DRB repository](https://github.com/Pywr-DRB/Pywr-DRB).

Start by cloning the github repository onto your machine:
```bash
git clone https://github.com/Pywr-DRB/Pywr-DRB.git
```

Once the repository is cloned, you can install the `pywrdrb` package.  

To create a virtual environment and install `pywrdrb` on Windows:
```bash
python -m virtualenv venv
venv/Scripts/activate
pip install . -e
```

Now we are ready! Import the `pywrdrb` package to make sure it is accessible. 

***

In [1]:
import pywrdrb

## 2.0 Using the `pywrdrb` package

The `pywrdrb` package has a few key components to facilitate each step of the process. 

These include:
`pywrdrb.ModelBuilder`
`pywrdrb.Model`
`pywrdrb.Outputrecorder`
`pywrdrb.Data`


### 2.1 Creating a pywrdrb model instance

Before we can run any simulations, we need to create an instance of the Pywr-DRB model, which will be stored in a `.json` file.  This `.json` file will contain all the information needed to run a simulation, including:
- Metadata such as start and end date and number of scenarios
- Relationships between different nodes
- Operational constrains 
- URL paths for different input sources
- etc.

These `.json` files are unique to each simulation configuration, so we can create different instances for different inflow datasets or configurations. 

We use the `pywrdrb.ModelBuilder` to create this `.json` file. 

The `pywrdrb.ModelBuilder` takes three key arguments including:
- `inflow_type` (str): the name of the dataset to use as input
- `start_date` (str): the start date of the simulation
- `end_date` (str): the end date of the simulation

The following code is used to generate a new `.json` file using the `"nhmv10_withObsScaled"` dataset. 

In [2]:
###### Create a model ######
#Initialize a model builder
mb = pywrdrb.ModelBuilder(
    inflow_type='nhmv10_withObsScaled', 
    start_date="1983-10-01",
    end_date="2016-12-31"
    )

# Make a model - this is stored as a dictionary attribute
mb.make_model()

# Export the data to a .json file
model_filename = rf"./model.json"
mb.write_model(model_filename)

Take a minute to double check that the `model.json` file exists. 

### 2.2 Loading a pywrdrb model

After the `.json` file has been created, we need to load that model instance.  

When loading the model, `pywr` is going to initialize all of the underlying parameters and node and create a `model` object which is ready for simulation. 

The line below shows how to load the model, using the `model_filename` corresponding to the `.json` file.

In [3]:
# Load the model using Model inherited from pywr
model = pywrdrb.Model.load(model_filename)

### 2.3 Setting up the data recorder

Once we have loaded the model, we are almost ready to run a simulation. 

First, we need to initializes a "Recorder" which will keep track of simulation data during the model run. 

Pywr provides a few existing Recorder classes, but we have created a custom class to maximize efficiency while still formatting and saving all the data of interest. 

The custom recorder is called `pywrdrb.OutputRecorder`.  This will save the data to an `.hdf5` file in a standardized format. 

You must provide the `OutputRecorder`:
- The `model` object
- The `output_filename`

The code below sets up the `OutputRecorder`.

In [4]:
# Specify the name for the output file
output_filename = rf"./pywrdrb_output.hdf5"

# Setup the OutputRecorder to track the model outputs
recorder = pywrdrb.OutputRecorder(
    model, output_filename, 
)

### 2.4 Running a pywrdrb simulation
Now we are ready to run the simulation.  This step is easy, as shown below. 

This should take ~30seconds to complete the full simulation.
(You may see many warnings pop up; don't worry about those unless the simulation actually stops..)

In [5]:
# Run the simulation
stats = model.run()

### 2.4 Loading output data

Great, you should now see a `pywrdrb_output.hdf5` file in your directory. 

In order to access the simulation results, you should use the `pywrdrb.Data` class. 

This `pywrdrb.Data` class is designed to load and store multiple different dataset types, but in this case we will focus on the output data.

Given that there are multiple datasets of interest, the `pywrdrb.Data` class uses a hierarchical structure for storing this data within the `pywrdrb.Data` object. 

After using `pywrdrb.Data().load()`, results are stored in the class as a nested dictionary structure following:

```python
data.results_set[datatype_label][scenario_number] -> pd.DataFrame
```

This will return a pd.DataFrame that contains the simulation data with a datetime index. 


#### 2.4.1 Results sets

In this case, `results_set` refers to a key referencing a specific type of variable, examples include `res_storage` (reservoir storage data) or `major_flow` (streamflow at major model nodes). 


You must provide a list of `results_sets` to `pywrdrb.Data` object, these results will then be loaded and stored by the relevant attribut. For example, if you want to access the streamflow at major Pywr-DRB model nodes, then you want `results_sets = ['major_flow']`. 

You can find descriptions of each `results_set` type in [`pywrdrb.utils.results_sets`](../pywrdrb/utils/results_sets.py).

For more details on usage of the `pywrdrb.Data` class, see [Tutorial 04 Accessing Data.ipynb](./Tutorial%2004%20Accessing%20Data.ipynb)

#### 2.4.2 Dataset label after loading

As mentioned above, the `pywrdrb.Data` class is designed to handle multiple datasets in a standard way.  Given this context, we include a dataset label in the hierarchical data structure after loading.

```python
data.results_set[datatype_label][scenario_number] -> pd.DataFrame
```

For `pywrdrb` output data, the `datatype_label` is going to be the name of the output file, with the `.hdf5` removed.  In this case, it will be `pywrdrb_output` since that is what we used for the `output_filename` earlier. 

This is beneficial since we can have a single `pywrdrb.Data` object with multiple different types of simulation results using different names. 


**Now, given this context, the code below loads the major_flow and res_storage simulation results.**

In [6]:
# Load simulation results

# Specify the datatypes and results sets to load
datatypes = ['output']
results_sets = ['major_flow', 'res_storage']


# Initalize a data object
data = pywrdrb.Data(print_status=True)

# Load the data
data.load(datatypes=datatypes,
          output_filenames= [output_filename], 
          results_sets=results_sets)


# Print a snapshot of the data
data.res_storage["pywrdrb_output"][0].head()

Loading output data...
Loading major_flow data from pywrdrb_output
Loading res_storage data from pywrdrb_output


Unnamed: 0,assunpink,beltzvilleCombined,blueMarsh,cannonsville,fewalter,greenLane,hopatcong,merrillCreek,mongaupeCombined,neversink,nockamixon,ontelaunee,pepacton,prompton,shoholaMarsh,stillCreek,wallenpaupack
1983-10-01,-40.090338,-309.439656,-1157.573542,-467.211127,-1245.654858,-272.215962,-69.567312,-8.096763,-652.610235,-180.467158,-629.074614,37.841606,-718.86072,-217.575621,-168.538125,6.219197,-664.924202
1983-10-02,-27.102195,-9.78103,-926.477782,-414.151996,-1243.653429,-215.953532,-70.364748,-7.820269,-647.118544,-128.663814,-629.801064,38.223599,-607.617983,-218.634731,-168.462883,6.924994,-642.010452
1983-10-03,-24.602824,-10.515606,-930.489667,-407.828315,-1245.283879,-275.7933,-72.319468,-7.993999,-654.009752,-148.727358,-631.755784,35.297333,-597.46635,-219.597291,-168.979355,5.344234,-655.986132
1983-10-04,-39.119488,-11.154192,-932.586513,-527.183689,-1246.765943,-307.13857,-72.919379,-8.383405,-661.570641,-192.254031,-632.355695,34.486143,-772.32135,-220.436791,-169.219203,5.05617,-676.761876
1983-10-05,-43.383176,-111.040586,-933.130129,-526.719068,-1248.265605,-307.374282,-73.124067,-8.715289,-663.426109,-192.084592,-632.560384,34.004635,-771.640683,-221.170619,-169.269357,4.875007,-678.77266


You should see a DataFrame with multiple columns corresponding to each of the reservoirs in the model. 

Now you can get into the fun of looking at results and some data visualization!

