# Using Customized Data To Run PywrDRB

PywrDRB provides multiple `inflow_type` and `diversion_type` options for users to select directly during the model-building process. However, sometimes users may want to use their own customized data to run the model.

In this tutorial, we will walk you through:  
1) The path structure adopted in PywrDRB.  
2) How to use your own flow and diversion data to run the model.


## Path Structure in PywrDRB

In PywrDRB, we use a global instance of the [PathNavigator](https://github.com/philip928lin/PathNavigator) object to manage the paths associated with different datasets used in PywrDRB.

To get the customizable path configuration, you can do:


In [2]:
import pywrdrb
from pprint import pprint

pn_config = pywrdrb.get_pn_config()
pprint(pn_config)

{'flows/nhmv10': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\nhmv10',
 'flows/nhmv10_withObsScaled': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\nhmv10_withObsScaled',
 'flows/nwmv21': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\nwmv21',
 'flows/nwmv21_withObsScaled': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\nwmv21_withObsScaled',
 'flows/pub_nhmv10_BC_withObsScaled': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\pub_nhmv10_BC_withObsScaled',
 'flows/wrf1960s_calib_nlcd2016': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\wrf1960s_calib_nlcd2016',
 'flows/wrf2050s_calib_nlcd2016': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\wrf2050s_calib_nlcd2016',
 'flows/wrfaorc_calib_nlcd2016': 'C:\\Users\\CL\\Documents\\GitHub\\Pywr-DRB\\src\\pywrdrb\\data\\flows\\wrfaorc_calib_nlcd2016',
 'flows/wrfaorc_wi

From here, you can see that we provide five inflow types: `nhmv10`, `nhmv10_withObsScaled`, `nwmv21`, `nwmv21_withObsScaled`, `wrf1960s_calib_nlcd2016`, `wrf2050s_calib_nlcd2016`, and `wrfaorc_calib_nlcd2016`, along with the corresponding five diversion types: `nhmv10`, `nhmv10_withObsScaled`, `nwmv21`, `nwmv21_withObsScaled`, `wrf1960s_calib_nlcd2016`, `wrf2050s_calib_nlcd2016`, and `wrfaorc_calib_nlcd2016`. 

The directories of the corresponding folders are stored in a dictionary, where the keys have the prefix `flows/` and `diversions/` to distinguish between flow data and diversion data.

So, if you want to use your own data to run the simulation, you will need to add your folder directory to `pn_config` and load it into `pywrdrb` before building and running the model. 

Let me show you how. Assuming you want to use your own flow datasets (we will discuss the required datasets soon) stored in an external folder `C:/my_data`, you need to add `{"flows/my_data": "C:/my_data"}` to `pn_config`. Then, you can use `my_data` as the flow type when using the `modelbuilder`.


In [3]:
# Before adding the custom config if we try to use the custom inflow type, it will raise an error

r"""
mb = pywrdrb.ModelBuilder(
    inflow_type='my_data', 
    diversion_type='nhmv10',
    start_date="1983-10-01",
    end_date="1985-12-31"
    )
"""
# Make a model (you are expected to see error here if you uncomment the line below)
# mb.make_model()

'\nmb = pywrdrb.ModelBuilder(\n    inflow_type=\'my_data\', \n    diversion_type=\'nhmv10\',\n    start_date="1983-10-01",\n    end_date="1985-12-31"\n    )\n'

In [4]:
# For demonstration purpose, let me assume my_data folder has the same directory of 
# nhmv10 flow type (you will use your actual directory to the folder)
pn_config = pywrdrb.get_pn_config()
pn_config["flows/my_data"] = pn_config["flows/nhmv10"]

pywrdrb.load_pn_config(pn_config)

# Now we can use the custom inflow type
mb = pywrdrb.ModelBuilder(
    inflow_type='my_data', 
    diversion_type='nhmv10',
    start_date="1983-10-01",
    end_date="1985-12-31"
    )

# Make a model (you are expected to see error here)
mb.make_model()
print("You secussessfully created a model with custom inflow type")

You secussessfully created a model with custom inflow type


## Datasets you need to have in your customize "my_folder"



In [5]:
import pandas as pd
print("For flow type folder, you need to have the following files: \n")
pn = pywrdrb.get_pn_object()
files = pn.flows.nhmv10.list()
print(f"File needed: {files}\n")  
for file in files:
    df = pd.read_csv(pn.flows.nhmv10.get(file))
    print(f"File: {file}")
    print(df.iloc[:5, :5]) # print first 5 rows and 5 columns
    print("\n")
#df.head()

For flow type folder, you need to have the following files: 

File needed: ['catchment_inflow_mgd.csv', 'gage_flow_mgd.csv', 'predicted_inflows_mgd.csv']

File: catchment_inflow_mgd.csv
     datetime  cannonsville    pepacton   neversink  wallenpaupack
0  1980-10-01    130.379373   89.325193   60.718368      33.438627
1  1980-10-02    320.876434  234.425929   58.710650     155.951484
2  1980-10-03    384.188487  261.300230   71.301161     196.969214
3  1980-10-04    345.220715  265.843778  115.661384     229.965044
4  1980-10-05    322.869093  257.040294   71.100223     208.087549


File: gage_flow_mgd.csv
     datetime  cannonsville    pepacton   neversink  wallenpaupack
0  1980-10-01    130.379373   89.325193   60.718368      33.438627
1  1980-10-02    320.876434  234.425929   58.710650     155.951484
2  1980-10-03    384.188487  261.300230   71.301161     196.969214
3  1980-10-04    345.220715  265.843778  115.661384     229.965044
4  1980-10-05    322.869093  257.040294   71.100223

In [9]:
import pandas as pd
print("For diversions type folder, you need to have the following files: \n")
pn = pywrdrb.get_pn_object()
files = pn.diversions.list()
print(f"File needed: {files}\n")  
for file in files:
    df = pd.read_csv(pn.diversions.get(file))
    print(f"File: {file}")
    print(df.iloc[:5, :5]) # print first 5 rows and 5 columns
    print("\n")
#df.head()

For diversions type folder, you need to have the following files: 

File needed: ['diversion_nj_extrapolated_mgd.csv', 'diversion_nyc_extrapolated_mgd.csv', 'predicted_diversions_mgd.csv']

File: diversion_nj_extrapolated_mgd.csv
     datetime  D_R_Canal
0  1945-01-01  86.850770
1  1945-01-02  82.101119
2  1945-01-03  61.066948
3  1945-01-04  48.853558
4  1945-01-05  56.995818


File: diversion_nyc_extrapolated_mgd.csv
     datetime  cannonsville  pepacton  neversink   aggregate
0  1945-01-01           NaN       NaN        NaN  473.057090
1  1945-01-02           NaN       NaN        NaN  473.057090
2  1945-01-03           NaN       NaN        NaN  473.057090
3  1945-01-04           NaN       NaN        NaN  759.854566
4  1945-01-05           NaN       NaN        NaN  790.545070


File: predicted_diversions_mgd.csv
     datetime  demand_nj_lag1_regression_disagg  \
0  1945-01-01                         85.923699   
1  1945-01-02                         81.227992   
2  1945-01-03        

## More About the Global PathNavigator Object Used in PywrDRB

We can get the global `PathNavigator` object used in PywrDRB by running: `pn = pywrdrb.get_pn_object()`

This `pn` object contains all the directory and path information, allowing you to locate specific files used in PywrDRB within the file explorer.

More `pn` operations can be found [here](https://github.com/philip928lin/PathNavigator). However, users should ONLY use pn to explore file and folder locations. It is not designed for modifications unless you fully understand what you are doing.

In [10]:
pn = pywrdrb.get_pn_object()
print(f"The root directory of the pywrdrb: {pn.get()}")

The root directory of the pywrdrb: C:\Users\CL\Documents\GitHub\Pywr-DRB\src\pywrdrb\data


In [11]:
pn.scan(max_depth=2)  # scan the directory structure up to 2 levels deep
pn.tree()

data
├── catchment_withdrawals
│   └── sw_avg_wateruse_pywrdrb_catchments_mgd.csv
├── diversions
│   ├── diversion_nj_extrapolated_mgd.csv
│   ├── diversion_nyc_extrapolated_mgd.csv
│   └── predicted_diversions_mgd.csv
├── flows
│   ├── nhmv10
│   ├── nhmv10_withObsScaled
│   ├── nwmv21
│   ├── nwmv21_withObsScaled
│   ├── pub_nhmv10_BC_withObsScaled
│   ├── wrf1960s_calib_nlcd2016
│   ├── wrf2050s_calib_nlcd2016
│   ├── wrfaorc_calib_nlcd2016
│   ├── wrfaorc_withObsScaled
│   ├── _hydro_model_flow_output
│   └── _scaled_inflows
├── observations
│   ├── _raw
│   ├── catchment_inflow_mgd.csv
│   ├── gage_flow_mgd.csv
│   └── reservoir_storage_mg.csv
├── operational_constants
│   ├── constants.csv
│   ├── ffmp_reservoir_operation_daily_profiles.csv
│   ├── ffmp_reservoir_operation_monthly_profiles.csv
│   ├── ffmp_reservoir_operation_weekly_profiles.csv
│   └── istarf_conus.csv
└── spatial
    └── to_be_determined.txt

18 directories, 13 files
