## Farmer Agent Based Model
_Jim Yoon, Emily Rexer, & Travis Thurber_

The Farmer Agent Based Model (`FABMod` eh eh?) integrates with `mosartwmpy` to model adaptive water demand based on crop economics.

TODO: etc etc

<br/>

__TODO:__ consider if something like `nbdev` is a good option for `FABMod`

<br/>

In [None]:
import xarray as xr
import pandas as pd
import numpy as np
import pickle

In [None]:
# These input files come from `mosartwmpy`

# Many-to-many relationships between GRanD Dam IDs and the grid cell IDs that can withdraw water from their reservoirs
dependency_database_path = 'input/reservoirs/dependency_database.parquet'

# Dam/reservoir parameters - in this case used to find which grid cell ID a particular GRanD Dam ID is located on
reservoir_parameter_path = 'input/reservoirs/grand_reservoir_parameters.nc'

# Output from `mosartwmpy` for the preceding year
# TODO this should be provided live from wmpy
simulation_output_path = 'output/istarf_validation/istarf_validation_1982*.nc'


In [None]:
# These input files belong to `FABMod`
# TODO they need to be consolidated into a single csv or parquet file

# TODO what really is this?
bias_correction_path = 'hist_avail_bias_correction_20201102.csv'

# This file has the relationships between NLDAS ID and lat/lon
# TODO perhaps we should just add the NLDAS IDs to the existing `mosartwmpy` domain file so that we don't need to use these
nldas_path = 'nldas.txt'

# This file has a list of NLDAS IDs that we care about
nldas_ids_path = 'nldas_ids.p'

# TODO what even is this?
historic_storage_path = 'hist_dependent_storage.csv'


<br/>

`mu` represents the memory decay rate of each agent; higher values indicate faster decay, i.e. 1 means only remember the preceding year.

In [None]:
mu = 0.2

<br/>

Begin by loading the inputs into memory:

In [None]:
# Grid cell to consumable reservoirs
dependency_database = pd.read_parquet(dependency_database_path)

# Placement of reservoirs on the grid
# TODO rather than use strings, need to read variable names from `mosartwmpy` config
reservoir_parameters = xr.open_dataset(reservoir_parameter_path)[['GRAND_ID', 'GRID_CELL_INDEX']].to_dataframe()

# Preceding year of `mosartwmpy` output, subset to the data `FABMod` needs and averaged over the whole year
# TODO rather than use strings, need to read variable names from `mosartwmpy` config
simulation_output = xr.open_mfdataset(simulation_output_path)[[
    'GINDEX', 'WRM_STORAGE', 'WRM_SUPPLY', 'RIVER_DISCHARGE_OVER_LAND_LIQ'
]].mean('time').to_dataframe().reset_index()

# Bias correction
# TODO what is it?
bias_correction = pd.read_csv(bias_correction_path)

# NLDAS IDs to lat/lon
# TODO see above, should just include this in `mosartwmpy` domain
nldas = pd.read_csv(nldas_path)

# Historic storage
# TODO what is it? Do we only need it during the warmup period?
historic_storage = pd.read_csv(historic_storage_path)

# NLDAS IDs we care about
# TODO should just include this in a unified input file that excludes things we don't care about
with open(nldas_ids_path, 'rb') as f:
    nldas_ids = pickle.load(f)
    

<br/>

Merge the reservoir grid cell locations into the dependency database.

In [None]:
# TODO remember to rely on the `mosartwmpy` config file for variable names
dependency_database = dependency_database.merge(reservoir_parameters, how='left', on='GRAND_ID').rename(columns={'GRID_CELL_INDEX': 'RESERVOIR_CELL_INDEX'})

<br/>

Find the total reservoir water volume each grid cell had access to:

In [None]:
# Merge the dependency database with the mean storage at reservoir locations, and aggregate per grid cell
# TODO remember to rely on the `mosartwmpy` config file for variable names
abm_data = dependency_database.merge(simulation_output[[
    'GINDEX', 'WRM_STORAGE'
]], how='left', left_on='RESERVOIR_CELL_INDEX', right_on='GINDEX').groupby('DEPENDENT_CELL_INDEX', as_index=False)[['WRM_STORAGE']].sum().rename(
    columns={'WRM_STORAGE': 'STORAGE_SUM'}
)

<br/>

Find the previous year's mean supply and flow for each grid cell:

In [None]:
# Merge in the mean supply and mean channel outflow from the simulation results per grid cell
abm_data[[
    'WRM_SUPPLY', 'RIVER_DISCHARGE_OVER_LAND_LIQ'
]] =  abm_data[['DEPENDENT_CELL_INDEX']].merge(simulation_output[[
    'GINDEX', 'WRM_SUPPLY', 'RIVER_DISCHARGE_OVER_LAND_LIQ'
]], how='left', left_on='DEPENDENT_CELL_INDEX', right_on='GINDEX')[[
    'WRM_SUPPLY', 'RIVER_DISCHARGE_OVER_LAND_LIQ'
]]

<br/>

Determine the NLDAS IDs for each grid cell:
TODO this should just be part of the `mosartwmpy` domain file and/or the `mosartwmpy` output instead

In [None]:
# Merge the lat/lon
abm_data[[
    'lat', 'lon'
]] = abm_data[['DEPENDENT_CELL_INDEX']].merge(simulation_output[[
    'GINDEX', 'lat', 'lon'
]], how='left', left_on='DEPENDENT_CELL_INDEX', right_on='GINDEX')[[
    'lat', 'lon'
]].round(4)

# Merge the NLDAS_ID
abm_data['NLDAS_ID'] = abm_data[['lat', 'lon']].merge(nldas[['CENTERY', 'CENTERX', 'NLDAS_ID']], left_on=['lat', 'lon'], right_on=['CENTERY', 'CENTERX'], how='left').NLDAS_ID

<br/>

Subselect only the NLDAS IDs we care about:

In [None]:
abm_data = abm_data.loc[abm_data.NLDAS_ID.isin(nldas_ids)]

<br/>

Merge in the other `FABMod` calibration/initial condition data:

TODO: figure out what is actually necessary during warmup vs after warmup (i.e. what data is live?)

In [None]:
# Merge historic storage
abm_data['STORAGE_SUM_OG'] = abm_data[['NLDAS_ID']].merge(historic_storage[['NLDAS_ID','STORAGE_SUM_OG']], on='NLDAS_ID', how='left')[['STORAGE_SUM_OG']]

In [None]:
# Merge bias correction, original supply in acreft, and original channel outflow
# TODO what are these
abm_data[[
    'sw_avail_bias_corr','WRM_SUPPLY_acreft_OG','RIVER_DISCHARGE_OVER_LAND_LIQ_OG'
]] = abm_data[['NLDAS_ID']].merge(bias_correction[[
    'NLDAS_ID','sw_avail_bias_corr','WRM_SUPPLY_acreft_OG','RIVER_DISCHARGE_OVER_LAND_LIQ_OG'
]], on='NLDAS_ID', how='left')[[
    'sw_avail_bias_corr','WRM_SUPPLY_acreft_OG','RIVER_DISCHARGE_OVER_LAND_LIQ_OG'
]]
# Rename original supply... TODO whatever for
abm_data['WRM_SUPPLY_acreft_prev'] = abm_data['WRM_SUPPLY_acreft_OG']

<br/>

Zero the missing data:

TODO: is this acceptable or does it mask underlying problems??

In [None]:
abm_data = abm_data.fillna(0)

<br/>

Calculate a "demand factor" for each agent:

TODO: Solicit Jim for better describing what this means

In [None]:
abm_data['demand_factor'] = np.where(
    abm_data['STORAGE_SUM_OG'] > 0,
    abm_data['STORAGE_SUM'] / abm_data['STORAGE_SUM_OG'],
    np.where(
        abm_data['RIVER_DISCHARGE_OVER_LAND_LIQ_OG'] >= 0.1,
        abm_data['RIVER_DISCHARGE_OVER_LAND_LIQ'] / abm_data['RIVER_DISCHARGE_OVER_LAND_LIQ_OG'],
        1
    )
)

<br/>

Calculate new stuff...

TODO: Need Jim's help to understand what each variable means

In [None]:
# TODO
abm_data['WRM_SUPPLY_acreft_newinfo'] = abm_data['demand_factor'] * abm_data['WRM_SUPPLY_acreft_OG']

# TODO
abm_data['WRM_SUPPLY_acreft_updated'] = ((1 - mu) * abm_data['WRM_SUPPLY_acreft_prev']) + (mu * abm_data['WRM_SUPPLY_acreft_newinfo'])

# TODO
abm_data['WRM_SUPPLY_acreft_prev'] = abm_data['WRM_SUPPLY_acreft_updated']

# TODO
abm_data['WRM_SUPPLY_acreft_bias_corr'] = abm_data['WRM_SUPPLY_acreft_updated'] + abm_data['sw_avail_bias_corr']

<br/>

Finally, we have a bias corrected water supply estimate for each agent as a dictionary:

TODO: Ask Jim what this really means

In [None]:
# TODO did we lose the conversion from m3 to acreft somewhere? the bloody hell is an acreft?
water_constraints_by_farm = abm_data['WRM_SUPPLY_acreft_bias_corr'].to_dict()
abm_data['WRM_SUPPLY_acreft_bias_corr']

<br/>

TODO: the rest of the stuff

In [None]:
# TODO