# Clearwater Modules Architecture

**Author:** Xavier Nogueira

# Installation and Setup

## Install

Carefully follow our **[Installation Instructions](README.md#getting-started)**, especially including:
- Creating a virtual environment for this repository (step 3)

## Import Python Dependancies

In [1]:
import clearwater_modules as cwm
import clearwater_modules.sorter as sorter
import numba
import random
import hvplot.xarray
import warnings
warnings.filterwarnings("ignore")

In [2]:
# Confirm that sub-modules are imported
dir(cwm)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 'base',
 'shared',
 'sorter',
 'tsm',
 'utils']

### If you get `ModuleNotFoundError`:

If you get this error:
```python
ModuleNotFoundError: No module named 'clearwater_modules'
```
Then:
1. Run the following terminal command with your local absolute path to this repo.
    - NOTE: Here we use Jupyter `!` magic command to run from the terminal via this notebook. 
2. Restart the kernel.
3. Rerun the import statements above.

See [4. Add your `ClearWater-modules-python` Path to Miniconda/Anaconda sites-packages](..ReadMe.md#4-add-your-clearwater-modules-python-path-to-minicondaanaconda-sites-packages).

# Writing/using a simple `Model` sub-class example

In this example we will be writing a `base.Model` sub-class that calculates the annual carbon sequestration in a forest for a given year timestep.

**Note:** Do not take the calculation too literally! I got it off ChatGPT in order to find a good, simple example for the code.

## Start by inheriting `base.Model` -> `CarbonSequestration(cwm.base.Model)`

In [53]:
class CarbonSequestration(cwm.base.Model):
    _variables: list[cwm.base.Variable] = []
    ...

## Next, use the `register_variable` decorator to add a few variables

To do this, make a sub-class of `base.Variable` but with the decorator pointed at the model(s) you want to add the variables too. Note that the `models` argument of the decorator must be either a single sub-class of `base.Model`, or a list of them.

Next, just write instances of the new `base.Variable` sub-class. Each variable's `use` attribute must be set to `static`, `dynamic`, and `state`. Read below about what this means / how you should split up your variables.

Note that anything that needs to be calculated or input into a model should be encapsulated by a variable!

In [54]:
@cwm.base.register_variable(models=CarbonSequestration)
class Variable(cwm.base.Variable):
    ...

### Add our static variables

**Working Definition:** Static variables are any variables that will not change across the course of a simulation, regardless of how many time-steps are run.

Note that one can update static variables if they really want to by re-initializing the model class and providing new static variable inputs.

Here we will use the following static variables:
1. **Net Primary Productivity (NPP)**: The average annual NPP of the forest ecosystem (g/m²/year).
2. **Carbon Content**: The fraction of NPP that is composed of carbon (usually around 50%, but it can vary).

In [72]:
Variable(
    name='npp',
    long_name='Net Primary Productivity (NPP)',
    units='g/m^2/year',
    description='The annual average NPP of the forest ecosystem.',
    use='static',
)
Variable(
    name='carbon_content',
    long_name='Carbon Content ratio',
    units='ratio',
    description='The fraction of NPP that is composed of carbon (usually around 50%, but it can vary).',
    use='static',
)

# display the variables we have registered so far
display(CarbonSequestration.get_variable_names())

['npp', 'carbon_content', 'delta_C_annual', 'C_total', 'forest_area']

### Add our dynamic variables

**Working Definition:** Dynamic variables are any intermediate variable calculation that don't need to be passed to the next timestep. All dynamic variables need to be associated with a function via the optional `Variable.process` attribute. This "process" function is used to calculate them. **Importantly, the arguments of said function should match the variable names that will be passed in!**

In this simple example we will have only one dynamic variable:
1. **Annual carbon sequestration** (delta_C_annual):

   `delta_C_annual = npp * carbon_content * forest_area`

In [56]:
@numba.njit
def delta_C_annual(
    npp: float,
    carbon_content: float,
    forest_area: float,
) -> float:
    return npp * carbon_content * forest_area

In [57]:
Variable(
    name='delta_C_annual',
    long_name='Annual Carbon Delta',
    units='g',
    description='Annual change in forest carbon content',
    use='dynamic',
    process=delta_C_annual,
)

# display the variables we have registered so far
display(CarbonSequestration.get_variable_names())

['npp', 'carbon_content', 'delta_C_annual']

### Add our state variable

**Working Definition:** A state variable is the main input/output to each timestep. Notably, it can be updated between timesteps to allow interaction with other models. Our model needs to be initialized with state variable values, and no matter what settings are used in initialization, the state variable is stored in our main dataset (keep reading to see this).

Our state variable is the total carbon stock of the forest, which is updated each year:
1. **Total carbon stock** (C_total):

    `C_total = C_total + delta_C_annual`
    
State variables also require a process function.

In [58]:
@numba.njit
def C_total(
    C_total: float,
    delta_C_annual: float,
) -> float:
    return C_total + delta_C_annual

In [59]:
Variable(
    name='C_total',
    long_name='Carbon total',
    units='g',
    description='Total forest carbon content',
    use='state',
    process=C_total,
)

Variable(
    name='forest_area',
    long_name='Area of the forest',
    units='m^2',
    description='Area of the forest, may change year by year with deforestation.',
    use='state',
    process=forest_area,
)

# display the variables we have registered so far
display(CarbonSequestration.get_variable_names())

['npp', 'carbon_content', 'delta_C_annual', 'C_total', 'forest_area']

In [60]:
# for state variables we can see them before initialization
display(CarbonSequestration.get_state_variables())

[Variable(name='C_total', long_name='Carbon total', units='g', description='Total forest carbon content', use='state', process=CPUDispatcher(<function C_total at 0x000002306625D760>)),
 Variable(name='forest_area', long_name='Area of the forest', units='m^2', description='Area of the forest, may change year by year with deforestation.', use='state', process=CPUDispatcher(<function forest_area at 0x000002306621F600>))]

## Now let's instantiate our new model

To instantiate a model we need to pass in a dictionary with our initial state variable values, any non-default changes to our static variables, and any other optional config settings.

In [74]:
initial_state_values = {'C_total': 1000, 'forest_area': 1000}
static_variable_values = {
    'carbon_content': 0.5,
    'npp': 10,
}

carbon_model = CarbonSequestration(
    initial_state_values=initial_state_values,  # mandatory
    static_variable_values=static_variable_values,  # mandatory/optional depending on defaults
    track_dynamic_variables=True,  # default is true
    hotstart_dataset=None,  # default is None
    time_dim='year',  # default is "timestep"
)

Initializing from dicts...
Model initialized from input dicts successfully!.


### All instantiated models have static, dynamic, and state variable properties

In [75]:
display(carbon_model.state_variables)

[Variable(name='C_total', long_name='Carbon total', units='g', description='Total forest carbon content', use='state', process=CPUDispatcher(<function C_total at 0x000002306625D760>)),
 Variable(name='forest_area', long_name='Area of the forest', units='m^2', description='Area of the forest, may change year by year with deforestation.', use='state', process=CPUDispatcher(<function forest_area at 0x000002306621F600>))]

In [76]:
display(carbon_model.static_variables)

[Variable(name='npp', long_name='Net Primary Productivity (NPP)', units='g/m^2/year', description='The annual average NPP of the forest ecosystem.', use='static', process=None),
 Variable(name='carbon_content', long_name='Carbon Content ratio', units='ratio', description='The fraction of NPP that is composed of carbon (usually around 50%, but it can vary).', use='static', process=None)]

In [77]:
display(carbon_model.dynamic_variables)

[Variable(name='delta_C_annual', long_name='Annual Carbon Delta', units='g', description='Annual change in forest carbon content', use='dynamic', process=CPUDispatcher(<function delta_C_annual at 0x0000023068D9FBA0>))]

### One can access their "computation order" which is calculated using a "dependency tree" approach in `sorter.py`

In [78]:
carbon_model.computation_order

[Variable(name='delta_C_annual', long_name='Annual Carbon Delta', units='g', description='Annual change in forest carbon content', use='dynamic', process=CPUDispatcher(<function delta_C_annual at 0x0000023068D9FBA0>)),
 Variable(name='C_total', long_name='Carbon total', units='g', description='Total forest carbon content', use='state', process=CPUDispatcher(<function C_total at 0x000002306625D760>)),
 Variable(name='forest_area', long_name='Area of the forest', units='m^2', description='Area of the forest, may change year by year with deforestation.', use='state', process=CPUDispatcher(<function forest_area at 0x000002306621F600>))]

In [79]:
print('Variable | Inputs\n------------------')
for i in carbon_model.computation_order:
    print(f'{i.name} | {sorter.get_process_args(i.process)}')

Variable | Inputs
------------------
delta_C_annual | ['npp', 'carbon_content', 'forest_area']
C_total | ['C_total', 'delta_C_annual']
forest_area | ['forest_area']


### Data is stored in `self.dataset`

In [80]:
carbon_model.dataset

## Running a timestep
All timesteps can be run independently. Optionally, one can update the state values with a float or a `xarray.DataArray`.

In [81]:
carbon_model.increment_timestep()
carbon_model.dataset

## Running a loop of timesteps

Here we run 100 years of our model with the following hypothetical:
* For the first 50 years deforestation reduces forest area incrementally.
* 50 years in, a program begins that ends deforestation, and the forest grows back incrementally.

**This demonstrates how we can update state variables to interact with other models!**

In [82]:
%%time
for i in range(100):
    forest_area_change = random.uniform(0.0, 25)
    if i < 50:
        forest_area_change = -forest_area_change
    new_forest_area = (carbon_model.dataset.forest_area + forest_area_change).isel(year=-1)
    carbon_model.increment_timestep(update_state_values={'forest_area': new_forest_area})
carbon_model.dataset

CPU times: total: 1.02 s
Wall time: 1.02 s


In [83]:
carbon_model.dataset.hvplot(x='year', y='delta_C_annual', title='delta_C_annual')

In [84]:
carbon_model.dataset.hvplot(x='year', y='C_total', title='C_total')

# TSM `EnergyBudget` Example

Now that we understand how the code architecture works, we can explore a real example.

In [85]:
from clearwater_modules.tsm.model import EnergyBudget

In [23]:
# Confirm that sub-modules are imported
dir(cwm.tsm)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 'constants',
 'dynamic_variables',
 'model',
 'processes',
 'state_variables',
 'static_variables']

## Start by instantiating a `EnergyBudget`

Initial state variable values are always required. To see the names/info of a model's state variables, we can use `Model.get_state_variables()`.

In [24]:
EnergyBudget.get_state_variables()

[Variable(name='water_temp_c', long_name='Water temperature', units='degC', description='TSM state variable for water temperature', use='state', process=CPUDispatcher(<function t_water_c at 0x0000023060D6C5E0>)),
 Variable(name='surface_area', long_name='Surface area', units='m^2', description='Surface area', use='state', process=<function mock_equation at 0x000002305D98E480>),
 Variable(name='volume', long_name='Volume', units='m^3', description='Volume', use='state', process=<function mock_equation at 0x000002305D98E480>)]

In [25]:
initial_state_values = {
    'water_temp_c': 1.0,
    'volume': 1.0,
    'surface_area': 1.0,
}

In [26]:
my_model = EnergyBudget(
    initial_state_values,
    time_dim='my_time_step',
)
my_model

Initializing from dicts...
Model initialized from input dicts successfully!.


<clearwater_modules.tsm.model.EnergyBudget at 0x2306863b310>

In [27]:
[i for i in dir(my_model) if i[0] != '_']

['all_variables',
 'computation_order',
 'dataset',
 'dynamic_variables',
 'dynamic_variables_names',
 'get_state_variables',
 'get_variable',
 'get_variable_names',
 'hotstart_dataset',
 'increment_timestep',
 'initial_state_values',
 'met_parameters',
 'register_variable',
 'state_variables',
 'state_variables_names',
 'static_variable_values',
 'static_variables',
 'static_variables_names',
 'temp_parameters',
 'time_dim',
 'track_dynamic_variables',
 'unregister_variables']

## TSM can be initialized with alternative met/temp parameter
**This is an example of a model specific `__init__`**. As of now we are using the defaults.

In [28]:
my_model.met_parameters

{'air_temp_c': 20,
 'q_solar': 400,
 'sed_temp_c': 5.0,
 'eair_mb': 1.0,
 'pressure_mb': 1013.0,
 'cloudiness': 0.1,
 'wind_speed': 3.0,
 'wind_a': 0.3,
 'wind_b': 1.5,
 'wind_c': 1.0,
 'wind_kh_kw': 1.0}

In [29]:
my_model.temp_parameters

{'stefan_boltzmann': 5.67e-08,
 'cp_air': 1005,
 'emissivity_water': 0.97,
 'gravity': -9.806,
 'a0': 6984.505294,
 'a1': -188.903931,
 'a2': 2.133357675,
 'a3': -0.01288580973,
 'a4': 4.393587233e-05,
 'a5': -8.023923082e-08,
 'a6': 6.136820929e-11,
 'pb': 1600.0,
 'cps': 1673.0,
 'h2': 0.1,
 'alphas': 0.0432,
 'richardson_option': True}

In [30]:
my_model.time_dim

'my_time_step'

## All models have static, dynamic, and state variables

In [31]:
display(my_model.static_variables)

[Variable(name='stefan_boltzmann', long_name='Stefan-Boltzmann Constant', units='W m-2 K-4', description='The Stefan-Boltzmann constant.', use='static', process=None),
 Variable(name='cp_air', long_name='Specific Heat Capacity of Air', units='J kg-1 K-1', description='The specific heat capacity of air.', use='static', process=None),
 Variable(name='emissivity_water', long_name='Emissivity of Water', units='1', description='The emissivity of water.', use='static', process=None),
 Variable(name='gravity', long_name='Gravity', units='m s-2', description='The acceleration due to gravity.', use='static', process=None),
 Variable(name='a0', long_name='Albedo of Water', units='unitless', description='The albedo of water.', use='static', process=None),
 Variable(name='a1', long_name='Albedo of Water', units='unitless', description='The albedo of water.', use='static', process=None),
 Variable(name='a2', long_name='Albedo of Water', units='unitless', description='The albedo of water.', use='sta

In [32]:
display(my_model.dynamic_variables)

[Variable(name='air_temp_k', long_name='Air temperature', units='K', description='Air temperature', use='dynamic', process=CPUDispatcher(<function air_temp_k at 0x0000023060D2B9C0>)),
 Variable(name='water_temp_k', long_name='Water temperature', units='K', description='Water temperature', use='dynamic', process=CPUDispatcher(<function water_temp_k at 0x0000023060D56AC0>)),
 Variable(name='mixing_ratio_air', long_name='Mixing ratio of air', units='unitless', description='Mixing ratio of air', use='dynamic', process=CPUDispatcher(<function mixing_ratio_air at 0x0000023060D56C00>)),
 Variable(name='density_air', long_name='Density of air', units='kg/m^3', description='Density of air', use='dynamic', process=CPUDispatcher(<function density_air at 0x0000023060D56E80>)),
 Variable(name='density_water', long_name='Density of water', units='kg/m^3', description='Density of water', use='dynamic', process=CPUDispatcher(<function mf_density_water at 0x0000023060D556C0>)),
 Variable(name='esat_mb'

## One can access their "computation order" which is calculated using a "dependency tree" approach in `sorter.py`

In [33]:
my_model.computation_order

[Variable(name='air_temp_k', long_name='Air temperature', units='K', description='Air temperature', use='dynamic', process=CPUDispatcher(<function air_temp_k at 0x0000023060D2B9C0>)),
 Variable(name='water_temp_k', long_name='Water temperature', units='K', description='Water temperature', use='dynamic', process=CPUDispatcher(<function water_temp_k at 0x0000023060D56AC0>)),
 Variable(name='mixing_ratio_air', long_name='Mixing ratio of air', units='unitless', description='Mixing ratio of air', use='dynamic', process=CPUDispatcher(<function mixing_ratio_air at 0x0000023060D56C00>)),
 Variable(name='density_air', long_name='Density of air', units='kg/m^3', description='Density of air', use='dynamic', process=CPUDispatcher(<function density_air at 0x0000023060D56E80>)),
 Variable(name='density_water', long_name='Density of water', units='kg/m^3', description='Density of water', use='dynamic', process=CPUDispatcher(<function mf_density_water at 0x0000023060D556C0>)),
 Variable(name='esat_mb'

In [34]:
for i in my_model.computation_order:
    print(f'{i.name} | {sorter.get_process_args(i.process)}')

air_temp_k | ['air_temp_c']
water_temp_k | ['water_temp_c']
mixing_ratio_air | ['eair_mb', 'pressure_mb']
density_air | ['pressure_mb', 'air_temp_k', 'mixing_ratio_air']
density_water | ['water_temp_c']
esat_mb | ['water_temp_k', 'a0', 'a1', 'a2', 'a3', 'a4', 'a5', 'a6']
density_air_sat | ['water_temp_k', 'esat_mb', 'pressure_mb']
ri_number | ['gravity', 'density_air', 'density_air_sat', 'wind_speed']
ri_function | ['ri_number']
lv | ['water_temp_k']
cp_water | ['water_temp_c']
emissivity_air | ['air_temp_k']
wind_function | ['wind_a', 'wind_b', 'wind_c', 'wind_speed']
q_latent | ['ri_function', 'pressure_mb', 'density_water', 'lv', 'wind_function', 'esat_mb', 'eair_mb']
q_sensible | ['wind_kh_kw', 'ri_function', 'cp_air', 'density_water', 'wind_function', 'air_temp_k', 'water_temp_k']
q_sediment | ['pb', 'cps', 'alphas', 'h2', 'sed_temp_c', 'water_temp_c']
dTdt_sediment_c | ['alphas', 'h2', 'water_temp_c', 'sed_temp_c']
q_longwave_down | ['air_temp_k', 'emissivity_air', 'cloudiness', 

## Run 5 timesteps

In [35]:
TIME_STEPS = 5

In [36]:
@numba.jit(forceobj=True)
def run_n_timesteps(time_steps: int, model: EnergyBudget):
    for i in range(time_steps):
        model.increment_timestep()

In [37]:
%%time
run_n_timesteps(TIME_STEPS, my_model)

CPU times: total: 4.72 s
Wall time: 5.11 s


In [38]:
my_model.dataset