# Running the SUMMA setups
To begin, we have to regionalize the paths in the configuration files that SUMMA will use.
This is accomplished by running a shell command. This is done by starting a line with the `!` operator.
We simply run a script to complete the installation.
Then, we can import some basic libraries along with `pysumma`.

<br>

### You will need to edit these paths to be your folders

In [None]:
top = '/glade/work/ashleyvb'
folder = top+'/CAMELs'
folders = folder+'/summa_camels'
! cd /glade/work/ashleyvb/CAMELs/summa_camels; ./install_local_setup.sh

First we check that we loaded the correct environment.

In [None]:
conda list pysumma

<br>
Then we load the imports.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pysumma as ps
import xarray as xr
import pandas as pd
from dask_jobqueue import PBSCluster
from dask.distributed import Client

In [None]:
NCORES=48
cluster = PBSCluster(n_workers = NCORES,
                     cores=NCORES,
                     processes=NCORES, 
                     memory="24GB",
                     project='UWAS0091',
                     queue='regular',
                     walltime='06:00:00')
client = Client(cluster)

### Check that have workers, do not run the rest of the cells until the workers show up. 

In [None]:
print(client)
!qstat

<br>

# Interacting with SUMMA via the `Distributed` object

We are running a `Distributed` object, which has multiple `Simulation` objects inside, each corresponding to some spatial chunk. 
We need to do `rm -r /glade/work/ashleyvb/CAMELs/summa_camels/.pysumma/` to clear out the distributed folders every run so permissions do not get screwed up in the loops. 

In [None]:
# for fewer basins, do not exceed number of basins in chunking
CHUNK = 8 #for all 671 basins
# get number of HRUs
attrib = xr.open_dataset(folders+'/settings.v1/attributes.nc')
the_hru = np.array(attrib['hruId'])
if len(the_hru) <8: CHUNK = len(the_hru)

<br>

To set up a `Distributed` object you must supply several pieces of information. 
First, supply the SUMMA executable; this could be either the compiled executable on your local machine, or a docker image. 
The second piece of information is the path to the file manager, which we just created through the install script. 

In [None]:
executable = top+'/summa/bin/summa.exe'

In [None]:
file_manager = folders+'/file_manager_truth.txt'
camels = ps.Distributed(executable, file_manager, num_workers=NCORES, chunk_size=CHUNK, client=client)
print(camels.manager) #possible days 1980-01-01 to 2018-12-31, we are running 1986-10-01 01:00 to 1991-10-02 0:00

<br>

# pySUMMA with all Forcing Files

We run pySumma for each set of forcing files on each basin. You can check how long it has been running by using the command `qstat -u <username>` in a terminal. Each run takes about 9 minutes for 671 basins (shorter if a subset). First, we start with the original NLDAS files, or the "truth run".

In [None]:
%%time
camels.run('local')
#all_status = [(n, s.status) for n, s in camels.simulations.items()] #if want to look at status if has errors
all_ds = [s.output.load() for n, s in camels.simulations.items()] #load it into memory so faster

<br>
We could just write it as several files instead of merging. However, if we want to merge, we can do the following.
First, detect automatically which vars have hru vs gru dimensions (depending on what we use for output, we may not have any gru):

In [None]:
hru_vars = [] # variables that have hru dimension
gru_vars = [] # variables that have gru dimension
for ds in all_ds:
    for name, var in ds.variables.items():
        if 'hru' in var.dims:
            hru_vars.append(name)
        elif 'gru' in var.dims:
            gru_vars.append(name)

<br>
Filter variables for merge, this takes seconds since we are running a limiited output, but if you add more to the output it will take longer.

In [None]:
%%time
hru_ds = [ds[hru_vars] for ds in all_ds]
gru_ds = [ds[gru_vars] for ds in all_ds]
hru_merged = xr.concat(hru_ds, dim='hru')
gru_merged = xr.concat(gru_ds, dim='gru')

In [None]:
print(hru_merged)

In [None]:
%%time
hru_merged.to_netcdf(folders+'/output/merged_day/NLDAStruth_hru.nc')
gru_merged.to_netcdf(folders+'/output/merged_day/NLDAStruth_gru.nc')
del camels
del all_ds 
del hru_merged
del gru_merged

<br>
Here are the other runs, now as a loop. The processes are the same, but for clarity we will divide it into 2 loops, one for the constant forcings and one for the MetSim forcings. This will take about an hour for each loop using all 671 basins. We delete stuff after every run to reduce memory needs.

In [None]:
%%time
# Constant
constant_vars= ['airpres','airtemp','LWRadAtm','pptrate','spechum','SWRadAtm','windspd','all']
for v in constant_vars:
    ! rm -rf /glade/work/ashleyvb/CAMELs/summa_camels/.pysumma 
    file_manager = folders+'/file_manager_constant_' + v +'.txt'
    camels = ps.Distributed(executable, file_manager, num_workers=NCORES, chunk_size=CHUNK, client=client)   
    camels.run('local')
    #all_status = [(n, s.status) for n, s in camels.simulations.items()] #if want to look at status if has errors
    all_ds = [s.output.load() for n, s in camels.simulations.items()] #load it into memory so faster    
    hru_vars = [] # variables that have hru dimension
    gru_vars = [] # variables that have gru dimension
    for ds in all_ds:
        for name, var in ds.variables.items():
            if 'hru' in var.dims:
                hru_vars.append(name)
            elif 'gru' in var.dims:
                gru_vars.append(name)
    hru_ds = [ds[hru_vars] for ds in all_ds]
    gru_ds = [ds[gru_vars] for ds in all_ds]
    hru_merged = xr.concat(hru_ds, dim='hru')
    gru_merged = xr.concat(gru_ds, dim='gru')
    hru_merged.to_netcdf(folders+'/output/merged_day/NLDASconstant_' + v +'_hru.nc')
    gru_merged.to_netcdf(folders+'/output/merged_day/NLDASconstant_' + v +'_gru.nc')
    del camels
    del all_ds 
    del hru_merged
    del gru_merged
    print(v)

In [None]:
%%time
# Metsim
metsim_vars= ['airpres','airtemp','LWRadAtm','pptrate','spechum','SWRadAtm','windspd','all']
for v in metsim_vars:
    ! rm -rf /glade/work/ashleyvb/CAMELs/summa_camels/.pysumma 
    file_manager = folders+'/file_manager_metsim_' + v +'.txt'
    camels = ps.Distributed(executable, file_manager, num_workers=NCORES, chunk_size=CHUNK, client=client)   
    camels.run('local')
    #all_status = [(n, s.status) for n, s in camels.simulations.items()] #if want to look at status if has errors
    all_ds = [s.output.load() for n, s in camels.simulations.items()] #load it into memory so faster    
    hru_vars = [] # variables that have hru dimension
    gru_vars = [] # variables that have gru dimension
    for ds in all_ds:
        for name, var in ds.variables.items():
            if 'hru' in var.dims:
                hru_vars.append(name)
            elif 'gru' in var.dims:
                gru_vars.append(name)
    hru_ds = [ds[hru_vars] for ds in all_ds]
    gru_ds = [ds[gru_vars] for ds in all_ds]
    hru_merged = xr.concat(hru_ds, dim='hru')
    gru_merged = xr.concat(gru_ds, dim='gru')
    hru_merged.to_netcdf(folders+'/output/merged_day/NLDASmetsim_' + v +'_hru.nc')
    gru_merged.to_netcdf(folders+'/output/merged_day/NLDASmetsim_' + v +'_gru.nc')
    del camels
    del all_ds 
    del hru_merged
    del gru_merged
    print(v)

<br>

# Manipulating the Configuration of the pySUMMA Objects

Currrently, none of the model decisions (or parameters) can be altered in a `Distributed` object. 
However, if we switch to `Simulation` objects and use the `Ensemble` class, we can run suites of different model configurations with relative ease. 
This code would take a long time to run on all 671 basins, so we throw an error if you try to run it with more than 10 basins (but theoretically you could run it with as many as you want). 
Above is a good stopping point for this notebook if you plan on running the entire dataset, and then you can move on to running the notebook `camels_analyze_entire_output.ipynb`.

In [None]:
if len(the_hru) >10: raise SystemExit("Stop right there!")

<br>
    
Since we must have a small subset of basins, we will proceed with the ensemble calculations.
Afterwards, you can run the notebook `camels_analyze_subset_output.ipynb`.
Before running the ensemble though, change to a simulation re-write the original simulation's configuration.

In [None]:
file_manager = folders+'/file_manager_truth.txt'
s = ps.Simulation(executable, file_manager)
s._write_configuration()

<br>

The configurations follow the exploration of [this paper.](https://doi.org/10.1002/2015WR017200)

Clark, M.P., Nijssen, B., Lundquist, J.D., Kavetski, D., Rupp, D.E., Woods, R.A., Freer, J.E., Gutmann, E.D., Wood, A.W., Gochis, D.J. and Rasmussen, R.M., 2015. A unified approach for process‐based hydrologic modeling: 2. Model implementation and case studies. Water Resources Research, 51(4), pp.2515-2542.

Of the model configurations discussed in this paper, the decisions that made the most difference are:

 - `groundwatr` choice of groundwater parameterization as:
   - `qTopmodl` the topmodel parameterization (note must set hc_profile = pow_prof and bcLowrSoiH = zeroFlux
   - `bigBuckt` a big bucket (lumped aquifer model) in between the other two choices for complexity
   - `noXplict` no explicit groundwater parameterization
 - `snowIncept` choice of parameterization for snow interception as:
   - `stickySnow` maximum interception capacity is an increasing function of temerature
   - `lightSnow` maximum interception capacity is an inverse function of new snow density
 - `windPrfile` choice of wind profile as
   - `exponential` an exponential wind profile extends to the surface
   - `logBelowCanopy` a logarithmic profile below the vegetation canopy

Choices `bigBuckt`, `lightSnow`, and `logBelowCanopy` are the defaults that we have run already. The paper showed choice of `groundwatr` affecting the timing of runoff and the magnitude of evapotranspiration, `snowIncept` affecting the magnitude canopy interception of snow, and `windPrfile` affecting the timing and magnitude of SWE, and latent and sensible heat. 

In [None]:
# qTopmodl vs bigBuckt groundwater only make difference in baseflow var, snowIncept and windPrfile makes a difference a few places

#alld = {'stomResist':np.array(['BallBerry','Jarvis']),'snowLayers':np.array(['jrdn1991','CLM_2010'])}
# Andrew recommended, layers doesn't seem to do anything with defaults

alld = {'groundwatr':np.array(['qTopmodl','bigBuckt']),'stomResist':np.array(['BallBerry','Jarvis']),'snowIncept':np.array(['stickySnow','lightSnow']),'windPrfile':np.array(['exponential','logBelowCanopy'])}
# according to paper

config = ps.ensemble.decision_product(alld)
param_ens = ps.Ensemble(executable, config, file_manager, num_workers=2, client=client) #8 eventually for 8 sims

<br>

Now we just do what we did before in the simulations previously, except here we merge with a new dimension of the the configuration decision identifier instead of by `hru` and `gru`. The ensemble uses `++` as a delimiter to create unique identifiers for each simulation in the enemble.

In [None]:
%%time
param_ens.run('local')
all_status = [(n, s.status) for n, s in param_ens.simulations.items()] #if want to look at status if has errors
all_ds = [s.output.load() for n, s in param_ens.simulations.items()] #load it into memory so faster  
all_name = [n for n, s in param_ens.simulations.items()]
all_merged = xr.concat(all_ds, pd.Index(all_name, name="decision"))
#all_merged.to_netcdf(folders+'/output/merged_day/NLDAStruth_configs.nc')
#del param_ens
#del all_ds 
#del all_merged
print(all_name)

In [None]:
### Plot cummulative
fig, axes = plt.subplots(nrows=14, ncols=1, figsize=(20, 40))
axes = axes.flatten()
axes[0].set_title('Cumulative')

variables = list(all_merged.variables.keys())[7:22]

#start =  24*5*30 #summer
start =  24*10*30 #winter
stop = start + 1*100*24 

truth_plt = all_merged.isel(hru=0, time=slice(start+90*24, stop+90*24)) #.cumsum(dim='time')
#truth_plt = all_merged.isel(hru=0).cumsum(dim='time')

for idx, var in enumerate(variables[0:14]):
    for i, dec in enumerate(all_name):    
        truth_plt[var].isel(decision=i).plot(ax=axes[idx],label=dec)
    axes[idx].set_title('') 
    axes[idx].set_ylabel(var)
    axes[idx].set_xlabel('D2ate')
plt.tight_layout()
plt.legend()

In [None]:
file_manager = folders+'/file_manager_truth.txt'
s = ps.Simulation(executable, file_manager)
s.manager['simStartTime'] = '1980-04-01 01:00'
s.manager['simEndTime'] = '1980-04-03 00:00'
#s.decisions['groundwatr'] = 'qTopmodl'
#s.decisions['hc_profile'] = 'pow_prof'
#s.decisions['bcLowrSoiH'] = 'zeroFlux'
#s.decisions['vegeParTbl'] = 'USGS'
#s.decisions['veg_traits'] = 'CM_QJRMS1988'
#s.decisions['LAI_method'] = 'specified'
print(s.decisions)

In [None]:
s.run('local', run_suffix='_default')
assert s.status == 'Success'

In [None]:
print(s.stderr)
print(s.stdout)