# Running the SUMMA setup
To begin, we have to regionalize the paths in the configuration files that SUMMA will use.
This is accomplished by running a shell command. This is done by starting a line with the `!` operator.
We simply run a script to complete the installation.
Then, we can import some basic libraries along with `pysumma`.
The `%pylab inline` magic command simply imports some standard scientific packages such as `numpy` and `matplotlib`.

In [1]:
top = '/glade/work/ashleyvb'
folder = top+'/CAMELs'
folders = folder+'/summa_camels'
! cd /glade/work/ashleyvb/CAMELs/summa_camels; ./install_local_setup.sh

First we check that we loaded the correct environment.

In [2]:
conda list pysumma

# packages in environment at /glade/work/ashleyvb/miniconda3/envs/pysumma:
#
# Name                    Version                   Build  Channel
pysumma                   2.0.0                     dev_0    <develop>

Note: you may need to restart the kernel to use updated packages.


<br>
Then we load the imports.

In [3]:
%pylab inline
import pysumma as ps
import xarray as xr
import pandas as pd
from dask_jobqueue import PBSCluster
from dask.distributed import Client

Populating the interactive namespace from numpy and matplotlib


In [17]:
NCORES=48
cluster = PBSCluster(n_workers = NCORES,
                     cores=NCORES,
                     processes=NCORES, 
                     memory="24GB",
                     project='UWAS0091',
                     queue='regular',
                     walltime='06:00:00')
client = Client(cluster)

Perhaps you already have a cluster running?
Hosting the HTTP server on port 36089 instead


In [20]:
# Check that have workers, do not run the rest of the cells until the workers show up. 
print(client)
!qstat

<Client: 'tcp://10.148.14.116:39883' processes=48 threads=48, memory=24.00 GB>
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
2864024.chadmin1  Jupyter          ashleyvb          00:23:18 R regular         
2865165.chadmin1  dask-worker      ashleyvb          00:02:36 R regular         


### <br>

# Interacting with SUMMA via the `Distributed` object

We are running a `Distributed` object, which has multiple `Simulation` objects inside, each corresponding to some spatial chunk. 
We need to do `rm -r /glade/work/ashleyvb/CAMELs/summa_camels/.pysumma/` to clear out the distributed folders every run so permissions do not get screwed up in the loops. 

<br>

## Instantiating a `Distributed` object

To set up a `Distributed` object you must supply several pieces of information. 
First, supply the SUMMA executable; this could be either the compiled executable on your local machine, or a docker image. 
The second piece of information is the path to the file manager, which we just created through the install script. 

In [21]:
executable = top+'/summa/bin/summa.exe'

In [22]:
file_manager = folders+'/file_manager.txt'
camels = ps.Distributed(executable, file_manager, num_workers=NCORES, chunk_size=8, client=client)

<br>

## Manipulating the configuration of the distributed object

Currrently, none of the attributes that can be changed in a SUMMA `Simulation` object cannot be altered in a `Distributed` object and the only one that can be viewed is the file manager. In the notebook that made the forcing files, we wrote file managers. 
To see a file manager, simply `print` it.

In [23]:
print(camels.manager) #possible days 1980-01-01 to 2018-12-31, we are running 1986-10-01 01:00 to 1991-10-02 0:00

'SUMMA_FILE_MANAGER_V1.0'    ! filemanager_version
'/glade/work/ashleyvb/CAMELs/summa_camels/settings.v1/'    ! settings_path
'/glade/work/ashleyvb/CAMELs/summa_camels/forcing/1hr/'    ! input_path
'/glade/work/ashleyvb/CAMELs/summa_camels/output/1hr/'    ! output_path
'modelDecisions.1hr.txt'    ! decisions_path
'[notUsed]'    ! meta_time
'[notUsed]'    ! meta_attr
'[notUsed]'    ! meta_type
'[notUsed]'    ! meta_force
'[notUsed]'    ! meta_localparam
'output_control2.txt'    ! output_control
'[notUsed]'    ! meta_localindex
'[notUsed]'    ! meta_basinparam
'[notUsed]'    ! meta_basinmvar
'../settings.v1/attributes.camels.v2.nc'    ! local_attributes
'../settings.v1/localParamInfo.txt'    ! local_param_info
'../settings.v1/basinParamInfo.txt'    ! basin_param_info
'../settings.v1/forcingFileList.1hr.txt'    ! forcing_file_list
'../settings.v1/coldState.8lyr.nc'    ! model_init_cond
'../settings.v1/trialParams.camels.v1.nc'    ! parameter_trial
'camels_1hr_'    ! output_prefix


<br>

## Running pySUMMA for all Forcing Files

We now run all 671 basins of pySUMMA for each set of forcing files. You can check how long it has been running by using the command `qstat -u <username>` in a terminal. Each run takes about 9 minutes. First, we start with the original NLDAS files, or the "truth run".

In [None]:
%%time
camels.run('local')
#all_status = [(n, s.status) for n, s in camels.simulations.items()] #if want to look at status if has errors
all_ds = [s.output.load() for n, s in camels.simulations.items()] #load it into memory so faster

We could just write it as several files instead of merging. However, if we want to merge, we can do the following.
First, detect automatically which vars have hru vs gru dimensions (depending on what we use for output, we may not have any gru):

In [None]:
hru_vars = [] # variables that have hru dimension
gru_vars = [] # variables that have gru dimension
for ds in all_ds:
    for name, var in ds.variables.items():
        if 'hru' in var.dims:
            hru_vars.append(name)
        elif 'gru' in var.dims:
            gru_vars.append(name)

Filter variables for merge, this takes seconds since we are running a limiited output.

In [None]:
%%time
hru_ds = [ds[hru_vars] for ds in all_ds]
gru_ds = [ds[gru_vars] for ds in all_ds]
hru_merged = xr.concat(hru_ds, dim='hru')
gru_merged = xr.concat(gru_ds, dim='gru')

In [None]:
print(hru_merged)

In [None]:
%%time
hru_merged.to_netcdf(folders+'/output/merged_day/NLDAS_1hr_hru.nc')
gru_merged.to_netcdf(folders+'/output/merged_day/NLDAS_1hr_gru.nc')
del camels
del all_ds 
del hru_merged
del gru_merged

<br>
Here are the other runs, now as a loop. The processes are the same, but for clarity we will divide it into 2 loops, one for the constant forcings and one for the MetSim forcings. This will take about an hour for each loop using all 671 basins. We delete stuff after every run to reduce memory needs.

In [24]:
%%time
# Constant
constant_vars= ['all'] #['airpres','airtemp','LWRadAtm','pptrate','spechum','SWRadAtm','windspd','all']
for v in constant_vars:
    ! rm -rf /glade/work/ashleyvb/CAMELs/summa_camels/.pysumma 
    file_manager = folders+'/file_manager_constant_' + v +'.txt'
    camels = ps.Distributed(executable, file_manager, num_workers=NCORES, chunk_size=8, client=client)   
    camels.run('local')
    #all_status = [(n, s.status) for n, s in camels.simulations.items()] #if want to look at status if has errors
    all_ds = [s.output.load() for n, s in camels.simulations.items()] #load it into memory so faster    
    hru_vars = [] # variables that have hru dimension
    gru_vars = [] # variables that have gru dimension
    for ds in all_ds:
        for name, var in ds.variables.items():
            if 'hru' in var.dims:
                hru_vars.append(name)
            elif 'gru' in var.dims:
                gru_vars.append(name)
    hru_ds = [ds[hru_vars] for ds in all_ds]
    gru_ds = [ds[gru_vars] for ds in all_ds]
    hru_merged = xr.concat(hru_ds, dim='hru')
    gru_merged = xr.concat(gru_ds, dim='gru')
    hru_merged.to_netcdf(folders+'/output/merged_day/NLDASconstant_' + v +'_hru.nc')
    gru_merged.to_netcdf(folders+'/output/merged_day/NLDASconstant_' + v +'_gru.nc')
    del camels
    del all_ds 
    del hru_merged
    del gru_merged
    print(v)

windspd


RuntimeError: There was an error during the simulation! Print the `stdout` and `stderr` for more information.

In [25]:
%%time
# Metsim
metsim_vars  = ['airpres','airtemp','LWRadAtm','pptrate','spechum','SWRadAtm','windspd','all']
for v in metsim_vars:
    ! rm -rf /glade/work/ashleyvb/CAMELs/summa_camels/.pysumma 
    file_manager = folders+'/file_manager_metsim_' + v +'.txt'
    camels = ps.Distributed(executable, file_manager, num_workers=NCORES, chunk_size=8, client=client)   
    camels.run('local')
    #all_status = [(n, s.status) for n, s in camels.simulations.items()] #if want to look at status if has errors
    all_ds = [s.output.load() for n, s in camels.simulations.items()] #load it into memory so faster    
    hru_vars = [] # variables that have hru dimension
    gru_vars = [] # variables that have gru dimension
    for ds in all_ds:
        for name, var in ds.variables.items():
            if 'hru' in var.dims:
                hru_vars.append(name)
            elif 'gru' in var.dims:
                gru_vars.append(name)
    hru_ds = [ds[hru_vars] for ds in all_ds]
    gru_ds = [ds[gru_vars] for ds in all_ds]
    hru_merged = xr.concat(hru_ds, dim='hru')
    gru_merged = xr.concat(gru_ds, dim='gru')
    hru_merged.to_netcdf(folders+'/output/merged_day/NLDASmetsim_' + v +'_hru.nc')
    gru_merged.to_netcdf(folders+'/output/merged_day/NLDASmetsim_' + v +'_gru.nc')
    del camels
    del all_ds 
    del hru_merged
    del gru_merged
    print(v)

airpres
airtemp
pptrate
spechum
SWRadAtm
windspd


RuntimeError: There was an error during the simulation! Print the `stdout` and `stderr` for more information.

In [26]:
# Here is the code to run a smaller test set that is not in distributed mode if you have errors and want to see them.
s = ps.Simulation(executable, file_manager)
s.decisions['simulStart'] = '1980-04-02 00:00'
s.decisions['simulFinsh'] = '1980-04-02 23:00'
print(s.decisions)
s.run('local', run_suffix='_default')
assert s.status == 'Success'

simulStart    '1980-04-02 00:00'   ! simulation start time
simulFinsh    '1980-04-02 23:00'   ! simulation end time
tmZoneInfo    utcTime              ! time zone information
soilCatTbl    STAS                 ! soil-category dataset
vegeParTbl    MODIFIED_IGBP_MODIS_NOAH ! vegetation-category dataset
soilStress    NoahType             ! choice of function for the soil moisture control on stomatal resistance
stomResist    BallBerry            ! choice of function for stomatal resistance
num_method    itertive             ! choice of numerical method
fDerivMeth    analytic             ! choice of method to calculate flux derivatives
LAI_method    monTable             ! choice of method to determine LAI and SAI
f_Richards    mixdform             ! form of Richards equation
groundwatr    bigBuckt             ! choice of groundwater parameterization
hc_profile    constant             ! choice of hydraulic conductivity profile
bcUpprTdyn    nrg_flux             ! type of upper boundary cond

AssertionError: 

In [27]:
print(s.stderr)
print(s.stdout)

Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG
STOP 1

file_suffix is '_default'.
file_master is '/glade/work/ashleyvb/CAMELs/summa_camels/.pysumma/_default/file_manager_metsim_all.txt'.
decisions file =  /glade/work/ashleyvb/CAMELs/summa_camels/.pysumma/_default/settings.v1/modelDecisions.1hr.txt
   1 simulStart: 1980-04-02 00:00
   2 simulFinsh: 1980-04-02 23:00
   3 tmZoneInfo: utcTime
   4 soilCatTbl: STAS
   5 vegeParTbl: MODIFIED_IGBP_MODIS_NOAH
   6 soilStress: NoahType
   7 stomResist: BallBerry
   8 num_method: itertive
   9 fDerivMeth: analytic
  10 LAI_method: monTable
  11 f_Richards: mixdform
  12 groundwatr: bigBuckt
  13 hc_profile: constant
  14 bcUpprTdyn: nrg_flux
  15 bcLowrTdyn: zeroFlux
  16 bcUpprSoiH: liq_flux
  17 bcLowrSoiH: drainage
  18 veg_traits: Raupach_BLM1994
  19 canopyEmis: difTrans
  20 snowIncept: lightSnow
  21 windPrfile: logBelowCanopy
  22 astability: louisinv
  23 canopySrad: BeersLaw
  24 alb_method: conDecay
  