# Running the SUMMA setup
To begin, we have to regionalize the paths in the configuration files that SUMMA will use.
This is accomplished by running a shell command. This is done by starting a line with the `!` operator.
We simply run a script to complete the installation.
Then, we can import some basic libraries along with `pysumma`.
The `%pylab inline` magic command simply imports some standard scientific packages such as `numpy` and `matplotlib`.

In [1]:
top = '/glade/work/ashleyvb'
folder = top+'/CAMELs'
folders = folder+'/summa_camels'
! cd /glade/work/ashleyvb/CAMELs/summa_camels; ./install_local_setup.sh

First we check that we loaded the correct environment.

In [2]:
conda list pysumma

# packages in environment at /glade/work/ashleyvb/miniconda3/envs/pysumma:
#
# Name                    Version                   Build  Channel
pysumma                   2.0.0                     dev_0    <develop>

Note: you may need to restart the kernel to use updated packages.


<br>
Then we load the imports.

In [3]:
%pylab inline
import pysumma as ps
import xarray as xr
import pandas as pd
from dask_jobqueue import PBSCluster
from dask.distributed import Client

Populating the interactive namespace from numpy and matplotlib


In [4]:
NCORES=48
cluster = PBSCluster(n_workers = NCORES,
                     cores=NCORES,
                     processes=NCORES, 
                     memory="24GB",
                     project='UWAS0091',
                     queue='regular',
                     walltime='06:00:00')
client = Client(cluster)

In [6]:
# Check that have workers, do not run the rest of the cells until the workers show up. 
print(client)
!qstat

<Client: 'tcp://10.148.2.107:33971' processes=48 threads=48, memory=24.00 GB>
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
2424017.chadmin1  Jupyter          ashleyvb          00:00:09 R regular         
2424079.chadmin1  dask-worker      ashleyvb          00:01:39 R regular         


### <br>

# Interacting with SUMMA via the `Distributed` object

We are running a `Distributed` object, which has multiple `Simulation` objects inside, each corresponding to some spatial chunk. 
We need to do `rm -r /glade/work/ashleyvb/CAMELs/summa_camels/.pysumma/` to clear out the distributed folders every run so permissions do not get screwed up in the loops. 

<br>

## Instantiating a `Distributed` object

To set up a `Distributed` object you must supply several pieces of information. 
First, supply the SUMMA executable; this could be either the compiled executable on your local machine, or a docker image. 
The second piece of information is the path to the file manager, which we just created through the install script. 

In [7]:
executable = top+'/summa/bin/summa.exe'

In [8]:
file_manager = folders+'/file_manager.txt'
camels = ps.Distributed(executable, file_manager, num_workers=NCORES, chunk_size=8, client=client)

<br>

## Manipulating the configuration of the distributed object

Currrently, none of the attributes that can be changed in a SUMMA `Simulation` object cannot be altered in a `Distributed` object and the only one that can be viewed is the file manager. In the notebook that made the forcing files, we wrote file managers. 
To see a file manager, simply `print` it.

In [9]:
print(camels.manager) #possible days 1980-01-01 to 2018-12-31, we are running 1986-10-01 01:00 to 1991-10-02 0:00

'SUMMA_FILE_MANAGER_V1.0'    ! filemanager_version
'/glade/work/ashleyvb/CAMELs/summa_camels/settings.v1/'    ! settings_path
'/glade/work/ashleyvb/CAMELs/summa_camels/forcing/1hr/'    ! input_path
'/glade/work/ashleyvb/CAMELs/summa_camels/output/1hr/'    ! output_path
'modelDecisions.1hr.txt'    ! decisions_path
'[notUsed]'    ! meta_time
'[notUsed]'    ! meta_attr
'[notUsed]'    ! meta_type
'[notUsed]'    ! meta_force
'[notUsed]'    ! meta_localparam
'output_control2.txt'    ! output_control
'[notUsed]'    ! meta_localindex
'[notUsed]'    ! meta_basinparam
'[notUsed]'    ! meta_basinmvar
'../settings.v1/attributes.camels.v2.nc'    ! local_attributes
'../settings.v1/localParamInfo.txt'    ! local_param_info
'../settings.v1/basinParamInfo.txt'    ! basin_param_info
'../settings.v1/forcingFileList.1hr.txt'    ! forcing_file_list
'../settings.v1/coldState.8lyr.nc'    ! model_init_cond
'../settings.v1/trialParams.camels.v1.nc'    ! parameter_trial
'camels_1hr_'    ! output_prefix


<br>

## Running pySUMMA for all Forcing Files

We now run all 671 basins of pySUMMA for each set of forcing files. You can check how long it has been running by using the command `qstat -u <username>` in a terminal. Each run takes about 9 minutes. First, we start with the original NLDAS files, or the "truth run".

In [10]:
%%time
camels.run('local')
#all_status = [(n, s.status) for n, s in camels.simulations.items()] #if want to look at status if has errors
all_ds = [s.output.load() for n, s in camels.simulations.items()] #load it into memory so faster

CPU times: user 48.8 s, sys: 16.8 s, total: 1min 5s
Wall time: 9min 3s


We could just write it as several files instead of merging. However, if we want to merge, we can do the following.
First, detect automatically which vars have hru vs gru dimensions (depending on what we use for output, we may not have any gru):

In [11]:
hru_vars = [] # variables that have hru dimension
gru_vars = [] # variables that have gru dimension
for ds in all_ds:
    for name, var in ds.variables.items():
        if 'hru' in var.dims:
            hru_vars.append(name)
        elif 'gru' in var.dims:
            gru_vars.append(name)

Filter variables for merge, this takes seconds since we are running a limiited output.

In [12]:
%%time
hru_ds = [ds[hru_vars] for ds in all_ds]
gru_ds = [ds[gru_vars] for ds in all_ds]
hru_merged = xr.concat(hru_ds, dim='hru')
gru_merged = xr.concat(gru_ds, dim='gru')

CPU times: user 8.21 s, sys: 1.75 s, total: 9.95 s
Wall time: 9.42 s


In [13]:
print(hru_merged)

<xarray.Dataset>
Dimensions:                (hru: 671, time: 43848)
Coordinates:
  * time                   (time) datetime64[ns] 1986-10-01T00:59:59.999986592 ... 1991-10-02
  * hru                    (hru) int64 1 2 3 4 5 6 7 ... 666 667 668 669 670 671
Data variables:
    pptrate                (time, hru) float64 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
    airtemp                (time, hru) float64 290.6 293.8 292.6 ... 303.2 300.3
    spechum                (time, hru) float64 0.01139 0.0129 ... 0.006427
    windspd                (time, hru) float64 4.275 4.809 4.036 ... 6.271 6.169
    SWRadAtm               (time, hru) float64 0.0 0.0 0.0 ... 294.0 300.4 295.8
    LWRadAtm               (time, hru) float64 377.2 400.1 381.0 ... 329.1 319.0
    airpres                (time, hru) float64 9.743e+04 9.944e+04 ... 9.412e+04
    scalarCanopyWat        (time, hru) float64 0.0 0.0 0.0 ... 0.003957 0.003082
    scalarSWE              (time, hru) float64 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
    scala

In [14]:
%%time
hru_merged.to_netcdf(folders+'/output/merged_day/NLDAS_1hr_hru.nc')
gru_merged.to_netcdf(folders+'/output/merged_day/NLDAS_1hr_gru.nc')
del camels
del all_ds 
del hru_merged
del gru_merged

CPU times: user 9.22 s, sys: 6.31 s, total: 15.5 s
Wall time: 17.4 s


<br>
Here are the other runs, now as a loop. The processes are the same, but for clarity we will divide it into 2 loops, one for the constant forcings and one for the MetSim forcings. This will take about an hour for each loop. We delete stuff after every run to reduce memory needs.

In [18]:
# Set forcings to hold at constant or MetSim and create dictionaries
constant_vars= ['airpres','airtemp','LWRadAtm','pptrate','spechum','SWRadAtm','windspd','all']
metsim_vars  = ['airpres','airtemp','LWRadAtm','pptrate','spechum','SWRadAtm','windspd','all']

In [19]:
%%time
# Constant
for v in constant_vars:
    ! rm -rf /glade/work/ashleyvb/CAMELs/summa_camels/.pysumma 
    file_manager = folders+'/file_manager_constant_' + v +'.txt'
    camels = ps.Distributed(executable, file_manager, num_workers=NCORES, chunk_size=8, client=client)   
    camels.run('local')
    #all_status = [(n, s.status) for n, s in camels.simulations.items()] #if want to look at status if has errors
    all_ds = [s.output.load() for n, s in camels.simulations.items()] #load it into memory so faster    
    hru_vars = [] # variables that have hru dimension
    gru_vars = [] # variables that have gru dimension
    for ds in all_ds:
        for name, var in ds.variables.items():
            if 'hru' in var.dims:
                hru_vars.append(name)
            elif 'gru' in var.dims:
                gru_vars.append(name)
    hru_ds = [ds[hru_vars] for ds in all_ds]
    gru_ds = [ds[gru_vars] for ds in all_ds]
    hru_merged = xr.concat(hru_ds, dim='hru')
    gru_merged = xr.concat(gru_ds, dim='gru')
    hru_merged.to_netcdf(folders+'/output/merged_day/NLDASconstant_' + v +'_hru.nc')
    gru_merged.to_netcdf(folders+'/output/merged_day/NLDASconstant_' + v +'_gru.nc')
    del camels
    del all_ds 
    del hru_merged
    del gru_merged
    print(v)

airpres
CPU times: user 1min 7s, sys: 21.5 s, total: 1min 29s
Wall time: 8min 36s


In [17]:
%%time
# Metsim
for v in metsim_vars:
    ! rm -rf /glade/work/ashleyvb/CAMELs/summa_camels/.pysumma 
    file_manager = folders+'/file_manager_metsim_' + v +'.txt'
    camels = ps.Distributed(executable, file_manager, num_workers=NCORES, chunk_size=8, client=client)   
    camels.run('local')
    #all_status = [(n, s.status) for n, s in camels.simulations.items()] #if want to look at status if has errors
    all_ds = [s.output.load() for n, s in camels.simulations.items()] #load it into memory so faster    
    hru_vars = [] # variables that have hru dimension
    gru_vars = [] # variables that have gru dimension
    for ds in all_ds:
        for name, var in ds.variables.items():
            if 'hru' in var.dims:
                hru_vars.append(name)
            elif 'gru' in var.dims:
                gru_vars.append(name)
    hru_ds = [ds[hru_vars] for ds in all_ds]
    gru_ds = [ds[gru_vars] for ds in all_ds]
    hru_merged = xr.concat(hru_ds, dim='hru')
    gru_merged = xr.concat(gru_ds, dim='gru')
    hru_merged.to_netcdf(folders+'/output/merged_day/NLDASmetsim_' + v +'_hru.nc')
    gru_merged.to_netcdf(folders+'/output/merged_day/NLDASmetsim_' + v +'_gru.nc')
    del camels
    del all_ds 
    del hru_merged
    del gru_merged
    print(v)

airpres
airtemp
LWRadAtm
pptrate
spechum
SWRadAtm
windspd
all
CPU times: user 8min 56s, sys: 2min 54s, total: 11min 51s
Wall time: 1h 9min 49s
