# Running the SUMMA setups on HPC
Here, we run SUMMA using pysumma, on the setups we created in the previous notebook.
This is the HPC version of the notebook and only runs the full problem, the most complex problem.
The last section of the notebook computes summary statistics of the output to be used in the next notebook. In this HPC notebook, you cannot see the summary statistics calculations as the code is on the HPC system.

The complexity choice is
 - 3)   `lhs_config_prob = 1`: 8 different configurations with the default and the exploration of the parameter space.


Less complex choices are available only in the local (non-HPC) version of the notebook, and not this notebook. These are, in order of increasing complexity: 
 - 1)   `default_prob = 1`: the "default" configuration with the "default" parameters. By "default" we mean whatever you chose in the summa setup files. 
 - 2a) `lhs_prob = 1`: the default configuration with exploration of the parameter space.
 - 2b) `config_prob = 1`: the default parameters with 8 different configurations (choices that have been seen to affect the model output in previous research) 
 

Eight iterations of each loop are run for each problem, to cover a truth run and the 7 forcings each held to a daily constant in turn.
 
You can make the problem run for fewer years to lower computational costs, changing `str(the_start)` and `str(the_end)`. You can test with as little as a day between `the_start` and `the_end`, but if you want to make the plots in this notebook and run the next notebook you will need >1 year (1 year is considered intialization period). 

<br>

### Make problem complexity choices here:
 
The only complexity choice you can make is to run a different length time-period. It is pre-populated to run 6 years as in the paper. You also need to choose how long the initialization period is for the error calculations. We suggest 183 to 365 days, with at least 1 more year of simulation. So for example, if you run 18 months of simulation you should choose your initialization to be 183 days.


In [1]:
default_prob = 0    #this should be 0 in this HPC notebook
lhs_prob = 0        #this should be 0 in this HPC notebook
config_prob = 0     #this should be 0 in this HPC notebook
lhs_config_prob = 1 #this should be 1 in this HPC notebook

In [2]:
the_start = '1990-10-01 00:00' #pre-populated to '1990-10-01 00:00' as in the paper.
the_end =   '1996-09-30 23:00' #pre-populated to '1996-09-30 23:00' as in the paper.
initialization_days = 365 #pre-populated to 365 days as in the paper.

<br>
Check that we loaded a correct environment. This should show pysumma version 3.0.

In [3]:
conda list summa

# packages in environment at /data/cigi/cjw-easybuild/conda/pysumma-2021-06:
#
# Name                    Version                   Build  Channel
pysumma                   3.0.3                    pypi_0    pypi

Note: you may need to restart the kernel to use updated packages.


<br>
Load the imports.

In [4]:
import numpy as np
import matplotlib.pyplot as plt
import pysumma as ps
import xarray as xr
import pandas as pd
import gc
import os

<br>
Set up the paths and regionalize the paths in the configuration files that SUMMA will use.

In [5]:
top_folder = os.path.join(os.getcwd(), 'summa_camels')
settings_folder = os.path.join(top_folder, 'settings')
ps_working = os.path.join(top_folder, '.pysumma')
regress_folder = os.path.join(os.getcwd(), 'regress_data')

In [6]:
! cd {top_folder}; chmod +x installTestCases_local.sh; ./installTestCases_local.sh

In [7]:
# get number of HRUs
attrib = xr.open_dataset(settings_folder+'/attributes.nc')
the_hru = np.array(attrib['hruId'])

<br>

# Setup HPC

This code sets up the HPC by setting the executable, writing the start and end files to the filemanager, and setting up workspaces on the HPC.

In [8]:
#define executable
executable =  '/usr/bin/summa.exe'

In [9]:
def replace_line_startwith(lines, flag, new_line_replacement):
    for i in range(len(lines)):
        if lines[i].startswith(flag):
            print(lines[i])
            lines[i] = new_line_replacement
    return lines
temple_filemanager = os.path.join(settings_folder, "template_file_manager.1hr.txt")
print(temple_filemanager)
with open(temple_filemanager, "r") as f:
    lines = f.readlines()
    new_lines1 = replace_line_startwith(lines, "simStartTime", "simStartTime         '{the_start}' !\n".format(the_start=the_start))
    new_lines2 = replace_line_startwith(new_lines1, "simEndTime", "simEndTime           '{the_end}' !\n".format(the_end=the_end))
f = open(os.path.join(settings_folder, "template_file_manager.1hr.txt"), "w")
#f.writelines(new_lines1)
f.writelines(new_lines2)
f.close()

/home/jovyan/work/summa_analyze_forcing_disagg/hpc/summa_camels/settings/template_file_manager.1hr.txt
simStartTime         '1990-10-01 00:00' !

simEndTime           '1996-09-30 23:00' !



In [10]:
import json
class NumpyEncoder(json.JSONEncoder):
    """
    Credit:
    https://stackoverflow.com/questions/26646362/numpy-array-is-not-json-serializable
    """
    def default(self, obj):
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        if isinstance(obj, np.floating):
            return float(obj)
        if isinstance(obj, np.integer):
            return int(obj)
        return json.JSONEncoder.default(self, obj)

In [11]:
import tempfile
import shutil, os
!rm -rf summa_camels.zip
workspace_dir = os.path.join(os.getcwd(), 'workspace')
!mkdir -p {workspace_dir}
unzip_dir = tempfile.mkdtemp(dir=workspace_dir)
model_folder_name = "summa_camels"
model_folder = os.path.join(unzip_dir, model_folder_name)
shutil.make_archive(model_folder_name, 'zip', os.getcwd()+"/summa_camels")

'/home/jovyan/work/summa_analyze_forcing_disagg/hpc/summa_camels.zip'

<br>

# Exploring the Parameter Calibration Space with a Latin Hypercube

Here we make parameter sets selected by using a Latin Hypercube to get 10 different parameter sets for every HRU, in order to explore the calibration space. The default parameter space is still included. This exploration will show us if the results of forcing importance could change after calibration. 

We change only the parameters that are usually calibrated. You can remove parameters if you were not planning to ever calibrate them away from their defaults (likewise you could add parameters).

The absolute minimums and maximums will break simulations and zero out variables, so we do not use those, we stay at the 5% level away from the extremes. Also, there are some constraints on the parameters that must be followed, they are:

* heightCanopyTop   > heightCanopyBottom
* critSoilTranspire > theta_res
* theta_sat         > critSoilTranspire
* fieldCapacity     > theta_res
* theta_sat         > fieldCapacity
* theta_sat         > theta_res
* critSoilTranspire > critSoilWilting
* critSoilWilting   > theta_res

In [12]:
if lhs_prob==1 or lhs_config_prob==1: from pyDOE import lhs

In [13]:
if lhs_prob==1 or lhs_config_prob==1:
    file_manager = settings_folder+'/file_manager_truth.txt'
    s = ps.Simulation(executable, file_manager)
    s.manager['simStartTime'] = str(the_start)
    s.manager['simEndTime'] = str(the_end)  
    #Before running the ensemble that changes parameters we must write the original simulation's parameters.
    s.manager.write()

In [14]:
# print the default, min, and max as in /settings.v1/localParamInfo.txt and /settings.v1/basinParamInfo.txt
if lhs_prob==1 or lhs_config_prob==1:
    param_calib_hru = ['albedoRefresh', 'aquiferBaseflowExp', 'aquiferBaseflowRate', 'frozenPrecipMultip', 'heightCanopyBottom','heightCanopyTop', 'k_macropore', 
                   'k_soil', 'qSurfScale', 'summerLAI', 'tempCritRain', 'theta_sat', 'windReductionParam'] 
    param_calib_gru = ['routingGammaScale', 'routingGammaShape']

    for k in param_calib_hru:
        print(s.global_hru_params[k])
    for k in param_calib_gru:
        print(s.global_gru_params[k]) 

albedoRefresh             |       1.0000 |       1.0000 |      10.0000
aquiferBaseflowExp        |       2.0000 |       1.0000 |      10.0000
aquiferBaseflowRate       |       0.1000 |       0.0000 |       0.1000
frozenPrecipMultip        |       1.0000 |       0.5000 |       1.5000
heightCanopyBottom        |       2.0000 |       0.0000 |       5.0000
heightCanopyTop           |      20.0000 |       0.0500 |     100.0000
k_macropore               |       0.0010 |       1.0d-7 |       0.1000
k_soil                    |       7.5d-6 |       1.0d-7 |       1.0d-5
qSurfScale                |      50.0000 |       1.0000 |     100.0000
summerLAI                 |       3.0000 |       0.0100 |      10.0000
tempCritRain              |     273.1600 |     272.1600 |     274.1600
theta_sat                 |       0.5500 |       0.3000 |       0.6000
windReductionParam        |       0.2800 |       0.0000 |       1.0000
routingGammaScale         |       2.0d+4 |       1.0000 |       1.0d+5
routin

In [15]:
if lhs_prob==1 or lhs_config_prob==1:
    bounds_hru = np.full((len(param_calib_hru),3),1.0)
    bounds_gru = np.full((len(param_calib_gru),3),1.0)
    for i,k in enumerate(param_calib_hru): bounds_hru[i,]= s.global_hru_params.get_value(k)[0:3]
    for i,k in enumerate(param_calib_gru): bounds_gru[i,]= s.global_gru_params.get_value(k)[0:3]

In [16]:
# Define bounds and expand to size of LHS runs
if lhs_prob==1 or lhs_config_prob==1:
    numl = 10
    num_vars =  len(param_calib_hru) + len(param_calib_gru)
    names =  param_calib_hru + param_calib_gru
    bounds =  np.concatenate((bounds_hru, bounds_gru), axis=0)
    par_def = dict(zip(names, np.transpose(np.tile(bounds[:,0],(len(the_hru),1))) ))
    par_min = dict(zip(names, np.transpose(np.tile(bounds[:,1],(numl*len(the_hru),1))) ))
    par_max = dict(zip(names, np.transpose(np.tile(bounds[:,2],(numl*len(the_hru),1))) ))

<br>
We remove geographically distributed parameters from the default set, and then make the LHS parameter set plus deault from the above bounds. Set 0 will be the default parameter set.

In [17]:
# remove geographically distributed parameters from default set
if lhs_prob==1 or lhs_config_prob==1:
    distributed_val = par_def.pop('heightCanopyBottom')
    distributed_val = par_def.pop('heightCanopyTop')
    distributed_val = par_def.pop('k_soil')
    distributed_val = par_def.pop('theta_sat')

In [18]:
# Make sure to obey parameter constraints
if lhs_prob==1 or lhs_config_prob==1:
    param = xr.open_dataset(settings_folder+'/parameters.nc')

    for i,h in enumerate(the_hru):
        lb_theta_sat = max(param[['critSoilTranspire','fieldCapacity','theta_res']].isel(hru=i).values()).values
        for j in range(0,numl): #say first numl belong to hru 0, second numl to hru 1, and so on
            if (par_min['theta_sat'][j + i*numl]<lb_theta_sat): par_min['theta_sat'][j + i*numl]=lb_theta_sat

    par_min['heightCanopyTop'] = par_max['heightCanopyBottom']

In [19]:
# Add a 5% buffer
if lhs_prob==1 or lhs_config_prob==1:
    buff = {key: (par_max.get(key) - par_min.get(key))*0.05 for key in set(par_max) }
    par_min = {key: par_min.get(key) + buff.get(key)*0.05 for key in set(buff) }
    par_max = {key: par_max.get(key) - buff.get(key)*0.05 for key in set(buff) }

In [20]:
#Generate samples with Latin Hypercube Sampling, set seed by HRU ID so it is the same every time run
if lhs_prob==1 or lhs_config_prob==1:
    lhd = np.empty(shape=(num_vars,numl*len(the_hru)))
    for i, h in enumerate(the_hru):
        np.random.seed(h) #if the hru ID is not a number this will not work
        lhd[:,range(i*numl,(i+1)*numl)] = lhs(numl, samples=num_vars)
    lhd = dict(zip(names,lhd))
    samples = {key: par_min.get(key) + lhd.get(key)*(par_max.get(key) - par_min.get(key)) for key in set(par_max) }

In [21]:
# make parameter sets
if lhs_prob==1 or lhs_config_prob==1:
    latin = {}
    latin[str(0)] = {'trial_parameters': {key: par_def.get(key) for key in set(par_def) }}
    for j in range(0,numl):
            latin[str(j+1)] = {'trial_parameters': {key: samples.get(key)[np.arange(j, len(the_hru)*numl, numl)] for key in set(samples) }}

<br>

# Manipulating the Configuration of the pysumma Objects

We need to run the parameter space with other model configurations, to see if the results seen on the default configuration hold true across the parameter space. The new configurations follow the exploration of [this paper.](https://doi.org/10.1002/2015WR017200).

Clark, M.P., Nijssen, B., Lundquist, J.D., Kavetski, D., Rupp, D.E., Woods, R.A., Freer, J.E., Gutmann, E.D., Wood, A.W., Gochis, D.J. and Rasmussen, R.M., 2015. A unified approach for process‐based hydrologic modeling: 2. Model implementation and case studies. Water Resources Research, 51(4), pp.2515-2542.

Of the model configurations discussed in this paper, the decisions that made the most difference are:

 - `groundwatr` choice of groundwater parameterization as:
   - `qTopmodl` the topmodel parameterization (note must set hc_profile = pow_prof and bcLowrSoiH = zeroFlux
   - `bigBuckt` a big bucket (lumped aquifer model) in between the other two choices for complexity
   - `noXplict` no explicit groundwater parameterization
 - `stomResist` choice of function for stomatal resistance as:
   - `BallBerry` Ball-Berry (1987) parameterization of physiological factors controlling transpiration
   - `Jarvis` Jarvis (1976) parameterization of physiological factors controlling transpiration
   - `simpleResistance` parameterized solely as a function of soil moisture limitations
 - `snowIncept` choice of parameterization for snow interception as:
   - `stickySnow` maximum interception capacity is an increasing function of temperature
   - `lightSnow` maximum interception capacity is an inverse function of new snow density
 - `windPrfile` choice of wind profile as:
   - `exponential` an exponential wind profile extends to the surface
   - `logBelowCanopy` a logarithmic profile below the vegetation canopy

Choices `bigBuckt`, `BallBerry`, `lightSnow`, and `logBelowCanopy` are the defaults that we have run already (see the decisions printed out in the previous cell). The paper showed choice of `groundwatr` affecting the timing of runoff and the magnitude of evapotranspiration, `stomResist` affecting timing and magnitude of evapotranspiration, `snowIncept` affecting the magnitude canopy interception of snow, and `windPrfile` affecting the timing and magnitude of SWE, and latent and sensible heat. We will not explore the `groundwatr` configurations here as the differences show up only in most simulated variables post-calibration. This study does not examine models calibrated for every set up of forcings configurations. Note, if you want to look at `qTopmodl`, you must set  `bcLowrSoiH` to `zeroFlux` (we will leave it at `drainage`) and `hc_profile` to `pow_prof` (we will leave it at `constant`). You can add other configuration choices here; this notebook and the next notebook will work properly (but it will make the computations take longer).

We make use of pysumma `Simulation` objects and the `Ensemble` class, to set up suites of different model decisions (and add to this the different sets of parameters in the next section).

In [22]:
if config_prob or lhs_config_prob==1:
    file_manager = settings_folder+'/file_manager_truth.txt'
    s = ps.Simulation(executable, file_manager)
    s.manager['simStartTime'] = str(the_start)
    s.manager['simEndTime'] = str(the_end)  
    #Before running the ensemble that changes configuration we must write the original simulation's configuration.
    s.manager.write()    
    print(s.decisions)

soilCatTbl    STAS                 ! soil-category dataset
vegeParTbl    MODIFIED_IGBP_MODIS_NOAH ! vegetation-category dataset
soilStress    NoahType             ! choice of function for the soil moisture control on stomatal resistance
stomResist    BallBerry            ! choice of function for stomatal resistance
num_method    itertive             ! choice of numerical method
fDerivMeth    analytic             ! choice of method to calculate flux derivatives
LAI_method    specified            ! choice of method to determine LAI and SAI
f_Richards    mixdform             ! form of Richards equation
groundwatr    bigBuckt             ! choice of groundwater parameterization
hc_profile    constant             ! choice of hydraulic conductivity profile
bcUpprTdyn    nrg_flux             ! type of upper boundary condition for thermodynamics
bcLowrTdyn    zeroFlux             ! type of lower boundary condition for thermodynamics
bcUpprSoiH    liq_flux             ! type of upper boundary c

In [23]:
if config_prob or lhs_config_prob==1:
    #alld = {'groundwatr':np.array(['qTopmodl','bigBuckt']),
    alld = {'stomResist':np.sort(np.array(['BallBerry','Jarvis'])),
            'snowIncept':np.sort(np.array(['stickySnow', 'lightSnow'])),
            'windPrfile':np.sort(np.array(['exponential','logBelowCanopy']))}
    config = ps.ensemble.decision_product(alld)

<br>
The ensemble uses `++` as a delimiter to create unique identifiers for each simulation in the ensemble. The default configuration will be run again. We do this so that each finished SUMMA *.nc output file is complete.

<br>

# Run the Full Problem

Now we can use the following code to make the full problem, exploring the parameter space and the configurations together. This is the problem that is run on the HPC.
The next notebook in the series runs the most complete figures using this full problem.
First, we combine the decision sets. 

In [24]:
# make ensembles with parameter space (numl parameter sets plus 1 for default), should make 88
if lhs_config_prob==1:
    config_latin = {}
    for key_config in config.keys():
        c = config[key_config]
        for key_latin in latin.keys():
            l = latin[key_latin]
            config_latin[key_config+key_latin] = {**c,**l}
    print(len(config_latin))

88


In [25]:
config_latin1 = json.dumps(config_latin, cls=NumpyEncoder)
config_latin = json.loads(config_latin1)
list(config_latin.items())[:2]

[('++BallBerry++lightSnow++exponential++0',
  {'decisions': {'stomResist': 'BallBerry',
    'snowIncept': 'lightSnow',
    'windPrfile': 'exponential'},
   'trial_parameters': {'routingGammaScale': [20000.0,
     20000.0,
     20000.0,
     20000.0],
    'frozenPrecipMultip': [1.0, 1.0, 1.0, 1.0],
    'routingGammaShape': [2.5, 2.5, 2.5, 2.5],
    'tempCritRain': [273.16, 273.16, 273.16, 273.16],
    'aquiferBaseflowExp': [2.0, 2.0, 2.0, 2.0],
    'k_macropore': [0.001, 0.001, 0.001, 0.001],
    'qSurfScale': [50.0, 50.0, 50.0, 50.0],
    'windReductionParam': [0.28, 0.28, 0.28, 0.28],
    'aquiferBaseflowRate': [0.1, 0.1, 0.1, 0.1],
    'summerLAI': [3.0, 3.0, 3.0, 3.0],
    'albedoRefresh': [1.0, 1.0, 1.0, 1.0]}}),
 ('++BallBerry++lightSnow++exponential++1',
  {'decisions': {'stomResist': 'BallBerry',
    'snowIncept': 'lightSnow',
    'windPrfile': 'exponential'},
   'trial_parameters': {'routingGammaScale': [91789.26975278289,
     96797.28897945539,
     37295.97265782523,
     84

<br>

### HPC setup

In [26]:
constant_vars= ["truth",
                'constant_airpres','constant_airtemp','constant_LWRadAtm',
                'constant_pptrate','constant_spechum','constant_SWRadAtm',
                'constant_windspd', ]
config = {}
idx = 0 
for i, v in enumerate(constant_vars):
    if v == "truth":
        key = v
    else:
        key = 'run'+str(idx)
        idx+=1
    value = {'file_manager': '<PWD>/settings/file_manager_' + v +'.txt'}
    config[key] = value
config

{'truth': {'file_manager': '<PWD>/settings/file_manager_truth.txt'},
 'run0': {'file_manager': '<PWD>/settings/file_manager_constant_airpres.txt'},
 'run1': {'file_manager': '<PWD>/settings/file_manager_constant_airtemp.txt'},
 'run2': {'file_manager': '<PWD>/settings/file_manager_constant_LWRadAtm.txt'},
 'run3': {'file_manager': '<PWD>/settings/file_manager_constant_pptrate.txt'},
 'run4': {'file_manager': '<PWD>/settings/file_manager_constant_spechum.txt'},
 'run5': {'file_manager': '<PWD>/settings/file_manager_constant_SWRadAtm.txt'},
 'run6': {'file_manager': '<PWD>/settings/file_manager_constant_windspd.txt'}}

In [27]:
config_3 = {}
for k, v in config.items():
    for k2, v2 in config_latin.items():
        v_copy = v.copy()
        v_copy.update(v2)
        config_3["{}_{}".format(k, k2)] = v_copy
len(config_3)

704

In [28]:
config_3 = json.dumps(config_3, cls=NumpyEncoder)
config_3 = json.loads(config_3)
list(config_3.items())[:1]

[('truth_++BallBerry++lightSnow++exponential++0',
  {'file_manager': '<PWD>/settings/file_manager_truth.txt',
   'decisions': {'stomResist': 'BallBerry',
    'snowIncept': 'lightSnow',
    'windPrfile': 'exponential'},
   'trial_parameters': {'routingGammaScale': [20000.0,
     20000.0,
     20000.0,
     20000.0],
    'frozenPrecipMultip': [1.0, 1.0, 1.0, 1.0],
    'routingGammaShape': [2.5, 2.5, 2.5, 2.5],
    'tempCritRain': [273.16, 273.16, 273.16, 273.16],
    'aquiferBaseflowExp': [2.0, 2.0, 2.0, 2.0],
    'k_macropore': [0.001, 0.001, 0.001, 0.001],
    'qSurfScale': [50.0, 50.0, 50.0, 50.0],
    'windReductionParam': [0.28, 0.28, 0.28, 0.28],
    'aquiferBaseflowRate': [0.1, 0.1, 0.1, 0.1],
    'summerLAI': [3.0, 3.0, 3.0, 3.0],
    'albedoRefresh': [1.0, 1.0, 1.0, 1.0]}})]

In [29]:
list(config_3.items())[-1]

('run6_++Jarvis++stickySnow++logBelowCanopy++10',
 {'file_manager': '<PWD>/settings/file_manager_constant_windspd.txt',
  'decisions': {'stomResist': 'Jarvis',
   'snowIncept': 'stickySnow',
   'windPrfile': 'logBelowCanopy'},
  'trial_parameters': {'routingGammaScale': [83207.19877638166,
    49491.07921518387,
    20889.472883106533,
    92847.62412584186],
   'frozenPrecipMultip': [1.4941952514849934,
    1.2748486107760686,
    0.5888941748010913,
    0.7699937851116837],
   'routingGammaShape': [2.4588535985052156,
    2.8862559247192694,
    2.3650149949138237,
    2.6095845423808197],
   'tempCritRain': [272.4013352030601,
    273.2630183365836,
    273.60907198223714,
    272.4881048005051],
   'aquiferBaseflowExp': [3.3185877636321988,
    4.103108209046647,
    8.969392915868646,
    5.342838448673438],
   'k_macropore': [0.07614168184790411,
    0.022773508289630037,
    0.046074219476830214,
    0.059670613437985257],
   'qSurfScale': [37.84599322964796,
    67.090573809382

In [30]:
if lhs_config_prob==1:
    import tempfile
    import shutil, os
    workspace_dir = os.path.join(os.getcwd(), 'workspace')
    !mkdir -p {workspace_dir}
    unzip_dir = tempfile.mkdtemp(dir=workspace_dir)
    model_folder_name = "summa_camels"
    model_folder = os.path.join(unzip_dir, model_folder_name)
    !unzip -o {model_folder_name}.zip -d {model_folder}
    !rm -rf {model_folder}/output
    !mkdir {model_folder}/output
    !mkdir {model_folder}/output/constant
    !mkdir {model_folder}/output/merged_day
    !mkdir {model_folder}/output/truth
    !mkdir {model_folder}/output/regress_data
    with open(os.path.join(model_folder, "output/regress_data/regress_param.json"), 'w') as f:
        json.dump({"initialization_days": initialization_days}, f)

Archive:  summa_camels.zip
   creating: /home/jovyan/work/summa_analyze_forcing_disagg/hpc/workspace/tmpcd_epm03/summa_camels/data/
   creating: /home/jovyan/work/summa_analyze_forcing_disagg/hpc/workspace/tmpcd_epm03/summa_camels/output/
   creating: /home/jovyan/work/summa_analyze_forcing_disagg/hpc/workspace/tmpcd_epm03/summa_camels/settings/
  inflating: /home/jovyan/work/summa_analyze_forcing_disagg/hpc/workspace/tmpcd_epm03/summa_camels/.DS_Store  
  inflating: /home/jovyan/work/summa_analyze_forcing_disagg/hpc/workspace/tmpcd_epm03/summa_camels/installTestCases_local.sh  
  inflating: /home/jovyan/work/summa_analyze_forcing_disagg/hpc/workspace/tmpcd_epm03/summa_camels/README  
   creating: /home/jovyan/work/summa_analyze_forcing_disagg/hpc/workspace/tmpcd_epm03/summa_camels/data/1hr/
   creating: /home/jovyan/work/summa_analyze_forcing_disagg/hpc/workspace/tmpcd_epm03/summa_camels/data/constant/
   creating: /home/jovyan/work/summa_analyze_forcing_disagg/hpc/workspace/tmpcd_epm

In [31]:
if lhs_config_prob==1:
    import json
    with open(os.path.join(model_folder, 'summa_options.json'), 'w') as outfile:
        json.dump(config_3, outfile)

    # check ensemble parameters    
    print("Number of ensemble runs: {}".format(len(config_3)))
    print(json.dumps(config_3, indent=4, sort_keys=True)[:800])
    print("...")

Number of ensemble runs: 704
{
    "run0_++BallBerry++lightSnow++exponential++0": {
        "decisions": {
            "snowIncept": "lightSnow",
            "stomResist": "BallBerry",
            "windPrfile": "exponential"
        },
        "file_manager": "<PWD>/settings/file_manager_constant_airpres.txt",
        "trial_parameters": {
            "albedoRefresh": [
                1.0,
                1.0,
                1.0,
                1.0
            ],
            "aquiferBaseflowExp": [
                2.0,
                2.0,
                2.0,
                2.0
            ],
            "aquiferBaseflowRate": [
                0.1,
                0.1,
                0.1,
                0.1
            ],
            "frozenPrecipMultip": [
                1.0,
                1.0,
           
...


<br>

### Submit model to HPC using New Job Submission Service Python Client

<div class="alert alert-block alert-info">
<b> Start HPC use  </b> (config_latin) config_latin SUMMA runs with each forcing </div>

In [32]:
if lhs_config_prob==1:
    from job_supervisor_client import *
    communitySummaSession = Session('summa', isJupyter=True)
    communitySummaJob = communitySummaSession.job() # create new job
    communitySummaJob.upload(model_folder)

In [33]:
if lhs_config_prob==1:
    communitySummaJob.submit(payload={
        "node": 256,
        "machine": "expanse",
        "file_manager_rel_path": "settings/file_manager_constant_airpres.txt"
    })

✅ job registered with ID: 1631202108zkAg


In [34]:
%%time
if lhs_config_prob==1:
    communitySummaJob.events(liveOutput=True)

📮Job ID: 1631202108zkAg
📍Destination: summa



types,message,time
JOB_QUEUED,"job [1631202108zkAg] is queued, waiting for registration",2021-09-09T15:41:47.799Z
JOB_REGISTERED,"job [1631202108zkAg] is registered with the supervisor, waiting for initialization",2021-09-09T15:41:49.622Z
SUMMA_HPC_CONNECTED,connected to HPC,2021-09-09T15:45:48.670Z
SUMMA_HPC_SUBMITTED,submitted SUMMA job to HPC,2021-09-09T15:45:48.670Z
JOB_INITIALIZED,initialized SUMMA job in HPC job queue with remote_id 5702626,2021-09-09T15:45:48.670Z
JOB_STATUS,RUNNING,2021-09-09T15:45:54.162Z
JOB_STATUS,RUNNING,2021-09-09T15:46:05.186Z
JOB_STATUS,RUNNING,2021-09-09T15:46:15.205Z
JOB_STATUS,RUNNING,2021-09-09T15:46:24.140Z
JOB_STATUS,RUNNING,2021-09-09T15:46:34.128Z


CPU times: user 40.4 s, sys: 2.52 s, total: 42.9 s
Wall time: 35min 24s


* When outputs are very big, sometimes missing data happened. So you need to reexecute this cell again.-

In [35]:
%%time
if lhs_config_prob==1:
    job_dir = os.path.join(model_folder, "{}".format(communitySummaJob.id))
    !mkdir -p {job_dir}/output
    communitySummaJob.download(job_dir)

file successfully downloaded under: /home/jovyan/work/summa_analyze_forcing_disagg/hpc/workspace/tmpcd_epm03/summa_camels/1631202108zkAg/1631202108zkAg.zip
CPU times: user 48.3 ms, sys: 21.2 ms, total: 69.5 ms
Wall time: 966 ms


In [36]:
!cd {job_dir} && unzip -o *.zip -d output

Archive:  1631202108zkAg.zip
   creating: output/regress_data/
  inflating: output/regress_data/error_data_configs_latin.nc  
  inflating: output/regress_data/regress_param.json  
  inflating: output/slurm-5702626.out  


In [37]:
error_path = os.path.join(job_dir, "output/regress_data/error_data_configs_latin.nc")
print(error_path)
xr.open_dataset(error_path)

/home/jovyan/work/summa_analyze_forcing_disagg/hpc/workspace/tmpcd_epm03/summa_camels/1631202108zkAg/output/regress_data/error_data_configs_latin.nc


In [38]:
! mkdir -p ./regress_data
! cp {error_path} ./regress_data

<div class="alert alert-block alert-info">
<b> Finish HPC use  </b> (config_latin) config_latin SUMMA runs with each forcing </div>

<br>

# Compute Error on Output
We calculate KGE statistics on the data. KGE means perfect agreement if it is 1, and <0 means the mean is a better guess. We use a modified KGE that avoids the amplified simulated mean divided by truth mean values when is the truth mean is small, and avoids the dependence of the KGE metric on the units of measurement. Then, we scale the KGE so that the range is 1 to -1.
If the values are identical we use KGE of 1.
We also keep summaries of the raw data (summed over time). 
This can take some time depending on how big of a problem you ran. It takes about 1/100th of the time it took to run the whole problem. 


All KGE codes were moved over to the job submission service and will execute on HPC. the codes are here.

[https://github.com/cybergis/Jupyter-xsede/blob/comet2expanse/cybergis/summa.py#L218](https://github.com/cybergis/Jupyter-xsede/blob/comet2expanse/cybergis/summa.py#L218)