# 01 - Automated Testing

*James McCreight, September 2022*

---

The pynhm automated testing is the basis for continuous integration (CI). Coupled with good coverage, CI allows for peace of mind and more rapid and robust code development. The tests themselves also provide a window into how to use the code base in many cases. 

However, the main reason to start with testing as the first notebook (after establishing the pynhm environment) is that the test data are used as input to many of the examples that will follow. This notebook gives a quick overview of generating the test data and running the pynhm tests but does not go into detail on the contents of the tests. 

Automated testing is typically performed on the command line as shown here (though it could be done from within python) and this notebook is to be run in the bash kernel in the pynhm_nb conda environment installed in notebook 00.

The automated testing uses pytest. Pytest is an executable called from the command line which has many of its own options. Bear in mind that you can see the full listing of options by typing `pytest --help`. I will highlight several of these here that we will use below.


```
pytest --help

...
  --pdb                 start the interactive Python debugger on errors or KeyboardInterrupt.
  
...
  
  --capture=method      per-test capturing method: one of fd|sys|no|tee-sys.
  -s                    shortcut for --capture=no.
  
...
  
  -v, --verbose         increase verbosity.  
  
...

  -n numprocesses, --numprocesses=numprocesses
                        Shortcut for '--dist=load --tx=NUM*popen'. With 'auto', attempt to detect physical CPU
                        count. With 'logical', detect logical CPU count. If physical CPU count cannot be found,
                        falls back to logical count. This will be 0 when used with --pdb.
```

Pytest generally likes to suppress output to the terminal and keep reporting to a minimum. The assumption is that typically every tests passes. It will report what tests fail at the end of the test and those can be run individually with terminal output (`-s`), increased pytest verbosity (`-v`) and even interactive debugging (`--pdb`). The option to parallelize the tests it helpful as it can dramatically reduce wait time (`-n=auto`). 

## Requirements: pynhm_nb virtual env
The pynhm virtual environment was installed in notebook 00. You need this environment to proceed. __This notebook is to be run with a python kernel using the conda env: pynhm_nb.__ This means we'll pass python variables to bash cell magics below, but that seemed to be the most portable solution (on Windows).


## pynhm_root variable
Define the location of the pynhm repository. This should be the location you defined in notebook 00. 

In [1]:
pynhm_repo_root = '/Users/jamesmcc/usgs/pynhm'

## Run PRMS to generate test answers and pynhm inputs

By default, the "tests" which run PRMS to generate answers and inputs for pynhm run for all 3 test domains, unless otherwise specified. One can actually see options (specific to this conftest.py) in `pytest --help` output, under "custom options" as will be shown later in this notebook (for the pynhm tests).

The three test domains have their basic data in these folders:

In [2]:
%%bash -s "$pynhm_repo_root"
pynhm_repo_root=$1
echo $pynhm
for domain in hru_1 drb_2yr ucb_2yr; do
    ls -d ${pynhm_repo_root}/test_data/${domain}
    ls -C ${pynhm_repo_root}/test_data/${domain}
    echo
    echo
done


/Users/jamesmcc/usgs/pynhm/test_data/hru_1
cbh.nc		myparam.param	runtest.sh	tmax.cbh
control.test	output		scripts		tmax.nc
hru_1.yaml	prcp.cbh	sf_data		tmin.cbh
model.out	prcp.nc		soltab_debug	tmin.nc


/Users/jamesmcc/usgs/pynhm/test_data/drb_2yr
cbh.nc		model.out	prcp.cbh	sf_data		tmax.nc
control.test	myparam.param	prcp.nc		soltab_debug	tmin.cbh
drb_2yr.yaml	output		rhavg.cbh	tmax.cbh	tmin.nc


/Users/jamesmcc/usgs/pynhm/test_data/ucb_2yr
cbh.nc		output		runtest.sh	tmax.cbh	ucb_2yr.yaml
control.test	prcp.cbh	scripts		tmax.nc
model.out	prcp.nc		sf_data		tmin.cbh
myparam.param	rhavg.cbh	soltab_debug	tmin.nc




If your repository is not freshly cloned, the above results may not look the same as other files have already been generated (as we will generate below).

Note that on windows, symlinks for control.test will be broken in the locations shown above (though somehow they are not broken in CI Windows environment). To fix this, until we find a better solution, please run the following code block.

In [3]:
%%bash -s "$pynhm_repo_root"
pynhm_repo_root=$1

# *** ONLY NECESSARY TO RUN ON WINDOWS IF YOUR SYMLINKS ARE BROKEN ***
if [[ ! -L "${pynhm_repo_root}/test_data/hru_1/control.test" ]]; then
    cp ${pynhm_repo_root}/test_data/common/control.single_hru ${pynhm_repo_root}/test_data/hru_1/control.test
    cp ${pynhm_repo_root}/test_data/common/control.multi_hru ${pynhm_repo_root}/test_data/drb_2yr/control.test
    cp ${pynhm_repo_root}/test_data/common/control.multi_hru ${pynhm_repo_root}/test_data/ucb_2yr/control.test
fi

The files listed above in each domain directory represent the data needed to run PRMS in an NHM configuration on each of the domains for 2 years in the case of the Delaware River and the Upper Colorado Basins. The inputs for hru_1 allow a 40 year run on a single HRU. More details about these domains will be provided in subsequent notebooks. 

Now we will run PRMS for each of these domains and generate output in an `output/` subdirectory of each domain directory listed above. 

In [4]:
%%bash -s "$pynhm_repo_root"
pynhm_repo_root=$1

cd ${pynhm_repo_root}/test_data/scripts
pytest -n=auto test_run_domains.py

platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0
rootdir: /Users/jamesmcc/usgs/pynhm
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, env-0.6.2, cov-3.0.0
gw0 I / gw1 I / gw2 I / gw3 I / gw4 I / gw5 I
gw0 [4] / gw1 [4] / gw2 [4] / gw3 [4] / gw4 [4] / gw5 [4]

....                                                                     [100%]


## Convert PRMS outputs to netcdf

PRMS generates CSV output files. For example, for the DRB the file listing is:

In [5]:
%%bash -s "$pynhm_repo_root"
pynhm_repo_root=$1

ls -C ${pynhm_repo_root}/test_data/drb_2yr/output
echo "Number of files: $(ls -1 ${pynhm_repo_root}/test_data/drb_2yr/output | wc -l)"

albedo.csv		iso.csv			seginc_ssflow.csv
cap_infil_tot.csv	net_ppt.csv		slow_flow.csv
contrib_fraction.csv	net_rain.csv		slow_stor.csv
dprst_evap_hru.csv	net_snow.csv		snow_evap.csv
dprst_seep_hru.csv	newsnow.csv		snowcov_area.csv
dprst_sroff_hru.csv	orad_hru.csv		snowmelt.csv
dprst_stor_hru.csv	perv_actet.csv		soil_lower.csv
freeh2o.csv		pk_ice.csv		soil_lower_ratio.csv
gwres_flow.csv		pkwater_ante.csv	soil_moist.csv
gwres_in.csv		pkwater_equiv.csv	soil_moist_tot.csv
gwres_sink.csv		potet.csv		soil_rechr.csv
gwres_stor.csv		potet_lower.csv		soil_to_gw.csv
hru_actet.csv		potet_rechr.csv		soil_to_ssr.csv
hru_impervevap.csv	pptmix.csv		sroff.csv
hru_impervstor.csv	pptmix_nopack.csv	ssr_to_gw.csv
hru_intcpevap.csv	pref_flow.csv		ssres_flow.csv
hru_intcpstor.csv	pref_flow_infil.csv	ssres_in.csv
hru_lateral_flow.csv	pref_flow_stor.csv	ssres_stor.csv
hru_outflow.csv		prmx.csv		stats.csv
hru_ppt.csv		pst.csv			swrad.csv
hru_rain.csv		recharge.csv		tavgc.csv
hru_snow.csv		salb.csv		tavgf.csv
hr

We convert these files to netcdf and generate a hand full of extra, derivative files as well in the next step.

In [6]:
%%bash -s "$pynhm_repo_root"
pynhm_repo_root=$1

cd ${pynhm_repo_root}/test_data/scripts
# currently there is a race condition in one test that may happen when run in parallel. just rerun for now
pytest -n=auto test_nc_domains.py  

platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0
rootdir: /Users/jamesmcc/usgs/pynhm
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, env-0.6.2, cov-3.0.0
gw0 I / gw1 I / gw2 I / gw3 I / gw4 I / gw5 I
gw0 [307] / gw1 [307] / gw2 [307] / gw3 [307] / gw4 [307] / gw5 [307]

........................................................................ [ 23%]
........................................................................ [ 46%]
........................................................................ [ 70%]
........................................................................ [ 93%]
...................                                                      [100%]


In [7]:
%%bash -s "$pynhm_repo_root"
pynhm_repo_root=$1

cd ${pynhm_repo_root}/test_data/drb_2yr/output
ls -C *.nc
echo "Number of files: $(ls -1 ${pynhm_repo_root}/test_data/drb_2yr/output/*.nc | wc -l)"

albedo.nc		iso.nc			slow_stor_prev.nc
cap_infil_tot.nc	net_ppt.nc		snow_evap.nc
contrib_fraction.nc	net_rain.nc		snowcov_area.nc
dprst_evap_hru.nc	net_snow.nc		snowmelt.nc
dprst_seep_hru.nc	newsnow.nc		soil_lower.nc
dprst_sroff_hru.nc	orad_hru.nc		soil_lower_prev.nc
dprst_stor_hru.nc	perv_actet.nc		soil_lower_ratio.nc
dprst_stor_hru_prev.nc	pk_ice.nc		soil_moist.nc
freeh2o.nc		pk_ice_prev.nc		soil_moist_prev.nc
freeh2o_prev.nc		pkwater_ante.nc		soil_moist_tot.nc
gwres_flow.nc		pkwater_equiv.nc	soil_rechr.nc
gwres_flow_vol.nc	potet.nc		soil_rechr_prev.nc
gwres_in.nc		potet_lower.nc		soil_to_gw.nc
gwres_sink.nc		potet_rechr.nc		soil_to_ssr.nc
gwres_stor.nc		pptmix.nc		soltab_horad_potsw.nc
hru_actet.nc		pptmix_nopack.nc	soltab_potsw.nc
hru_impervevap.nc	pref_flow.nc		soltab_sunhrs.nc
hru_impervstor.nc	pref_flow_infil.nc	sroff.nc
hru_impervstor_prev.nc	pref_flow_stor.nc	sroff_vol.nc
hru_intcpevap.nc	pref_flow_stor_prev.nc	ssr_to_gw.nc
hru_intcpstor.nc	prmx.nc			ssres_flow.nc
hru_lateral_f

These netcdf files are the results of running PRMS 5.2.1. These files are used for evaluating the results/simulations of pynhm and also as inputs to individual process models (e.g. PRMSRunoff) in pynhm. Netcdf files can be inspected on the command line with the ncdump utility. Though it's installed with the pynhm_nb environment, specifying the path to ncdump is a pain here. Instead, we'll display the datasets from xarray, which is very similar to `ncdump -h`. In the highlevel metadata shown, note that the time durations and number of HRUs are evident by looking at the surface runoff variable (sroff) for each domain. 

In [8]:
import pathlib as pl
import xarray as xr
from pynhm.constants import __pynhm_root__

for domain in ["hru_1", "drb_2yr", "ucb_2yr"]:
    print(domain)
    display(xr.open_dataset(pl.Path(f"{__pynhm_root__.parent}/test_data/{domain}/output/sroff.nc")))
    print()


hru_1



drb_2yr



ucb_2yr





## pynhm autotest
Now we can run the suite of pynhm tests, as we just genereated all the answers and input data. This verifies that your pynhm code base and your virtual environment are copacetic (assuming the commit being tested passed CI). First, I will point out that `pytest --help` even returns options for the test in the current directory under "custom options":

In [9]:
%%bash -s "$pynhm_repo_root"
pynhm_repo_root=$1

cd ${pynhm_repo_root}/autotest
pytest --help | grep -A7 "custom options"

custom options:
  --domain_yaml=DOMAIN_YAML
                        YAML file(s) for indiv domain tests. You can pass
                        multiples of this argument. Default value (not shown
                        here) is --domain_yaml=../test_data/drb_2yr/drb_2yr.yaml
  --print_ans           Print results and assert False for all domain tests
  --all_domains         Run all test domains



In [10]:
%%bash -s "$pynhm_repo_root"
pynhm_repo_root=$1

cd ${pynhm_repo_root}/autotest
pytest -n=auto --all_domains

platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0
rootdir: /Users/jamesmcc/usgs/pynhm
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, env-0.6.2, cov-3.0.0
gw0 I / gw1 I / gw2 I / gw3 I / gw4 I / gw5 I
gw0 [80] / gw1 [80] / gw2 [80] / gw3 [80] / gw4 [80] / gw5 [80]

...................................................................XxxxX [ 90%]
.X.X.X.X                                                                 [100%]
autotest/test_atmosphere.py::TestPRMSAtmosphere::test_init[drb_2yr]
autotest/test_atmosphere.py::TestPRMSAtmosphere::test_init[ucb_2yr]
autotest/test_atmosphere.py::TestPRMSAtmosphere::test_init[hru_1]
    warn(f"using tol = {tol} for variable {key}")

autotest/test_atmosphere.py::TestPRMSAtmosphere::test_init[drb_2yr]
autotest/test_atmosphere.py::TestPRMSAtmosphere::test_init[ucb_2yr]
autotest/test_atmosphere.py::TestPRMSAtmosphere::test_init[hru_1]
    warn(f"using tol = {tol} for variable {key}")

autotest/test_model.py::test_model[ucb_2yr-nhm]
autotest

We see that some tests are marked "x" for "expected failure". Some of these fail (x) and some pass (X) as the expected failures are typically just for one of the three domains. We also see generated warnings and the time taken. 