# Preprocess PRMSAtmosphere.

This notebook demonstrates how to pre-process the atmospheric forcings used by the hydrology. If you are not varying parameters in PRMSAtmosphere over your model runs, this preprocessing can save considerable time. Certain workflows, like calibration, vary parameters in PRMSAtmosphere and therefore can not preprocess its output. 

Preprocessing the PRMSAtmoshpere is just one example of how you can preprocess all the inputs up to your process of interest, provided they do not vary in any for your problem of interest. For example, the sub-models demonstrated in notebooks `01_multi-process_models.ipynb` and `02_prms_legacy_models.ipynb` have all thier inputs pre-processed. But because `PRMSSolarGeom` and `PRMSAtmosphere` behave silghtly different, this example of how to pre-process their outputs can be helpful. 

This example further illustrates the flexible nature of how input data are handled by pywatershed. Below we'll run the atmopshere using an active `PRMSSolarGeom` instance and also use the static output of `PRMSSolarGeom` to drive `PRMSAtmosphere`. We'll start by preprocessing CBH files from PRMS-native format to NetCDF (as was previously demonstrated in `02_prms_legacy_models.ipynb`.

This notebook assumes you have an editable install of pywatershed (`pip install -e .` from the root of the cloned repository), to get necessary domain information. See [this section of the DEVELOPER.md](https://github.com/EC-USGS/pywatershed/blob/develop/DEVELOPER.md#installing-pywatershed-in-development-mode) for additional details.

In [None]:
from copy import deepcopy
import pathlib as pl
from pprint import pprint
import shutil

import jupyter_black
import numpy as np
import pywatershed as pws
import xarray as xr

jupyter_black.load()  # auto-format the code in this notebook

pws_repo_root = pws.constants.__pywatershed_root__.parent

This is where we'll place all the output from this notebook. 

In [None]:
nb_output_dir = pl.Path("./04_preprocess_atm")
if nb_output_dir.exists():
    shutil.rmtree(nb_output_dir)
nb_output_dir.mkdir()

This works with domains in the pywatershed repository, you can configure for your domains.

In [None]:
dom_name = "drb_2yr"
dom_dir = pws_repo_root / f"test_data/{dom_name}"
param_file = dom_dir / "myparam.param"
control_file = dom_dir / "nhm.control"
dis_file = dom_dir / "parameters_dis_hru.nc"

## Convert CBH files to netcdf
For completeness sakes, we'll start with PRMS-native inputs and process those to the NetCDF files that pywatershed will use.

In [None]:
params = pws.parameters.PrmsParameters.load(param_file)

cbh_files = {
    "prcp": dom_dir / "prcp.cbh",
    "tmax": dom_dir / "tmax.cbh",
    "tmin": dom_dir / "tmin.cbh",
}

cbh_dir = nb_output_dir / f"cbh"
cbh_dir.mkdir(exist_ok=True)

for kk, vv in cbh_files.items():
    out_file = cbh_dir / f"{kk}.nc"
    pws.utils.cbh_file_to_netcdf({kk: vv}, params, out_file)

## Write solar geometry files
Below we'll demonstrate using an active instance of `PRMSSolarGeom` and also using its static output to drive `PRMSAtmosphere`. Here we create the static output that we need for `PRMSSolarGeom` in the second case.

In [None]:
solar_geom_dir = nb_output_dir / "solar_geom"
solar_geom_dir.mkdir(exist_ok=True)

solar_geom_output_vars = ["soltab_horad_potsw", "soltab_potsw"]

control = pws.Control.load_prms(control_file, warn_unused_options=False)
control.options = control.options | {
    "netcdf_output_dir": solar_geom_dir,
    "netcdf_output_var_names": [
        "soltab_horad_potsw",
        "soltab_potsw",
    ],
}

solar_geom = pws.PRMSSolarGeometry(control, None, params)
solar_geom.initialize_netcdf()
control.advance()
solar_geom.advance()
solar_geom.output()
del solar_geom

We'll take a look at some of the data, particularly looking at the last time available in the file.

In [None]:
var = "soltab_potsw"
da = xr.open_dataarray(solar_geom_dir / f"{var}.nc", decode_timedelta=False)
display(da)
print(da[-1, 0:100].values)
da.close()

## Preprocess atmospheric forcings without solar geometry files present

When a `PRMSAtmosphere` object is initalized with a `netcdf_output_dir` argument, the adjusted forcings 
are written to this location. Unless one requests specific variables only, all variables are written. 

Typically, the `soltab_potsw.nc` and `soltab_horad_potsw.nc` input files are not available as inputs. 
(These are only output in a fixed width format by a version of PRMS5.2.1 in the pynhm repository
that is translated to netCDF when setting up test data). Here we show how to get the CBH adjustments
to output files using PRMSSolarGeometry instead of soltab files. The next section will show how to use available soltab files we created above.

In [None]:
cbh_files_dict = {ff.with_suffix("").name: ff for ff in cbh_dir.glob("*.nc")}

In [None]:
atm_dir = nb_output_dir / "atm_without_solar_files"
atm_dir.mkdir(exist_ok=True)

control = pws.Control.load_prms(control_file, warn_unused_options=False)
control.options["netcdf_output_dir"] = atm_dir

solar_geom = pws.PRMSSolarGeometry(control, None, params)

atm = pws.PRMSAtmosphere(
    control,
    None,
    params,
    **cbh_files_dict,
    soltab_horad_potsw=solar_geom.soltab_horad_potsw,
    soltab_potsw=solar_geom.soltab_potsw,
)
atm.initialize_netcdf()
control.advance()
solar_geom.advance()
atm.advance()
atm.calculate(1)
atm.output()
del atm

In [None]:
var = "potet"
da = xr.open_dataarray(atm_dir / f"{var}.nc")
display(da)
print(da[-1, 0:100].values)
da.close()

## Preprocess atmospheric forcings with solar geometry files present
We repeat the above, dropping the `PRMSSolarGeometry` object as its information is now coming from the soltab files. 

In [None]:
cbh_files_dict = {ff.with_suffix("").name: ff for ff in cbh_dir.glob("*.nc")}
solar_files_dict = {
    ff.with_suffix("").name: ff for ff in solar_geom_dir.glob("*.nc")
}
atm_input_files_dict = cbh_files_dict | solar_files_dict

In [None]:
atm_solar_files_dir = nb_output_dir / "atm_without_solar_files"
atm_solar_files_dir.mkdir(exist_ok=True)

control = pws.Control.load_prms(control_file, warn_unused_options=False)
control.options["netcdf_output_dir"] = atm_solar_files_dir

solar_geom = pws.PRMSSolarGeometry(control, None, params)

atm = pws.PRMSAtmosphere(
    control,
    None,
    params,
    **atm_input_files_dict,
)
atm.initialize_netcdf()
control.advance()
atm.advance()
atm.calculate(1)
atm.output()
del atm

In [None]:
var = "potet"
da = xr.open_dataarray(atm_dir / f"{var}.nc")
display(da)
print(da[-1, 0:100].values)
da.close()