# Creating forcing datasets for pySUMMA

Here we will put together forcing data setups for CAMELs.
This requires hourly forcings of
 - temperature (K)
 - precipitation (kg/m^2/s^1)
 - shortwave radiation (W/m^2)
 - longwave radiation (W/m^2)
 - specific humidity (g/g)
 - air pressure (Pa)
 - wind speed (m/s)
 
NLDAS hourly forcings are considered to be "truth". 
We take NLDAS hourly forcings and give them their mean daily values for each variable in turn (or all variables). 
Then we take NLDAS daily forcings of maximum and minimum temperature, total precipitation, and mean windspeed; and redistribute them to hourly with calculation of the other variables using the MetSim python package. 

Before we start, the first thing we need to do is make sure SUMMA and pysumma are installed. Commands:
* `cd /path/to/home`
* `github clone https://github.com/ashleymedin/summa.git`
* `cd summa`
* `github checkout develop`
* `cd build`

In the makefile, at a minimum you will need to set the following:
* F_MASTER         - top level summa directory
* FC               - compiler suite
* FC_EXE           - compiler executable
* INCLUDES         - path to include files
* LIBRARIES        - path to and libraries to include

This is setup in the repo for Cheyenne under Makefile_cheyenne, which you can copy and then build. Commands:
* `cp Makefile_cheyenne my_makefile`
* `./build_summa`

Next, get an Anaconda Python environment set up and install the environment there as follows. 

Check the operating system:
* `cd /path/to/home`
* `uname -m`
Then get Miniconda for that system. For Linux 64 like Cheyenne, the command is:
* `wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh`
* `chmod 777 Miniconda3-latest-Linux-x86_64.sh`
* `./Miniconda3-latest-Linux-x86_64.sh`
Close and then re-open your terminal window.

Now install the environment with conda the extra packages, with commands:
* `cd /path/to/home`
* `git clone https://github.com/ashleymedin/pysumma.git`
* `cd pysumma`
* `git checkout develop`
* `conda env create -f environment.yml`

This will make a python environment that you can activate and deactivate. 
Activate it-- this will install metsim, some plotting tools, and the dask tools for good measure: 
* `conda activate pysumma` 

Install it as a kernel in your Jupyter environments:
* `python -m ipykernel install --user --name=pysumma`

Then install pySUMMA
* `python setup.py develop`

Make sure to run `git pull origin`inside `/summa` and `/pysumma` periodically to keep things up to date. You need to rebuild SUMMA if there are changes. You *do not* need to run `setup.py` again if you used develop. 
If you want to deactivate, run `conda deactivate`.

<br>
Now, let's make the inputs for pySumma. First we check that we loaded correct environment.

In [None]:
conda list metsim

<br>
Then we load some standard imports.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import geoviews as gv
import geopandas as gpd
import holoviews as hv
import xarray as xr
import ogr
import cartopy

hv.notebook_extension('bokeh')

<br>

### You will need to edit these paths to be your folders

In [None]:
# Some folder places
top = '/glade/work/ashleyvb'
folder = top+'/CAMELs'
folders = folder+'/summa_camels'
shapefile = folder+'/basin_set_full_res_simple/HCDN_nhru_final_671.shp'

<br>

# Our study location: the CAMELs basins

We are using MetSim on unstructured mesh arrangements as well as the structured latitude-longitude grid that we ran in the command line examples.
We will use a setup which consists of 671 "hydrologic response units" or HRU.
A HRU typically delineates a watershed by topography, soil type, land use, or other defining feature.
Here, each CAMELs basin is run as a lumped model, so each one is an HRU.

Let's take a look. 
Because this is an unstructured mesh we will be defining the mesh elements by their respective basin identifiers. 

In [None]:
file_list = []
filelist = open(folders+'/settings.v1/forcingFileList.1hr.txt', 'r')
for lineNumber, line in enumerate (filelist):
   file_list.append(folders+'/forcing/1hr/'+line.strip("'\n"))
filelist.close()

In [None]:
# Some things get dumped with open_mfdataset, so keep them here
extra_vars0 = xr.open_dataset(file_list[0]) 
extra_vars0 = extra_vars0.assign_coords(hru=extra_vars0['hruId'])

### THIS IS IMPORTANT! Select here what basins you want to run. 

In [None]:
# All HRUs, select out all or some to look at, assuming have GRUs same as HRUs here
#the_hru = np.array(extra_vars0['hruId']) # run all
the_hru = np.array([1413500, 7261000, 9223000, 12488500])
the_gru = the_hru
extra_vars = extra_vars0.sel(hru=the_hru)
print(the_hru)

In [None]:
gdf = gpd.read_file(shapefile)
shapes = cartopy.io.shapereader.Reader(shapefile)
list(shapes.records())[0]

In [None]:
# Convert the data from an xarray dataset to a pandas dataframe 
out_df = extra_vars0['hru']
out_df = out_df.to_dataframe()
# Make sure we have some metadata to join with the shapefile
out_df['hru_id'] = gdf['hru_id'].values
#search for the ones with desired records
find_rec = out_df.loc[the_hru,:]['hru_id']
#look at attributes
desired_shapes = []
for i in find_rec:
    for s in shapes.records():
        if s.attributes['hru_id'] == i :
            desired_shapes.append(s)

In [None]:
# Create backgound
mapp =gv.tile_sources.StamenTerrainRetina.opts(width=900, height=500)
# Create the shape plot
poly = gv.Shape.from_records(shapes.records(), out_df, index=['hru_id'], on='hru_id',crs=cartopy.crs.PlateCarree())
poly2 = gv.Shape.from_records(desired_shapes,out_df.loc[the_hru,:],on='hru_id', crs=cartopy.crs.PlateCarree())
poly = poly.opts(cmap='plasma', tools=['hover'], colorbar=True, alpha=0.8)
poly2 = poly2.opts(fill_color='cyan', line_color='cyan', alpha=0.8)
# Plot
mapp*poly*poly2

Basins are colored by index values and are hoverable for index values and IDs. Selected basins are higlighted in cyan.

<br>

# Make SUMMA files the correct HRUs
We have to select out only the HRUs of the basins we are using.

In [None]:
# Attributes 
attrib = xr.open_dataset(folders+'/settings.v1/attributes.camels.v2.nc')
attrib = attrib.assign_coords(hru=attrib['hruId'])
attrib = attrib.assign_coords(gru=attrib['gruId'])
gg = attrib['gruId'] # save because gruId was missing from the parameter file
attrib = attrib.sel(hru=the_hru)
attrib = attrib.sel(gru=the_gru)
attrib = attrib.drop(['hru','gru']) #summa doesn't like these to have coordinates
attrib.to_netcdf(folders+'/settings.v1/attributes.nc')

In [None]:
# Parameters
param = xr.open_dataset(folders+'/settings.v1/trialParams.camels.Oct2020.nc')
param = param.assign_coords(hru=param['hruId'])
param = param.assign_coords(gru=gg) # there should be a gruId in here, but there wasn't
param = param.sel(hru=the_hru)
param = param.sel(gru=the_gru)
param = param.drop(['hru','gru']) #summa doesn't like these to have coordinates
param.to_netcdf(folders+'/settings.v1/parameters.nc')

<br>
Lastly we will need to make the constant initial conditions file.

### You will need to edit these paths to be your folders

In [None]:
! cd /glade/work/ashleyvb/CAMELs/summa_camels/settings.v1; source activate /glade/work/ashleyvb/miniconda3/envs/pysumma; python gen_coldstate.py attributes.nc init_cond.nc int

<br>

# Forcing files of NLDAS with Constant Daily Values
Here we make the NLDAS data hourly values into constant 24 hourly values for each forcing variable. 
We need to make these 24 hours represent a local day, so local time zones, such that later calculations on days work. 
Since SUMMA will impose that shortwave radiation is 0 when the sun is below the horizon, we distribute the constant shortwave radiation only during the daylight hours.
Other changes inside SUMMA are that specific humidity will be lowered in order that relative humidity does not exceed 100%, and tiny windspeeds will be elimated.

<br>
Merge everything in the folder along time. This takes ~2 minutes.
We set 'minimal' so it won't add the time dimension to the variables that don't have it.

In [None]:
%%time
truth = xr.open_mfdataset(folders+'/forcing/1hr/*.nc',data_vars='minimal',combine='by_coords').load()
truth = truth.assign_coords(hru=truth['hruId'])
truth = truth.sel(hru=the_hru)

In [None]:
# Fix time encoding to be the same since the merge drops it
truth.time.encoding = extra_vars.time.encoding
# Also fix the variables that have no time dimension as they may get merged incorrectly by open_mfdataset
truth['data_step'] = extra_vars['data_step']
# Pull out HRUs
truth = truth.assign_coords(hru=truth['hruId'])
truth = truth.sel(hru=the_hru) # none of these in in here, so maybe a dummy file

<br>
Write this file for pySUMMA forcing and save the forcing file name.

In [None]:
# save
t0 = truth['time'].values[0] 
tl = truth['time'].values[-1]
t0_s = pd.to_datetime(str(t0))
t0_sf =t0_s.strftime('%Y%m%d')
tl_s = pd.to_datetime(str(tl))
tl_sf =tl_s.strftime('%Y%m%d')
ffname ='NLDAStruth_' + t0_sf +'-' + tl_sf +'.nc'
truth.to_netcdf(folders+'/forcing/truth/'+ffname)
fflistname = folders+'/settings.v1/forcingFileList.truth.txt' 
file =open(fflistname,"w")
file.write(ffname)
file.close()

<br>
Now we can make the daily data, first shifting to local time zones with longitude.

In [None]:
# Separate out variables with no time dimension and add offset
ds = truth
ds_withtime = ds.drop_vars([ var for var in ds.variables if not 'time' in ds[var].dims ])
ds_timeless = ds.drop_vars([ var for var in ds.variables if     'time' in ds[var].dims ])
DEG_PER_REV = 360.0       # Number of degrees in full revolution
HRS_PER_DAY = 24
offset = (attrib['longitude'] / DEG_PER_REV) * HRS_PER_DAY
offset = offset.astype(int)
ds_withtime['offset'] = offset
ds_withtime = ds_withtime.assign_coords(hru=ds_timeless['hruId'])

<br>
Here are the time zone changes. 
This takes about a minute using all 671 basins; a subset of basins should be shorter.

In [None]:
%%time
for t in np.unique(offset.data):
    ds = ds_withtime.where(offset==t,drop=True)
    ds = ds.shift(time=t)
    if t==np.unique(offset.data)[0]: ds0 = ds
    if t>np.unique(offset.data)[0]: ds0 = xr.concat([ds0,ds],dim='hru')
    print(t)

In [None]:
# Sort back, and drop offset, keep airtemp for later calculations
ds_withtime = ds0.sortby('hru')
ds_withtime = ds_withtime.drop_vars('offset') 
air24 = ds_withtime.get('airtemp')

<br>
Downsample hourly time-series data to daily data. 
This takes about a 30 seconds.

In [None]:
%%time
truth24 = xr.merge([ds_timeless, ds_withtime.resample(time='1D').mean()]).load()
# Fix time encoding to be the same since the merge drops it
truth24.time.encoding = extra_vars.time.encoding

<br>
Then we upsample this back to hourly data for constant daily values. 
We need to undo the time zone changes after we upsample. 
This whole process takes about a minute using all 671 basins; a subset of basins should be shorter.

In [None]:
# Add a fake day of data so upsamples until the end
day_fake = truth24.isel(time=-1)['time']+np.timedelta64(1,'D')
add_fake = truth24.isel(time=-1)
add_fake['time'] = day_fake
truth24_add = xr.concat([truth24, add_fake], dim='time',data_vars='minimal')

In [None]:
%%time
# Again we have to separate out variables with no time dimension.
ds = truth24_add
ds_withtime = ds.drop_vars([ var for var in ds.variables if not 'time' in ds[var].dims ])
ds_timeless = ds.drop_vars([ var for var in ds.variables if     'time' in ds[var].dims ])
ds_withtime = ds_withtime.resample(time='1H').ffill()
ds_withtime['offset'] = offset
ds_withtime = ds_withtime.assign_coords(hru=ds_timeless['hruId'])
for t in np.unique(offset.data):
    ds = ds_withtime.where(offset==t,drop=True)
    ds = ds.shift(time=-t)
    if t==np.unique(offset.data)[0]: ds0 = ds
    if t>np.unique(offset.data)[0]: ds0 = xr.concat([ds0,ds],dim='hru')
    print(t)

In [None]:
# Sort back, and drop offset, and merge
ds_withtime = ds0.sortby('hru')
ds_withtime = ds_withtime.drop_vars('offset') 
constant_all = xr.merge([ds_timeless, ds_withtime])
constant_all = constant_all.sel(hru=the_hru) #put back in original order

In [None]:
# Take extra day off
constant_all = constant_all.isel(time=slice(0,-1))
# Fix time encoding to be the same since the merge drops it
constant_all.time.encoding = extra_vars.time.encoding

<br>

## Scale Constant SW Radiation
Edit the constant daily shortwave radiation so that energy is the same in the "truth" when pySUMMA makes the shortwave radiation zero during the day.  

In [None]:
# Find where 0's shoud be based on original NLDAS data
zero_one = truth['SWRadAtm']/truth['SWRadAtm']
zero_one = zero_one.fillna(0)

In [None]:
# Find how much too small shortwave is each day if we use these 0's
swr0 = zero_one*constant_all['SWRadAtm']
div = swr0.resample(time='1D').mean()/constant_all['SWRadAtm'].resample(time='1D').mean()

In [None]:
# Upsample, again add a fake day of data so upsamples until the end
add_fake = div.isel(time=-1)
add_fake['time'] = day_fake
div_add = xr.concat([div, add_fake], dim='time')
div_add = div_add.resample(time='1H').ffill()

# Take extra day off
div = div_add.isel(time=slice(0,-1))

In [None]:
# Finally add back in this constant shortwave radiation
swr0 = swr0/div
constant_all['SWRadAtm']=swr0

<br>

## Files with Only One Variable Constant
Now make files with only one variable held at daily means and save forcing file names.

In [None]:
t0 = constant_all['time'].values[0] 
tl = constant_all['time'].values[-1]
t0_s = pd.to_datetime(str(t0))
t0_sf =t0_s.strftime('%Y%m%d')
tl_s = pd.to_datetime(str(tl))
tl_sf =tl_s.strftime('%Y%m%d')

In [None]:
constant_vars=['airpres','airtemp','LWRadAtm','pptrate','spechum','SWRadAtm','windspd']
for v in constant_vars:
    constant_one = truth.copy()
    constant_one[v]= constant_all[v]
    ffname ='NLDASconstant_' + v +'_forcing_' + t0_sf +'-' + tl_sf +'.nc'
    constant_one.to_netcdf(folders+'/forcing/constant/'+ffname)
    fflistname = folders+'/settings.v1/forcingFileList.constant_' + v + '.txt' 
    file =open(fflistname,"w")
    file.write(ffname)
    file.close()
    print(v)

<br>

# Check Files

To make sure things look how we want, we plot the constant dataset against the NLDAS "truth" dataset, and plot the cumulative variables to see how errors are compounding. 
We plot one HRU (the first one) for 2 months.

In [None]:
#Plot hourly
fig, axes = plt.subplots(nrows=7, ncols=1, figsize=(20, 20))
axes = axes.flatten()
axes[0].set_title('Hourly')

unit_str = ['($ ^o K$)', '($kg/m/s$)', '($W/m^2$)','($w/m^2$)','($g/g$)','($Pa$)', '($m/s$)',]

variables = list(ms_pysum.variables.keys())
dims = list(ms_out.dims.keys())
[variables.remove(d) for d in dims]

start =  24*7*30 
stop = start + 2*30*24 
#truth starts 90 days earlier
truth_plt = truth.isel(hru=0, time=slice(start+90*24, stop+90*24))
constant_all_plt = constant_all.isel(hru=0, time=slice(start+90*24, stop+90*24))

for idx, var in enumerate(variables[0:7]):
    truth_plt[var].plot(ax=axes[idx],label='NLDAS')
    constant_all_plt[var].plot(ax=axes[idx],label='Constant')
    axes[idx].set_title('') 
    axes[idx].set_ylabel('{} {}'.format(var, unit_str[idx]))
    axes[idx].set_xlabel('Date')
plt.tight_layout()
plt.legend()

In [None]:
#Plot cummulative
fig, axes = plt.subplots(nrows=7, ncols=1, figsize=(20, 20))
axes = axes.flatten()
axes[0].set_title('Cumulative')

truth_plt = truth.isel(hru=0, time=slice(start+90*24, stop+90*24)).cumsum(dim='time')
constant_all_plt = constant_all.isel(hru=0, time=slice(start+90*24, stop+90*24)).cumsum(dim='time')

for idx, var in enumerate(variables[0:7]):
    truth_plt[var].plot(ax=axes[idx],label='NLDAS')
    constant_all_plt[var].plot(ax=axes[idx],label='Constant')
    axes[idx].set_title('') 
    axes[idx].set_ylabel('{} {}'.format(var, unit_str[idx]))
    axes[idx].set_xlabel('Date')
plt.tight_layout()
plt.legend()