# Creating input dataset for MetSim

Alright, now that you've gotten a feel for how to work with MetSim you may be wondering how to bring in new data.
In this portion of the tutorial we will put together a new MetSim setup for Reynolds Creek.
This is an experimental watershed in southwestern Idaho.
We will be looking at a single location for a year.
The data was downloaded in csv format, which we will transform into NetCDF inputs.
As usual, we begin with some standard imports.

In [1]:
%matplotlib inline
import cartopy
import geoviews as gv
import geopandas as gpd
import holoviews as hv
import pandas as pd
import xarray as xr
from metsim import MetSim
import matplotlib.pyplot as plt
import matplotlib
import numpy as np

matplotlib.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['figure.dpi'] = 96
hv.notebook_extension('bokeh')

In [2]:
'''(gv.tile_sources.StamenTerrainRetina
 * gv.Points([(-116.51, 43.17)]).opts(style=dict(size=16, color='red'))
 * gv.Points([(-122.3321, 49.9),(-122.8443, 47.2529), (-122.5, 42.0)]).opts(style=dict(size=0))
).opts(width=900, height=600)'''

"(gv.tile_sources.StamenTerrainRetina\n * gv.Points([(-116.51, 43.17)]).opts(style=dict(size=16, color='red'))\n * gv.Points([(-122.3321, 49.9),(-122.8443, 47.2529), (-122.5, 42.0)]).opts(style=dict(size=0))\n).opts(width=900, height=600)"

# Put together the required meteorological data
We've gathered some data form the Reynolds Creek site that we will use as input.
There are two sets of CSVs with precipitation and temperature data, respectively.
We data for 2009 and 2010, and will be generating MetSim input for 2010.
To do this we must first convert it into an `xarray` dataset.
Before doing that though, let's just open up one of the temperature files and see what we're working with.

In [3]:
%%bash
head -n 10 ./reynolds_creek_data/daily/temp_2010.csv
echo ""

head: ./reynolds_creek_data/daily/temp_2010.csv: No such file or directory





Here we see that we have several pieces of information to weed through.
Luckily, pandas makes this quite easy. We can load this into a dataframe and easily select out the columns that we want.
As we can see from the header, we are going to be interested in the daily values `TMAX.D-1` and `TMIN.D-1`.
Similarly, the column we are interested in, in the precipitation data is `PREC.I-1`.
So, let's load things up.

## Create the dataset with the relevant dimensions.

In [3]:
dates = pd.date_range('10/01/2022', '09/30/2023')
shape = (len(dates), 1, 1, )
dims = ('time', 'lat', 'lon', )

# We are running only one site, at these coordinates
lats = [48.66]
lons = [-119.84]
elev = 1359.4 # meters
coords = {'time': dates, 'lat': lats, 'lon': lons}

# Create the initial met data input data structure
met_data = xr.Dataset(coords=coords)
met_data

## Create the actual data arrays to put the data into.

In [7]:
for varname in ['prec', 't_min', 't_max']:
    met_data[varname] = xr.DataArray(data=np.full(shape, np.nan),
                                     coords=coords, dims=dims,
                                     name=varname)

## Read in the data and put it into the dataset

In [50]:
data = pd.read_csv('./salmon_cleaned_WY23.csv')
data.head(3)
# data = data[data['Date_Time'] < '2023-09-01'] 
# data['air_temp_set_1'] = data['air_temp_set_1'].interpolate(method='linear')
# data['precip_accum_set_1'] = data['precip_accum_set_1'].interpolate(method='linear')
# data.loc[data['precip_accum_set_1'] > 1100, 'precip_accum_set_1'] = 0

data['time'] = pd.to_datetime(data.time)

# data['precip_accum_set_1'] = data['precip_accum_set_1'].cummax()

# Set the 'datetime' column as the index of the dataframe
data.set_index('time', inplace=True)

prec_vals = data['accppt'].diff().resample('D').sum()
print(prec_vals.max())

# Resample the data to daily frequency and calculate the maximum and minimum temperatures
tmax_vals = data['airtemp'].resample('D').max()
tmin_vals = data['airtemp'].resample('D').min()

# Print the daily maximum and minimum temperatures
# print(daily_max_temp)
# print(daily_min_temp)
# print(prec_vals)

met_data['prec'].values[:, 0, 0] = prec_vals

# Assign the daily maximum and minimum temperatures to the met_data xarray, converting to Celsius
met_data['t_min'].values[:, 0, 0] = tmin_vals - 273.15
met_data['t_max'].values[:, 0, 0] = tmax_vals - 273.15

met_data.to_netcdf('./input/rc_forcing.nc')
met_data

35.559999999999995


PermissionError: [Errno 13] Permission denied: '/Users/clintonalden/Documents/Research/summa_work/processing/methow/input/rc_forcing.nc'

# Put together the required domain 

In [19]:
# We form the domain in a similar fashion
# First, by creating the data structure
coords = {'lat': lats, 'lon': lons}
domain = xr.Dataset(coords=coords)
domain['elev'] = xr.DataArray(data=np.full((1,1,), np.nan),
                          coords=coords,
                          dims=('lat', 'lon', ))
domain['mask'] = xr.DataArray(data=np.full((1,1,), np.nan),
                          coords=coords,
                          dims=('lat', 'lon', ))

# Add the data
domain['elev'][0, 0] = elev
domain['mask'][0, 0] = 1
domain.to_netcdf('./input/rc_domain.nc')
domain

In [20]:
prec_vals

time
2022-10-01     0.00
2022-10-02     0.00
2022-10-03     0.00
2022-10-04     0.00
2022-10-05     0.00
              ...  
2023-09-26     2.54
2023-09-27    10.16
2023-09-28     2.54
2023-09-29     2.54
2023-09-30     0.00
Freq: D, Name: accppt, Length: 365, dtype: float64

### Put together the required state

In [35]:
data['accppt'].max()

nan

In [42]:
# Finally, we create the state file - the dates are 90 days prior to 
# the MetSim run dates - as usual, create an empty data structure to
# read the data into
dates = pd.date_range('07/03/2022', '09/30/2022')
shape = (len(dates), 1, 1, )
dims = ('time', 'lat', 'lon', )
lats = [48.66]
lons = [-119.84]
elev = 1359.4 # meters
coords = {'time': dates, 'lat': lats, 'lon': lons}
state = xr.Dataset(coords=coords)
for varname in ['prec', 't_min', 't_max']:
    state[varname] = xr.DataArray(data=np.full(shape, np.nan),
                               coords=coords, dims=dims,
                               name=varname)

data = pd.read_csv("./salmon_cleaned_WY22.csv")

data = data[data['time'] < '2022-10-01'] 
data = data['2022-07-03' < data['time']]


# data['airtemp'] = data['airtemp'].interpolate(method='linear')
# data['accppt'] = data['accppt'].interpolate(method='linear')
# data.loc[data['precip_accum_set_1'] > 1100, 'precip_accum_set_1'] = 0


data['time'] = pd.to_datetime(data.time)

# data['accppt'] = data['accppt'].cummax()

# Set the 'datetime' column as the index of the dataframe
data.set_index('time', inplace=True)

prec_vals = data['accppt'].diff().resample('D').sum()

# Resample the data to daily frequency and calculate the maximum and minimum temperatures
tmax_vals = data['airtemp'].resample('D').max()
tmin_vals = data['airtemp'].resample('D').min()

# Do precip data
state['prec'].values[:, 0, 0] = prec_vals

# And now temp data and convert to C
state['t_min'].values[:, 0, 0] = tmin_vals - 273.15
state['t_max'].values[:, 0, 0] = tmax_vals - 273.15
state.to_netcdf('./input/rc_state.nc')
state

airtemp     0
accppt      0
pptrate     0
rh          0
airpres     0
spechum     0
LWRadAtm    0
dtype: int64


## Registering parameters and building the driver
Now that we've built all of the input files we need, we can run MetSim with our new setup.
Again, we build a simple configuration and run just as we did in the previous notebook.

In [41]:
dates = pd.date_range('10/01/2022', '09/30/2023')
params = {
    'time_step'    : "60",       
    'start'        : dates[0],
    'stop'         : dates[-1],
    'forcing'      : './input/rc_forcing.nc',     
    'domain'       : './input/rc_domain.nc',
    'state'        : './input/rc_state.nc',
    'forcing_fmt'  : 'netcdf',
    'out_dir'      : './output',
    'out_prefix': 'salmon',
    'scheduler'    : 'threading',
    'chunks'       : 
        {'lat': 1, 'lon': 1},
    'forcing_vars' : 
        {'prec' : 'prec', 't_max': 't_max', 't_min': 't_min'},
    'state_vars'   : 
        {'prec' : 'prec', 't_max': 't_max', 't_min': 't_min'},
    'domain_vars'  : 
        {'elev': 'elev', 'lat': 'lat', 'lon': 'lon', 'mask': 'mask'}
    }               

ms = MetSim(params)
ms.run()
output = ms.open_output().load()

ValueError: `y` must contain only finite values.

# Let's look at what we've got
First, let's just look at our output by plotting the shortwave for the year.
We can see a clear annual cycle, as we would hope.

In [13]:
output['longwave']

# Finally, a simple comparison against some observations
While we're at it, let's compare how MetSim simulated shortwave radiation to some observations.
We will just look at some for January. 
First, let's load in the observations and plot the two timeseries.
We see that, generally MetSim has the right timing, but is occasionally off on magnitude.
To look at this a little bit further we also show a scatter plot comparing these values.
Again, we see that MetSim and the observations are fairly well correlated, although there is a decent amount of spread.

In [None]:
df = pd.read_csv("./reynolds_creek_data/solar_rad_jan_2010.csv", skiprows=[0,1])
df.head()
df.index = pd.DatetimeIndex(df['Date'] + " " + df['Time'])

In [None]:
sliced = output.sel(time=slice('01/01/2010', '01/31/2010')).isel(lat=0, lon=0, drop=True)
sliced['shortwave'].plot()
df['SRADV.H-1 (watt) '].plot(marker = 'o', ls=':')

In [None]:
fig, ax = plt.subplots(figsize=(5,5))
ax.scatter(df['SRADV.H-1 (watt) '], sliced['shortwave'].sel(time=df.index))
ax.set_xlabel(r'Observed shortwave $(W/m^2)$')
ax.set_xlim([0,500])
ax.set_ylabel(r'MetSim shortwave $(W/m^2)$')
ax.set_ylim([0,500])
ax.plot([0, 500], [0, 500], color='k', linestyle='--')