# Getting started with MetSim

Before jumping into a bunch of the options that MetSim allows you to change 
we have to get a baseline comfort with how to set up MetSim and what kinds of 
data go into MetSim and how to set it up. To do so we will use the Reynolds 
Mountain East data that you've used previously. This data already has an hourly 
timestep with all of the forcing variables, which is sufficient to run SUMMA, 
so we're going to first aggregate just temperatures and precipitation to daily 
values and then use MetSim to estimate the required inputs for SUMMA and 
disaggregate back down to hourly timesteps. This gives us a handle on how the
estimation routines compare to the originally observed forcings, and what kins
of impacts they have on the simulation of the hydrologic cycle. We will 
specifically explore how these differences affect the snowpack in this notebook.

With that, let's get started! Below we've got some standard imports, notably we've
added the `from metsim import MetSim` line, which is the main object that is used
to run MetSim from a notebook environment. 

In [None]:
# modules 
import os
import pysumma as ps
import xarray as xr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from metsim import MetSim

# Let's start by building input data for MetSim

As explained in the lecture, MetSim can take a minimal set of data 
at a daily timestep to estimate various quantities and temporally disaggregate them
down to a finer timestep. Since our initial data for Reynolds Mountain East was at the
hourly timestep, we will just reproduce that here. We start out by producing this input
data for MetSim from the hourly Reynolds data. This step is effectively MetSim in reverse.

We will also need to add a couple of parameters/attributes to the daily data. This data is
contained inthe attributes file for SUMMA so we will pull that in as well.


In [None]:
hourly_data = xr.open_dataset('./data/reynolds/forcing_sheltered.nc')
attributes = xr.open_dataset('./settings/reynolds/summa_zLocalAttributes.nc')

With the data pulled in we will compute the daily minimum and maximum 
temperatures (in Celcius, as MetSim requires) as well as the daily 
average windspeed and daily total precipitation. To do this we can
use the `xarray.DataArray.resample` methods. For the precipitation,
we also have to account for the fact that SUMMA requires mm per second,
while MetSim wants mm total over the timestep, so we multiply by the 
length of the `data_step` in seconds.

With those we have the complete forcings, however we must still provide
the state file for MetSim, which is simply additional forcings for the 
preceeding 90 days from the simulation period. In this case we don't have
that data so we make a subjective choice to fill in the data. We want to
get data for 1998/07/03 up to 1998/10/01 - so I have chosen to 
simply take the data from the time period 1999/07/03 to 1999/10/01 and 
copy that back. You might do many things to get this data, such as use
climatologic averages, or just shorten your simulation period if reasonable.

Then, we compile everything, including latitude, longitude, elevation and a `mask`
variable into the dataset. The `mask` variable simply marks which cells to run.
This is useful for cutting out oceans and lakes or other features in spatially
distributed runs, but we will simply set it to `1`, which marks the cell for
running. Finally, we save out the new dataset, which will be used in this
notebook as well as the subsequent notebooks.

In [None]:
# Resample to daily
tmin = hourly_data['airtemp'].resample({'time': 'D'}).min() - 273.16
tmin.name = 'Tmin'
tmax = hourly_data['airtemp'].resample({'time': 'D'}).max() - 273.16
tmax.name = 'Tmax'
wind = hourly_data['windspd'].resample({'time': 'D'}).mean()
wind.name = 'wind'
precip = hourly_data['data_step'].dt.seconds * hourly_data['pptrate'].resample({'time': 'D'}).sum()
precip.name = 'precip'

# Put it all into a single dataset
daily_data = xr.Dataset()
daily_data['Tmin'] = tmin
daily_data['Tmax'] = tmax
daily_data['prcp'] = precip
daily_data['wind'] = wind

# Generate state data
state = daily_data.isel(time=slice(365-90, 365))
state = state.assign_coords({'time': state['time'] - pd.Timedelta('365D')})
daily_data = xr.concat([state, daily_data], dim='time')

# Add some attributes/parameters
daily_data['lat'] = attributes['latitude']
daily_data['lon'] = attributes['longitude']
daily_data['elev'] = attributes['elevation']
daily_data['mask'] = (attributes['elevation'] > 0).astype(int)

daily_data.to_netcdf('./data/reynolds/forcing_daily.nc')

# Getting a configuration set up

The MetSim configuration can be verbose, but is quite flexible. We will 
only cover the basic usage/explanations here, but you can browse [the docs](https://metsim.readthedocs.io/en/develop/)
for further information about how to set up various configurations.

The gist here is that we will set the input file paths (for the daily data we just generated),
then set some output paths. We then set up the run times and time step, as well as setting the
`period_ending` flag to `True`. This ensures that the timestamps line up with the conventions
that are used in SUMMA, reducing the need to postprocess the MetSim output for use in SUMMA.

Following the top level configuration is a series of nested dictionaries of special purpose.
The `chunks` section tells MetSim how to break up spatially distributed runs in parallel.
Because this is a point simulation we can simply set it to 1. The `forcing_vars`, `state_vars`,
and `domain_vars` provide the ability to map variable names in your data to the names that
MetSim expects. The format here is `yourVarName: metsimVarName`. Finally, we set up the output
data in the `out_vars` section. The keys in this section are the output variable names as MetSim
calls them. Then each sub-dictionary allows you to rename it in the output file via the `out_name`
specification, as well as allows you to do basic unit conversions which, again, simply reduces
the amount of postprocessing we will need to do to use MetSim output to run SUMMA.

In [None]:
config = {
    # Input files
    "domain": './data/reynolds/forcing_daily.nc',
    "forcing": './data/reynolds/forcing_daily.nc',
    "state": './data/reynolds/forcing_daily.nc',
    
    # Output location/naming
    "out_dir": './data/reynolds/',
    "out_prefix": 'forcing_metsim_uniform',
    
    # Run configuration/parameters
    "start": "1998/10/01",
    "stop": "2008/10/01",
    "time_step": 60,
    "period_ending": True,
    
    # Set up spatial chunking
    "chunks": {'hru': 1},

    # Set up input variable mapping
    "forcing_vars": {
        "Tmin": "t_min",
        "Tmax": "t_max",
        "prcp": "prec",
        "wind": "wind",
    },
    "state_vars": {
        "Tmin": "t_min",
        "Tmax": "t_max",
        "prcp": "prec",
        "wind": "wind",
    },
    "domain_vars": {
        "lon": "lon",
        "lat": "lat",
        "elev": "elev",
        "mask": "mask",
    },
    
    # Set up output specifications
    "out_vars": {
        'temp'        : {'out_name': 'airtemp', 'units': 'K'},
        'prec'        : {'out_name': 'pptrate', 'units': 'mm s-1'},
        'air_pressure': {'out_name': 'airpres', 'units': 'Pa'},
        'shortwave'   : {'out_name': 'SWRadAtm'},
        'longwave'    : {'out_name': 'LWRadAtm'},
        'spec_humid'  : {'out_name': 'spechum' },
        'wind'        : {'out_name': 'windspd' }
    },
}

# Running MetSim and preparing for running SUMMA

With a working configuration set up it's pretty simple to run MetSim. 
We begin by instantiating the `MetSim` object with the configuration.
Then, it's as easy as calling the `.run` method to kick things off.
Once the simulation is finished the output can be opened with the
`.open_output` method. We will open up the output, add in a couple of
pieces of metadata for the SUMMA simulation and write out the new data.

In [None]:
ms = MetSim(config)

In [None]:
ms.run()

In [None]:
with ms.open_output() as ds:
    ds['data_step'] = hourly_data['data_step']
    ds['hruId'] = hourly_data['hruId']
    out_ds = ds.load()
out_prefix = config["out_prefix"]
out_suffix = ms.get_nc_output_suffix(ds["time"].to_series())
out_filename = f'{out_prefix}_{out_suffix}.nc'
out_dirname = os.path.abspath('./data/reynolds')
out_ds.to_netcdf(f'{out_dirname}/{out_filename}')

# Running SUMMA and comparing results

To compare how the MetSim forcing data stacks up to the observed data we will run SUMMA simulations using both datasets.
To run the simulation with the MetSim generated data we just have to replace the path to the data in the forcing file list.


In [None]:
summa_executable = 'summa.exe'
file_manager = './settings/reynolds/summa_fileManager.txt'

sim_default = ps.Simulation(summa_executable, file_manager)
sim_metsim  = ps.Simulation(summa_executable, file_manager)

sim_metsim.force_file_list.options[0].name = f"{out_dirname}/{out_filename}"

sim_default.run('local', run_suffix='default')
sim_metsim.run('local', run_suffix='metsim')

From the plot below we can see that while the MetSim derived forcing 
is not as good as the observed forcings, they still produce SWE results 
that are reasonably close to the observed values. MetSim does have a number
of tunable parameters such as the lapse rate or fraction of shortwave radiation
to transmit on rainy days which can be used to adjust the forcings generated.

In [None]:
obs = xr.open_dataset('./data/reynolds/ReynoldsCreek_valData.nc')
obs.sel(time=slice('2005/07/01', '2006/09/30'))['SWE'].plot(label='Observed', color='black')
sim_default.output['scalarSWE'].plot(label='Observed forcings')
sim_metsim.output['scalarSWE'].plot(label='MetSim forcings')
plt.legend()