<img src='https://repository-images.githubusercontent.com/121802384/c355bb80-7d42-11e9-9e0e-4729609f9fbc' alt='WRF-Hydro Logo' width="15%"/>

# Lesson 7 - Evaluating Model Performance 

## Overview
In this lesson, We will discuss WRF-Hydro model performance evaluation using model outcomes. After model calibration and validation, we have a best parameter set for WRF-Hydro model. Applying the best parameter set to an objectve basin, we will be able to generate various hydrological variables such as streamflow, snow, soil moisture and more. Many error functions are used for model perfomnace evaulation. In particular, we will be focused on the streamflow analysis.
As mentioned in Lesson 3, the WRF-Hydro output files employ standard netCDF4 files, and we use a few Python libraries and commands in this tutorial. 

## Introduction to our Python environment and libraries
As mentioned Lesson 3, we are using Python 3 for all exercises in this tutorial. Please refer to the Lesson 3 for details of python environemnt and libraries.

## Dataset for NetCDF4
In the lesson, the Python command of Dataset is used for WRF-Hydro outputs that are netCDF4 files.

`Dataset('path-to-netcdf-file','r')`: Read a single netCDF file. 

After we read the datasets, we can extract some variables which are included in the netCDF output files. This example indicates the variable of steamflow: 

`my_dataset = Dataset('path-to-netcdf-file','r')`

`streamflow_data = my_dataset.variables['streamflow']`


## Error Criteria
Some error functions are used to evaluate WRF-Hydro model performances. 

| Error Functions | Formula | Best Match Value |
| ------------- | ------------- | ------------- |
| Root Mean Square Error | $$ \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(OBS_i - SIM_i)^2} $$ | <center> 0 </center> |
| Percent Bias | $$ \text{PBIAS} = \frac{\sum_{i=1}^{n} (OBS_i - \ SIM_i)}{\sum_{i=1}^{n} OBS_i} \times 100 $$ | <center> 0 </center> |
| Nash–Sutcliffe efficiency | $$ \text{NSE} = 1 - \frac{\sum_{i=1}^{n} (OBS_i - \ SIM_i)^2}{\sum_{i=1}^{n} (OBS_i - \bar{OBS})^2} $$ | <center> 1 </center> |
| Kling-Gupta Efficiency | $$ \text{KGE} = 1 - \sqrt{(r - 1)^2 + (\beta - 1)^2 + (\gamma - 1)^2} $$ | <center> 1 </center> |

## Examples
### Plot and Calculate Errors in Daily Streamflow

In this example, we will plot the simulated and observed values using real 2-year WRF-Hydro model outputs, and calculate the errors between them.

**Load WRF-Hydro Output**

In [None]:
path = "/home/docker/wrf-hydro-training/example_case/supplemental/wrf-hydro-output/"
usgsid = 13010065 # SNAKE RIVER AB JACKSON LAKE AT FLAGG RANCH WY
comid = 23123539

**Load the libraries**

In [None]:
import datetime
import numpy as np
import pandas as pd
import hydroeval as hydro
import matplotlib.pyplot as plt
import math
from netCDF4 import Dataset
from datetime import date, timedelta

**Read netCDF files and Extract Daily Streamflow Simulations**

In [None]:
# Simulated from 2019-01-01 to 2020-12-31
begDate = datetime.datetime(2019,1,1)
endDate = datetime.datetime(2021,1,1)
dt = endDate - begDate
nHours = dt.days*24
time1 = np.empty([nHours],np.int32)
data1 = np.empty([nHours],np.float32)
# Read Simulated Streamflow 
for hour in range(0,nHours):
    dcurrent = begDate + datetime.timedelta(seconds=3600*hour)
    fileIn = path + dcurrent.strftime('%Y%m%d%H') + '00.CHANOBS_DOMAIN1'
    todays = dcurrent.strftime('%Y%m%d%H')
    idTmp = Dataset(fileIn,'r')
    index = np.where(idTmp.variables['feature_id'][:] == comid)[0]
    time1[hour] = int(todays)
    data1[hour] = idTmp.variables['streamflow'][0][index]
timeday = np.arange(np.datetime64("2019-01-01"), np.datetime64("2021-01-01"))
timehour = np.arange(np.datetime64("2019-01-01"), np.datetime64("2021-01-01"), timedelta(hours=1))
base_hour = pd.DataFrame(data1[:])
# cms to cfs
base_hour = base_hour * 35.314666
base_hour['time'] = timehour
base_hour.columns = ['Base_Hours','Time']
base_day = base_hour.set_index('Time').resample('24H').mean()

**Read .csv file and Extract Daily Streamflow Observations**

In [None]:
# Read Observed Streamflow
pathobs = path + str(usgsid) + '.csv'
obsdf = pd.read_csv(pathobs)
# cms to cfs
obsdf.obs = obsdf.obs * 35.314666
print (obsdf)

**Calculate Daily Streamflow Errors**

In [None]:
SIM=pd.DataFrame(base_day); SIM=SIM.reset_index(drop=True); SIM.columns=['flow']
OBS=pd.DataFrame(obsdf.obs); OBS=OBS.reset_index(drop=True); OBS.columns=['flow']
# Root mean square error
rmse = np.sqrt(np.mean((OBS-SIM)**2))
# Percent Bias
aver_bias=np.mean(SIM-OBS)
pv = np.std(OBS)
pbias = 100 * (abs(aver_bias) / pv)
# Nash-Sutcliffe efficiency
nse = hydro.evaluator(hydro.nse, SIM,OBS)
# Kling-Gupta effciency
kge,r,alpha,beta=hydro.evaluator(hydro.kge,SIM,OBS)

**Plotting Daily Streamflow**

In [None]:
# Plot
fig = plt.figure()
plt.plot(timeday,base_day,"-b",label="WRFHydro-Simulation")
plt.scatter(timeday,obsdf.obs,marker='.',color='r',zorder=10,label="Observed")
plt.xlabel('Time (YEAR)')
plt.ylabel('Daily Streammflow (CFS)')
plt.legend(loc='upper left')
titlename = 'USGS : ' + str(usgsid)
plt.title(titlename)
plt.xticks(rotation=45);
fig.text(0.7,0.8,'OBS vs. Baseline')
RMSE="RMSE   = {:.2f}".format(rmse.item())
PBIAS="Pbias = {:.2f}".format(pbias.item())
NSE="NSE = {:.2f}".format(nse.item())
KGE="KGE = {:.2f}".format(kge.item())
fig.text(0.7,0.75,str(RMSE))
fig.text(0.7,0.70,str(PBIAS))
fig.text(0.7,0.65,str(NSE))
fig.text(0.7,0.60,str(KGE));

### Plot and Calculate Errors in Monthly Flow Volume
**Monthly Streamflow Volume**

In [None]:
# Definition 
def nofdays (x):
    return math.ceil(x / 24)
def cf2kaf_convert (x):
    return x*24*3600 / 43560 / 1000
# Monthly averages
base_mon = base_hour.set_index('Time').groupby(pd.Grouper(freq='M'))['Base_Hours'].mean()
# Number of days in each month
hours_in_mon = base_hour.set_index('Time').groupby(pd.Grouper(freq='M'))['Base_Hours'].count()
days_in_mon = hours_in_mon.apply(nofdays)
# cubic-feet to acre-feet
cf2kaf = days_in_mon.apply(cf2kaf_convert)
# Monthly flow volume
base_tot = base_mon * cf2kaf
print (base_tot)

**Read .csv file and Extract Monthly Observed Flow Volume**

In [None]:
# Read Observed Streaflow
pathobs = path + str(usgsid) + '.csv'
obsdf = pd.read_csv(pathobs)
obs_day = pd.DataFrame(obsdf.obs)
# cms to cfs
obs_day = obs_day * 35.314666
obs_day['newTime'] = timeday
obs_mon = obs_day.set_index('newTime').groupby(pd.Grouper(freq='M'))['obs'].mean()
obs_days_in_mon = obs_day.set_index('newTime').groupby(pd.Grouper(freq='M'))['obs'].count()
obs_cf2kaf = obs_days_in_mon.apply(cf2kaf_convert)
obs_tot = obs_mon * obs_cf2kaf
print (obs_tot)

**Calculate Monthly Flow Volume Errors**

In [None]:
# Root mean square error
rmse = np.sqrt(np.mean((obs_tot-base_tot)**2))
# Percent Bias
aver_bias=np.mean(base_tot-obs_tot)
pv = np.std(obs_tot)
pbias = 100 * (abs(aver_bias) / pv)
# Nash-Sutcliffe efficiency
nse = hydro.evaluator(hydro.nse,base_tot,obs_tot)
# Kling-Gupta effciency
kge,r,alpha,beta=hydro.evaluator(hydro.kge,base_tot,obs_tot)

**Plotting Monthly Flow Volume**

In [None]:
# Plot
fig = plt.figure()
xxx = np.arange(np.datetime64("2019-01"), np.datetime64("2021-01"))
plt.plot(xxx,base_tot,"-b",label="WRFHydro-Simulation")
plt.scatter(xxx,obs_tot,marker='.',color='r',zorder=10,label="Observed")
plt.xlabel('Time (YEAR)')
plt.ylabel('Monthly Flow Volume (kac-ft)')
plt.legend(loc='upper left')
titlename = 'USGS : ' + str(usgsid)
plt.title(titlename)
plt.xticks(rotation=45);
fig.text(0.7,0.8,'OBS vs. Baseline')
RMSE="RMSE   = {:.2f}".format(rmse.item())
PBIAS="Pbias = {:.2f}".format(pbias.item())
NSE="NSE = {:.2f}".format(nse.item())
KGE="KGE = {:.2f}".format(kge.item())
fig.text(0.7,0.75,str(RMSE))
fig.text(0.7,0.70,str(PBIAS))
fig.text(0.7,0.65,str(NSE))
fig.text(0.7,0.60,str(KGE));

### Plot Annual Accumulated Flow

**Re-Arrange Simulation and Observations**

In [None]:
# Save Stremflow to each date
newsyear=2019
neweyear=2020
nday = 365
obsval = []
baseval = []
for j in range(newsyear,neweyear):
    syearday = str(j) + '-01-01'
    eyearday = str(j) + '-12-31'
    obs1yr = obs_day[(obs_day.newTime >= syearday) & (obs_day.newTime <= eyearday)]
    base_day['Time'] = timeday
    sim1yr = base_day[(base_day.Time >= syearday) & (base_day.Time <= eyearday)]
    obs1yr_val=obs1yr['obs'].values; obs1yr_val=np.reshape(obs1yr_val,(nday,1))
    sim1yr_val=sim1yr['Base_Hours'].values; sim1yr_val=np.reshape(sim1yr_val,(nday,1))
    obsval.append(obs1yr_val)
    baseval.append(sim1yr_val)
obs_merge = np.concatenate(obsval,axis=1)
base_merge= np.concatenate(baseval,axis=1)
# cfs to kac-ft
obs_merge = obs_merge * 0.001983
base_merge = base_merge * 0.001983
df_obs = pd.DataFrame(obs_merge)
df_base = pd.DataFrame(base_merge)
# Median
base_median = df_base.median(axis=1)
basefin = base_median.cumsum()
obs_median = df_obs.median(axis=1)
obs_median_10q = df_obs.quantile(0.1,axis=1)
obs_median_90q = df_obs.quantile(0.9,axis=1)
obsfin = obs_median.cumsum()
obsfin_10q = obs_median_10q.cumsum()
obsfin_90q = obs_median_90q.cumsum()

**Plotting Annual Accumulated Flow**

In [None]:
# Plot
fig = plt.figure()
ax = plt.axes()
xxx = [j for j in range(0,nday)]
plt.plot(basefin,"-b",label="WRFHydro-Simulation")
plt.plot(obsfin,marker='.',color='r',zorder=10,label="Observed")
plt.fill_between(xxx,obsfin_10q,obsfin_90q,facecolor="grey",color='grey',alpha=0.5,label='Observations (10th-90th percentile)')
plt.xlabel('Time (Day)')
plt.ylabel('Accumulated Flow (kac-ft)')
plt.legend(loc='upper left')
titlename = 'USGS : ' + str(usgsid)
plt.title(titlename)
ax.set_xticks([0, 31, 59, 90, 120, 150, 181, 212, 243, 273, 303, 334])
ax.set_xticklabels(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.xticks(rotation=45);

### Plot Flow Duration Curve

**Sorted Simulation and Observations**

In [None]:
# Sorting Streamflow
base_sort=np.sort(base_day.Base_Hours)[::-1]
exceedence = np.arange(1.,len(base_sort)+1) / len(base_sort)
obs_sort=np.sort(obs_day.obs)[::-1]

**Plotting Flow Duration Curve**

In [None]:
# Plot
fig = plt.figure()
plt.plot(exceedence*100,base_sort,"-b",label="Baseline")
plt.plot(exceedence*100,obs_sort,"-r",label="Observed")
plt.ylabel("Flow Rate  [ CFS ]")
plt.xlabel("Exceedence [%] ")
titlename = 'USGS : ' + str(usgsid)
plt.title(titlename)