# Evaluating model performance 
## Overview
Todaye are going to calculate error metrics to evaluate the skill of modules. By the end of this session we will have computed the:

- Mean Error - $\it{ME}$
- Mean Absolute Error - $\it{MAE}$
- Skill Score - $\it{SS}$

These have been introduced to you in class so they are (hopefully) at least somewhat familiar to you by now. Briefly, the $\it{ME}$ computes the means of the differences between the modelled ($\it{m}$) and observed ($\it{o}$) series: 

\begin{equation*}
ME=\frac{1}{n}\sum_{i=1}^n m_i - o_i
\end{equation*} 

whereas the $\it{MAE}$ evaluates the mean of the $\it{absolute}$ differences between the modelled and observed series:

\begin{equation*}
MAE=\frac{1}{n}\sum_{i=1}^n | m_i - o_i |
\end{equation*} 

$\it{SS}$ then provides us with a means to weigh the performance of competing models against one another:

\begin{equation*}
SS=1-\frac{E}{E_{ref}}
\end{equation*} 

Where $E_{ref}$ is the reference model against which we are evaluating performance. Clearly, as the error of our new model ($E$) tends to zero, SS approaches one. If the new model does not do such a good job as the reference ($E$ >$E_{ref}$) , the SS will be negative. 

In what follow we will import the data processed in the last session, where we modelled the day-of-year mean temperature climatology using two approaches:

- 'doy_mean' = the temperature for any day was modelled as the arithmetic mean of all temperatures observed on the same calendar day (irrespective of year)
- 'clim' = a sine wave was fitted to the daily mean temperature series

We then will compute the error metrics on these models of the climatology, enabling us to evaluate which is a more appropriate climatology baseline in our assessment of daily mean air temperature forecasting skill for Loughborough campus. 

### Tips for the assignment 
Remember that your assignment is a write-up of the different modelling approaches, including your judegment of which method we should use in an operational forecasting application. All of these practical sessions will therfore be highly relevant, as they will essentially provide you with the output needed to complete the assignment. Note, however, that they will *not* repeat things -- even though that may be what you need to do. For example, today we will run code segments to compute error metrics, and it is expected that you will be able to do this $\it{yourself}$ to evaluate the performance of other models we test in the future. What this means in practice is that you are expected to be able to copy/paste/edit the code segments I provide to you -- so that you can, for example, use the error metrics -- across *all* models -- to underpin your conclusion as to which model we should use at Loughborough University. I will regularly remind you of this in the closing instructions for each practical session (under the section: 'Your challenge -- to be completed before the next practical'). The challenges there will often require you to adapt code from other practical sessions to get the job done; and the output may be very useful for your assignment. 

## Instructions

Begin by importing the pandas, numpy, and matplolib modules:

In [53]:
import pandas as pd, numpy as np, matplotlib.pyplot as plt

We are now going to read in the .csv file we saved at the end of the last session. To enable this code block to run, you will need to replace the string assigned to 'fin' with the full path of where you saved the file; if you are unsure where this is, check your code from the last session. 

Importantly, I want you to pay close attention to how we read in the file using the read_csv method of the pandas module. You will use this in almost every session, and you may need to recycle this code independently for your assignment. Note that there are three arguments (or, 'options') whose value I set: 'filepath_or_buffer', 'parse_dates', and 'index_col'. There are many more you *could* set (see here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html), but this is all we need to make sure things are set up correctly. Often, getting your data read in correctly can take a few attempts as you iterate towards success: try -> inspect data -> change options -> try...

Run the code below to import the data: 

In [54]:
fin="C:/Users/gytm3/OneDrive - Loughborough University/Teaching/GYP035/201920/DoY_climatology_tmean.csv"
data=pd.read_csv(filepath_or_buffer=fin,parse_dates=True,index_col=0) # Note: you could ignore the "filepath_or_buffer=" bit
# and instead just have pd.read_csv(fin,parse_dates=True,index_col=0). Pandas will in this case just assume that the first 
# argument is the file path. 

How do you know that the data have been read in correctly? If you recall last session we touched on this with use of the command 'data.columns' -- which shows the columns attribute of data (the names of our variables -- the column headers). There is, however, an even better way of getting an idea of whether your data have been read in correctly: the 'head' method. Run the code below to use this to inspect data. 

In [55]:
nrows=10 # Change this to see more or less rows from the 'head' method, below. 
data.head(nrows)

Unnamed: 0_level_0,obs,doy_mean,clim
TIMESTAMP,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2016-04-08,9.283875,10.221156,9.473328
2016-04-09,7.281344,7.598104,9.581643
2016-04-10,7.00274,7.250464,9.690412
2016-04-11,10.468208,7.79818,9.799605
2016-04-12,10.058594,7.81763,9.909188
2016-04-13,10.684615,9.476432,10.019128
2016-04-14,9.619125,9.503677,10.129395
2016-04-15,6.991844,9.448984,10.239954
2016-04-16,4.890479,9.318544,10.350773
2016-04-17,6.352,10.925599,10.461819


The syntax is simple, and, hopefully, intuitive. 

These inspections are particularly important for a few reasons: 1) it reminds us how to 'index' variables within data (i.e. we can see what they are called); 2) it allows us to check that the date has been read in correctly. Pandas is super-smart when it comes to recognizing dates and interpreting them correctly... But it does make some mistakes, and this check is criticial. 

We are next going to compute the error metrics for doy_mean. Run the code below to find out the $\it{ME}$ and $\it{MAE}$. Read the comments to make sure you understand what is going on. 

In [56]:
me=np.mean(data["doy_mean"]-data["obs"]) # We use the 'mean' method of the numpy module to 
# calculate the mean of doy_mean-obs
print("Mean error is: ", me) # normal use of the print function to show
# the answer!
mae=np.mean(np.abs(data["doy_mean"]-data["obs"])) # same as above, but here we use the 
# numpy method 'abs' to compute the absolute value of obs-doy_mean. Remember
# that this means all negatives become positive, and all postives stay
# positive. 
print("Mean absolute error is: ", mae) # normal use of the print function to show

Mean error is:  3.934967682215745e-17
Mean absolute error is:  1.9636835620033317


Before you proceed, a few questions that you should have answered by a member of staff:

- Describe the result for $\it{ME}$ in your own words (I want to see if you understand the number format)

- Is $\it{ME}$ a useful measure in this instance? If not, why?

- What are the $\it{units}$ for these error metrics? 

We are now going to repeat the computation for these error metrics, but this time using our own $\it{function}$ to wrap up all the code for us. We do this because we $\it{know}$ that we will be re-using the above code in the future and, as a rule, code that will be re-used frequently is best written in to a function -- it eliminates the chance of us making a mistake somewhere with copy/pasting: so long as we write the function correctly, it will $\it{always}$ return the correct output if given the correct input. 

In the code section below, I show you another feature of syntax in python: everthing I write between """ is interpreted as comments not as code. This is a convenient way of writing extended notes to document how the function works, and any other important information about the function that the programmer thinks is relevant.

Run the code below to have the python interpreter register our "error_metrics" function.

In [57]:
def error_metrics(obs,mod,summary=False):
    
    """
    This function returns the mean error (me) and 
    mean absolute (mae). Details of required input,
    output, and notes are provided below. 
    
    Input: 
    
        - obs: column of a pandas dataframe (a Series type)
               corresponding to the observed quanity
        - mod: column of pandas dataframe (a Series type)
               corresponding to the modelled quantity. 
        - summary: boolean ('True' or 'False'). If True
               the function will print the me and mae. 
               
    Output: 
    
        - me:  mean error
        - mae: mean absolute error
        
    Notes: 
        - No checking of input is performed. 
        - Requires numpy
    
    Change log:
        - created 03/11/2019 by t.matthews@lboro.ac.uk
        
    """
    me=np.mean(mod-obs) 
    mae=np.mean(np.abs(mod-obs))
    if summary:
        # Note I use a differnt format 
        # to print here. It allows me 
        # to control how many digits
        # behind the decimal point I 
        # show (here %.2f = 2)
        print("ME = %.2f"%me)
        print("MAE = %.2f"%mae)
        
    return me,mae # This 'returns' variables from 'inside' 
                  # the function to the 'outside' of the 
                  # function. 
    

It will look like nothing happens when you run the above, but python will add this function to its memory bank (for the duration of this session). 

Below we will test that the function works by using it to compute the errors for doy_mean; they should match the values returned earlier. To begin with, set summary to True. This will have the $\it{function}$ print the error metrics to screen

In [58]:
me,mae=error_metrics(data["obs"],data["doy_mean"],summary=True)

ME = 0.00
MAE = 1.96


Now, we'll set summary to False and re-run the code:

In [59]:
me,mae=error_metrics(data["obs"],data["clim"],summary=False)

This stops the function printing the results. We will, instead, print me and mae ourselves. We do this to demonstrate that the syntax me,mae=error_metrics(data["obs"],data["doy_mean"],summary=True) assigns the mean error computed by the function to the variable 'me', and the mean absolute error to the variable 'mae'; they now exist $\it{outside}$ the function and can be used for any other analysis we want to conduct (in this case just printing their values): 

In [60]:
print("ME = %.2f"%me)
print("MAE = %.2f"%mae)

ME = -0.00
MAE = 2.26


On the other hand, the variable 'summary' does not exist outside the function (because we didn't ask the function to $\it{return}$ it). The following code will therefore return an error:  

In [61]:
print(summary)

NameError: name 'summary' is not defined

### Your challenge -- to be completed before the next practical
- (1) Use the function we used above to compute the me and mae for clim (the sine wave model created last time).
- (2) Compute $\it{SS}$ using the $\it{MAE}$. What does this tell us about the relative performance of the two  approaches for estimating the climatology?

Tip: the arithmetic operations in python syntax are quite intuitive (e.g. '-' is the subtraction operator; and "/" is divide). If you ever need to look up this syntax, consult the docs: e.g. "Mapping Operators to Functions" table in https://docs.python.org/3/library/operator.html)
- (3) Try writing your own function to compute the $\it{SS}$. The answer you get from this function should match that computed in (2)

Before trying (4) you need to download "TomModel.csv" from Learn. The data here are generated with exactly the same method (and have the same labels: doy_mean & clim), but they were calibrated using data from 2016-2018 and we have here 'predictions' for 2019 only. This means we will evaluate the models using data that were $\it{not}$ involved at all in the fitting procedure. 

- (4) Compute the $\it{MAE}$ for doy_mean and clim in TomModel.csv. Which suffers more in performance? Why do you think this is? [Hint: look back at how doy_mean is defined -- and ask yourself what would be the impact of removing an entire year's worth of data?]