# Analysing the pipeline results

Drought is an extremely damaging natural hazard. Operational drought monitoring often uses satellite measurements to determine drought severity and extent. In Kenya, the National Drought Monitoring Authority uses the **Vegetation Condition Index (VCI)** to determine whether or not to distribute emergency funds to counties ([Klitsch and Atsberger 2016](https://www.mdpi.com/2072-4292/8/4/267/htm))

Here we demonstrate the usefulness of our pipeline for predicting VCI from pre-existing hydrological and meteorological conditions (the weather!).

Our goal is to accurately predict *VCI* one month ahead. This would allow the Kenyan Drought Authority to proactively distribute funds ahead of damaging conditions. Machine learning has already been applied to this problem with impressive results ([Adede et al 2019](https://www.mdpi.com/2072-4292/11/9/1099/htm)). Can we do better?


### **NOTE 1**: 

We are using ready-to-use data that has already been through the pipeline.

Because of the time required to download, preprocess and train the models, the model here has already been trained and we are using the model predictions as saved in a zipfile in the `data` directory.

In order to run the pipeline end-to-end and reproduce the steps prior to this notebook, you will need to run the `run_demo.py` script in the `scripts/` directory.

#### The data:
Make sure you unzip the file `zip_data.zip`.

This will produce a folder `zip_data` which we will use as our base data directory. Remember, if you were to reproduce this analysis by running the pipeline from end-to-end then you would simply use the default data directory at `.data`.

Inside `data/zip_data` we have two directories. **`features`** and **`models`**.

**`features`** contains the data that has been through the `preprocessors` and the `engineers`. It only contains data for the `test` set in order to reduce the memory requirements. 

**`models`** contains data that has been predicted by the given model. It also contains a saved version.

Here we only have two models: **`ealstm`** and **`previous_month`** which is our baseline model.

### **NOTE 2**:
Here the data only includes the baseline `persistence` model and the state of the art Entity Aware Long-Short Term Memory (EALSTM) network. See the `notebooks/docs/Pipeline.ipynb` notebook for more information about the other models that we currently accomodate.

In [1]:
from pathlib import Path
import xarray as xr
import pandas as pd
import os

if Path('.').absolute().parents[1].name == 'ml_drought':
    os.chdir(Path('.').absolute().parents[1])

!pwd

  PANDAS_TYPES = (pd.Series, pd.DataFrame, pd.Panel)
  'DataArray', pd.Series, pd.DataFrame, pd.Panel]:


/Users/tommylees/github/ml_drought


In [2]:
# load the pre-trained model
# load the truth vs. predictions
# make plot of model performance (true vs. predictions)
# analyse by region / VCI3M
# demonstrate how this analysis might be run for a different problem.
# how do we determine feature contributions?
# how do we explore spatial patterns?

First let's load both the true and predicted data!

In [3]:
import xarray as xr

data_dir = Path('data/zip_data')
# assert data_dir.exists(), f'Make sure ' \
#     'that you have downloaded and unzipped the ' \
#     'zip_data. This contains the processed data ' \
#     'required to run the Notebook!'
data_dir = Path('/Volumes/Lees_Extend/data/zip_data')

We have a few convinient analysis tools for evaluating model performance. 

Most of these functions are defined in `src/analysis/evaluation.py`. They are accesible through `src.analysis`.

In [6]:
from src.analysis import monthly_score, annual_scores
%load_ext autoreload
%autoreload 2

In [7]:
scores = annual_scores(
    data_path=data_dir, 
    models=['ealstm', 'previous_month'],
    metrics=['rmse', 'r2'],
    verbose=False,
    to_dataframe=True
)

scores

Unnamed: 0,month,ealstm_rmse,previous_month_rmse,ealstm_r2,previous_month_r2
0,1.0,12.328923,15.91806,0.577197,0.295483
1,2.0,10.03873,9.6098,0.720184,0.743631
2,3.0,15.299802,18.995626,0.434448,0.128205
3,4.0,15.628013,24.495406,0.511018,-0.201306
4,5.0,21.278271,18.025452,0.18472,0.414927
5,6.0,13.961635,15.021294,0.665425,0.612703
6,7.0,14.345408,15.152611,0.599551,0.553215
7,8.0,13.139816,15.98964,0.66764,0.507862
8,9.0,14.346208,14.151048,0.60229,0.613059
9,10.0,14.993118,19.0527,0.531009,0.242681
