In [4]:
from pathlib import Path
import os

if Path('.').absolute().parents[1].name == 'ml_drought':
    os.chdir(Path('.').absolute().parents[1])

from src import models
models.__all__

# Models

The machine learning models take the data from the `data/features` directory and pass it to the models for training.

## The DataLoader

This is probably the most impressive (in our opinion!) and most complex bit of code because it's function is somewhat difficult to understand. The role of the Dataloader is to take the data from the `data/features/{experiment}/{train}/{timestep}/*.nc` directory and to convert it to a numpy array. This is done **lazily** to save on memory constraints.

Because the `DataLoader` is responsible for loading the data and sending it to the models, it is also where the pipeline implements model options, such as `including_latlons`, passing one-hot-encoded month labels and passing surrounding pixels to the models. The underlying data remains the same but the way that it is passed to the models differs.

<img src="img/dataloader_diagram.png" style='background-color: #878787; border-radius: 25px; padding: 20px'>

## Models API

Despite their differences in complexity the models have a standard API. The main entry point is through the `model.train()` function.

## Persistence

The first model is a parsimonious (simple) model used as a benchmark. Persistence simply takes the `target_variable` in the month prior to the `target_month` and predicts that the value will be the same.

For example, we are predicting `VHI` (our `target_variable`) in April (our `target_month`) using the value of `VHI` in March.

<img src="img/persistence_diagram.png" style='background-color: #878787; border-radius: 25px; padding: 20px'>

In [35]:
ls data/features/one_month_forecast

normalizing_dict.pkl


In [30]:
model = models.Persistence(experiment='one_month_forecast')
model.train()

In [31]:
ls data/models/one_month_forecast/previous_month

# LinearRegression

A simple model is the `LinearRegression` model.

<img src="img/regression_diagram.png" style='background-color: #878787; border-radius: 25px; padding: 20px'>

In [56]:
model = models.LinearRegression()

# [method for method in dir(model) if '__' not in method]
# help(model)

model.train()

Training linear_regression for experiment one_month_forecast


FileNotFoundError: [Errno 2] No such file or directory: 'data/features/one_month_forecast/train'

## Base Neural Network

The training of the neural networks follow this process.

<img src="img/train_nn_diagram.png" style='background-color: #878787; border-radius: 25px; padding: 20px'>

## Linear Neural Network

<img src="img/linear_nn_diagram.png" style='background-color: #878787; border-radius: 25px; padding: 20px'>

In [54]:
model = models.LinearNetwork(layer_sizes=[128])
model.train()

Training linear_network for experiment one_month_forecast


FileNotFoundError: [Errno 2] No such file or directory: 'data/features/one_month_forecast/train'

## Recurrent Neural Network (RNN) - LSTM Architecture

LSTM (long short term memory)

<img src="img/rnn_diagram.png" style='background-color: #878787; border-radius: 25px; padding: 20px'>

In [52]:
model = models.RecurrentNetwork(hidden_size=[128])
# help(models.RecurrentNetwork)
model.train()

Training rnn for experiment one_month_forecast


FileNotFoundError: [Errno 2] No such file or directory: 'data/features/one_month_forecast/train'

## Entity Aware LSTM - (EA RNN)

[`EARecurrentNetwork` (EA LSTM - entity aware long short term memory)](https://arxiv.org/pdf/1907.08456.pdf) takes the 

<img src="img/ea_rnn_diagram.png" style='background-color: #878787; border-radius: 25px; padding: 20px'>

In [53]:
model = models.EARecurrentNetwork(hidden_size=[128])
model.train()

Training ealstm for experiment one_month_forecast


FileNotFoundError: [Errno 2] No such file or directory: 'data/features/one_month_forecast/train'