# Description
I will use several deep learning models for my time series predictions. 
* LSTM
* Transformer
* dialated CNN

In all cases I will include daily snowfall as an exogenous variable.

## Environment
For $ reasons I will use Colab

In [1]:
# get colab status
try:
    import google.colab
    IN_COLAB = True
    %tensorflow_version 2.x
except:
    IN_COLAB = False

In [2]:
# data wrangling
import numpy as np
import pandas as pd
import os.path

# viz
import altair as alt
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from scipy import stats
from sklearn.preprocessing import MinMaxScaler

In [3]:
# local code with hack to avoid cloing full repo each time colab is run
if IN_COLAB:
    projectcode = r"thttps://github.com/chrisoyer/ski-snow-modeling/blob/master/src/analysis/project_utils/project_utils.py"
    ! wget $projectcode
from project_utils.project_utils import *

In [4]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.models import load_model

# Parameters

In [5]:
alt.renderers.enable(embed_options={'theme': 'vox'})
alt.data_transformers.disable_max_rows()
plt.style.use('seaborn')
plt.rc('figure', figsize=(11.0, 7.0))

logs_path = "./logs/visualize_graph"

# Load Data

In [6]:
if IN_COLAB:
    from google.colab import drive
    drive.mount('/content/gdrive')
    os.chdir(r'/content/gdrive/My Drive/data_sci/colab_datasets/ski/')
    all_data_path = r'./data/snow_data_clean.parquet'
    mirrored_strategy = tf.distribute.MirroredStrategy()
else:
    all_data_path = r'../../data/snow_data_clean.parquet'
!pwd

/c/Users/User/Documents/GitHub/ski-snow-modeling/src/analysis


In [7]:
snow_df = pd.read_parquet(all_data_path)

### Reshape for TF input
Shape should match (__samples__, __time steps__, __features__)

In [23]:
## Quick& dirty
def data_slim(source=snow_df, station=None):
    return (source.query('station==@station')
            [['dayofyr', 'base', 'snowfall']]
           .to_numpy()
           )
copper = data_slim(station="Copper Mountain")
copper

array([[ 67.        ,  18.        ,   0.        ],
       [ 68.        ,  18.        ,   0.        ],
       [ 69.        ,  18.        ,   0.        ],
       ...,
       [432.        ,  35.43418834,   0.        ],
       [433.        ,  35.56049869,   0.        ],
       [434.        ,  35.6400577 ,   0.        ]])

# Plotting Functions
* residuals over time
* y vs yhat
* train & extrapolate



In [9]:
def error_plotter(history):
    loss_train = history.history['train_loss']
    loss_val = history.history['val_loss']
    plt.plot(epochs, loss_train, 'g', label='Training loss')
    plt.plot(epochs, loss_val, 'b', label='validation loss')
    plt.title('Training and Validation loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.show()

# Timeseries Modeling

The evolution of snow base depth over time depends (not 1:1; a foot of powder is only a few inches of packed powder) on new snowfall and melting of old snow. I will start by modeling as a simple timeseries, and then include new snowfall as a predictor variable.

## Modeling Setup
I will use supersetting crossvalidation (walk-forward CV) since this is a time series problem.

# TF LSTM models
 

In [10]:
# scale data
def scale(X):
    scaler = MinMaxScaler(feature_range=(-1, 1))
    scaler = scaler.fit(X)
    scaled_X = scaler.transform(X)

    # invert transform
    inverted_X = scaler.inverse_transform(scaled_X)
    return scaler, inverse_scaler

In [33]:
def make_lstm(neurons, layers, batch_size, x_shape):
    """
    Parameters:
        neurons: width of layers, eg (4,5,6) implies first hidden layer has 4
            neuron, 2nd layer has 5, third layer has 6
        batch size: ...
        x_shape: (rows, features)
    Returns: unfitted model
    """
    input_shape = (x_shape[0], x_shape[1])
    model = Sequential()
    for layer in range(layers):
        model.add(LSTM(neurons, batch_input_shape=input_shape, 
                   stateful=True, dropout=0.2, recurrent_dropout=0.2,))
    model.add(Dense(1))
    metrics = ['mean absolute error']  # TODO: custom r2 func
    model.compile(loss='root_mean_squared_error', metrics=metrics,
                  optimizer='adam')
    return model

def fit_model(model, train, batch_size, n_epoch):
    """runs the training; returns model and history"""
    for i in range(n_epoch):
        history = model.fit(X, y, epochs=n_epoches, batch_size=batch_size, 
                            verbose=0, shuffle=False, callbacks=[])
        model.reset_states()
    return model, history

# Vanilla LSTM Model

In [35]:
lstm_100x1 = make_lstm(neurons=100, layers=1,
                       batch_size=20, x_shape=copper.shape)
lstm_100x1, lstm_100x1_hst = fit_model(lstm_100x1)

plot_model(lstm_100, show_shapes=True, to_file='model.png')

ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [2925, 3]

Batch Size: 1
Epochs: 3000
Neurons: 4
--------------
or 1 neuron

# Compare Models