# Deep Earthquake Prediction

This notebook shows some new functionality that will be useful for applying deep learning on the earthquake data.

In [None]:
import os, gc, sys
import pandas as pd
import numpy as np
import pickle

import keras
from keras.models import Sequential, Model
from keras.layers import Dense, Conv1D, Flatten, Dropout, LSTM
from keras import backend as K

sys.path.append("../..")
from common.utils import progress
from earthquakes.engineering import save_earthquake_cycles, get_cycle
from earthquakes.deep import Scaler, KFoldCycles, train_on_cycles, evaluate_on_cycles

## Saving the earthquake data by cycle

In [None]:
# data_dir = "../data"
data_dir = "D:/KaggleData/earthquakes/"

# train = pd.read_csv(os.path.join(data_dir, "train.csv"))
train = pickle.load(open(os.path.join(data_dir, "train.pickle"), "rb"))

The `engineering` module now has two additional functions:
1. `find_earthquakes`: calculates the exact timing of earthquakes in a few chunks (3 by default) to prevent memory issues, while maintaining speed.
2. `save_earthquake_cycles`: calls `find_earthquakes` and then saves the entire training data per cycle. Note that there will be 17 cycles in total, since there is still data after the last earthquake that we can use.

The whole thing runs in a about 30 seconds, so no worries here:

In [None]:
save_earthquake_cycles(train, data_dir=data_dir)

After saving the cycles, we can delete the training data from memory, as we won't need it anymore. Instead, we will train on the cycles, one by one.

In [None]:
del train
gc.collect()

## Helper classes

There are two helper classes in the `deep` module. The first is the `Scaler`, which implements various scaling methods (see docstring for all of them).

In [None]:
# initialize a Scaler with a custom scaling value
scaler = Scaler(method="value", value=300)
# example: 
scaler.scale([150, 4.5, 2000])

The second is `KFoldCycles` which is a splitter like `sklearn`'s `KFold`, but instead splits cycle numbers.

In [None]:
splitter = KFoldCycles()
for train_cycles, val_cycles in splitter.split():
    print("train: {}, val: {}".format(train_cycles, val_cycles))

# Training a Deep Learning model

Let's first define a keras model:

In [None]:
model = Sequential()
model.add(Conv1D(32, kernel_size=15, strides=8, padding="causal", activation="relu", input_shape=(150000, 1)))
model.add(Conv1D(32, kernel_size=15, strides=3, padding="causal", activation="relu"))
model.add(Conv1D(32, kernel_size=15, strides=3, padding="causal", activation="relu"))
model.add(Conv1D(32, kernel_size=15, strides=3, padding="causal", activation="relu"))
# model.add(LSTM(16, activation="relu"))
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dense(1, activation="linear"))
model.compile(optimizer='adam',
              loss='mse')
print(model.summary())

Now, the `deep` module has two more functions that make it very easy to train and evaluate a model on different cycles. They are simply called `train_on_cycles` and `evaluate_on_cycles`.

In [None]:
cv_losses = []
for train_cycles, val_cycles in splitter.split():
    model = train_on_cycles(model, epochs=4, cycle_nrs=train_cycles, scaler=scaler, data_dir=data_dir)
    loss, cycle_losses, cycle_weights = evaluate_on_cycles(model, cycle_nrs=val_cycles, scaler=scaler, data_dir=data_dir)
    cv_losses.append(loss)

print("Mean Cross-Validation loss: {}".format(np.mean(cv_losses)))