# Building Tensorflow time series deep learning model

## Introduction

This project will build a neural network model to predict sunspot activity.
The goal is to:
- Build a neural network model with a provided dataset using Tensorflow packages.
- Make predictions of sunspot activity using the model.

Before building and executing the neural network model, basic EDA, data cleaning, and other manipulations will be conducted to prepare the data for modeling if necessary.

Modeling follows the steps:
1. Importing packages and loading data
2. Exploring the data and completing the cleaning process (optional)
3. Building a neural network
4. Evaluating the model

### Step 1: Importing packages and loading data

#### 1.1. Import packages

Import relevant Python packages.

In [54]:
# Standard operational packages
import csv
import tensorflow as tf
import numpy as np
import urllib
import pandas as pd

# Data preparation packages
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# Modeling and evaluation packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, LSTM, Dense
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.losses import Huber

#### 1.2. Load the dataset

Download the `Sunspot.csv` dataset from the tensorflow data storage

Save it as `sunspots.csv` into a local directory.

In [55]:
url = 'https://storage.googleapis.com/download.tensorflow.org/data/Sunspots.csv'
urllib.request.urlretrieve(url, '../data/sunspots.csv')

('../data/sunspots.csv', <http.client.HTTPMessage at 0x7fb0371aa3d0>)

In [56]:
df = pd.read_csv('../data/sunspots.csv').drop(columns=['Unnamed: 0'])
df

Unnamed: 0,Date,Monthly Mean Total Sunspot Number
0,1749-01-31,96.7
1,1749-02-28,104.3
2,1749-03-31,116.7
3,1749-04-30,92.8
4,1749-05-31,141.7
...,...,...
3230,2018-03-31,2.5
3231,2018-04-30,8.9
3232,2018-05-31,13.2
3233,2018-06-30,15.9


### Step 2: Exploring the data and completing the cleaning process

#### 2.1. Prepare the data

After downloading the dataset, prepare the data to be suitable for a neural network model.
- Exploring the data
- Checking for missing values
- Encoding the data
- Split the `original` dataset into `train` and `test` dataset. 

#### 2.2. Explore the the data

Use functions to take a look at the data
- `shape`
- `info()`

In [57]:
df.shape

(3235, 2)

In [58]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3235 entries, 0 to 3234
Data columns (total 2 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Date                               3235 non-null   object 
 1   Monthly Mean Total Sunspot Number  3235 non-null   float64
dtypes: float64(1), object(1)
memory usage: 50.7+ KB


#### 2.3. Check for missing values

Check for missing values in the rows of the data.

In [59]:
df.isna().sum()

Date                                 0
Monthly Mean Total Sunspot Number    0
dtype: int64

### 3. Building a neural network model

#### 3.1. Create the training and testing data

From the dataframe review, this project needs the `Monthly Mean Total Sunspot Number` field.
1. Extract only the field from `df`.
2. Convert the dataframe object into array object.
3. Normalize the data with a scaler.
4. Create a `training` set with between `1st` and `3000th` rows of the data.
5. Create a `validation` set with the remaining rows. 

In [60]:
#1. Extract only the field from `df`.
df = df.drop(columns=['Date'])
df

Unnamed: 0,Monthly Mean Total Sunspot Number
0,96.7
1,104.3
2,116.7
3,92.8
4,141.7
...,...
3230,2.5
3231,8.9
3232,13.2
3233,15.9


In [61]:
#2. Convert the dataframe object into array object.
series = np.array(df).reshape(-1,1)
series

array([[ 96.7],
       [104.3],
       [116.7],
       ...,
       [ 13.2],
       [ 15.9],
       [  1.6]])

In [62]:
#3. Normalize the data with a scaler.
scaler = MinMaxScaler()
series = scaler.fit_transform(series)
series

array([[0.24284279],
       [0.26192868],
       [0.29306881],
       ...,
       [0.03314917],
       [0.03992968],
       [0.00401808]])

In [63]:
#4. Set a split number.
spilt_rows = 3000

In [64]:
#5. Create a `training` set with between `1st` and `3000th` rows of the data.
x_train = series[:spilt_rows]

In [65]:
#6. Create a `validation` set with the remaining rows. 
x_valid = series[spilt_rows:]

In [66]:
x_train.shape, x_valid.shape

((3000, 1), (235, 1))

#### 3.2. Create a neural network model

1. Set parameters for data set preparation.
2. Identify a method to prepare a train set for neural network model.
3. Prepare a train set and valid set with the method.
4. Set a model check point.
5. Set an optimizer.
6. Set a loss.
7. Compile the model.
8. Fit the model.

In [67]:
#1. Set parameters for data set preparation.
window_size = 30
batch_size = 32
shuffle_buffer_size = 1000

In [68]:
#2. Identify a method to prepare a train set for neural network model.
def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
    ds = tf.data.Dataset.from_tensor_slices(series)
    ds = ds.window(window_size + 1, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size + 1))
    ds = ds.shuffle(shuffle_buffer)
    ds = ds.map(lambda w: (w[:-1], w[1:]))
    return ds.batch(batch_size).prefetch(1)

In [69]:
#3. Prepare a train set and valid set with the method.
train_set = windowed_dataset(x_train, window_size=window_size, batch_size=batch_size, shuffle_buffer=shuffle_buffer_size)
valid_set = windowed_dataset(x_valid, window_size=window_size, batch_size=batch_size, shuffle_buffer=shuffle_buffer_size)

In [70]:
#4. Create a neural network model.
model = Sequential([
    Conv1D(60, kernel_size=5, padding='causal', activation='relu', input_shape=[None,1]),
    LSTM(60, return_sequences=True),
    LSTM(60, return_sequences=True),
    Dense(30, activation='relu'),
    Dense(30, activation='relu'),
    Dense(1)
])

In [71]:
#5. Set a model check point.
checkpoint_path = '../model/temp_checkpoint.ckpt'
checkpoint = ModelCheckpoint(filepath=checkpoint_path,
                             save_weights_only=True,
                             save_best_only=True,
                             monitor='val_mae',
                             verbose=1,
                            )

In [72]:
#6. Set an optimizer.
optimizer = SGD(learning_rate=0.00001,
                momentum=0.9)

In [73]:
#7. Set a loss.
loss = Huber()

#### 3.3. Execute the neural network Model

1. Compile the model.
2. Fit the model.
3. Evaluate the results.

In [74]:
#1. Compile the model.
model.compile(optimizer=optimizer,
              loss=loss,
              metrics=['mae'])

In [75]:
#2. Fit the model.
model.fit(x=train_set,
          validation_data=(valid_set),
          epochs=10,
          callbacks=[checkpoint])

Epoch 1/10
     93/Unknown - 6s 24ms/step - loss: 0.0416 - mae: 0.2225
Epoch 00001: val_mae improved from inf to 0.18830, saving model to ../model/temp_checkpoint.ckpt
Epoch 2/10
Epoch 00002: val_mae improved from 0.18830 to 0.18176, saving model to ../model/temp_checkpoint.ckpt
Epoch 3/10
Epoch 00003: val_mae improved from 0.18176 to 0.17612, saving model to ../model/temp_checkpoint.ckpt
Epoch 4/10
Epoch 00004: val_mae improved from 0.17612 to 0.17119, saving model to ../model/temp_checkpoint.ckpt
Epoch 5/10
Epoch 00005: val_mae improved from 0.17119 to 0.16684, saving model to ../model/temp_checkpoint.ckpt
Epoch 6/10
Epoch 00006: val_mae improved from 0.16684 to 0.16289, saving model to ../model/temp_checkpoint.ckpt
Epoch 7/10
Epoch 00007: val_mae improved from 0.16289 to 0.15917, saving model to ../model/temp_checkpoint.ckpt
Epoch 8/10
Epoch 00008: val_mae improved from 0.15917 to 0.15566, saving model to ../model/temp_checkpoint.ckpt
Epoch 9/10
Epoch 00009: val_mae improved from 0.

<keras.callbacks.History at 0x7fb02fc96a10>

In [76]:
#3. Evaluate the results.
model.load_weights(checkpoint_path)
print('evaluate:', model.evaluate(valid_set))
model.save('../model/tensorflow-sunspots.h5')


evaluate: [0.020133497193455696, 0.14925356209278107]
