# Summary:

#### In this notebook, the optimal network configurations obtained from the runs in the preceding notebook ('02_Freddie_Freeloader.ipynb') are applied to the clean data set with added random noise. The noise is added in two different ways: 
#### 1) 'Add distortion': The noise is fed as separate input to the model while the target variable remains the clean time series
#### 2) 'Distort signal': The noise is added ontop of the clean time series data, which is taken as the target variable.
#### In both scenarios, the model performance is measures for varying standard deviation of the noise level.

#### The best performing model configuration will be used in the notebook '03_Blue_in_Green.ipynb' to analyze the impact of adding noise to the clean dataset.

# Table of contents
* [1. Load modules](#Part1_link)
* [2. Distortion next to clean time series](#Part2_link)
<br >&nbsp;&nbsp;&nbsp;[2.1 Evaluate model performance under varying noise levels](#Part2.1_link)
<br >&nbsp;&nbsp;&nbsp;[2.2 Visualize and save results](#Part2.2_link)
* [3. Distorted time series](#Part3_link)
<br >&nbsp;&nbsp;&nbsp;[3.1 Evaluate model performance under varying noise levels](#Part3.1_link)
<br >&nbsp;&nbsp;&nbsp;[3.2 Visualize and save results](#Part3.2_link)

<a id='Part1_link'></a>
# 1. Load modules

In [1]:
import sys
sys.path.append("../src/")
import Kind_of_Blue  # own class with a collection of methods used in this analysis

import tensorflow as tf

import matplotlib.pyplot as plt

plt.style.use('fivethirtyeight')

import numpy as np
import pandas as pd


<a id='Part2_link'></a>
# 2. Distortion next to clean time series

Run RNN and LSTM model with noise as separate feature. Evaluate model performance for a range of variances of the noise.

In [2]:
# set a range of dates on which the observations are made
idx = pd.date_range(end='7/1/2020', periods=5*364, freq='d')

# take a sine function as the observations
num_periods = 10  # number of sine periods
observations = [np.sin(2*np.pi*num_periods*x/len(idx)) for x in range(len(idx))]

# initialize object
mdq = Kind_of_Blue.Kind_of_Blue()

# set target feature 
mdq._selected_features = ['observations']

# set number of time points for 1/ future forecasting points and 2/ the past, historical time points
future_target_size = int(365/52)
past_history_size = int(1*365)

# specify model configuration: this is chosen basen on the results from the previous notebook 02_Freddie_Freeloader.ipynb
units = 256  # number of units in each neural network layer
num_layers = 2  # total number of layers
epochs = 30


<a id='Part2.1_link'></a>
### 2.1 Evaluate model performance under varying noise levels

The following steps are repeated from the previous notebook, '02_Freddie_Freeloader.ipynb', and are grouped into one single step here for simplicity.

In [None]:
# add random noise with zero mean and varying standard deviation as a separate feature to the input data 
standard_deviations = [0.01, 0.5, 1.0, 10.0]

# initialize results dictionary
res = {'model_type': [], 'std': [], 'val_mse': []
       , 'mse': [], 'total_training_time': []}

# model type 
model_types = ['RNN', 'LSTM']

for model_type in model_types:
    
    for std in standard_deviations:
        
        # generate Gaussian noise
        mean = 0.0
        noise = [np.random.normal(loc=mean, scale=std, size=None) for x in range(len(idx))]

        # initialize dataframe to store time series
        df = pd.DataFrame(data={'observations': observations, 'noise': noise})
        df.index = idx

        # plot clean data and added noise
        df.plot(alpha=0.5)
        # save figure
        fig_name = model_type + '_std_' + str(std) + '.jpg'
        plt.savefig('../images/03_' + fig_name, dpi=500)

        # load dataframe into object
        mdq.df = df

        # initialize dataset from dataframe 
        mdq.initialize_dataset()

        # standardize data
        mdq.standardize_data()

        # generate train and validation data
        mdq.generate_train_and_val_data(future_target_size=future_target_size, past_history_size=past_history_size)

        # set number of steps per epoch
        num_samples = mdq._num_samples
        steps_per_epoch = int(num_samples/future_target_size)
        validation_steps = int(steps_per_epoch/2)

        # compile model
        mdq.compile_model(units=units, num_layers=num_layers, model_type=model_type)

        # fit model
        mdq.fit_model(epochs=epochs, steps_per_epoch=steps_per_epoch
                      ,validation_steps=validation_steps, model_type=model_type)

        # get errors
        history = mdq._histories[model_type]
        val_mse = history.history['val_mse'][-1]
        mse = history.history['mse'][-1]

        # get total training time
        total_training_time = sum(mdq._time_callbacks[model_type].times)

        # append results to results dictionary
        res['model_type'].append(model_type)
        res['std'].append(std)
        res['val_mse'].append(val_mse)
        res['mse'].append(mse)
        res['total_training_time'].append(total_training_time)

training set shape: x:(909, 365, 1), y:(909, 7, 1)
validation set shape: x:(174, 365, 1), y:(174, 7, 1)
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
 19/129 [===>..........................] - ETA: 19s - loss: 0.0114 - mse: 0.0114

<a id='Part2.2_link'></a>
### 2.2 Visualize and save results

In [None]:
# transform dictionary to dataframe
df_res = pd.DataFrame(res)

# store dataframe as csv locally
df_res.to_csv('../data/03_results_addedNoise.csv')

<a id='Part3_link'></a>
# 3. Distorted time series

Run RNN and LSTM model with noise added ontop of the clean data. Evaluate model performance for a range of variances of the noise.

In [2]:
# set a range of dates on which the observations are made
idx = pd.date_range(end='7/1/2020', periods=5*364, freq='d')

# take a sine function as the observations
num_periods = 10  # number of sine periods
observations = [np.sin(2*np.pi*num_periods*x/len(idx)) for x in range(len(idx))]

# initialize object
mdq = Kind_of_Blue.Kind_of_Blue()

# set target feature 
mdq._selected_features = ['observations']

# set number of time points for 1/ future forecasting points and 2/ the past, historical time points
future_target_size = int(365/52)
past_history_size = int(1*365)

# specify model configuration: this is chosen basen on the results from the previous notebook 02_Freddie_Freeloader.ipynb
units = 256  # number of units in each neural network layer
num_layers = 2  # total number of layers
epochs = 30


<a id='Part3.1_link'></a>
### 3.1 Evaluate model performance under varying noise levels

In [None]:
# add random noise with zero mean and varying standard deviation as a separate feature to the input data 
standard_deviations = [0.01, 0.5, 1.0, 10.0]

# initialize results dictionary
res = {'model_type': [], 'std': [], 'val_mse': []
       , 'mse': [], 'total_training_time': []}

# model type 
model_types = ['RNN', 'LSTM']

for model_type in model_types:
    
    for std in standard_deviations:
        
        # generate Gaussian noise
        mean = 0.0
        noise = [np.random.normal(loc=mean, scale=std, size=None) for x in range(len(idx))]

        # initialize dataframe to store time series
        df = pd.DataFrame(data={'observations': observations, 'noise': noise})
        df.index = idx

        # plot clean data and added noise
        df.plot(alpha=0.5)
        # save figure
        fig_name = model_type + '_std_' + str(std) + '.jpg'
        plt.savefig('../images/03_' + fig_name, dpi=500)

        # load dataframe into object
        mdq.df = df

        # initialize dataset from dataframe 
        mdq.initialize_dataset()

        # standardize data
        mdq.standardize_data()

        # generate train and validation data
        mdq.generate_train_and_val_data(future_target_size=future_target_size, past_history_size=past_history_size)

        # set number of steps per epoch
        num_samples = mdq._num_samples
        steps_per_epoch = int(num_samples/future_target_size)
        validation_steps = int(steps_per_epoch/2)

        # compile model
        mdq.compile_model(units=units, num_layers=num_layers, model_type=model_type)

        # fit model
        mdq.fit_model(epochs=epochs, steps_per_epoch=steps_per_epoch
                      ,validation_steps=validation_steps, model_type=model_type)

        # get errors
        history = mdq._histories[model_type]
        val_mse = history.history['val_mse'][-1]
        mse = history.history['mse'][-1]

        # get total training time
        total_training_time = sum(mdq._time_callbacks[model_type].times)

        # append results to results dictionary
        res['model_type'].append(model_type)
        res['std'].append(std)
        res['val_mse'].append(val_mse)
        res['mse'].append(mse)
        res['total_training_time'].append(total_training_time)

training set shape: x:(909, 365, 1), y:(909, 7, 1)
validation set shape: x:(174, 365, 1), y:(174, 7, 1)
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
 19/129 [===>..........................] - ETA: 19s - loss: 0.0114 - mse: 0.0114

<a id='Part3.2_link'></a>
### 3.2 Visualize and save results

In [None]:
# transform dictionary to dataframe
df_res = pd.DataFrame(res)

# store dataframe as csv locally
df_res.to_csv('../data/03_results_distored.csv')