# Description

Our goal is to forecast Bitcoin (closing) prices using historical data. To accomplish this, we'll use a long short term memory (LSTM) model, a special kind of recurrent neural network (RNN) that is often used to model time series data.

Before training a model, we must first perform a few data preprocessing steps that are either required by the model (e.g., all data must be represented numerically), or improve the performance of the model (e.g., normalization). This notebook covers the following preprocessing steps:

1. Extracting relevant data from original dataset
2. Preparing time series data for LSTM
3. Creating training and test sets for model training
4. Normalizing the data
5. Reshaping data for LSTM

In [None]:
import os

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
import pickle

from oit_helpers import DataPreprocessors,LSTM

sns.set(style="darkgrid", font_scale=1.5)
%matplotlib inline

## 1. Import the data

In [None]:
data_location = './datasets/all-crypto-currencies/crypto-markets.csv'
data = pd.read_csv(data_location, 
                   parse_dates=['date'], 
                   index_col="date")
data.head(3)

## 2. Extract Bitcoin closing prices

In [None]:
bit_data = (data
            .query('slug=="bitcoin"')
            .copy()[['close']])
bit_data.shape

In [None]:
plt.figure(figsize=(10,4))
(sns.lineplot(x=bit_data.index, y=bit_data.close)
 .set_title("Closing price of Bitcoin"));

## 3. Initialize preprocessor object

In [None]:
# our custom preprocessor class
dprep = DataPreprocessors(data=bit_data)

## 3. Prepare time series data

A time series is a sequence of data points, equally spaced through time. Do you see why our Bitcoin prices constitute a time series? At what interval are they measured? 

When modeling time series data, we often use a "sliding window" in which we use a number of consecutive data points to predict the next data point (sometimes more) in the series. The number of consecutive data points is referred to as the **timestep**. Consider a timestep equal to `step = 90` for our data. Our sliding window would, for example, use the first 90 closing prices to predict the 91st closing price. Next, we'd take the 2nd through 91st set of observations to predict the 92nd data point, etc. 

Explore the `create_timeseries_sequences()` function included in `oit_helpers.py` and make sure you see what's going on. Slice and dice the original data and compare it to the output of the function. Specifically, ensure you see how the predictor and predicted variables are formed, and where they fall in relation to the original dataset.

**Task**

Read about timesteps and sliding windows for time series data. Research how to choose the timestep -- what implications it has for time series models.

In [None]:
series_data = dprep.original_data.iloc[:,0] # we need a pandas series for the function
historical_all,target_all = dprep.create_timeseries_sequences(data=series_data,
                                                              timestep=90)

In [None]:
print(historical_all.shape)
print(target_all.shape)

## 4. Train test split

Ahead of training a predictive model, you will almost always perform a train test split on the original data. Essentially, you split the data into a **training set** that you will use to build the model, and a **test set** that you use in the final evaluation of your trained model. It is very important that once you create a test set, you essentially forget about it until the very end. Even normalizing the training and test sets together is technically incorrect, since you're allowing information from the training set to *leak* into your test set.

A decision to make regarding a train test split is the proportion of the original data that you use for each set. Generally, we need a larger sample for the training set, while ensuring that the test set isn't too small. When you have big data, you'll commonly see a 90%/10% split.

**Task**

Read about train/test splits and why they're important for building predictive models. Pay close attention to sizing recommendations and play around with the proportion supplied to `train_test_split()`.

In [None]:
historical_train, historical_test, target_train, target_test = (dprep
                                                                .train_test_split(historical=historical_all, 
                                                                                  target=target_all, 
                                                                                  prop_train=.9))

## 5. Normalize the data

Data normalization is usually performed because it *normalizes* the scales of different variables. We're only using 1, but it's still a good practice. Plus, normalization is very helpful when working with neural networks in that it makes training more stable. A canonical choice is min-max normalization. 

**Task**

Write some code that normalizes the training data using the min-max algorithm. Once you work out a solution, add it as a new method to the DataPreprocessors class.

**Tips**
* Read about [min-max scaling](https://towardsdatascience.com/everything-you-need-to-know-about-min-max-normalization-in-python-b79592732b79)
* Sklearn provides a custom class for min-max normalization `sklearn.preprocessing import MinMaxScaler` (already imported)
* `MinMaxScaler` requires 2D data, so you will have to reshape `target_train` using `target_train.reshape(-1,1)`

In [None]:
# your code here

## 6. Save your data

In [None]:
with open('preprocessed_data_object.pkl','wb') as f:
    pickle.dump(dprep, f)