#### Magic Commands
Magic commands (those that start with `%`) are commands that modify a configuration of Jupyter Notebooks. A number of magic commands are available by default (see list [here](http://ipython.readthedocs.io/en/stable/interactive/magics.html))--and many more can be added with extensions. The magic command added in this section allows `matplotlib` to display our plots directly on the browser instead of having to save them on a local file.

In [1]:
%matplotlib inline

# Exercise 4.01: Re-training a model dynamically
In this activity, we re-train our model every time new data is available.

First, we start by importing `cryptonic`. Cryptonic is a simple software application developed for this course that implements all the steps up to this section using Python classes and modules. Consider Cryptonic a template on how you could develop similar applications.

In [2]:
import pandas as pd
import numpy as np

In [3]:
from tqdm import tqdm_notebook

In [4]:
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import yfinance as yf

In [5]:
from cryptonic import Model
import cryptonic.models.normalizations as normalizations

### Fetching Real-time Data
Throughout this project we have been using data originally provided by Yahoo finance API. We have created an interface for collecting both real-time and historical data.

Our model is designed to work with daily data. Let's go ahead and collect historic daily data from Yahoo finance.

In [6]:
ticker =  yf.Ticker("BTC-USD")
historic_data = ticker.history(period='max')

In [7]:
historic_data = historic_data.rename(columns={'Open':'open', 'High':'high', 'Low':'low', 'Close':'close', 'Volume':'volume'})
historic_data.index.names = ['date']
historic_data = historic_data[['open','high', 'low', 'close', 'volume']]
historic_data = historic_data.reset_index()

In [8]:
historic_data.head(3)

Unnamed: 0,date,open,high,low,close,volume
0,2014-09-17,465.86,468.17,452.42,457.33,21056800
1,2014-09-18,456.86,456.86,413.1,424.44,34483200
2,2014-09-19,424.1,427.83,384.53,394.8,37919700


The data contains practically the same variables from our earlier dataset. However, much of the data comes from an earlier period. Recent Bitcoin prices have gained a lot of volatility if compared to the prices of a few years ago. Before using this data in our model, let's make sure to filter it to dates after January 1, 2019 and before December 31,2019.

In [13]:
#
#  Using the Pandas API, filter the dataframe
#  for observations from 2017 only. 
# 
#  Hint: use the `date` column / variable.
#

start_date = '01-01-2019'
end_date = '31-12-2019'
mask = ((historic_data['date'] >= start_date) & (historic_data['date']<= end_date))
model_data = historic_data[mask]
model_data = model_data.reset_index(drop=True)

In [39]:
model_data

Unnamed: 0,date,open,high,low,close,volume
0,2019-01-01,3746.71,3850.91,3707.23,3843.52,4324200990
1,2019-01-02,3849.22,3947.98,3817.41,3943.41,5244856835
2,2019-01-03,3931.05,3935.69,3826.22,3836.74,4530215218
3,2019-01-04,3832.04,3865.93,3783.85,3857.72,4847965467
4,2019-01-05,3851.97,3904.90,3836.90,3845.19,5137609823
...,...,...,...,...,...,...
360,2019-12-27,7238.14,7363.53,7189.93,7290.09,22777360995
361,2019-12-28,7289.03,7399.04,7286.91,7317.99,21365673026
362,2019-12-29,7317.65,7513.95,7279.87,7422.65,22445257701
363,2019-12-30,7420.27,7454.82,7276.31,7293.00,22874131671


In [15]:
M = Model(data=model_data,
          variable='close',
          predicted_period_size=7)

In [16]:
M.build()

<tensorflow.python.keras.engine.sequential.Sequential at 0x1b29fa03400>

In [17]:
M.train(epochs=100, verbose=1)

Train on 1 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Ep

Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<tensorflow.python.keras.callbacks.History at 0x1b29fe360f0>

We can now use the model for making predictions with the `predict()` method. The parameter `denormalized` will return values in the original scale of the data. In our case, US dollars.

In [18]:
M.predict(denormalized=True)

array([7207.736 , 7135.076 , 7100.197 , 7158.1567, 7187.7617, 7288.418 ,
       7152.7134], dtype=float32)

We now evaluate our model to inspect the statistics for the last epoch of training compared to a single test week.

In [19]:
M.evaluate()

{'mse': 0.0, 'rmse': 59.23, 'mape': 0.72}

Finally, we can now save the trained model on disk for later use.

In [20]:
M.save('bitcoin_model_prod_v0.h5')

Our `Model()` class can also load a previously trained model when instantiated with the `path` parameter.

In [21]:
M = Model(path='bitcoin_model_prod_v0.h5',
          data=model_data,
          variable='close',
          predicted_period_size=7)

In [22]:
M.predict(denormalized=True)

array([7207.736 , 7135.076 , 7100.197 , 7158.1567, 7187.7617, 7288.418 ,
       7152.7134], dtype=float32)

### New Data, Re-train Old Model
One strategy discussed earlier regards the re-training of our model with new data. In our case, our biggest concern is to shape data in a way that the model has been configured. As an example, we will configure our model to predict a week using 48 weeks. We will first train the model with the first 40 weeks of 2019, then continue to re-train it over the following weeks until we reach week 51.

In [23]:
print('Number of full weeks: {}'.format(len(model_data) // 7))

Number of full weeks: 52


First, let's build a model with the first set of data. Notice how we use `7*40 + 7` as the indexer. This is because we use 40 weeks for training and 1 week for testing. 

In [24]:
M = Model(data=model_data.loc[0*7:7*40 + 7],
          variable='close',
          predicted_period_size=7)

In [25]:
M.build()

<tensorflow.python.keras.engine.sequential.Sequential at 0x1b2a4fac400>

In [26]:
M.train()

<tensorflow.python.keras.callbacks.History at 0x1b2a64366a0>

In [29]:
#
#  Complete the range function and
#  the model_data filtering parameters
#  using an index to split the data in overlapping
#  groups of 7 days. Then, re-train our model
#  and collect the results.
#
#  The variables A, B, C, and D are placeholders.
#
results = []
for i in range(41, 52):
    j = i - 40
    print("Training model {0} for week {1}".format(j,i))
    M.train(model_data.loc[j*7:7*i + 7])
    results.append(M.evaluate())

Training model 1 for week 41
Training model 2 for week 42
Training model 3 for week 43
Training model 4 for week 44
Training model 5 for week 45
Training model 6 for week 46
Training model 7 for week 47
Training model 8 for week 48
Training model 9 for week 49
Training model 10 for week 50
Training model 11 for week 51


In [30]:
M.predict(denormalized=True)

array([7187.145 , 7143.798 , 7113.7324, 7173.985 , 7200.346 , 7300.2896,
       7175.3203], dtype=float32)

### New Data, New Model
Another strategy is to create and train a new model evey time new data is available. This approach tends to reduce catastrophic forgetting, but training time increases as data increases. 

It's implementation is quite simple.

Let's assume we have old data for 49 weeks of 2019 and after a week we now have new data. We represent this wtih the variables `old_data` and `new_data`. 

In [31]:
old_data = model_data.loc[0*7:7*48 + 7]

In [32]:
new_data = model_data.loc[0*7:7*49 + 7]

In [33]:
M = Model(data=old_data,
          variable='close',
          predicted_period_size=7)

In [34]:
M.build()
M.train()

<tensorflow.python.keras.callbacks.History at 0x1b2aa8138d0>

In [35]:
M.predict(denormalized=True)

array([7286.304 , 7220.4487, 7410.7295, 7496.35  , 7523.3467, 7525.1533,
       7362.4614], dtype=float32)

Now, assume that new data is available. Using this technicle we go ahead and create a new model using only the new data. 

In [36]:
#
#  Re-instantiate the model with the Model()
#  class using the new_data variable instead
#  of the old_data one. 
#

M = Model(data=new_data,
          variable='close',
          predicted_period_size=7)

In [37]:
M.build()
M.train()

<tensorflow.python.keras.callbacks.History at 0x1b2af751470>

In [38]:
M.predict(denormalized=True)

array([6629.0273, 6590.4287, 6608.812 , 6624.322 , 6490.407 , 6532.2583,
       6315.8413], dtype=float32)

This approach is very simple to implement and tends to work well. We will be using this to deploy our application.