# Cryptocurrency prediction deep learning

### Predicción del precio de criptomonedas con deep learning

"Projecto" : "Análisis_criptomonedas"  
"Título" : "Predicción del precio de cryptomonedas con deep learning"  
"Autor" : "Cristian García Díaz"  
"Fecha de creación" : "20180821"  
"Fecha de modificación" : "20180826"  
"Fuentes":  
>https://medium.com/activewizards-machine-learning-company/bitcoin-price-forecasting-with-deep-learning-algorithms-eb578a2387a3

## Índice
[1. Configuración del entorno y obtención de los datos](#1)  
[2. Indicadores del mercado Bitcoin](#2)  
[3. Correlación de los indicadores de Bitcoin](#3)  
[4. Obtención y análisis del volumen cambio entre *Bitcoin* y divisas](#4) 

## <a name="1"></a> 1. Configuración del entorno y obtención de los datos

   - Instalar Anaconda.  
   - Instalar las librerias, dependencias y paquetes necesarios.  
   - Crear un entorno de trabajo.
   - Estructura de carpetas para almacenar los datos.  
   - Configuración de la API Key.  
   - Función para obtención de datos desde las APIs. 

In [59]:
# Se importan las líbrerias, dependencias o paquetes necesarios
import numpy as np
import pandas as pd
import pickle
import quandl
from datetime import datetime
import plotly as py
import os

# Se importa el paquete Plotly
import plotly.offline as py
import plotly.graph_objs as go
import plotly.figure_factory as ff
# Se configura el modo offline
py.init_notebook_mode(connected=True)

from sklearn.preprocessing import MinMaxScaler

In [60]:
# API Key Quandl
quandl.ApiConfig.api_key = "gjodR_eNGkrTQq24cufg"

In [61]:
# Comprobar si no esta creada la carpeta de archivos para almacenar los datos
if not os.path.exists("cryptocurrency_indicators_files"):
    os.mkdir('cryptocurrency_indicators_files')

In [62]:
# Se define una función Quandl para cargar los datos
"""pickle --> para no descargar de nuevo los mismo datos"""
"""La función devuelve un Dataframe Pandas"""

def get_quandl_data(quandl_id):
    """Descargamos en cache los datos de Quandl"""
    """Se almacena un fichero .pkl como cache de los datos"""
    cache_path='.\cryptocurrency_indicators_files\{}.pkl'.format(quandl_id).replace('/','-')
    try:
        f = open(cache_path,'rb')
        df = pickle.load(f)
        print('Dataset {} cargado del cache'.format(quandl_id))
    except (OSError,IOError)as e:
        print('Descargando {} de Quandl'.format(quandl_id))
        df = quandl.get(quandl_id, returns="pandas")
        df.to_pickle(cache_path)
        print('Cargado {} de {} en el cache'.format(quandl_id,cache_path))
    return df

In [63]:
# Se define la función para visualizar los datos
def df_scatter(df, title,seperate_y_axis=False, y_axis_label='',scale='linear',initial_hide=False):
    # Se definen la lista de los nombres de cada dataframe como una lista label_arr = ['BITSTAMP', 'COINBASE', 'ITBIT', 'KRAKEN']
    label_arr = list(df)
    # Aplicamos una función lambda para mapear cada columnas y asignar la etiqueta correspondiente
    # Se guarda como otra lista series_arr
    series_arr = list(map(lambda col:df[col],label_arr))
    
    # Se definen los parametros de la salida gráfica
    layout = go.Layout(
        title = title,
        legend = dict(orientation='h'),
        xaxis = dict(type='date'),
        yaxis = dict(
            title = y_axis_label, 
            showticklabels = not seperate_y_axis,
            type = scale
        )
    )
    
    # Se define la configuración del eje y
    y_axis_config = dict(
        overlaying = 'y',
        showticklabels = False,
        type = scale
    )
    
    # Se define la visibilidad
    visibility = 'visible'
    if initial_hide:
        visibility = 'legendonly'
        
    # Se define la forma para cada serie de datos
    trace_arr = []
    for index, series in enumerate(series_arr):
        trace = go.Scatter(
        x = series.index,
        y = series,
        name = label_arr[index],
        visible = visibility
        )
        
        #Añadir un eje separado para cada serie
        if seperate_y_axis:
            trace['yaxis'] = 'y{format}'.format(index + 1)
            layout['yaxis{}'.format(index + 1)] = y_axis_config
        trace_arr.append(trace)
    
    fig = go.Figure(data = trace_arr, layout = layout)
    py.iplot(fig)

## <a name="2"></a> 2. Indicadores del mercado Bitcoin
   - [2.1 Precio de Bitcoin](#2.1)
   - [2.2 Número total de bictoins](#2.2)
   - [2.3 Valor del mercado](#2.3) 
   - [2.4 Direcciones de bitcoin](#2.4) 
   - [2.5 Volumen de cambio de bitcoin a dóladores](#2.5) 
   - [2.6 Número de transacciones de bitcoin](#2.6) 
   - [2.7 Número de transacciones acumuladas de bitcoin](#2.7) 
   - [2.8 Hash rate de bitcoin](#2.8) 
   - [2.9 Dificultad de bitcoin](#2.9) 
   - [2.10 Recompensa de los mineros de bitcoin](#2.10) 
   
   "Fuentes":  
>https://www.quandl.com/data/BCHAIN-Blockchain

### <a name="2.1"></a> 2.1 Precio de Bitcoin

In [64]:
# Obtención de los datos del precio
price_btc = get_quandl_data("BCHAIN/MKPRU")

Dataset BCHAIN/MKPRU cargado del cache


### <a name="2.2"></a> 2.2 Número total de bictoins

In [65]:
# Se obtienen los datos históricos del número total de Bitcoins
total_number_btc = get_quandl_data("BCHAIN/TOTBC")

Dataset BCHAIN/TOTBC cargado del cache


### <a name="2.3"></a> 2.3 Valor del mercado

In [66]:
# Se obtienen los datos históricos de la capitalización del mercado del Bitcoin en USD. El valor de mercado del Bitcoin
market_capitalization_btc = get_quandl_data("BCHAIN/MKTCP")

Dataset BCHAIN/MKTCP cargado del cache


### <a name="2.4"></a> 2.4 Número de direcciones bitcoin

In [67]:
# Se obtienen los datos históricos del número de direcciones Bitcoin usadas por dia
address_btc = get_quandl_data("BCHAIN/NADDU")

Dataset BCHAIN/NADDU cargado del cache


### <a name="2.5"></a> 2.5 Volumen de cambio de USD/BTC

In [68]:
# Se obtienen los datos históricos del volumen de cambio USD/BTC
exchange_trade_btc = get_quandl_data("BCHAIN/TRVOU")

Dataset BCHAIN/TRVOU cargado del cache


### <a name="2.6"></a> 2.6 Número de transacciones Bitcoin

In [69]:
# Se obtienen los datos históricos de las transacciones de BTC
transactions_btc = get_quandl_data("BCHAIN/NTRAN")

Dataset BCHAIN/NTRAN cargado del cache


### <a name="2.8"></a> 2.8 Hash rate de bitcoin 

In [70]:
# Se obtienen los datos históricos de Hash rate de bitcoin
# Es el número estimado de hash rate de Bitcoun y se miden en TeraHashes por segundo TH/s.
# 1 TH/s = 10^12 = 1.000.000.000.000 hash/s = 1 billón de hashes por segundo.
hash_rate_btc = get_quandl_data("BCHAIN/HRATE")

Dataset BCHAIN/HRATE cargado del cache


### <a name="2.9"></a> 2.9 Dificultad de bitcoin

In [71]:
# Se obtienen la dificultad de Bitcoin. Es una medida de dificultad propia.
# cada 210.000 bloques se recalcula la dificultad para crear bloques en la cadena de bloques cada 10 minutos de media.
difficulty_btc = get_quandl_data("BCHAIN/DIFF")

Dataset BCHAIN/DIFF cargado del cache


### <a name="2.10"></a> 2.10 Recompensa de los mineros de bitcoin 

In [72]:
# Se obtienen los datos históricos de la recompensa de los mineros
miners_revenue_btc = get_quandl_data("BCHAIN/MIREV")

Dataset BCHAIN/MIREV cargado del cache


## <a name="3"></a>3. Transformación de los datos

In [73]:
# Se preparan los datos para unirlos en un único Dataframe para poder aplicar la correlación.
mesures_name= ["price_btc",
        "total_number_btc",
        "market_capitalization_btc",
        "address_btc",
        "exchange_trade_btc",
        "transactions_btc",
        "hash_rate_btc",
        "difficulty_btc",
        "miners_revenue_btc"]

mesures_data= [price_btc,
        total_number_btc,
        market_capitalization_btc,
        address_btc,
        exchange_trade_btc,
        transactions_btc,
        hash_rate_btc,
        difficulty_btc,
        miners_revenue_btc]

In [74]:
# Preparación de un Dataframe con todos los indicadores e igualar los días para construir para todos los indicadores
fecha=pd.Timestamp(2018, 8, 26)

for i in range(len(mesures_name)):
    if(mesures_data[i].index.max()== fecha):
        mesures_data[i] = mesures_data[i].drop(mesures_data[i].index[len(mesures_data[i])-1])
    print( i, mesures_name[i], mesures_data[i].index.max()-mesures_data[i].index.min(), mesures_data[i].index.max())

0 price_btc 3521 days 00:00:00 2018-08-25 00:00:00
1 total_number_btc 3521 days 00:00:00 2018-08-25 00:00:00
2 market_capitalization_btc 3521 days 00:00:00 2018-08-25 00:00:00
3 address_btc 3521 days 00:00:00 2018-08-25 00:00:00
4 exchange_trade_btc 3521 days 00:00:00 2018-08-25 00:00:00
5 transactions_btc 3521 days 00:00:00 2018-08-25 00:00:00
6 hash_rate_btc 3520 days 00:00:00 2018-08-25 00:00:00
7 difficulty_btc 3521 days 00:00:00 2018-08-25 00:00:00
8 miners_revenue_btc 3521 days 00:00:00 2018-08-25 00:00:00


In [75]:
# Se realiza una unión de los varios Dataframe en uno
mesures_bitcoin = pd.concat(
    [mesures_data[0],
     mesures_data[1],
     mesures_data[2],
     mesures_data[3],
     mesures_data[4],
     mesures_data[5],
     mesures_data[6],
     mesures_data[7],
     mesures_data[8]],axis=1
)

In [76]:
# Renombrar las columnas
for i in range(len(mesures_name)):
    mesures_bitcoin.columns.values[i] = mesures_name[i]

In [77]:
# mesures_bitcoin = mesures_bitcoin.drop(['market_capitalization_btc'],axis=1)

In [78]:
# Crear los datos de entreno, test y evaluación
# Entreno
train_from_date = '2016-01-01'
train_end_date =  '2018-06-22'
# Test
test_from_date = '2018-06-23'
test_end_date = '2018-08-16'
# Evaluación
# '2018-08-17'
evaluation_from_date = '2018-08-10'
evaluation_end_date = '2018-08-22'

df_train = mesures_bitcoin.loc[train_from_date:train_end_date]
df_test = mesures_bitcoin.loc[test_from_date:test_end_date]
df_evaluation = mesures_bitcoin.loc[evaluation_from_date:evaluation_end_date]


print(df_train.size," días de entreno\n",df_test.size," días de test\n",df_evaluation.size," días de evaluación\n")

train_days = mesures_bitcoin.loc[train_from_date:train_end_date].count()
test_days = mesures_bitcoin.loc[test_from_date:test_end_date].count()
evalutacion_days = mesures_bitcoin.loc[evaluation_from_date:evaluation_end_date].count()
print(train_days, " desde ",train_from_date," hasta ",train_end_date )
print(test_days, " desde ",test_from_date," hasta ",test_end_date )
print(evalutacion_days, " desde ",evaluation_from_date," hasta ",evaluation_end_date )
print("Días totales ",train_days.price_btc+test_days.price_btc+evalutacion_days.price_btc)

8136  días de entreno
 495  días de test
 117  días de evaluación

price_btc                    904
total_number_btc             904
market_capitalization_btc    904
address_btc                  904
exchange_trade_btc           904
transactions_btc             904
hash_rate_btc                904
difficulty_btc               904
miners_revenue_btc           904
dtype: int64  desde  2016-01-01  hasta  2018-06-22
price_btc                    55
total_number_btc             55
market_capitalization_btc    55
address_btc                  55
exchange_trade_btc           55
transactions_btc             55
hash_rate_btc                55
difficulty_btc               55
miners_revenue_btc           55
dtype: int64  desde  2018-06-23  hasta  2018-08-16
price_btc                    13
total_number_btc             13
market_capitalization_btc    13
address_btc                  13
exchange_trade_btc           13
transactions_btc             13
hash_rate_btc                13
difficulty_btc        

# 4. Red neuronal LSTM


In [79]:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from time import time
from math import sqrt

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.preprocessing import MinMaxScaler

import statsmodels.api as sm
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

In [80]:
# Variable independientes
df_train_x = df_train.loc[:,"total_number_btc":]
# Variables predictoras
df_train_y = df_train.loc[:,"price_btc"]

# Variable independientes
df_test_x = df_test.loc[:,"total_number_btc":]
# Variables predictoras
df_test_y = df_test.loc[:,"price_btc"]

# Variable independientes
df_evaluation_x = df_evaluation.loc[:,"total_number_btc":]
# Variables predictoras
df_evaluation_y = df_evaluation.loc[:,"price_btc"]

In [81]:
# Se define la ventana para preparar los datos para LSTM
window_len = 10

In [82]:
# Se preparan los datos para unirlos en un único Dataframe para poder aplicar la correlación.
mesures_name= ["total_number_btc",
        "market_capitalization_btc",
        "address_btc",
        "exchange_trade_btc",
        "transactions_btc",
        "hash_rate_btc",
        "difficulty_btc",
        "miners_revenue_btc"]

In [83]:
# mesures_name.remove('market_capitalization_btc')

In [84]:
mesures_name

['total_number_btc',
 'market_capitalization_btc',
 'address_btc',
 'exchange_trade_btc',
 'transactions_btc',
 'hash_rate_btc',
 'difficulty_btc',
 'miners_revenue_btc']

In [85]:
# ENTRENO
#Variables explicativas
LSTM_training_inputs = []

for i in range(len(df_train_x)-window_len):
    temp_set = df_train_x[i:(i+window_len)].copy()
    for j in mesures_name:
        temp_set.loc[:, j] = temp_set[j]/temp_set[j].iloc[0] - 1
    LSTM_training_inputs.append(temp_set)
    
#Variable explicada
LSTM_training_outputs = (df_train_y[window_len:].values/df_train_y[:-window_len].values)-1

In [86]:
# TEST
#Variables explicativas
LSTM_test_inputs = []

for i in range(len(df_test_x)-window_len):
    temp_set = df_test_x[i:(i+window_len)].copy()
    for j in mesures_name:
        temp_set.loc[:, j] = temp_set[j]/temp_set[j].iloc[0] - 1
    LSTM_test_inputs.append(temp_set)
    
#Variable explicada 
LSTM_test_outputs = (df_test_y[window_len:].values/df_test_y[:-window_len].values)-1

In [87]:
# EVALUATION
#Variables explicativas
LSTM_evaluation_inputs = []

for i in range(len(df_evaluation_x)-window_len):
    temp_set = df_evaluation_x[i:(i+window_len)].copy()
    for j in mesures_name:
        temp_set.loc[:, j] = temp_set[j]/temp_set[j].iloc[0] - 1
    LSTM_evaluation_inputs.append(temp_set)
    
#Variable explicada 
LSTM_evaluation_outputs = (df_evaluation_y[window_len:].values/df_evaluation_y[:-window_len].values)-1

In [88]:
LSTM_training_inputs.size

[            total_number_btc  market_capitalization_btc  address_btc  \
 Date                                                                   
 2016-01-01          0.000000                   0.000000     0.000000   
 2016-01-02          0.000225                   0.007190    -0.251770   
 2016-01-03          0.000452                   0.011171     0.078535   
 2016-01-04          0.000700                  -0.002120    -0.048832   
 2016-01-05          0.000991                   0.009525     0.172110   
 2016-01-06          0.001263                   0.007233     0.176328   
 2016-01-07          0.001489                   0.004778     0.092153   
 2016-01-08          0.001728                   0.058588     0.051197   
 2016-01-09          0.002019                   0.043329     0.181023   
 2016-01-10          0.002284                   0.050864     0.195984   
 
             exchange_trade_btc  transactions_btc  hash_rate_btc  \
 Date                                                 

In [89]:
LSTM_training_outputs.size

894

In [90]:
LSTM_test_inputs.size

[            total_number_btc  market_capitalization_btc  address_btc  \
 Date                                                                   
 2018-06-23          0.000000                   0.000000     0.000000   
 2018-06-24          0.000117                  -0.100322    -0.215935   
 2018-06-25          0.000242                  -0.080073    -0.254452   
 2018-06-26          0.000357                  -0.083802    -0.114840   
 2018-06-27          0.000463                  -0.071017    -0.125926   
 2018-06-28          0.000552                  -0.086804    -0.152617   
 2018-06-29          0.000666                  -0.086073    -0.155726   
 2018-06-30          0.000777                  -0.113992    -0.035082   
 2018-07-01          0.000880                  -0.113745    -0.215570   
 2018-07-02          0.000987                  -0.054319    -0.271896   
 
             exchange_trade_btc  transactions_btc  hash_rate_btc  \
 Date                                                 

In [91]:
LSTM_test_outputs.size

45

In [92]:
# Convertimos los datos de dataframe a matrices numpy ya que tenemos datos númericos
LSTM_training_inputs = [np.array(LSTM_training_input) for LSTM_training_input in LSTM_training_inputs]
LSTM_training_inputs = np.array(LSTM_training_inputs)

LSTM_test_inputs = [np.array(LSTM_test_inputs) for LSTM_test_inputs in LSTM_test_inputs]
LSTM_test_inputs = np.array(LSTM_test_inputs)

In [93]:
LSTM_evaluation_inputs = [np.array(LSTM_evaluation_inputs) for LSTM_evaluation_inputs in LSTM_evaluation_inputs]
LSTM_evaluation_inputs = np.array(LSTM_evaluation_inputs)

In [94]:
LSTM_evaluation_inputs.size

240

In [95]:
# LSTM_training_inputs.size =  894 lista de 10 listas de 7 listas = 62580 elementos
# 1 lista de 894 listas de 10 listas de 7 elementos
# 879 * 10 * 8
LSTM_training_inputs.size

71520

In [96]:
# LSTM_training_inputs.size =  45 lista de 10 listas de 7 listas = 3150 elementos
# 1 lista de 45 listas de 10 listas de 7 elementos
# 72 * 10 * 8
LSTM_test_inputs.size

3600

In [97]:
# Se importan los modulos necesarios
from keras.models import Sequential
from keras.layers import Activation, Dense
from keras.layers import LSTM
from keras.layers import Dropout

# Entreno

In [98]:
# Se define una función para construir el modelo de red neuronal
# Se construye un modelo vacío sequencial y se agrega una capa LSTM.
# El modelo se ha configurado para que se adapte a una entrada n x m.
# Se incluye la función de activación.

def build_model(inputs, output_size, neurons, activ_func="linear",
                dropout=0.25, loss="mae", optimizer="adam"):
    model = Sequential()

    model.add(LSTM(neurons, input_shape=(inputs.shape[1], inputs.shape[2])))
    model.add(Dropout(dropout))
    model.add(Dense(units=output_size))
    model.add(Activation(activ_func))

    model.compile(loss=loss, optimizer=optimizer)
    return model

In [99]:
# Se define una semilla para generar los números pseudoaleatorios
np.random.seed(202)

# Se inicializa el modelo
model_btc = build_model(LSTM_training_inputs, output_size=1, neurons = 20)

# La salida del modelo se normaliza
LSTM_training_outputs = (df_train['price_btc'][window_len:].values/df_train['price_btc'][:-window_len].values)-1

# Comprobar el tiempo
start_time = time()

#Se entrena al modelo. model_btc_history contiene información del error por entreno
model_btc_history = model_btc.fit(LSTM_training_inputs, LSTM_training_outputs, 
                            epochs=50, batch_size=1, verbose=2, shuffle=True)
# Comprobar el tiempo
final_time = time() - start_time

Epoch 1/50
 - 6s - loss: 0.0728
Epoch 2/50
 - 5s - loss: 0.0494
Epoch 3/50
 - 5s - loss: 0.0434
Epoch 4/50
 - 5s - loss: 0.0418
Epoch 5/50
 - 5s - loss: 0.0397
Epoch 6/50
 - 5s - loss: 0.0392
Epoch 7/50
 - 5s - loss: 0.0357
Epoch 8/50
 - 5s - loss: 0.0354
Epoch 9/50
 - 5s - loss: 0.0364
Epoch 10/50
 - 5s - loss: 0.0354
Epoch 11/50
 - 5s - loss: 0.0352
Epoch 12/50
 - 4s - loss: 0.0350
Epoch 13/50
 - 5s - loss: 0.0347
Epoch 14/50
 - 5s - loss: 0.0350
Epoch 15/50
 - 5s - loss: 0.0342
Epoch 16/50
 - 6s - loss: 0.0346
Epoch 17/50
 - 6s - loss: 0.0348
Epoch 18/50
 - 7s - loss: 0.0338
Epoch 19/50
 - 6s - loss: 0.0338
Epoch 20/50
 - 6s - loss: 0.0337
Epoch 21/50
 - 7s - loss: 0.0330
Epoch 22/50
 - 6s - loss: 0.0332
Epoch 23/50
 - 7s - loss: 0.0334
Epoch 24/50
 - 6s - loss: 0.0332
Epoch 25/50
 - 5s - loss: 0.0336
Epoch 26/50
 - 4s - loss: 0.0333
Epoch 27/50
 - 4s - loss: 0.0332
Epoch 28/50
 - 5s - loss: 0.0333
Epoch 29/50
 - 5s - loss: 0.0340
Epoch 30/50
 - 5s - loss: 0.0329
Epoch 31/50
 - 5s -

In [100]:
# Tiempo de ejecución
print('Tiempo de ejecución de la red neural es de: {0:.3f}'.format(final_time))

Tiempo de ejecución de la red neural es de: 256.404


In [101]:
# Gráfico del error MAE
history_error_btc = go.Scatter(x=model_btc_history.epoch, y=model_btc_history.history['loss'])
py.iplot([history_error_btc])

In [102]:
predicted = ((np.transpose(model_btc.predict(LSTM_training_inputs))+1) * df_train_y.values[:-window_len])[0]
observated = df_train_y.values[window_len:]

In [103]:
observated[893]

6733.90166667

In [104]:
predicted[893]

6618.029160011281

In [105]:
# Visualización de 
trace1 = go.Scatter(
    x = np.arange(0, len(predicted), 1),
    y = predicted,
    mode = 'lines',
    name = 'Predicted',
    line = dict(color=('rgb(244, 146, 65)'), width=2)
)
trace2 = go.Scatter(
    x = np.arange(0, len(predicted), 1),
    y = observated,
    mode = 'lines',
    name = 'Observaciones',
    line = dict(color=('rgb(66, 244, 155)'), width=2)
)

data = [trace1, trace2]
layout = dict(title = 'Comparison of true prices (on the test dataset) with prices our model predicted',
             xaxis = dict(title = 'Day number'), yaxis = dict(title = 'Price, USD'))
fig = dict(data=data, layout=layout)
py.iplot(fig, filename='results_demonstrating0')

In [106]:
# MSE
print("MSE: %.3f" % mean_squared_error(observated, predicted))
# RMSE Root Mean Square Error
RMSE = sqrt(mean_squared_error(observated, predicted))
print('RMSE: %.3f' % RMSE)
from sklearn.metrics import mean_absolute_error
# MAE
print("MAE: %.3f" % mean_absolute_error(observated, predicted))

MSE: 102239.140
RMSE: 319.749
MAE: 134.273


In [118]:
# Ahora se extrae con el formato de fechas y se traza 
Test_Dates =df_train_x['total_number_btc'].index

trace1 = go.Scatter(x=Test_Dates, y=observated, name= 'Actual Price',
                   line = dict(color = ('rgb(66, 244, 155)'),width = 2))
trace2 = go.Scatter(x=Test_Dates, y=predicted, name= 'Predicted Price',
                   line = dict(color = ('rgb(244, 146, 65)'),width = 2))
data = [trace1, trace2]
layout = dict(title = 'Comparison of true prices (on the test dataset) with prices our model predicted, by dates',
             xaxis = dict(title = 'Date'), yaxis = dict(title = 'Price, USD'))
fig = dict(data=data, layout=layout)
py.iplot(fig, filename='results_demonstrating1')

 ## Test

In [108]:
predicted_test = ((np.transpose(model_btc.predict(LSTM_test_inputs))+1) * df_test_y.values[:-window_len])[0]
observated_test = df_test_y.values[window_len:]

In [109]:
# Visualización de 
trace1 = go.Scatter(
    x = np.arange(0, len(predicted_test), 1),
    y = predicted_test,
    mode = 'lines',
    name = 'Predicted',
    line = dict(color=('rgb(244, 146, 65)'), width=2)
)
trace2 = go.Scatter(
    x = np.arange(0, len(observated_test), 1),
    y = observated_test,
    mode = 'lines',
    name = 'Observaciones',
    line = dict(color=('rgb(66, 244, 155)'), width=2)
)

data = [trace1, trace2]
layout = dict(title = 'Comparison of true prices (on the test dataset) with prices our model predicted',
             xaxis = dict(title = 'Day number'), yaxis = dict(title = 'Price, USD'))
fig = dict(data=data, layout=layout)
py.iplot(fig, filename='results_demonstrating0')

In [110]:
# MSE
print("MSE: %.3f" % mean_squared_error(observated_test, predicted_test))
# RMSE Root Mean Square Error
RMSE = sqrt(mean_squared_error(observated_test, predicted_test))
print('RMSE: %.3f' % RMSE)
from sklearn.metrics import mean_absolute_error
# MAE
print("MAE: %.3f" % mean_absolute_error(observated_test, predicted_test))


# TRAIN
# MSE: 102208.495
# RMSE: 319.701
# MAE: 134.233
# TEST
# MSE: 106684.829
# RMSE: 326.626
# MAE: 264.080

MSE: 107707.390
RMSE: 328.188
MAE: 262.578


In [167]:
# Ahora se extrae con el formato de fechas y se traza
# Se desplaza 10 dias por la ventana
# df_test.loc['2018-06-23':,'total_number_btc':'total_number_btc'].index

# Test_Dates = df_test['total_number_btc'].index

Test_Dates = df_test.loc['2018-07-03':,'total_number_btc':'total_number_btc'].index

trace1 = go.Scatter(x=Test_Dates, y=observated_test, name= 'Actual Price',
                   line = dict(color = ('rgb(66, 244, 155)'),width = 2))
trace2 = go.Scatter(x=Test_Dates, y=predicted_test, name= 'Predicted Price',
                   line = dict(color = ('rgb(244, 146, 65)'),width = 2))
data = [trace1, trace2]
layout = dict(title = 'Comparison of true prices (on the test dataset) with prices our model predicted, by dates',
             xaxis = dict(title = 'Date'), yaxis = dict(title = 'Price, USD'))
fig = dict(data=data, layout=layout)
py.iplot(fig, filename='results_demonstrating1')

In [130]:
Test_Dates = df_test['total_number_btc'].index

In [129]:
T

array([False, False, False, False, False, False, False, False, False,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True])

## Evaluación

In [112]:
predicted_evaluation = ((np.transpose(model_btc.predict(LSTM_evaluation_inputs))+1) * df_evaluation_y.values[:-window_len])[0]
observated_evaluation = df_evaluation_y.values[window_len:]

In [113]:
# Visualización de 
trace1 = go.Scatter(
    x = np.arange(0, len(predicted_evaluation), 1),
    y = predicted_evaluation,
    mode = 'lines',
    name = 'Predicted',
    line = dict(color=('rgb(244, 146, 65)'), width=2)
)
trace2 = go.Scatter(
    x = np.arange(0, len(observated_evaluation), 1),
    y = observated_evaluation,
    mode = 'lines',
    name = 'Observaciones',
    line = dict(color=('rgb(66, 244, 155)'), width=2)
)

data = [trace1, trace2]
layout = dict(title = 'Comparison of true prices (on the test dataset) with prices our model predicted',
             xaxis = dict(title = 'Day number'), yaxis = dict(title = 'Price, USD'))
fig = dict(data=data, layout=layout)
py.iplot(fig, filename='results_demonstrating0')

In [114]:
# MSE
print("MSE: %.3f" % mean_squared_error(observated_evaluation, predicted_evaluation))
# RMSE Root Mean Square Error
RMSE = sqrt(mean_squared_error(observated_evaluation, predicted_evaluation))
print('RMSE: %.3f' % RMSE)
from sklearn.metrics import mean_absolute_error
# MAE
print("MAE: %.3f" % mean_absolute_error(observated_evaluation, predicted_evaluation))


# TRAIN
# MSE: 102208.495
# RMSE: 319.701
# MAE: 134.233
# TEST
# MSE: 106684.829
# RMSE: 326.626
# MAE: 264.080

MSE: 3110.589
RMSE: 55.773
MAE: 48.335


In [116]:
# Ahora se extrae con el formato de fechas y se traza 
Test_Dates =df_evaluation_x['total_number_btc'].index

trace1 = go.Scatter(x=Test_Dates, y=observated_evaluation, name= 'Actual Price',
                   line = dict(color = ('rgb(66, 244, 155)'),width = 2))
trace2 = go.Scatter(x=Test_Dates, y=predicted_evaluation, name= 'Predicted Price',
                   line = dict(color = ('rgb(244, 146, 65)'),width = 2))
data = [trace1, trace2]
layout = dict(title = 'Comparison of true prices (on the test dataset) with prices our model predicted, by dates',
             xaxis = dict(title = 'Date'), yaxis = dict(title = 'Price, USD'))
fig = dict(data=data, layout=layout)
py.iplot(fig, filename='results_demonstrating1')