"Projecto" : "Análisis_criptomonedas"  
"Título" : "Regresión del precio de Bitcoin"  
"Autor" : "Cristian García Díaz"  
"Fecha de creación" : "20180821"  
"Fecha de modificación" : "20180826"  
"Fuentes":  
>http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html  
>http://www.aprendemachinelearning.com/regresion-lineal-en-espanol-con-python/  
>http://davidcoallier.com/blog/linear-regression-from-r-to-python/  

>https://datatofish.com/multiple-linear-regression-python/  
>http://www.aprendemachinelearning.com/regresion-lineal-en-espanol-con-python/  
>https://towardsdatascience.com/simple-and-multiple-linear-regression-in-python-c928425168f9  

# Cryptocurrency Regression

### Regresión del precio del Bitcoin

## Índice
[1. Configuración del entorno y obtención de los datos](#1)  
[2. Indicadores del mercado Bitcoin](#2)  
[3. Correlación de los indicadores de Bitcoin](#3)  
[4. Obtención y análisis del volumen cambio entre *Bitcoin* y divisas](#4) 

## <a name="1"></a> 1. Configuración del entorno y obtención de los datos

   - Instalar Anaconda.  
   - Instalar las librerias, dependencias y paquetes necesarios.  
   - Crear un entorno de trabajo.
   - Estructura de carpetas para almacenar los datos.  
   - Configuración de la API Key.  
   - Función para obtención de datos desde las APIs. 

In [1]:
# Se importan las líbrerias, dependencias o paquetes necesarios
import numpy as np
import pandas as pd
import pickle
import quandl
from datetime import datetime
import plotly as py
import os

# Se importa el paquete Plotly
import plotly.offline as py
import plotly.graph_objs as go
import plotly.figure_factory as ff
# Se configura el modo offline
py.init_notebook_mode(connected=True)

In [2]:
# API Key Quandl
quandl.ApiConfig.api_key = "gjodR_eNGkrTQq24cufg"

In [3]:
# Comprobar si no esta creada la carpeta de archivos para almacenar los datos
if not os.path.exists("cryptocurrency_indicators_files"):
    os.mkdir('cryptocurrency_indicators_files')

In [4]:
# Se define una función Quandl para cargar los datos
"""pickle --> para no descargar de nuevo los mismo datos"""
"""La función devuelve un Dataframe Pandas"""

def get_quandl_data(quandl_id):
    """Descargamos en cache los datos de Quandl"""
    """Se almacena un fichero .pkl como cache de los datos"""
    cache_path='.\cryptocurrency_indicators_files\{}.pkl'.format(quandl_id).replace('/','-')
    try:
        f = open(cache_path,'rb')
        df = pickle.load(f)
        print('Dataset {} cargado del cache'.format(quandl_id))
    except (OSError,IOError)as e:
        print('Descargando {} de Quandl'.format(quandl_id))
        df = quandl.get(quandl_id, returns="pandas")
        df.to_pickle(cache_path)
        print('Cargado {} de {} en el cache'.format(quandl_id,cache_path))
    return df

In [5]:
# Se define la función para visualizar los datos
def df_scatter(df, title,seperate_y_axis=False, y_axis_label='',scale='linear',initial_hide=False):
    # Se definen la lista de los nombres de cada dataframe como una lista label_arr = ['BITSTAMP', 'COINBASE', 'ITBIT', 'KRAKEN']
    label_arr = list(df)
    # Aplicamos una función lambda para mapear cada columnas y asignar la etiqueta correspondiente
    # Se guarda como otra lista series_arr
    series_arr = list(map(lambda col:df[col],label_arr))
    
    # Se definen los parametros de la salida gráfica
    layout = go.Layout(
        title = title,
        legend = dict(orientation='h'),
        xaxis = dict(type='date'),
        yaxis = dict(
            title = y_axis_label, 
            showticklabels = not seperate_y_axis,
            type = scale
        )
    )
    
    # Se define la configuración del eje y
    y_axis_config = dict(
        overlaying = 'y',
        showticklabels = False,
        type = scale
    )
    
    # Se define la visibilidad
    visibility = 'visible'
    if initial_hide:
        visibility = 'legendonly'
        
    # Se define la forma para cada serie de datos
    trace_arr = []
    for index, series in enumerate(series_arr):
        trace = go.Scatter(
        x = series.index,
        y = series,
        name = label_arr[index],
        visible = visibility
        )
        
        #Añadir un eje separado para cada serie
        if seperate_y_axis:
            trace['yaxis'] = 'y{format}'.format(index + 1)
            layout['yaxis{}'.format(index + 1)] = y_axis_config
        trace_arr.append(trace)
    
    fig = go.Figure(data = trace_arr, layout = layout)
    py.iplot(fig)

## <a name="2"></a> 2. Indicadores del mercado Bitcoin
   - [2.1 Precio de Bitcoin](#2.1)
   - [2.2 Número total de bictoins](#2.2)
   - [2.3 Valor del mercado](#2.3) 
   - [2.4 Direcciones de bitcoin](#2.4) 
   - [2.5 Volumen de cambio de bitcoin a dóladores](#2.5) 
   - [2.6 Número de transacciones de bitcoin](#2.6) 
   - [2.7 Número de transacciones acumuladas de bitcoin](#2.7) 
   - [2.8 Hash rate de bitcoin](#2.8) 
   - [2.9 Dificultad de bitcoin](#2.9) 
   - [2.10 Recompensa de los mineros de bitcoin](#2.10) 
   
   "Fuentes":  
>https://www.quandl.com/data/BCHAIN-Blockchain

### <a name="2.1"></a> 2.1 Precio de Bitcoin

In [6]:
# Obtención de los datos del precio
price_btc = get_quandl_data("BCHAIN/MKPRU")

Dataset BCHAIN/MKPRU cargado del cache


### <a name="2.2"></a> 2.2 Número total de bictoins

In [7]:
# Se obtienen los datos históricos del número total de Bitcoins
total_number_btc = get_quandl_data("BCHAIN/TOTBC")

Dataset BCHAIN/TOTBC cargado del cache


### <a name="2.3"></a> 2.3 Valor del mercado

In [8]:
# Se obtienen los datos históricos de la capitalización del mercado del Bitcoin en USD. El valor de mercado del Bitcoin
market_capitalization_btc = get_quandl_data("BCHAIN/MKTCP")

Dataset BCHAIN/MKTCP cargado del cache


### <a name="2.4"></a> 2.4 Número de direcciones bitcoin

In [9]:
# Se obtienen los datos históricos del número de direcciones Bitcoin usadas por dia
address_btc = get_quandl_data("BCHAIN/NADDU")

Dataset BCHAIN/NADDU cargado del cache


### <a name="2.5"></a> 2.5 Volumen de cambio de USD/BTC

In [10]:
# Se obtienen los datos históricos del volumen de cambio USD/BTC
exchange_trade_btc = get_quandl_data("BCHAIN/TRVOU")

Dataset BCHAIN/TRVOU cargado del cache


### <a name="2.6"></a> 2.6 Número de transacciones Bitcoin

In [11]:
# Se obtienen los datos históricos de las transacciones de BTC
transactions_btc = get_quandl_data("BCHAIN/NTRAN")

Dataset BCHAIN/NTRAN cargado del cache


### <a name="2.8"></a> 2.8 Hash rate de bitcoin 

In [12]:
# Se obtienen los datos históricos de Hash rate de bitcoin
# Es el número estimado de hash rate de Bitcoun y se miden en TeraHashes por segundo TH/s.
# 1 TH/s = 10^12 = 1.000.000.000.000 hash/s = 1 billón de hashes por segundo.
hash_rate_btc = get_quandl_data("BCHAIN/HRATE")

Dataset BCHAIN/HRATE cargado del cache


### <a name="2.9"></a> 2.9 Dificultad de bitcoin

In [13]:
# Se obtienen la dificultad de Bitcoin. Es una medida de dificultad propia.
# cada 210.000 bloques se recalcula la dificultad para crear bloques en la cadena de bloques cada 10 minutos de media.
difficulty_btc = get_quandl_data("BCHAIN/DIFF")

Dataset BCHAIN/DIFF cargado del cache


### <a name="2.10"></a> 2.10 Recompensa de los mineros de bitcoin 

In [14]:
# Se obtienen los datos históricos de la recompensa de los mineros
miners_revenue_btc = get_quandl_data("BCHAIN/MIREV")

Dataset BCHAIN/MIREV cargado del cache


## <a name="3"></a>3. Correlación de los indicadores de Bitcoin

In [15]:
# Se preparan los datos para unirlos en un único Dataframe para poder aplicar la correlación.
mesures_name= ["price_btc",
        "total_number_btc",
        "market_capitalization_btc",
        "address_btc",
        "exchange_trade_btc",
        "transactions_btc",
        "hash_rate_btc",
        "difficulty_btc",
        "miners_revenue_btc"]

mesures_data= [price_btc,
        total_number_btc,
        market_capitalization_btc,
        address_btc,
        exchange_trade_btc,
        transactions_btc,
        hash_rate_btc,
        difficulty_btc,
        miners_revenue_btc]

In [16]:
# Preparación de un Dataframe con todos los indicadores e igualar los días para construir para todos los indicadores
fecha=pd.Timestamp(2018, 8, 26)

for i in range(len(mesures_name)):
    if(mesures_data[i].index.max()== fecha):
        mesures_data[i] = mesures_data[i].drop(mesures_data[i].index[len(mesures_data[i])-1])
    print( i, mesures_name[i], mesures_data[i].index.max()-mesures_data[i].index.min(), mesures_data[i].index.max())

0 price_btc 3521 days 00:00:00 2018-08-25 00:00:00
1 total_number_btc 3521 days 00:00:00 2018-08-25 00:00:00
2 market_capitalization_btc 3521 days 00:00:00 2018-08-25 00:00:00
3 address_btc 3521 days 00:00:00 2018-08-25 00:00:00
4 exchange_trade_btc 3521 days 00:00:00 2018-08-25 00:00:00
5 transactions_btc 3521 days 00:00:00 2018-08-25 00:00:00
6 hash_rate_btc 3520 days 00:00:00 2018-08-25 00:00:00
7 difficulty_btc 3521 days 00:00:00 2018-08-25 00:00:00
8 miners_revenue_btc 3521 days 00:00:00 2018-08-25 00:00:00


In [17]:
# Se realiza una unión de los varios Dataframe en uno
mesures_bitcoin = pd.concat(
    [mesures_data[0],
     mesures_data[1],
     mesures_data[2],
     mesures_data[3],
     mesures_data[4],
     mesures_data[5],
     mesures_data[6],
     mesures_data[7],
     mesures_data[8]],axis=1
)

In [18]:
# Renombrar las columnas
for i in range(len(mesures_name)):
    mesures_bitcoin.columns.values[i] = mesures_name[i]

In [19]:
mesures_bitcoin.drop(['market_capitalization_btc'],axis=1)

Unnamed: 0_level_0,price_btc,total_number_btc,address_btc,exchange_trade_btc,transactions_btc,hash_rate_btc,difficulty_btc,miners_revenue_btc
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2009-01-03,0.000000,50.0,1.0,0.000000e+00,1.0,,1.000000e+00,0.000000e+00
2009-01-04,0.000000,50.0,0.0,0.000000e+00,0.0,0.000000e+00,0.000000e+00,0.000000e+00
2009-01-05,0.000000,50.0,0.0,0.000000e+00,0.0,0.000000e+00,0.000000e+00,0.000000e+00
2009-01-06,0.000000,50.0,0.0,0.000000e+00,0.0,0.000000e+00,0.000000e+00,0.000000e+00
2009-01-07,0.000000,50.0,0.0,0.000000e+00,0.0,0.000000e+00,0.000000e+00,0.000000e+00
2009-01-08,0.000000,50.0,0.0,0.000000e+00,0.0,0.000000e+00,0.000000e+00,0.000000e+00
2009-01-09,0.000000,750.0,14.0,0.000000e+00,14.0,6.959438e-07,1.000000e+00,0.000000e+00
2009-01-10,0.000000,2300.0,31.0,0.000000e+00,31.0,1.541018e-06,1.000000e+00,0.000000e+00
2009-01-11,0.000000,7600.0,106.0,0.000000e+00,106.0,5.269289e-06,1.000000e+00,0.000000e+00
2009-01-12,0.000000,12050.0,96.0,0.000000e+00,95.0,4.424214e-06,1.000000e+00,0.000000e+00


In [20]:
mesures_bitcoin.tail()

Unnamed: 0_level_0,price_btc,total_number_btc,market_capitalization_btc,address_btc,exchange_trade_btc,transactions_btc,hash_rate_btc,difficulty_btc,miners_revenue_btc
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2018-08-21,6434.559167,17221487.5,110812700000.0,459855.0,278407600.0,214983.0,47642200.0,6389317000000.0,12251870.0
2018-08-22,6401.246154,17223337.5,110250800000.0,449915.0,494275700.0,217777.0,47006970.0,6389317000000.0,12000750.0
2018-08-23,6575.229167,17225337.5,113260500000.0,484462.0,578694700.0,230333.0,50818350.0,6389317000000.0,13310070.0
2018-08-24,6434.881667,17227200.0,110855000000.0,444345.0,521498100.0,217352.0,47324590.0,6389317000000.0,12109210.0
2018-08-25,6543.645714,17229025.0,112740600000.0,487990.0,309487700.0,219848.0,48824180.0,6505039000000.0,12067490.0


In [21]:
# Correlación de las variables
mesures_correlacion = mesures_bitcoin.corr()

In [22]:
# Correlación entre las variables
mesures_correlacion

Unnamed: 0,price_btc,total_number_btc,market_capitalization_btc,address_btc,exchange_trade_btc,transactions_btc,hash_rate_btc,difficulty_btc,miners_revenue_btc
price_btc,1.0,0.465134,0.999782,0.677478,0.853488,0.582283,0.779903,0.775728,0.980981
total_number_btc,0.465134,1.0,0.458457,0.80861,0.342171,0.81869,0.420175,0.421509,0.457854
market_capitalization_btc,0.999782,0.458457,1.0,0.672858,0.852266,0.577893,0.786708,0.782611,0.978755
address_btc,0.677478,0.80861,0.672858,1.0,0.56859,0.983264,0.514078,0.511148,0.674568
exchange_trade_btc,0.853488,0.342171,0.852266,0.56859,1.0,0.482211,0.560481,0.55442,0.865675
transactions_btc,0.582283,0.81869,0.577893,0.983264,0.482211,1.0,0.449453,0.446921,0.577882
hash_rate_btc,0.779903,0.420175,0.786708,0.514078,0.560481,0.449453,1.0,0.994816,0.698855
difficulty_btc,0.775728,0.421509,0.782611,0.511148,0.55442,0.446921,0.994816,1.0,0.685947
miners_revenue_btc,0.980981,0.457854,0.978755,0.674568,0.865675,0.577882,0.698855,0.685947,1.0


In [23]:
# Correlación entre las variables
mesures_correlacion['price_btc'].sort_values(ascending=False)

price_btc                    1.000000
market_capitalization_btc    0.999782
miners_revenue_btc           0.980981
exchange_trade_btc           0.853488
hash_rate_btc                0.779903
difficulty_btc               0.775728
address_btc                  0.677478
transactions_btc             0.582283
total_number_btc             0.465134
Name: price_btc, dtype: float64

## <a name="4"></a>4. Regresión Lineal

In [24]:
# Crear los datos de entreno, test y evaluación
# Entreno
train_from_date = '2016-11-22'
train_end_date =  '2018-06-22'
# Test
test_from_date = '2018-06-23'
test_end_date = '2018-08-16'
# Evaluación
evaluation_from_date = '2018-08-17'
evaluation_end_date = '2018-08-22'

df_train = mesures_bitcoin.loc[train_from_date:train_end_date]
df_test = mesures_bitcoin.loc[test_from_date:test_end_date]
df_evaluation = mesures_bitcoin.loc[evaluation_from_date:evaluation_end_date]


print(df_train.size," días de entreno\n",df_test.size," días de test\n",df_evaluation.size," días de evaluación\n")

train_days = mesures_bitcoin.loc[train_from_date:train_end_date].count()
test_days = mesures_bitcoin.loc[test_from_date:test_end_date].count()
evalutacion_days = mesures_bitcoin.loc[evaluation_from_date:evaluation_end_date].count()
print(train_days, " desde ",train_from_date," hasta ",train_end_date )
print(test_days, " desde ",test_from_date," hasta ",test_end_date )
print(evalutacion_days, " desde ",evaluation_from_date," hasta ",evaluation_end_date )

5202  días de entreno
 495  días de test
 54  días de evaluación

price_btc                    578
total_number_btc             578
market_capitalization_btc    578
address_btc                  578
exchange_trade_btc           578
transactions_btc             578
hash_rate_btc                578
difficulty_btc               578
miners_revenue_btc           578
dtype: int64  desde  2016-11-22  hasta  2018-06-22
price_btc                    55
total_number_btc             55
market_capitalization_btc    55
address_btc                  55
exchange_trade_btc           55
transactions_btc             55
hash_rate_btc                55
difficulty_btc               55
miners_revenue_btc           55
dtype: int64  desde  2018-06-23  hasta  2018-08-16
price_btc                    6
total_number_btc             6
market_capitalization_btc    6
address_btc                  6
exchange_trade_btc           6
transactions_btc             6
hash_rate_btc                6
difficulty_btc               6

In [25]:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

import statsmodels.api as sm

### Regresión linear
#### El estimador utilizado es OLS

In [26]:
# Variable independientes
X = mesures_bitcoin.loc["2016-01-01":,:]
# Variables predictoras
y = mesures_bitcoin.loc["2016-01-01":,"price_btc"]

In [27]:
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

# Creamos el objeto mesures_bitcoinde Regresión Linear
lm = linear_model.LinearRegression()

# Entrenamos nuestro modelo
model = lm.fit(X,y)

# Calcular la predicción
predictions = lm.predict(X)

print(predictions[:5])
print(predictions[-5:])

[429.34 432.33 433.94 428.13 433.  ]
[6434.55916667 6401.24615385 6575.22916667 6434.88166667 6543.64571429]


In [28]:
X

Unnamed: 0_level_0,price_btc,total_number_btc,market_capitalization_btc,address_btc,exchange_trade_btc,transactions_btc,hash_rate_btc,difficulty_btc,miners_revenue_btc
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2016-01-01,429.340000,15028600.0,6.452379e+09,310331.0,2.854093e+07,164132.0,7.436044e+05,1.038803e+11,1.557216e+06
2016-01-02,432.330000,15031975.0,6.498774e+09,232199.0,2.040696e+07,123623.0,6.971292e+05,1.038803e+11,1.467328e+06
2016-01-03,433.940000,15035400.0,6.524461e+09,334703.0,1.506949e+07,142904.0,7.074570e+05,1.038803e+11,1.498829e+06
2016-01-04,428.130000,15039125.0,6.438701e+09,295177.0,2.332215e+07,141064.0,7.694240e+05,1.038803e+11,1.604631e+06
2016-01-05,433.000000,15043500.0,6.513836e+09,363742.0,2.194946e+07,170176.0,9.036860e+05,1.038803e+11,1.906932e+06
2016-01-06,431.900000,15047575.0,6.499048e+09,365051.0,2.048775e+07,187012.0,8.417189e+05,1.038803e+11,1.774245e+06
2016-01-07,430.750000,15050975.0,6.483207e+09,338929.0,1.915684e+07,177478.0,7.022931e+05,1.038803e+11,1.477472e+06
2016-01-08,453.710000,15054575.0,6.830411e+09,326219.0,5.847808e+07,177873.0,7.436044e+05,1.038803e+11,1.648782e+06
2016-01-09,447.040000,15058950.0,6.731953e+09,366508.0,4.119154e+07,190001.0,9.036860e+05,1.038803e+11,1.971446e+06
2016-01-10,450.150000,15062925.0,6.780576e+09,371151.0,1.765685e+07,178374.0,8.210632e+05,1.038803e+11,1.802401e+06


In [29]:
y[1]

432.33

In [30]:
print("Los coeficientes son: \n")
for i in range(len(mesures_name)-1):
    print( lm.coef_[i] ,"  ",mesures_name[i])

Los coeficientes son: 

0.9999999999999998    price_btc
-7.239217446294713e-20    total_number_btc
-1.0833602565443285e-22    market_capitalization_btc
4.398996505627667e-19    address_btc
1.7886152907151706e-20    exchange_trade_btc
-3.2968470047616474e-19    transactions_btc
-2.669689046391503e-21    hash_rate_btc
2.634269557848366e-24    difficulty_btc


In [31]:
# MSE
print("MSE es :\n",mean_squared_error(y, predictions))

MSE es :
 3.4256856179220235e-23


In [32]:
from math import sqrt
# RMSE
print("RMSE es :\n",sqrt(mean_squared_error(y, predictions)))

RMSE es :
 5.852935688970128e-12


In [33]:
from sklearn.metrics import mean_absolute_error
# MAE
print("MAE es :\n",mean_absolute_error(y, predictions))

MAE es :
 3.283881890014572e-12


In [34]:
print("La varianza es:\n",lm.score(X,y))


La varianza es:
 1.0


In [35]:
print("La varianza es:\n",r2_score(y, predictions))

La varianza es:
 1.0


In [36]:
lm.intercept_

-4.547473508864641e-13

In [56]:
X.shape

(968, 10)

In [37]:
# Se añade la constante
X = sm.add_constant(X)
# Se crea el modelo de regresión
model = sm.OLS(y, X).fit()
# Se crea el modelo de regresión
predictions = model.predict(X) 

print_model = model.summary()
print(print_model)

                            OLS Regression Results                            
Dep. Variable:              price_btc   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 2.430e+16
Date:                Tue, 04 Sep 2018   Prob (F-statistic):               0.00
Time:                        22:55:54   Log-Likelihood:                 6570.1
No. Observations:                 968   AIC:                        -1.312e+04
Df Residuals:                     958   BIC:                        -1.307e+04
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
const                 

In [38]:
# Crear Dataframe de las prediciones
pred=pd.DataFrame(predictions)

In [39]:
# Renombrar la columna de Dataframe
pred =pred.rename(columns={0:'prediction_btc'})

In [40]:
# Crear Dataframe de las observaciones
obs=pd.DataFrame(y)

In [41]:
# Renombrar la columna de Dataframe
obs = obs.rename(columns={'price_btc':'observation_btc'})

In [42]:
# Dataframe resultado
result = obs.join(pred,lsuffix='_obs', rsuffix='_pred')

In [43]:
# Gráfica del dataframe de resultado
df_scatter(result,'Obervaciones vs predicciones')

In [44]:
for i in range(len(result-1)):
    result['dif'][i]=result['observation_btc'][i]- result['prediction_btc'][i]

KeyError: 'dif'

In [None]:
result

In [None]:
sum(result['dif'])

In [45]:
y = mesures_bitcoin.loc["2016-01-01":,"price_btc"]

In [46]:
predicted = mesures_bitcoin.loc["2018-06-23":,"price_btc"]

In [51]:
test_days = mesures_bitcoin.loc["2018-06-23":"2018-08-16",:]

In [52]:
test_days

Unnamed: 0_level_0,price_btc,total_number_btc,market_capitalization_btc,address_btc,exchange_trade_btc,transactions_btc,hash_rate_btc,difficulty_btc,miners_revenue_btc
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2018-06-23,6332.573333,17109525.0,115050900000.0,469522.0,478403900.0,204144.0,38112980.0,5077499000000.0,12793130.0
2018-06-24,6141.605833,17111525.0,103508800000.0,368136.0,558291400.0,163295.0,40384620.0,5077499000000.0,12098140.0
2018-06-25,6037.008333,17113662.5,105838400000.0,350051.0,450337200.0,158022.0,43161060.0,5077499000000.0,13225420.0
2018-06-26,6211.4475,17115625.0,105409500000.0,415602.0,501623200.0,200173.0,39627400.0,5077499000000.0,12092550.0
2018-06-27,6218.595,17117450.0,106880300000.0,410397.0,363193900.0,195404.0,36850960.0,5077499000000.0,11395190.0
2018-06-28,6105.295,17118962.5,105064000000.0,397865.0,345979200.0,188510.0,30540870.0,5077499000000.0,9288788.0
2018-06-29,6107.896154,17120925.0,105148200000.0,396405.0,262900500.0,187624.0,39627400.0,5077499000000.0,12052690.0
2018-06-30,5908.7025,17122812.5,101936000000.0,453050.0,414797800.0,197881.0,38112980.0,5077499000000.0,11236720.0
2018-07-01,6381.390833,17124575.0,101964500000.0,368307.0,489427100.0,181065.0,35588940.0,5077499000000.0,10494420.0
2018-07-02,6374.754167,17126412.5,108801500000.0,341861.0,224327900.0,156247.0,37103370.0,5077499000000.0,11673360.0


In [53]:
predicted = mesures_bitcoin.loc['2018-06-23':'2018-08-16',"price_btc"]

In [55]:
test_days.shape

(55, 9)

In [54]:
# Creamos el objeto mesures_bitcoinde Regresión Linear
lm = linear_model.LinearRegression()


# Entrenamos nuestro modelo
model = lm.fit(X,y)

# Calcular la predicción
predictions = lm.predict(test_days)


print(predictions[:5])
print(predicted[:5])

ValueError: shapes (55,9) and (10,) not aligned: 9 (dim 1) != 10 (dim 0)

In [None]:
predictions

In [None]:
pred=pd.DataFrame(predictions)

In [None]:
# Renombrar la columna de Dataframe
pred =pred.rename(columns={0:'prediction_btc'})

In [None]:
predicted = mesures_bitcoin.loc["2018-06-23":"2018-08-16","price_btc"]

In [None]:
obs=pd.DataFrame(predicted)

In [None]:
obs = obs.reset_index()

In [None]:
obs= obs.drop(columns=['Date'],axis=1)

In [None]:
result = obs.join(pred,lsuffix='_obs', rsuffix='_pred')

In [None]:
# Visualización de 
trace1 = go.Scatter(
    x = np.arange(0, len(result), 1),
    y = result['prediction_btc'],
    mode = 'lines',
    name = 'Predicted',
    line = dict(color=('rgb(244, 146, 65)'), width=2)
)
trace2 = go.Scatter(
    x = np.arange(0, len(result), 1),
    y = result['price_btc'],
    mode = 'lines',
    name = 'Observaciones',
    line = dict(color=('rgb(66, 244, 155)'), width=2)
)

data = [trace1, trace2]
layout = dict(title = 'Comparison of true prices (on the test dataset) with prices our model predicted',
             xaxis = dict(title = 'Day number'), yaxis = dict(title = 'Price, USD'))
fig = dict(data=data, layout=layout)
py.iplot(fig, filename='results_demonstrating0')

In [None]:
from sklearn.metrics import mean_absolute_error
# MAE
print("MAE es :\n",mean_absolute_error(result['price_btc'], result['prediction_btc']))