<a href="https://www.kaggle.com/code/dascient/crypto-forecast-using-statsmodels-varmax?scriptVersionId=216315524" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Welcome to our Universe
## [@donutz.ai](www.donutz,ai/#)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

**Get our data**

In [None]:
# bring in data
df = pd.read_csv('/kaggle/input/all-crypto-currencies/crypto-markets.csv')

# isolate BTC
df = df.loc[df.symbol == 'BTC']
df.head()

In [None]:
df.set_index('date')['close'].tail(500).plot(figsize=(15,4), title='BTC - close')

**Scale data**

In [None]:
# Use Scikit-learn to transform with maximum absolute scaling
from sklearn.preprocessing import MinMaxScaler,MaxAbsScaler

scaled_df = df.set_index('date')[['open','high','low','close']]

scaler = MaxAbsScaler()
scaler.fit(scaled_df)
scaled = scaler.transform(scaled_df)
scaled_df = pd.DataFrame(scaled_df, columns=scaled_df.columns)

# use first 1000 days as train
train = scaled_df[:1000]

# simulation window
window = 100 # days look ahead
test = scaled_df[1000:1000+window]

We will feed the model with our training data as the first 1000 days.

In [None]:
train.tail()

Then our predicted model will be compared against the test set, which captures 100 days into our observed future.

In [None]:
test.head()

**Initiate VARMAX Modeling**

As well as, output model summary for each variable [o,h,l,c].

In [None]:
# VARMA example
from statsmodels.tsa.statespace.varmax import VARMAX

# contrived dataset
data = train[['open','high','low','close']]
# fit model
model = VARMAX(data, freq = 'D', order=(1,1), seasonal_order=(0, 0, 0, 0),
                mle_regression = True,
                filter_concentrated = True)
model_fit = model.fit(disp=True)

# make prediction
yhat = model_fit.predict(len(data), len(data))

from IPython.display import clear_output
clear_output()

model_fit.summary()

In [None]:
# simulate prediction
model_fit.plot_diagnostics(3,figsize=(20,5))

In [None]:
# plot
pred = pd.DataFrame()
pred['prediction'] = model_fit.simulate(window).reset_index(drop=True).close
pred['observed'] = test.reset_index(drop=True).close
pred.plot(figsize=(20,5),title='prediction vs observed',color=['blue','black'],style=['--','-'])

**Plot prediction - observed**

In [None]:
pred['prediction - observed'] = pred['prediction'] - pred['observed']
pred.plot(figsize=(20,5),title='forecast error',color=['blue','black','green'],style=['--','-',':'])