## [XGBoost](https://xgboost.readthedocs.io/en/latest/index.html)

I just wanted to try this algorithm as an alternative to the [LSTM](https://keras.io/api/layers/recurrent_layers/lstm/) model

> - XGBoost is an optimized distributed gradient boosting library designed to be highly efficint, flexible and portable
> - it uses *ensamble* of decision trees where new trees correct errors present in the model
> - trees are added until no further improvemnt can be made to the model



## import dependencies

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from xgboost import XGBRegressor as xgb

### load data

In [2]:
file_path = "https://bc-dataviz-bucket11.s3.us-east-2.amazonaws.com/BTC-USD.csv"
df = pd.read_csv(file_path, index_col="Date")
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2014-09-17,465.864014,468.174011,452.421997,457.334015,457.334015,21056800
2014-09-18,456.859985,456.859985,413.104004,424.440002,424.440002,34483200
2014-09-19,424.102997,427.834991,384.532013,394.79599,394.79599,37919700
2014-09-20,394.673004,423.29599,389.882996,408.903992,408.903992,36863600
2014-09-21,408.084991,412.425995,393.181,398.821014,398.821014,26580100


### Construct featture table from the Close column 

In [3]:
features=df[['Close']].copy()
window_size=5
for i in range(1, window_size+1): 
    features[f'Close_T-{i}']=features['Close'].shift(i)

features=features.dropna()
features.head()

Unnamed: 0_level_0,Close,Close_T-1,Close_T-2,Close_T-3,Close_T-4,Close_T-5
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2014-09-22,402.152008,398.821014,408.903992,394.79599,424.440002,457.334015
2014-09-23,435.790985,402.152008,398.821014,408.903992,394.79599,424.440002
2014-09-24,423.204987,435.790985,402.152008,398.821014,408.903992,394.79599
2014-09-25,411.574005,423.204987,435.790985,402.152008,398.821014,408.903992
2014-09-26,404.424988,411.574005,423.204987,435.790985,402.152008,398.821014


In [4]:
y=features[['Close']]
X=features.drop(columns=['Close'])

In [5]:
X_train=X.loc[:'2021-06']
y_train=y.loc[:'2021-06']
X_test=X.loc['2021-07':]
y_test=y.loc['2021-07':]

### Train model

In [6]:
xgb_model = xgb(n_estimators=1000, objective="reg:squarederror", learning_rate=0.01)
xgb_model.fit(X_train, y_train)

XGBRegressor(learning_rate=0.01, n_estimators=1000,
             objective='reg:squarederror')

In [7]:
predictions = xgb_model.predict(X_test)
predictions[:10]

array([35716.11 , 35280.81 , 35248.293, 35242.26 , 35434.777, 35296.598,
       35296.598, 35280.81 , 32879.336, 35296.598], dtype=float32)

### Plot actual and predicted closing prices

In [8]:
test_df = y_test.copy()
test_df['pred'] = predictions
test_df.head()

Unnamed: 0_level_0,Close,pred
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2021-07-01,33572.117188,35716.109375
2021-07-02,33897.046875,35280.808594
2021-07-03,34668.546875,35248.292969
2021-07-04,35287.78125,35242.261719
2021-07-05,33746.003906,35434.777344


In [9]:
px.line(test_df, y=["Close"], title="Actual Close")

In [10]:
px.line(test_df, y=["pred"], title="Predicted Close")