Project Implementing on Random Forest and
GradientBoostingRegressor Model
Imports: The script imports necessary libraries including
requests for making HTTP requests, pandas for data
manipulation, numpy for numerical computations, GridSearchCV
and RandomForestRegressor from scikit-learn for building the
machine learning model.
Data Retrieval: It constructs a URL to fetch historical price
data (candlestick data) for a specified cryptocurrency symbol
('BTCUSDT') and time interval (5 minutes) using the Binance
API. It then fetches the data and converts it into a pandas
DataFrame.
Data Preprocessing: The script preprocesses the data by
dropping unnecessary columns ('close_time', 'qav',
'is_best_match') and creating a new column 'target'
containing the next close price (shifted by one time
interval) as the target variable for prediction.
Model Training and Prediction: It iterates through the
dataset and splits it into training and test sets. For each
iteration, it performs a grid search over hyperparameters
(n_estimators and max_depth) of the Random Forest model using
time series cross-validation (tscv). It then fits the model
to the training data and prints the forecasted price for the
test data.
Output: The script prints the forecasted price for each
iteration of the model.
Comments: There are multiple comments throughout the code,
indicating the source of the instructions and encouraging
viewers to like and subscribe to a YouTube channel ("Bitone
Great").


In [2]:
import requests
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from datetime import datetime

In [3]:
symbol = 'BTCUSDT'
timeinterval = 5


In [4]:
url =  'https://fapi.binance.com/fapi/v1/klines?symbol=' + symbol + '&interval=' + str(timeinterval) + 'm' + '&limit=100'
data = requests.get(url).json()

In [5]:
D = pd.DataFrame(data)
D.columns = ['open_time', 'open', 'high', 'low', 'close', 'volume', 'close_time', 'qav', 'num_trades',
             'taker_base_vol', 'taker_quote_vol', 'is_best_match']
display(D)

Unnamed: 0,open_time,open,high,low,close,volume,close_time,qav,num_trades,taker_base_vol,taker_quote_vol,is_best_match
0,1734001500000,100473.40,100634.50,100473.30,100490.60,296.493,1734001799999,29814842.78600,8915,171.899,17285637.68660,0
1,1734001800000,100490.50,100490.50,100211.60,100227.70,1248.631,1734002099999,125256019.75080,21670,425.239,42657058.19710,0
2,1734002100000,100227.60,100390.60,100210.30,100254.40,592.077,1734002399999,59375854.78310,11239,334.014,33499560.97340,0
3,1734002400000,100254.40,100413.00,100150.00,100327.50,1081.824,1734002699999,108445781.42210,16202,442.330,44348113.03930,0
4,1734002700000,100327.50,100446.80,100285.00,100446.80,424.178,1734002999999,42576000.73320,8462,247.890,24882815.26610,0
...,...,...,...,...,...,...,...,...,...,...,...,...
95,1734030000000,100833.80,100994.00,100768.00,100882.60,659.207,1734030299999,66484572.38850,14500,317.769,32054753.07450,0
96,1734030300000,100882.70,101055.30,100837.60,100952.30,862.023,1734030599999,87014677.81820,14205,638.631,64470245.86020,0
97,1734030600000,100952.30,101000.00,100754.30,100762.20,783.127,1734030899999,79004831.15780,11689,364.758,36799851.68560,0
98,1734030900000,100762.20,100786.50,100312.10,100479.00,2594.340,1734031199999,260844753.89080,36493,781.713,78610800.25310,0


In [6]:
D['open_time'] = pd.to_datetime(D['open_time'], unit='ms')

In [7]:
D['month'] = D['open_time'].dt.month
D['day'] = D['open_time'].dt.day
D['year'] = D['open_time'].dt.year

In [8]:
D = D.drop(['open_time', 'close_time', 'qav', 'is_best_match'], axis=1)

In [9]:
X = D.drop(['close'], axis=1)
y = D['close']

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [11]:
RF = RandomForestRegressor(random_state=42)
RF.fit(X_train, y_train)

In [12]:
y_pred = RF.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print('Mean Squared Error:', mse)

Mean Squared Error: 12155.93712219719


In [14]:
RF.fit(X_train, y_train)

In [15]:
prd=RF.predict(X_test)
print(type(prd))
print('Predicted Forecasted Price ', RF.predict(X_test))
ard=y_test
print(type(ard))
print('Actual Forecasted Price ', y_test)
print('Diff between Actual and Predicted Forecased Price (close price )')
ard=ard.astype(float)
prd-ard

<class 'numpy.ndarray'>
Predicted Forecasted Price  [101447.785 101225.955 101568.139 101224.62  101207.072 101250.923
 100805.321 101478.912 100279.289 100519.967 100775.51  100882.459
 101557.374 101008.551 101171.576 100327.656 101344.46  101497.667
 100486.085 101063.391]
<class 'pandas.core.series.Series'>
Actual Forecasted Price  83    101413.50
53    101354.60
70    101449.80
45    101485.00
44    101303.10
39    101388.30
22    100811.50
80    101413.70
10    100339.80
0     100490.60
18    100876.20
30    100800.00
73    101563.00
33    100989.20
90    100985.60
4     100446.80
76    101515.30
77    101564.90
12    100539.30
31    101000.00
Name: close, dtype: object
Diff between Actual and Predicted Forecased Price (close price )


83     34.285
53   -128.645
70    118.339
45   -260.380
44    -96.028
39   -137.377
22     -6.179
80     65.212
10    -60.511
0      29.367
18   -100.690
30     82.459
73     -5.626
33     19.351
90    185.976
4    -119.144
76   -170.840
77    -67.233
12    -53.215
31     63.391
Name: close, dtype: float64

In [16]:
y

0     100490.60
1     100227.70
2     100254.40
3     100327.50
4     100446.80
        ...    
95    100882.60
96    100952.30
97    100762.20
98    100479.00
99    100402.40
Name: close, Length: 100, dtype: object

In [17]:
# Evaluate model
mse = mean_squared_error(y_test, y_pred)
print('Mean Squared Error (MSE):', mse)

Mean Squared Error (MSE): 12155.93712219719


In [18]:
# Calculate training and testing accuracy
training_accuracy = RF.score(X_train, y_train)
testing_accuracy = RF.score(X_test, y_test)

print('Training Accuracy:', training_accuracy)
print('Testing Accuracy:', testing_accuracy)

Training Accuracy: 0.9932138121865949
Testing Accuracy: 0.9241393725832902


In [19]:
import matplotlib.pyplot as plt
# Convert y_pred to a pandas Series with the same index as y_test
y_pred_series = pd.Series(y_pred, index=y_test.index)
# Plotting actual vs predicted prices
plt.figure(figsize=(10, 6))
plt.plot(X_test['open'].head(10),y_test.head(10), label='Actual Price', colo
# plt.plot(y_pred_series.index, y_pred_series.values, label='Predicted Price
plt.xlabel('Open_price')
plt.ylabel('Close_price')
# plt.title('Actual vs Predicted Price')

SyntaxError: invalid syntax (3314125167.py, line 8)