Model Workflow
Data Loading &  Processing

Data source: ccxt for real-time and historical data on OHLCV(open, high, low, closing & volume data) for digital currency (e.g. BTC) using a 1d (daily) timeframe 
Feature Engineering : calculate price, volume & volatility-based indicators 
- Price: Simple Moving Average(SMA), EMA, RSI, MACD (for momentum signals)
--> RSI: measure 
- Volume: OBV (On-Balance Volume), Money Flow Index (MFI)
- Volatility: ATR (Average True Range), Bollinger Bands Width(BBW)

Further Feature engineering: implementing Time Series 

- Add weighted averages based on regression of lagged values, using partial autocorrelation (PACF) to identify autoregressive lags for feature selection
- Implement time-series modelling to forecast short-term price movements using ARIMA, & selecting best ARIMA model based on AIC/BIC criteria to obtain parameters (p - autoregressive order & q - simple moving average order)
Filling missing data with interpolation & normalization

Feature Selection through Pre-calculated Technical Indicators 

Trend prediction: inputs - SMA, EMA, using Linear regression (OLS), or support vector machine
Market Regime Detection: inputs - RSI, ATR, SMA using K-means Clustering
Buy/Sell Signals for Binary Classification: inputs - RSI, MACD, OBV, using Random Forest Classifier 

Model Training & Prediction: Machine Learning

Time Series Modelling: ARIMA for short -term price, GARCH for volatility forecasting 
Price Forecasting: LSTM, utilizing keras to implement long-term price forecasting
Strategy Implementation:
Combine signals and implement reinforcement learning to create a trading strategy

In [27]:
#Data Loading
import ccxt
import pandas as pd
import time
from datetime import datetime, timedelta

# Initialize exchange
exchange = ccxt.binance()

# Define symbol and timeframe
symbol = 'BTC/USDT'
timeframe = '1d'

# Calculate timestamp for 1 year ago 
end_date = datetime(2025, 2, 28)  # Last day of February 2025
start_date = end_date - timedelta(days=365)  

# Convert to milliseconds (Binance API uses Unix timestamp in milliseconds)
since = int(start_date.timestamp() * 1000)

# Fetch historical OHLCV data
ohlcv = exchange.fetch_ohlcv(symbol, timeframe, since=since)

# Convert to DataFrame
data = pd.DataFrame(ohlcv, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
data['timestamp'] = pd.to_datetime(data['timestamp'], unit='ms')
data.set_index('timestamp', inplace=True)

# Display the first few rows
print(data.head())
print(data.tail())  # Check the latest date to ensure correct range


                open      high       low     close       volume
timestamp                                                      
2024-02-29  62432.11  63676.35  60364.70  61130.98  78425.07603
2024-03-01  61130.99  63114.23  60777.00  62387.90  47737.93473
2024-03-02  62387.90  62433.19  61561.12  61987.28  25534.73659
2024-03-03  61987.28  63231.88  61320.00  63113.97  28994.90903
2024-03-04  63113.97  68499.00  62300.00  68245.71  84835.16005
                open      high       low     close       volume
timestamp                                                      
2025-02-27  84250.09  87078.46  82716.49  84708.58  42505.45439
2025-02-28  84708.57  85120.00  78258.52  84349.94  83648.03969
2025-03-01  84349.95  86558.00  83824.78  86064.53  25785.05464
2025-03-02  86064.54  95000.00  85050.60  94270.00  54889.09045
2025-03-03  94269.99  94416.46  89117.00  90330.87  36236.60519


In [30]:
#Calculation of Technical Indicators
from ta.trend import SMAIndicator, EMAIndicator, MACD
from ta.momentum import RSIIndicator
from ta.volatility import BollingerBands, AverageTrueRange
from ta.volume import OnBalanceVolumeIndicator, VolumeWeightedAveragePrice, money_flow_index
import numpy as np
from sklearn.linear_model import LinearRegression
from statsmodels.graphics.tsaplots import plot_pacf
import matplotlib.pyplot as plt


# Calculate technical indicators
data['SMA'] = SMAIndicator(data['close'], window=14, fillna = True).sma_indicator()
data['EMA'] = EMAIndicator(data['close'], window=14, fillna = True).ema_indicator()
data['RSI'] = RSIIndicator(data['close'], window=14, fillna = True).rsi()
macd = MACD(data['close'], window_fast=12, window_slow=26, window_sign=9, fillna = True)
data['MACD'] = macd.macd() 
data['MACD_signal'] = macd.macd_signal()
data['MACD_hist'] = macd.macd_diff()
data['OBV'] = OnBalanceVolumeIndicator(data['close'], data['volume'], fillna = True).on_balance_volume()
data['MFI'] = money_flow_index(data['high'], data['low'], data['close'], data['volume'], window=14, fillna = True)
data['ATR'] = AverageTrueRange(data['high'], data['low'], data['close'], window=14,fillna = True).average_true_range()
bands = BollingerBands(data['close'], window=20, window_dev = 2, fillna = True)
data['upper_band'] = bands.bollinger_hband()
data['middle_band'] = bands.bollinger_mavg()
data['lower_band'] = bands.bollinger_lband() 
data['BBW'] = (data['upper_band'] - data['lower_band']) / data['middle_band']

#inspect data
print(data.head())
print(data.tail())
print(data.info())


                open      high       low     close       volume           SMA  \
timestamp                                                                       
2024-02-29  62432.11  63676.35  60364.70  61130.98  78425.07603  61130.980000   
2024-03-01  61130.99  63114.23  60777.00  62387.90  47737.93473  61759.440000   
2024-03-02  62387.90  62433.19  61561.12  61987.28  25534.73659  61835.386667   
2024-03-03  61987.28  63231.88  61320.00  63113.97  28994.90903  62155.032500   
2024-03-04  63113.97  68499.00  62300.00  68245.71  84835.16005  63373.168000   

                     EMA         RSI        MACD  MACD_signal   MACD_hist  \
timestamp                                                                   
2024-02-29  61130.980000  100.000000    0.000000     0.000000    0.000000   
2024-03-01  61298.569333  100.000000  100.267123    20.053425   80.213698   
2024-03-02  61390.397422   74.446344  145.723049    45.187349  100.535700   
2024-03-03  61620.207099   85.595004  269.55441

In [None]:
# # Calculate weighted averages using regression on lagged values
# lags = 5
# for i in range(1, lags + 1):
#     data[f'lag_{i}'] = data['close'].shift(i)
# data.dropna(inplace=True)

# X = data[[f'lag_{i}' for i in range(1, lags + 1)]]
# y = data['close']
# model = LinearRegression()
# model.fit(X, y)
# weights = model.coef_
# data['weighted_avg'] = np.dot(X, weights)

# # Plot PACF to identify autoregressive lags
# plot_pacf(data['close'], lags=20)
# plt.title('Partial Autocorrelation Function (PACF)')
# plt.show()