## Documentation

Building a trend-following trading algorithm using machine learning involves several steps. Here's a general framework to guide you through the process:

1. Data Collection: Gather historical price data for the assets you want to trade. Ensure the data includes relevant features like open, high, low, and closing prices, as well as any additional indicators you may want to use.

2. Data Preprocessing: Prepare the data for training by cleaning, normalizing, and transforming it into a suitable format. Split the dataset into training and testing sets to evaluate model performance.

3. Feature Engineering: Enhance the dataset by creating additional features derived from the existing ones. For trend-following strategies, you might consider indicators like moving averages, MACD, RSI, or Bollinger Bands. Experiment with various indicators to identify those that work best for your specific trading strategy.

4. Labeling: Assign labels to your dataset to indicate whether a particular period represents an uptrend, downtrend, or a neutral phase. This step is crucial for training a supervised machine learning model.

5. Model Selection: Choose an appropriate machine learning model that can capture the underlying patterns in your data and make predictions. Several models can be suitable for trend following, including logistic regression, support vector machines (SVM), random forests, gradient boosting machines (GBM), or even deep learning models like recurrent neural networks (RNNs).

6. Training: Feed your labeled dataset into the chosen model and train it on historical data. Adjust hyperparameters and evaluate the model's performance using appropriate evaluation metrics like accuracy, precision, recall, or the area under the receiver operating characteristic (ROC) curve.

7. Testing and Validation: Assess the model's performance on unseen data by testing it on your validation set. Monitor key performance metrics and ensure the model's predictive ability translates to real-time data.

8. Backtesting: Implement your trained model into a trading strategy and simulate its performance on historical data to evaluate its profitability. This step allows you to assess whether the model can generate meaningful trading signals.

9. Risk Management: Incorporate risk management techniques to mitigate potential losses. Implement stop-loss orders, position sizing, or portfolio diversification strategies to control risk exposure.

10. Live Trading: Deploy your algorithm in a live trading environment with appropriate safeguards. Continuously monitor its performance and refine the model as needed based on real-time results.

Remember that successful trading algorithms require continuous monitoring, adjustment, and adaptation. Market conditions and dynamics can change, necessitating regular updates to the model and strategy. Additionally, risk management and thorough validation are crucial to ensure the algorithm's effectiveness in real-world trading scenarios.

In [15]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler
import numpy as np

In [78]:
# Step 1: Data Collection
# Assuming you have a CSV file with historical price data, load it into a pandas DataFrame
df = yf.download("^NSEI",start="2021-06-12",end="2023-06-18")

[*********************100%***********************]  1 of 1 completed


In [79]:
# Step 2: Data Preprocessing
# Clean, normalize, and transform the data as needed
df.dropna(inplace=True)


In [80]:
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-06-14,15791.400391,15823.049805,15606.500000,15811.849609,15811.849609,392900
2021-06-15,15866.950195,15901.599609,15842.400391,15869.250000,15869.250000,323300
2021-06-16,15847.500000,15880.849609,15742.599609,15767.549805,15767.549805,340200
2021-06-17,15648.299805,15769.349609,15616.750000,15691.400391,15691.400391,357600
2021-06-18,15756.500000,15761.500000,15450.900391,15683.349609,15683.349609,640800
...,...,...,...,...,...,...
2023-06-12,18595.050781,18633.599609,18559.750000,18601.500000,18601.500000,179500
2023-06-13,18631.800781,18728.900391,18631.800781,18716.150391,18716.150391,233200
2023-06-14,18744.599609,18769.699219,18690.000000,18755.900391,18755.900391,261400
2023-06-15,18774.449219,18794.099609,18669.050781,18688.099609,18688.099609,263000


In [81]:
df['Date2'] = df.index
df.reset_index(inplace=True)
df.Date = df.Date2
df.drop(["Date2"],inplace=True,axis=1)



# sore the values in ascending order 
df.sort_values('Date',inplace = True)



In [82]:
# Normalize the price data using Min-Max scaling
scaler = MinMaxScaler()
df['Normalized Close'] = scaler.fit_transform(df['Close'].values.reshape(-1, 1))


In [83]:
df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Normalized Close
0,2021-06-14,15791.400391,15823.049805,15606.500000,15811.849609,15811.849609,392900,0.146737
1,2021-06-15,15866.950195,15901.599609,15842.400391,15869.250000,15869.250000,323300,0.162987
2,2021-06-16,15847.500000,15880.849609,15742.599609,15767.549805,15767.549805,340200,0.134197
3,2021-06-17,15648.299805,15769.349609,15616.750000,15691.400391,15691.400391,357600,0.112640
4,2021-06-18,15756.500000,15761.500000,15450.900391,15683.349609,15683.349609,640800,0.110361
...,...,...,...,...,...,...,...,...
495,2023-06-12,18595.050781,18633.599609,18559.750000,18601.500000,18601.500000,179500,0.936447
496,2023-06-13,18631.800781,18728.900391,18631.800781,18716.150391,18716.150391,233200,0.968903
497,2023-06-14,18744.599609,18769.699219,18690.000000,18755.900391,18755.900391,261400,0.980156
498,2023-06-15,18774.449219,18794.099609,18669.050781,18688.099609,18688.099609,263000,0.960962


In [84]:
# Compute additional technical indicators if needed
# For example, you can calculate the simple moving average (SMA)
sma_window = 20
df['SMA'] = df['Close'].rolling(window=sma_window).mean()

# Compute other technical indicators as required


In [85]:
# Define target variable
lookback_period = 5  # Adjust as needed
df['Target'] = df['Close'].shift(-lookback_period)
df['Target'] = np.where(df['Target'] > df['Close'], 1, -1)

In [86]:
# Drop any remaining NaN values

df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Normalized Close,SMA,Target
0,2021-06-14,15791.400391,15823.049805,15606.5,15811.849609,15811.849609,392900,0.146737,,-1
1,2021-06-15,15866.950195,15901.599609,15842.400391,15869.25,15869.25,323300,0.162987,,-1
2,2021-06-16,15847.5,15880.849609,15742.599609,15767.549805,15767.549805,340200,0.134197,,-1
3,2021-06-17,15648.299805,15769.349609,15616.75,15691.400391,15691.400391,357600,0.11264,,1
4,2021-06-18,15756.5,15761.5,15450.900391,15683.349609,15683.349609,640800,0.110361,,1


In [87]:
# Step 3: Feature Engineering
# Create additional features derived from the existing ones

import pandas as pd
import talib

# Assuming you have a preprocessed DataFrame named 'df' with relevant columns

# Compute additional features using talib indicators
df['SMA'] = talib.SMA(df['Close'], timeperiod=20)
df['EMA'] = talib.EMA(df['Close'], timeperiod=20)
df['MACD'], _, _ = talib.MACD(df['Close'])
df['RSI'] = talib.RSI(df['Close'], timeperiod=14)
df['ADX'] = talib.ADX(df['High'], df['Low'], df['Close'], timeperiod=14)
df['Stoch_K'], df['Stoch_D'] = talib.STOCH(df['High'], df['Low'], df['Close'])

# Define any other features you want to engineer using talib indicators

# Print the updated DataFrame with additional features
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Normalized Close,SMA,Target,EMA,MACD,RSI,ADX,Stoch_K,Stoch_D
0,2021-06-14,15791.400391,15823.049805,15606.5,15811.849609,15811.849609,392900,0.146737,,-1,,,,,,
1,2021-06-15,15866.950195,15901.599609,15842.400391,15869.25,15869.25,323300,0.162987,,-1,,,,,,
2,2021-06-16,15847.5,15880.849609,15742.599609,15767.549805,15767.549805,340200,0.134197,,-1,,,,,,
3,2021-06-17,15648.299805,15769.349609,15616.75,15691.400391,15691.400391,357600,0.11264,,1,,,,,,
4,2021-06-18,15756.5,15761.5,15450.900391,15683.349609,15683.349609,640800,0.110361,,1,,,,,,


In [92]:
df.dropna(inplace=True)
df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Normalized Close,SMA,Target,EMA,MACD,RSI,ADX,Stoch_K,Stoch_D
33,2021-07-30,15800.599609,15862.799805,15744.849609,15763.049805,15763.049805,400000,0.132923,15788.252539,1,15779.856617,2.205335,48.380172,16.211937,61.670196,59.568037
34,2021-08-02,15874.900391,15892.900391,15834.650391,15885.150391,15885.150391,244800,0.167488,15796.400049,1,15789.884596,10.731267,54.793989,16.093105,77.416599,65.426020
35,2021-08-03,15951.549805,16146.900391,15914.349609,16130.750000,16130.750000,341300,0.237013,15811.220068,1,15822.347967,36.880826,64.380880,15.674999,87.036493,75.374429
36,2021-08-04,16195.250000,16290.200195,16176.150391,16258.799805,16258.799805,427300,0.273263,15833.247559,1,15863.914809,67.162878,68.170903,16.031763,96.574542,87.009211
37,2021-08-05,16288.950195,16349.450195,16210.299805,16294.599609,16294.599609,418200,0.283397,15853.995020,1,15904.932409,92.978568,69.158942,16.641805,94.231284,92.614107
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,2023-06-12,18595.050781,18633.599609,18559.750000,18601.500000,18601.500000,179500,0.936447,18453.867578,-1,18464.650763,153.271194,61.347101,29.156805,31.142826,53.114805
496,2023-06-13,18631.800781,18728.900391,18631.800781,18716.150391,18716.150391,233200,0.968903,18475.350098,-1,18488.603109,155.137058,65.514370,29.482508,37.846215,40.826922
497,2023-06-14,18744.599609,18769.699219,18690.000000,18755.900391,18755.900391,261400,0.980156,18504.057617,-1,18514.059993,158.001914,66.848856,30.000311,63.579881,44.189641
498,2023-06-15,18774.449219,18794.099609,18669.050781,18688.099609,18688.099609,263000,0.960962,18531.965137,-1,18530.635194,153.037258,62.412467,30.609760,72.650733,58.025610


In [97]:
# Step 4: Labeling
# Assign labels (e.g., 1 for uptrend, -1 for downtrend, 0 for neutral)
import pandas as pd

# Assuming you have a DataFrame named 'df' with relevant columns including 'Close'

# Define a threshold for labeling
threshold = 0.02  # Adjust as needed

# Assign labels based on the percentage change in price
df['label'] = 0  # Initialize all labels as 0 (neutral)

df.loc[df['Close'].pct_change() > threshold, 'label'] = 1  # Assign 1 for uptrend
df.loc[df['Close'].pct_change() < -threshold, 'label'] = -1  # Assign -1 for downtrend

# Print the DataFrame with labels
df

array([ 0, -1,  1])

In [98]:
# Step 5: Splitting the dataset
X = df.drop('label', axis=1)  # Features
y = df['label']  # Labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [102]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Assuming you have a preprocessed DataFrame named 'df' with relevant columns

# Convert the datetime column to a numerical representation
df['Date'] = df['Date'].astype(int)

# Split the dataset into features (X) and labels (y)
X = df.drop('label', axis=1)  # Features
y = df['label']  # Labels

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model Selection and Training
model = RandomForestClassifier()  # You can try other models as well
model.fit(X_train, y_train)

# Continue with further steps of your trading model


In [101]:
type(X_train)

pandas.core.frame.DataFrame

In [104]:
# Step 7: Testing and Validation
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

Accuracy: 0.9680851063829787


In [105]:
# Step 8: Backtesting
# Implement your trading strategy and simulate its performance on historical data

import backtrader as bt
import pandas as pd
import numpy as np

# Define a strategy based on your trained model
class MyStrategy(bt.Strategy):
    def __init__(self):
        self.data_close = self.datas[0].close
        self.model = model  # Replace 'model' with your trained model

    def next(self):
        # Access the current date, close price, and other features from the data feed
        current_date = self.data.datetime.date()
        current_close = self.data_close[0]
        # Access other features as needed

        # Prepare the features to feed into the model for prediction
        features = np.array([current_close])  # Replace with your feature array

        # Make predictions using your model
        prediction = self.model.predict(features.reshape(1, -1))

        # Implement your trading logic based on the prediction
        # Place buy/sell orders, adjust portfolio, etc.

# Assuming you have a preprocessed DataFrame named 'df' with relevant columns
# Set the 'Date' column as the index
df.set_index('Date', inplace=True)

# Create a backtrader data feed from the DataFrame
data = bt.feeds.PandasData(dataname=df)

# Initialize a backtrader Cerebro instance
cerebro = bt.Cerebro()

# Add the data feed to the Cerebro instance
cerebro.adddata(data)

# Add your strategy to the Cerebro instance
cerebro.addstrategy(MyStrategy)

# Set the initial capital
initial_capital = 100000
cerebro.broker.setcash(initial_capital)

# Set the commission scheme (if applicable)
# cerebro.broker.setcommission(commission=0.001)

# Run the backtest
cerebro.run()

# Print the final portfolio value
final_value = cerebro.broker.getvalue()
print(f"Final Portfolio Value: {final_value:.2f}")


AttributeError: 'numpy.int64' object has no attribute 'to_pydatetime'

In [None]:
# Step 9: Risk Management
# Incorporate risk management techniques into your strategy


In [None]:
# Step 10: Live Trading
# Deploy your algorithm in a live trading environment with appropriate safeguards
