 Let’s build a simple machine learning-based quant strategy using Python to predict stock price direction — whether the price will go up or down tomorrow — using historical stock data and a RandomForestClassifier.

🧠 GOAL:
Predict if a stock (e.g., Tesla) will go UP (1) or DOWN (0) tomorrow based on historical price indicators.

In [None]:
#installing required libraries
!pip install yfinance ta scikit-learn -q

In [None]:
import yfinance as yf
import pandas as pd
import ta #for technincal indicators
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

In [None]:
ticker = 'TSLA'
df = yf.download(ticker, start='2020-01-01', end='2024-01-01')

In [None]:
from ta.momentum import RSIIndicator
from ta.trend import MACD

##Relative Strength Index — measures if stock is overbought/oversold
df['rsi'] = RSIIndicator(close=df['Close'].squeeze()).rsi()

##Simple Moving Average of last 10 days — short-term trend
df['sma_10'] = df['Close'].rolling(window=10).mean()

##Same but over 50 days — longer-term trend
df['sma_50'] = df['Close'].rolling(window=20).mean()

##	Measures trend momentum and possible reversals
df['macd'] = MACD(close=df['Close'].squeeze()).macd_diff()

##	Standard deviation of price — how "unstable" the stock is
df['volatility'] = df['Close'].rolling(window=10).std()


In [None]:
df['target'] = (df['Close'].shift(-1) > df['Close']).astype(int)

🔍 What This Means:
This line creates your prediction label — the value your machine learning model will try to predict.

df['Close'].shift(-1) → gets tomorrow’s close

df['Close'] → today’s close

The comparison shift(-1) > current checks if the price increased the next day

.astype(int) converts the result into:

1 → price went up

0 → price went down or stayed the same


💡 Why This Works for Quant Trading
Your ML model will now learn patterns in the quant features like RSI, SMA, MACD, etc., that are often followed by a price increase or decrease the next day.

So this becomes a binary classification problem:

"Based on today's indicators, should I expect TSLA to go up tomorrow?"

✅ Go ahead and run this line — it should work fine.

In [None]:
df.dropna(inplace=True)

In [None]:
features = ['rsi', 'sma_10', 'sma_50', 'macd', 'volatility']
X = df[features]
y = df['target']

🔍 What This Means:
You’re selecting the 5 quant indicators as your input features X.

And using the target column (price up/down tomorrow) as your label y.

Your model will now try to learn:

“Based on these indicators, can I predict whether Tesla will go up tomorrow?”

In [None]:
print("Shape of X:", X.shape)
print("Shape of y:", y.shape)
print("Any NaNs in X?", X.isna().sum().sum())
print("Any NaNs in y?", y.isna().sum())

In [None]:
from sklearn.model_selection import train_test_split

# Split 80% train, 20% test; no shuffle to preserve time order
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, shuffle=False
)

print("Train set size:", X_train.shape[0])
print("Test set size:", X_test.shape[0])

In [None]:
from sklearn.ensemble import RandomForestClassifier

# Create the model with 100 trees, fixed random seed for reproducibility
clf = RandomForestClassifier(n_estimators=100, random_state=42)

# Train on the training data
clf.fit(X_train, y_train)

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

# Predict on the test set
y_pred = clf.predict(X_test)

# Print detailed classification metrics
print(classification_report(y_test, y_pred))

# Optional: confusion matrix to see true vs predicted
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))