In [1]:
import yfinance as yf
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, VotingClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import precision_score
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LassoCV
import warnings

In [2]:
warnings.filterwarnings('ignore')

The code provided is a detailed implementation for creating a machine learning-based model that predicts the future stock price movement based on various technical indicators, market regimes, and volatility. Below is a step-by-step breakdown of what each part of the code does:

1. Calculating Advanced Technical Indicators:
The function calculate_advanced_indicators(df) computes several technical indicators for stock price analysis. These indicators are used in trading algorithms to help identify trends, momentum, and volatility.

Moving Averages:
EMA (Exponential Moving Average) and WMA (Weighted Moving Average) for various windows (e.g., 5, 10, 20, 50, 100, 200) are calculated to help identify trends and smoothing out fluctuations.
Momentum Indicators:
ROC (Rate of Change): Measures the percentage change of the stock price over a defined window.
MOM (Momentum): Measures the difference between the current price and the price a few periods earlier.
Volume-based Indicators:
VWAP (Volume-Weighted Average Price): Gives the average price of the stock weighted by its volume over a given window.
Volume Force: Measures the influence of volume on momentum.
Volatility Measures:
ParkinsonVolatility: This volatility measure is based on the high/low prices for the day and captures the price range.
Support/Resistance Levels:
Calculations for Support and Resistance are done for different window sizes, showing the minimum and maximum price levels over specific periods.
Position In Range shows where the current price is in relation to the support/resistance levels, helping to identify whether the price is in a bullish or bearish zone.
Enhanced Volatility Features:
ATR (Average True Range): Measures volatility by comparing the range between high and low prices.
Volatility Ratio: Measures volatility over different time periods, showing the relative volatility between short and long windows.
2. Market Regime Features:
The function add_market_regime_features(df) adds features that help categorize the market into different "regimes" based on volatility and trends.

Trend Detection:
The code identifies whether the market is in a strong or weak uptrend or downtrend based on comparisons of the current price to moving averages (200-period and 50-period).
Volatility Regime:
The volatility of the stock is calculated using the percentage change of the stock prices over a rolling window of 21 days.
Based on the volatility, the market is categorized into three regimes:
0: Low volatility
1: Normal volatility
2: High volatility
3. Creating the Target Variable:
The function create_target(df) defines the target variable that the model will predict.

Volatility-Adjusted Threshold:
The threshold for determining whether the stock's future price return is positive or negative is adjusted dynamically based on volatility. If volatility is high, the threshold is higher, and vice versa.
Target Variable (Target):
If the return of the stock exceeds the dynamically adjusted threshold, the target is labeled as 1 (indicating a positive price movement); otherwise, it is labeled as 0.
4. Creating an Ensemble Model:
The function create_ensemble_model() creates an ensemble of two machine learning models:

Random Forest Classifier (RandomForestClassifier): A tree-based ensemble model that helps make predictions by aggregating the results of multiple decision trees.
Gradient Boosting Classifier (GradientBoostingClassifier): A boosting technique that builds an ensemble of decision trees sequentially, each correcting the errors of the previous one.
These models are combined into a Voting Classifier, which predicts based on the majority vote from the two models. The ensemble uses a "soft" voting method (probability-based) to combine the predictions, with equal weight given to both models.

5. Predicting with Market Regime:
The function predict_with_regime(train, test, predictors, model) makes predictions on the test set based on the current market regime.

Scaling: The features are standardized using a StandardScaler so that the models can work better.
Market Regime-Aware Predictions: The model is trained and tested separately for each of the three volatility regimes (low, normal, high). The predictions are adjusted according to the regime to better fit the market conditions.
6. Enhanced Backtesting:
The function enhanced_backtest(data, predictors, start=252, step=21) performs backtesting to evaluate the model's performance over time.

The backtest is done using a rolling window approach: for each step, the model is trained on a subset of data (up to the current point) and tested on a smaller subsequent subset.
The model’s predictions are collected over time, and the results are aggregated.
7. Calculating Precision Scores:
The main execution part of the code calculates the precision score based on the model’s predictions.

Precision Score: Measures how many of the positive predictions are correct (true positives), indicating how accurate the model is when it predicts a positive outcome.
Precision by Regime: Precision is calculated separately for each market regime (volatility regime 0, 1, 2) to understand how well the model performs under different market conditions.
Main Execution:
Download Stock Data: The code retrieves historical data for the stock AAPL using the yfinance library.
Feature Engineering: It computes the advanced technical indicators, market regime features, and target variable.
Backtesting: It runs an enhanced backtest to predict the stock price movements, considering market regimes.
Precision Calculation: Finally, the precision score is computed for the model's overall performance and for each volatility regime.
Summary:
The code combines machine learning with technical analysis to predict stock price movements by considering both technical indicators and market conditions.
It uses an ensemble of models (Random Forest and Gradient Boosting) and trains them under different market regimes to make more accurate predictions.
Backtesting is performed to evaluate the model's predictive performance, and precision scores are calculated to assess the accuracy of the model’s positive predictions.
Key Features of the Code:
Market Regime-Aware Modeling: Predicts based on the current market regime (trend and volatility).
Dynamic Thresholds: Adjusts the prediction threshold based on volatility to optimize predictions.
Ensemble Model: Uses multiple models to improve prediction accuracy.

In [3]:
def calculate_advanced_indicators(df):
    """Enhanced technical indicators with market regime detection"""
    # Advanced momentum features
    for window in [5, 10, 20, 50, 100, 200]:
        # Enhanced moving averages with Wilder's smoothing
        df[f'EMA_{window}'] = df['Close'].ewm(span=window, adjust=False).mean()
        df[f'WMA_{window}'] = df['Close'].ewm(alpha=1/window, adjust=False).mean()
        
        # Dynamic momentum indicators
        df[f'ROC_{window}'] = df['Close'].pct_change(window) * 100
        df[f'MOM_{window}'] = df['Close'].diff(window)
        
        # Volume-weighted indicators
        df[f'VWAP_{window}'] = (df['Close'] * df['Volume']).rolling(window=window).sum() / df['Volume'].rolling(window=window).sum()
        df[f'Volume_Force_{window}'] = df['Volume'] * df[f'ROC_{window}']
        
        # Volatility measures
        df[f'ParkinsonVol_{window}'] = np.sqrt(
            (1 / (4 * np.log(2))) * 
            (np.log(df['High'] / df['Low'])**2).rolling(window).mean()
        ) * np.sqrt(252)
    
    # Market regime features
    df['Trend_Strength'] = abs(
        df['Close'].rolling(20).mean() - df['Close'].rolling(50).mean()
    ) / df['Close'].rolling(20).std()
    
    # Support/Resistance levels
    for window in [20, 50, 100]:
        df[f'Support_{window}'] = df['Low'].rolling(window=window).min()
        df[f'Resistance_{window}'] = df['High'].rolling(window=window).max()
        df[f'Position_In_Range_{window}'] = (
            (df['Close'] - df[f'Support_{window}']) / 
            (df[f'Resistance_{window}'] - df[f'Support_{window}'])
        )
    
    # Enhanced volatility features
    df['ATR'] = (
        df['High'].rolling(14).max() - df['Low'].rolling(14).min()
    ) / df['Close'].rolling(14).mean()
    
    df['Volatility_Ratio'] = (
        df['Close'].rolling(10).std() / df['Close'].rolling(30).std()
    )
    
    return df

def add_market_regime_features(df):
    """Detect and encode market regimes"""
    # Trend detection
    df['Trend'] = np.where(
        df['Close'] > df['Close'].rolling(200).mean(),
        np.where(
            df['Close'] > df['Close'].rolling(50).mean(),
            2,  # Strong uptrend
            1   # Weak uptrend
        ),
        np.where(
            df['Close'] < df['Close'].rolling(50).mean(),
            -2, # Strong downtrend
            -1  # Weak downtrend
        )
    )
    
    # Volatility regime
    vol = df['Close'].pct_change().rolling(21).std()
    df['Volatility_Regime'] = np.where(
        vol > vol.rolling(252).mean() + vol.rolling(252).std(),
        2,  # High volatility
        np.where(
            vol < vol.rolling(252).mean() - vol.rolling(252).std(),
            0,  # Low volatility
            1   # Normal volatility
        )
    )
    
    return df

def create_target(df, volatility_window=20):
    """Dynamic threshold based on market conditions"""
    # Calculate volatility-adjusted threshold
    volatility = df['Close'].pct_change().rolling(volatility_window).std()
    base_threshold = 0.015  # 1.5% base threshold
    
    # Adjust threshold based on volatility regime
    df['Dynamic_Threshold'] = base_threshold * (1 + volatility)
    
    # Create target with dynamic threshold
    df['Tomorrow'] = df['Close'].shift(-1)
    df['Return'] = (df['Tomorrow'] - df['Close']) / df['Close']
    df['Target'] = (df['Return'] > df['Dynamic_Threshold']).astype(int)
    
    return df

def create_ensemble_model():
    """Create an ensemble of different models"""
    rf = RandomForestClassifier(
        n_estimators=500,
        min_samples_split=20,
        min_samples_leaf=10,
        max_depth=10,
        max_features='sqrt',
        class_weight={0: 1, 1: 3},
        random_state=42,
        n_jobs=-1
    )
    
    gb = GradientBoostingClassifier(
        n_estimators=200,
        learning_rate=0.05,
        max_depth=4,
        min_samples_leaf=15,
        subsample=0.8,
        random_state=42
    )
    
    return VotingClassifier(
        estimators=[('rf', rf), ('gb', gb)],
        voting='soft',
        weights=[1, 1]
    )

def predict_with_regime(train, test, predictors, model, probability_threshold=0.7):
    """Make predictions considering market regime"""
    scaler = StandardScaler()
    train_scaled = scaler.fit_transform(train[predictors])
    test_scaled = scaler.transform(test[predictors])
    
    # Separate models for different volatility regimes
    predictions = pd.Series(index=test.index, dtype=float)
    
    for regime in [0, 1, 2]:  # Different volatility regimes
        regime_mask_train = train['Volatility_Regime'] == regime
        regime_mask_test = test['Volatility_Regime'] == regime
        
        if regime_mask_train.sum() > 50:  # Minimum samples for training
            model.fit(
                train_scaled[regime_mask_train], 
                train.loc[regime_mask_train, 'Target']
            )
            
            if regime_mask_test.sum() > 0:
                probas = model.predict_proba(test_scaled[regime_mask_test])
                # Adjust threshold based on regime
                adjusted_threshold = probability_threshold * (1 + regime * 0.1)
                predictions[regime_mask_test] = (probas[:, 1] > adjusted_threshold).astype(int)
    
    return predictions.fillna(0)


def enhanced_backtest(data, predictors, start=252, step=21):
    """Enhanced backtesting with regime-aware predictions"""
    model = create_ensemble_model()
    all_predictions = []
    
    for i in range(start, data.shape[0], step):
        train = data.iloc[0:i].copy()
        test = data.iloc[i:(i+step)].copy()
        predictions = predict_with_regime(train, test, predictors, model)
        
        # Convert the Series into a DataFrame with column name 'Predictions'
        predictions_df = pd.DataFrame(predictions, columns=['Predictions'])
        all_predictions.append(predictions_df)
    
    return pd.concat(all_predictions)

# Main execution
aapl = yf.Ticker("AAPL")
aapl = aapl.history(period="max")

# Create enhanced features
aapl = calculate_advanced_indicators(aapl)
aapl = add_market_regime_features(aapl)
aapl = create_target(aapl)
aapl = aapl.loc['2017-01-01':].copy()
aapl = aapl.dropna()

# Define predictors
exclude_columns = ['Target', 'Tomorrow', 'Return', 'Open', 'High', 'Low', 'Close', 'Volume', 
                  'Dynamic_Threshold']
predictors = [col for col in aapl.columns if col not in exclude_columns]

# Run enhanced backtest
predictions = enhanced_backtest(aapl, predictors)

# Calculate final precision score
final_precision = precision_score(
    aapl.iloc[-len(predictions):]['Target'],
    predictions['Predictions'].fillna(0)
)
print(f"Final Precision Score: {final_precision}")

# Calculate precision by regime
for regime in [0, 1, 2]:
    regime_mask = aapl['Volatility_Regime'].iloc[-len(predictions):] == regime
    if regime_mask.sum() > 0:
        regime_precision = precision_score(
            aapl.iloc[-len(predictions):]['Target'][regime_mask],
            predictions['Predictions'][regime_mask].fillna(0)
        )
        print(f"Precision Score for Volatility Regime {regime}: {regime_precision}")

Final Precision Score: 0.4375
Precision Score for Volatility Regime 0: 0.0
Precision Score for Volatility Regime 1: 0.4375
Precision Score for Volatility Regime 2: 0.0
