<a href="https://colab.research.google.com/github/Navneeth08k/dayTradingModel/blob/main/GradientBoostingSwingTrading_IPYNB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Gradient Boosting approach to swing trading.
Gradient boosting strengths: Good for tabular data and can capture complex patterns in historcial data

Weaknesses: No time-awareness

In [28]:
#requirements install
pip install alpha_vantage pandas




In [56]:
from alpha_vantage.timeseries import TimeSeries
import pandas as pd

# Set your API key
api_key = 'WF7QE42TQTE34NG8'

# Initialize TimeSeries
ts = TimeSeries(key=api_key, output_format='pandas')

# Fetch daily data for a specific stock
symbol = 'AAPL'
data, meta_data = ts.get_daily(symbol=symbol, outputsize='full')

# Rename columns for clarity
data = data.rename(columns={
    '1. open': 'Open',
    '2. high': 'High',
    '3. low': 'Low',
    '4. close': 'Close',
    '5. volume': 'Volume'
})

# Reverse the order to make the earliest data first
data = data.sort_index()

# Display the first few rows
print(data.tail())


              Open      High     Low   Close      Volume
date                                                    
2025-01-03  243.36  244.1800  241.89  243.36  40244114.0
2025-01-06  244.31  247.3300  243.20  245.00  45045571.0
2025-01-07  242.98  245.5500  241.35  242.21  40855960.0
2025-01-08  241.92  243.7123  240.05  242.70  37628940.0
2025-01-10  240.01  240.1600  233.00  236.85  61373819.0


Below I will be adding technical indicators that are commonly used by swing traders. I added SMA with period 10, sma with period 50, rsi, and bollinger bands.

In [57]:
# Simple Moving Average (SMA)
data['sma_10'] = data['Close'].rolling(window=10).mean()
data['sma_50'] = data['Close'].rolling(window=50).mean()

# Relative Strength Index (RSI)
delta = data['Close'].diff()
gain = (delta.where(delta > 0, 0)).rolling(window=14).mean()
loss = (-delta.where(delta < 0, 0)).rolling(window=14).mean()
data['rsi'] = 100 - (100 / (1 + gain / loss))

# Bollinger Bands
data['bb_upper'] = data['sma_10'] + 2 * data['Close'].rolling(window=10).std()
data['bb_lower'] = data['sma_10'] - 2 * data['Close'].rolling(window=10).std()

# Drop rows with NaN values created by rolling computations
data = data.dropna()

# Display the first few rows with indicators
print(data[['Close', 'sma_10', 'sma_50', 'rsi', 'bb_upper', 'bb_lower']].head())


             Close   sma_10   sma_50        rsi    bb_upper   bb_lower
date                                                                  
2000-01-11   92.75  100.725  97.3678  41.466830  111.284896  90.165104
2000-01-12   87.19   99.375  97.5592  39.397971  112.970271  85.779729
2000-01-13   96.75   99.019  97.8892  44.896416  112.691677  85.346323
2000-01-14  100.44   98.782  98.2680  50.860887  112.243147  85.320853
2000-01-18  103.94   97.982  98.6744  54.227320  108.623169  87.340831


Below set target. I am saying that the target is at least 1 percent gain in 5 days

In [63]:
data["target"] = ((data["Close"].shift(-5) - data["Close"]) / data["Close"] >= 0.01).astype(int)  # 1% gain in 5 days



In [69]:
from sklearn.utils import resample
from sklearn.model_selection import TimeSeriesSplit

# Define features and target
X = data[['sma_10', 'sma_50', 'rsi', 'bb_upper', 'bb_lower', 'Volume', 'Close']]
y = data['target']

# Combine features and target into a single dataset
dataset = pd.concat([X, y], axis=1)

# Separate majority and minority classes
majority = dataset[dataset['target'] == 0]
minority = dataset[dataset['target'] == 1]

# Upsample minority class to balance the dataset
minority_upsampled = resample(minority,
                               replace=True,
                               n_samples=len(majority),
                               random_state=42)

# Combine majority and upsampled minority
balanced_dataset = pd.concat([majority, minority_upsampled])

# Separate features and target from the balanced dataset
X_balanced = balanced_dataset.drop('target', axis=1)
y_balanced = balanced_dataset['target']

# Use time-series split for training and testing
tscv = TimeSeriesSplit(n_splits=5)

for train_index, test_index in tscv.split(X_balanced):
    X_train, X_test = X_balanced.iloc[train_index], X_balanced.iloc[test_index]
    y_train, y_test = y_balanced.iloc[train_index], y_balanced.iloc[test_index]

    # Display shapes for debugging
    print(f"X_train shape: {X_train.shape}, y_train shape: {y_train.shape}")
    print(f"X_test shape: {X_test.shape}, y_test shape: {y_test.shape}")


X_train shape: (63, 7), y_train shape: (63,)
X_test shape: (63, 7), y_test shape: (63,)
X_train shape: (126, 7), y_train shape: (126,)
X_test shape: (63, 7), y_test shape: (63,)
X_train shape: (189, 7), y_train shape: (189,)
X_test shape: (63, 7), y_test shape: (63,)
X_train shape: (252, 7), y_train shape: (252,)
X_test shape: (63, 7), y_test shape: (63,)
X_train shape: (315, 7), y_train shape: (315,)
X_test shape: (63, 7), y_test shape: (63,)


Train the model. Then backtest it using a starting bankroll of 10000.

In [70]:
from sklearn.ensemble import GradientBoostingClassifier

# Initialize and train the model
model = GradientBoostingClassifier()
model.fit(X_train, y_train)

# Evaluate the model
accuracy = model.score(X_test, y_test)
print(f"Model Accuracy: {accuracy:.2f}")


Model Accuracy: 0.86


In [71]:
# Function to backtest on a single stock
def backtest_stock(data, symbol):
    X = data[["sma_10", "sma_50", "rsi", "bb_upper", "bb_lower", "Volume", "Close"]]
    y = data["target"]

    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

    # Train model
    model = GradientBoostingClassifier()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    # Backtest logic
    balance = initial_balance
    position = 0
    portfolio_values = []
    position_days = 0
    trade_log = []
    buy_signals = []
    sell_signals = []
    wins, losses = 0, 0
    entry_price = 0

    for i in range(len(y_test)):
        price = X_test.iloc[i]["Close"]
        if y_pred[i] == 1 and position == 0 and position_days == 0:  # Buy signal
            trade_amount = min(balance * fraction_to_trade, max_position_size) * (1 - transaction_fee - slippage)
            shares_to_buy = trade_amount / price
            position += shares_to_buy
            balance -= trade_amount
            position_days = holding_period
            entry_price = price
            buy_signals.append(i)
            trade_log.append({
                "Type": "Buy",
                "Price": price,
                "Shares": shares_to_buy,
                "Balance": balance,
                "Portfolio Value": balance + position * price
            })
        elif y_pred[i] == 0 and position > 0 and position_days <= 0:  # Sell signal
            trade_amount = position * price * (1 - transaction_fee - slippage)
            balance += trade_amount
            profit_or_loss = (price - entry_price) * position
            if profit_or_loss > 0:
                wins += 1
            else:
                losses += 1
            sell_signals.append(i)
            trade_log.append({
                "Type": "Sell",
                "Price": price,
                "Shares": position,
                "Balance": balance,
                "Profit/Loss": profit_or_loss,
                "Portfolio Value": balance
            })
            position = 0
            entry_price = 0




        position_days = max(position_days - 1, 0)
        portfolio_values.append(balance + position * price)

    print(data['target'].value_counts())
    print(f"y_pred unique values: {np.unique(y_pred)}")
    print(data.describe())



    final_balance = balance + position * X_test.iloc[-1]["Close"]
    cumulative_returns = (final_balance - initial_balance) / initial_balance
    daily_returns = np.diff(portfolio_values) / portfolio_values[:-1]
    sharpe_ratio = daily_returns.mean() / daily_returns.std() * np.sqrt(252)
    total_trades = wins + losses
    win_rate = wins / total_trades if total_trades else 0  # Assign 0 if no trades
    # Print summary
    print(f"Stock: {symbol}")
    print(f"Final Portfolio Value: ${final_balance:.2f}")
    print(f"Cumulative Returns: {cumulative_returns:.2%}")
    print(f"Sharpe Ratio: {sharpe_ratio:.2f}")
    print(f"Winning Trades: {wins}, Losing Trades: {losses}")
    print(f"Win Rate: {win_rate:.2%}")
    '''
    # Plot stock price with buy/sell signals and Bollinger Bands
    plt.figure(figsize=(14, 8))
    plt.plot(X_test["Close"].values, label="Stock Price", color="blue", alpha=0.7)
    plt.plot(X_test["bb_upper"].values, label="Bollinger Upper Band", color="orange", linestyle="--", alpha=0.6)
    plt.plot(X_test["bb_lower"].values, label="Bollinger Lower Band", color="orange", linestyle="--", alpha=0.6)
    plt.scatter(buy_signals, X_test["Close"].iloc[buy_signals], marker="^", color="green", label="Buy Signal", s=100)
    plt.scatter(sell_signals, X_test["Close"].iloc[sell_signals], marker="v", color="red", label="Sell Signal", s=100)
    plt.title(f"{symbol}: Stock Price with Buy/Sell Signals and Bollinger Bands")
    plt.xlabel("Time")
    plt.ylabel("Stock Price")
    plt.legend()
    plt.show()

    # Plot portfolio performance
    plt.figure(figsize=(14, 8))
    plt.plot(portfolio_values, label="Portfolio Value", color="purple")
    plt.title(f"{symbol}: Portfolio Value Over Time")
    plt.xlabel("Time")
    plt.ylabel("Portfolio Value")
    plt.legend()
    plt.show()
'''
    return pd.DataFrame(trade_log)


In [79]:
# Function to fetch, prepare data, and filter by date range
def fetch_and_prepare_data(symbol, start_date=None, end_date=None):
    data, meta_data = ts.get_daily(symbol=symbol, outputsize="full")
    data = data.rename(columns={
        "1. open": "Open",
        "2. high": "High",
        "3. low": "Low",
        "4. close": "Close",
        "5. volume": "Volume"
    })
    data = data.sort_index()  # Ensure data is in chronological order
    # Add features
    data["sma_10"] = data["Close"].rolling(window=10).mean()
    data["sma_50"] = data["Close"].rolling(window=50).mean()
    data["rsi"] = calculate_rsi(data["Close"], window=14)
    data["bb_upper"] = data["sma_10"] + 2 * data["Close"].rolling(window=10).std()
    data["bb_lower"] = data["sma_10"] - 2 * data["Close"].rolling(window=10).std()
    data["target"] = ((data["Close"].shift(-5) - data["Close"]) / data["Close"] >= 0.03).astype(int)  # 3% gain in 5 days
    data = data.dropna()  # Drop rows with NaN values

    # Filter by date range if specified
    if start_date and end_date:
        data = data.loc[start_date:end_date]
    return data

# Example for testing a 1-month interval
symbol = "NVDA"
start_date = "2008-01-01"
end_date = "2022-01-31"

# Fetch and prepare data for the specified interval
data = fetch_and_prepare_data(symbol, start_date=start_date, end_date=end_date)

# Run backtest
trade_log_df = backtest_stock(data, symbol)

# Display the trade log
print(trade_log_df)


target
0    2359
1    1187
Name: count, dtype: int64
y_pred unique values: [0 1]
              Open         High          Low        Close        Volume  \
count  3546.000000  3546.000000  3546.000000  3546.000000  3.546000e+03   
mean    113.434696   115.267548   111.452089   113.436568  1.469570e+07   
std     156.345434   158.803671   153.632060   156.346117  1.040812e+07   
min       6.000000     6.380000     5.750000     5.900000  1.141128e+06   
25%      14.530000    14.750000    14.320000    14.532500  8.049146e+06   
50%      22.000000    22.322500    21.685000    22.000000  1.199758e+07   
75%     182.692500   184.807500   180.322500   183.025000  1.809295e+07   
max     834.140000   835.000000   814.010000   827.940000  1.153631e+08   

            sma_10       sma_50          rsi     bb_upper     bb_lower  \
count  3546.000000  3546.000000  3546.000000  3546.000000  3546.000000   
mean    113.187183   111.802376    53.805043   121.820571   104.553794   
std     155.683603   

# We see that this model yielded a 7 percent return over a 14 year time frame. With a win rate of 66.67 percent.

However, I want to make a model with a higher yield by making more trades in less time. If I was to run this code on a data of 1 year, it likely wouldn't generate buy/sell signals. This is because gradient boosting is not inherently time-aware. Each row of data is treated independently.