# Stock Price Prediction Using XGBoost

In this notebook, we use historical stock data to predict future price movements using machine learning.
We will compute technical indicators, create target variables, and build XGBoost models for regression and classification.


In [None]:
!pip install -U yfinance pandas numpy scikit-learn xgboost

## 1. Import Required Libraries

We import essential Python libraries for data handling, technical analysis, model building, and evaluation.


In [1]:
import pandas as pd
import yfinance as yf
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, accuracy_score
import xgboost as xgb

## 2. Load Historical Stock Data

We load 1 year of historical data for NVIDIA (NVDA) using `yfinance`.
Unnecessary columns like Dividends and Stock Splits are removed.


In [3]:
ticker = 'NVDA'
data = yf.Ticker(ticker).history(period='1y')
data = data.drop(columns=['Dividends', 'Stock Splits'], errors='ignore')
data.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2025-05-23 00:00:00-04:00,130.0,132.679993,129.160004,131.289993,198821300
2025-05-27 00:00:00-04:00,134.149994,135.660004,133.309998,135.5,192953600
2025-05-28 00:00:00-04:00,136.029999,137.25,134.789993,134.809998,304021100
2025-05-29 00:00:00-04:00,142.25,143.490005,137.910004,139.190002,370615200
2025-05-30 00:00:00-04:00,138.720001,139.619995,132.919998,135.130005,332443400


## 3. Feature Engineering: Technical Indicators

We create three common indicators used in technical analysis:

- **SMA (Simple Moving Average)**: This indicator helps smooth out price data by calculating the average closing price over a specific number of days. It shows the general direction of the trend by filtering out short-term fluctuations.

- **RSI (Relative Strength Index)**: RSI measures the speed and change of price movements to identify overbought or oversold conditions. It compares the average gains and losses over a set period to give a value between 0 and 100.

- **MACD (Moving Average Convergence Divergence)**: MACD is used to spot changes in the strength, direction, momentum, and duration of a trend. It is calculated by subtracting the longer-term average (26-day EMA) from the shorter-term average (12-day EMA).


In [4]:
# SMA
data['SMA_20'] = data['Close'].rolling(window=20).mean()

# RSI
delta = data['Close'].diff()
gain = delta.where(delta > 0, 0)
loss = -delta.where(delta < 0, 0)
gain_avg = gain.rolling(window=14).mean()
loss_avg = loss.rolling(window=14).mean()
rs = gain_avg / loss_avg
data['RSI_14'] = 100 - (100 / (1 + rs))

# MACD and Histogram
ema_12 = data['Close'].ewm(span=12, adjust=False).mean()
ema_26 = data['Close'].ewm(span=26, adjust=False).mean()
data['MACD_12_26_9'] = ema_12 - ema_26
data['MACD_S_12_26_9'] = data['MACD_12_26_9'].ewm(span=9, adjust=False).mean()
data['MACD_H_12_26_9'] = data['MACD_12_26_9'] - data['MACD_S_12_26_9']

## 4. Create Target Labels

We define three targets:
- **`Target`**: Next day closing price (for regression).
- **`Target_direction`**: Binary classification for up/down movement.
- **`Target_3d_up`**: Whether the price increases within 3 days.


In [5]:
data['Target'] = data['Close'].shift(-1)
data['Target_direction'] = (data['Target'] > data['Close']).astype(int)
data['future_3d_return'] = (data['Close'].shift(-3) - data['Close']) / data['Close']
data['Target_3d_up'] = (data['future_3d_return'] > 0).astype(int)
data.dropna(inplace=True)

## 5. Split Data into Features and Labels

We select relevant indicators as features and split data into training and testing sets (80/20).


In [6]:
features = ['SMA_20', 'RSI_14', 'MACD_12_26_9', 'MACD_H_12_26_9']
X = data[features]

split_index = int(len(data) * 0.8)
X_train, X_test = X.iloc[:split_index], X.iloc[split_index:]
y_price_train, y_price_test = data['Target'].iloc[:split_index], data['Target'].iloc[split_index:]
y_dir_train, y_dir_test = data['Target_direction'].iloc[:split_index], data['Target_direction'].iloc[split_index:]
y_3d_train, y_3d_test = data['Target_3d_up'].iloc[:split_index], data['Target_3d_up'].iloc[split_index:]


## 6. Train XGBoost Models

We train:
- A regression model for next-day price prediction.
- Two classification models: one for 1-day and one for 3-day price direction.


In [10]:
model_price = xgb.XGBRegressor(objective='reg:squarederror', random_state=42)
model_price.fit(X_train, y_price_train)

model_dir = xgb.XGBClassifier(eval_metric='logloss', random_state=42)
model_dir.fit(X_train, y_dir_train)

model_3d = xgb.XGBClassifier(eval_metric='logloss', random_state=42)
model_3d.fit(X_train, y_3d_train)


## 7. Evaluate Model Performance

We use RMSE (Root Mean Squared Error) to measure the average error in price predictions. 

Lower RMSE values indicate better accuracy in predicting future stock prices.



In [11]:
# Predictions
pred_price = model_price.predict(X_test)
rmse_price = np.sqrt(mean_squared_error(y_price_test, pred_price))

pred_dir = model_dir.predict(X_test)
acc_dir = accuracy_score(y_dir_test, pred_dir)

pred_3d = model_3d.predict(X_test)
acc_3d = accuracy_score(y_3d_test, pred_3d)


## 9. Print Prediction Results

This cell displays the model's prediction results, including the predicted next-day price and movement directions.  
It also shows performance metrics to help understand how well the models are performing on unseen data.



In [16]:
latest_features = data[features].iloc[-1:]
predicted_next_day_price = model_price.predict(latest_features)[0]
predicted_direction = model_dir.predict(latest_features)[0]
predicted_3d_movement = model_3d.predict(latest_features)[0]

predict_for_date = data.index[-1]

print(f"Predicting next-day price for: {predict_for_date.strftime('%d/%m/%y')}")
print(f"\nPredicted next-day closing price: Rs {predicted_next_day_price:.2f}")
print("Predicted next-day direction:", "Up" if predicted_direction == 1 else "Down")
print("Predicted 3-day movement:", "Up" if predicted_3d_movement == 1 else "Down")
print(f"\nNext day price RMSE: {rmse_price:.4f}")
print(f"Next-Day Direction Accuracy: {acc_dir:.4f}")
print(f"3-Day Movement Accuracy: {acc_3d:.4f}")



Predicting next-day price for: 27/05/25

Predicted next-day closing price: Rs 138.44
Predicted next-day direction: Up
Predicted 3-day movement: Up

Next day price RMSE: 10.0898
Next-Day Direction Accuracy: 0.4565
3-Day Movement Accuracy: 0.5000
