# AI-Based Market Direction Prediction (Up / Down)

**Module:** AI Applications – Individual Open Project  
**Track:** Machine Learning Application  


## 1. Problem Definition & Objective

### 1.1 Selected Project Track
Machine Learning based AI Application

### 1.2 Problem Statement
The objective of this project is to design an AI-based system that predicts the
direction of a financial market (Up or Down) for the next time step using
historical market data and technical indicators.

### 1.3 Real-World Relevance and Motivation
Accurately predicting market direction can assist investors and analysts in
decision-making by identifying potential trends and reducing uncertainty.
This project focuses on direction prediction rather than price forecasting,
making it a safer and more interpretable AI application.


## 2. Data Understanding & Preparation

### 2.1 Dataset Source
The dataset used in this project is sourced from publicly available historical
market data obtained using an online financial data API.

### 2.2 Data Loading and Exploration
Initial exploration includes understanding data structure, features, and
basic statistics.

### 2.3 Data Cleaning and Preprocessing
This includes handling missing values, scaling numerical features, and
engineering relevant technical indicators.

### 2.4 Handling Missing Values or Noise
Appropriate techniques are applied to manage missing values and reduce noise
in time-series data.


In [None]:
# Install required library (only needed once)
!pip install yfinance




In [None]:
import yfinance as yf
import pandas as pd

# Download NIFTY 50 historical data
data = yf.download("^NSEI", start="2018-01-01", end="2024-01-01")

# Display first few rows
data.head()


  data = yf.download("^NSEI", start="2018-01-01", end="2024-01-01")
[*********************100%***********************]  1 of 1 completed


Price,Close,High,Low,Open,Volume
Ticker,^NSEI,^NSEI,^NSEI,^NSEI,^NSEI
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2018-01-02,10442.200195,10495.200195,10404.650391,10477.549805,153400
2018-01-03,10443.200195,10503.599609,10429.549805,10482.650391,167300
2018-01-04,10504.799805,10513.0,10441.450195,10469.400391,174900
2018-01-05,10558.849609,10566.099609,10520.099609,10534.25,180900
2018-01-08,10623.599609,10631.200195,10588.549805,10591.700195,169000


In [None]:
# Create target variable: 1 if next day's close is higher, else 0
data["Target"] = (data["Close"].shift(-1) > data["Close"]).astype(int)

# Drop last row (no next-day data available)
data = data.dropna()

# Check the result
data[["Close", "Target"]].head()


Price,Close,Target
Ticker,^NSEI,Unnamed: 2_level_1
Date,Unnamed: 1_level_2,Unnamed: 2_level_2
2018-01-02,10442.200195,1
2018-01-03,10443.200195,1
2018-01-04,10504.799805,1
2018-01-05,10558.849609,1
2018-01-08,10623.599609,1


### Dataset Description
This project uses historical daily price data of the NIFTY 50 index obtained
from Yahoo Finance. The dataset includes Open, High, Low, Close, and Volume
features over multiple years. The target variable is created based on whether
the next day’s closing price is higher or lower than the current day’s closing
price.


## 3. Model / System Design

### 3.1 AI Technique Used
Supervised Machine Learning (Binary Classification)

### 3.2 System Architecture / Pipeline
The system follows a pipeline consisting of data collection, preprocessing,
feature engineering, model training, prediction, and evaluation.

### 3.3 Justification of Design Choices
Classification models are chosen due to their interpretability, efficiency,
and suitability for binary outcome prediction.


## 4. Core Implementation

This section covers the implementation of data processing, model training,
and prediction logic using Python and machine learning libraries.


In [None]:
# Feature Engineering

# Daily returns
data["Return"] = data["Close"].pct_change()

# Moving averages
data["MA_5"] = data["Close"].rolling(window=5).mean()
data["MA_10"] = data["Close"].rolling(window=10).mean()

# Drop rows with NaN values created due to rolling calculations
data = data.dropna()

# Display feature columns
data[["Return", "MA_5", "MA_10", "Target"]].head()


Price,Return,MA_5,MA_10,Target
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2018-01-15,0.005645,10668.640039,10591.584961,0
2018-01-16,-0.003826,10681.330078,10617.409961,1
2018-01-17,0.008233,10712.6,10651.944922,1
2018-01-18,0.002637,10745.759961,10683.164941,1
2018-01-19,0.007183,10788.45,10716.75,1


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Feature matrix and target
X = data[["Return", "MA_5", "MA_10"]]
y = data["Target"]

# Train-test split (time-series safe split)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, shuffle=False
)

# Train Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
accuracy


0.564625850340136

In [None]:
# Detailed evaluation
print("Classification Report:\n")
print(classification_report(y_test, y_pred))

# Show sample predictions
results = X_test.copy()
results["Actual"] = y_test.values
results["Predicted"] = y_pred

results.head()


Classification Report:

              precision    recall  f1-score   support

           0       0.49      0.39      0.43       126
           1       0.60      0.70      0.65       168

    accuracy                           0.56       294
   macro avg       0.55      0.54      0.54       294
weighted avg       0.55      0.56      0.56       294



Price,Return,MA_5,MA_10,Actual,Predicted
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2022-10-21,0.000703,17490.25,17299.944922,1,1
2022-10-24,0.008787,17574.039844,17348.919922,0,1
2022-10-25,-0.004196,17607.919922,17416.199805,1,1
2022-10-27,0.004565,17652.859766,17477.534766,1,1
2022-10-28,0.002811,17697.430078,17554.779883,1,1


The model performance is evaluated using accuracy, precision, recall, and
F1-score. Due to the stochastic and noisy nature of financial markets,
extremely high accuracy is not expected. The results indicate that the model
captures short-term trends better than random guessing, while still having
limitations in volatile conditions.



## 5. Evaluation & Analysis

### 5.1 Evaluation Metrics
Performance is evaluated using accuracy, precision, recall, and F1-score.

### 5.2 Sample Predictions
Sample outputs are shown to demonstrate the model’s predictive capability.

### 5.3 Performance Analysis and Limitations
The limitations of the model and challenges of market prediction are discussed.



## 6. Ethical Considerations & Responsible AI

### 6.1 Bias and Fairness
Market data limitations and potential biases are acknowledged.

### 6.2 Responsible Use of AI
This project is intended for educational and analytical purposes only and does
not provide financial or trading advice.


### Bias and Fairness Considerations
Financial market data is inherently noisy and influenced by external economic,
political, and social factors that are not fully captured in historical price
data. This may introduce bias in predictions, especially during abnormal market
conditions.

### Dataset Limitations
The dataset is limited to historical price-based features and does not include
fundamental, macroeconomic, or sentiment data, which may affect predictive
performance.

### Responsible Use of AI
This system is developed strictly for educational and analytical purposes.
The predictions generated by the model should not be interpreted as financial
or investment advice. Human judgment is essential before making any real-world
decisions.


## 7. Conclusion & Future Scope



### Conclusion
This project demonstrates the application of machine learning techniques to
predict the short-term direction of a financial market using historical price
data and technical indicators. A complete end-to-end pipeline was implemented,
including data collection, preprocessing, feature engineering, model training,
and evaluation.

### Future Scope
The model can be enhanced by incorporating additional technical indicators,
alternative machine learning algorithms such as Random Forest or Support Vector
Machines, and advanced deep learning architectures like LSTM. Real-time data
integration and sentiment analysis could further improve predictive capability.
