#DATA PREPROCESSING

In [19]:
# Importing libraries

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

In [32]:
## Data loading

df = pd.read_csv('quantvision_financial_dataset_200.csv')
print(df.head())
print(df.info())
# Assuming your dataframe is named 'df' or 'X_train'
print(df.columns)

## One-Hot Encoding

columns_to_encode = ['asset_type','market_regime']
df_encoded = pd.get_dummies(df, columns=columns_to_encode, dtype=int)
print(df_encoded.head())

## Scaling Numerical features

columns_to_scale = ['lookback_days','technical_score','edge_density','slope_strength','candlestick_variance','pattern_symmetry']
scaler = StandardScaler()
df_encoded[columns_to_scale] = scaler.fit_transform(df_encoded[columns_to_scale])
print(df_encoded.head())

## Data Splitting

X = df_encoded.drop('future_trend', axis=1)
y = df_encoded['future_trend']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=9, stratify=y)

print(f"Training data sample: {X_train.shape[0]}")
print(f"Testing data sample: {X_test.shape[0]}")

   lookback_days asset_type market_regime  high_volatility  \
0             48     equity       bullish                0   
1             38      index       bullish                1   
2             24     equity       bullish                1   
3             52     equity       bullish                0   
4             17     equity       bullish                1   

   trend_continuation  technical_score  edge_density  slope_strength  \
0                   1            59.99         0.504           0.298   
1                   1            78.54         0.559           0.037   
2                   0            56.03         0.617           0.212   
3                   0            66.51         0.360           0.347   
4                   1            61.21         0.492           0.144   

   candlestick_variance  pattern_symmetry  future_trend  
0                 1.572             0.768             1  
1                 0.692             0.538             1  
2                 1.

# MODEL TRAINING & EVALUATION

LOGISTIC REGRESSION

In [21]:
# Importing libraries
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
import numpy as np

In [22]:
# Training logistic regression model
log_model = LogisticRegression(
    solver='liblinear',
    max_iter=100,
    class_weight='balanced',
    C=0.1,
    random_state=9
)

log_model.fit(X_train, y_train)
pred_log = log_model.predict(X_test)

In [23]:
# Defining function for model evaluation
def metrics(y_true,y_pred):
  return{
      "Accuracy":accuracy_score(y_true,y_pred),
      "Precision":precision_score(y_true, y_pred),
      "recall":recall_score(y_true,y_pred),
      "f1 score":f1_score(y_true,y_pred),
      "Confusion matrix":confusion_matrix(y_true,y_pred)
  }

In [24]:
# Evaluating logistic regression model
log_metrics = metrics(y_test,pred_log)

SIMPLE MLP

In [25]:
# Importing libraries

from sklearn.neural_network import MLPClassifier

In [26]:
# Training neural network
mlp = MLPClassifier(
    hidden_layer_sizes=(100, 50),
    activation='relu',
    solver='adam',
    max_iter = 100,
    random_state = 9
)

mlp.fit(X_train, y_train)
pred_mlp = mlp.predict(X_test)



In [27]:
# Evaluating neural network
mlp_metrics = metrics(y_test,pred_mlp)

#COMPARING BOTH MODELS

In [28]:
# Comparing Logistic regression model and Neural Network
comparision_table = pd.DataFrame(
    [log_metrics,mlp_metrics],
    index = ["Logistic Regression", "Neural Network"]
)
comparision_table

Unnamed: 0,Accuracy,Precision,recall,f1 score,Confusion matrix
Logistic Regression,0.75,1.0,0.72973,0.84375,"[[3, 0], [10, 27]]"
Neural Network,0.9,0.923077,0.972973,0.947368,"[[0, 3], [1, 36]]"


# ANALYSIS AND FINANCIAL INTERPRETATION

Q1) Why Logistic Regression performs reasonably good or bad?

ANS: The logistic regression model was very cautious and performed very well on safety (100% precision) but because of this we missed some good opportunities (73% precision).

Because it uses a linear boundary it couldn't perfectly separate the 'UP' days from the 'DOWN' days. It focused on catching all the 'DOWN' days correctly and in that process ended up catching 10 'UP' days as well.



Q2) Why Neural Network performs better or worse?

ANS:The neural network was very greedy. It blindly assumes the market will go up. It captured almost all the profit but failed to predict a single market crash.

Looking at the Confusion Matrix [0, 3] we can say that it failed to predict any of the 3 "DOWN" days (0 True Negatives).

The neural network only focuses on minimizing the loss and here it did so always predicting 'UP' because it realized that ignoring the down signals had very little impact on its overall accuracy score.This way is goes and settles down in a local minima easily instead of doing the hard work of finding the global minima.



Q3) The effect of volatility on predictions

ANS: High volatility usually creates noise and confuses the model.

The Logistic Regression model likely treated high volatility as a negative signal. It interpreted volatile data as risky causing it to stay out of the market and miss 10 ups.

But the neural network likely ignored volatility and focused mostly on trend continuation signals making it behave in a highly bullish way.


Q4) The role of trend continuation

ANS: Our dataset is heavily skewed towards 'UP' (37 vs 3). This implies that we have a strong up trend. The neural network likely assigned a massive weight to trend continuation feature and thus learned that if the trend is continuing just keep buying. This works good here and in general in a Bullish market but the model completely breaks if we give it a different dataset where the trend is completely opposite.

Q5) Situations where the model fails and why

ANS: The logistic regression model fails in a Bullish market because the model is very cautious and says sell even if the stocks look slightly risky. This was we miss out on profit.

The neural network fails when the market trends shifts because it keeps saying buy even when the market is going down because it has learned that saying up all the time works pretty well.