In [5]:
import pandas as pd
import numpy as np

In [6]:
# Loading the dataset
df = pd.read_csv("/content/quantvision_financial_dataset_200.csv")
print(df.shape)
df.head()

(200, 11)


Unnamed: 0,lookback_days,asset_type,market_regime,high_volatility,trend_continuation,technical_score,edge_density,slope_strength,candlestick_variance,pattern_symmetry,future_trend
0,48,equity,bullish,0,1,59.99,0.504,0.298,1.572,0.768,1
1,38,index,bullish,1,1,78.54,0.559,0.037,0.692,0.538,1
2,24,equity,bullish,1,0,56.03,0.617,0.212,1.419,0.301,1
3,52,equity,bullish,0,0,66.51,0.36,0.347,0.699,0.498,1
4,17,equity,bullish,1,1,61.21,0.492,0.144,2.52,0.828,1


In [11]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Separate input features (X) and target variable (y)
# X contains all independent variables
# y contains the future price movement (0 = Down, 1 = Up)

X = df.drop("future_trend", axis=1)
y = df["future_trend"]

# Convert categorical variables into numerical form
X = pd.get_dummies(X, drop_first=True)

# Standardize numerical features so that all features are on a similar scale
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42
)


In [16]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Train Logistic Regression model
log_model = LogisticRegression(max_iter=500)
log_model.fit(X_train, y_train)
# Predictions
y_pred_log = log_model.predict(X_test)

# Evaluation
print("Logistic Regression Results")
print("Accuracy:", accuracy_score(y_test, y_pred_log))
print("Precision:", precision_score(y_test, y_pred_log))
print("Recall:", recall_score(y_test, y_pred_log))
print("F1-score:", f1_score(y_test, y_pred_log))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_log))


Logistic Regression Results
Accuracy: 0.925
Precision: 0.9487179487179487
Recall: 0.9736842105263158
F1-score: 0.961038961038961
Confusion Matrix:
 [[ 0  2]
 [ 1 37]]


In [17]:
from sklearn.neural_network import MLPClassifier
# Train Neural Network (MLP)
mlp_model = MLPClassifier(
    hidden_layer_sizes=(64, 32),  # 2 dense layers
    activation='relu',
    max_iter=500,
    random_state=42
)

mlp_model.fit(X_train, y_train)

# Predictions
y_pred_mlp = mlp_model.predict(X_test)

# Evaluation
print("Neural Network (MLP) Results")
print("Accuracy:", accuracy_score(y_test, y_pred_mlp))
print("Precision:", precision_score(y_test, y_pred_mlp))
print("Recall:", recall_score(y_test, y_pred_mlp))
print("F1-score:", f1_score(y_test, y_pred_mlp))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_mlp))

Neural Network (MLP) Results
Accuracy: 0.925
Precision: 0.9487179487179487
Recall: 0.9736842105263158
F1-score: 0.961038961038961
Confusion Matrix:
 [[ 0  2]
 [ 1 37]]



## **1. Performance of Logistic Regression**

Logistic Regression performs reasonably well on this dataset due to the nature of the input features.

* Features such as **trend_continuation**, **technical_score**, and **slope_strength** have a **direct relationship** with future price movement.
* These indicators clearly capture market direction and momentum.
* Logistic Regression is effective when the decision boundary between classes can be approximated using a **linear relationship**.

As a result, the model achieves **high accuracy and recall**, making it a strong baseline model for this problem.



## **2. Performance of the Neural Network**

Neural Networks are capable of learning complex and non-linear relationships. However, in this case:

* The dataset is relatively **small**.
* The features are already **well-engineered technical indicators**.
* The relationship between features and the target variable is mostly **linear**.

Because of these reasons, the Neural Network learns the **same decision boundary** as Logistic Regression. Consequently, both models produce **identical predictions and evaluation scores**, indicating that additional model complexity does not improve performance for this dataset.



## **3. Effect of Volatility on Predictions**

The **high_volatility** feature represents periods of increased market uncertainty.

* During high volatility, price movements become noisy and less predictable.
* Technical indicators may generate false or misleading signals.
* Both models show increased misclassification during such periods.

This demonstrates that volatility negatively impacts prediction reliability for both linear and non-linear models.



## **4. Role of Trend Continuation**

**Trend_continuation** plays a significant role in predicting future price movement.

* Financial markets often exhibit momentum, where existing trends persist.
* When a trend is continuing, both models correctly predict upward price movement in most cases.
* This feature strongly contributes to the **high recall** observed in the results.

Trend-following behavior is therefore a key driver of model performance.


## **5. Situations Where the Models Fail**

The models tend to fail in the following scenarios:

* Sudden market regime changes (e.g., bullish to bearish)
* Highly volatile or sideways markets
* Unexpected news or external economic events

Additionally, the dataset is **imbalanced**, with more instances of upward price movement than downward movement. This causes both models to favor predicting the majority class (“Price Up”), leading to high accuracy but weaker performance in predicting rare downward movements.



## **6. Overall Interpretation**

The identical performance of Logistic Regression and the Neural Network indicates that the dataset is **largely linearly separable**. The engineered technical features already capture most of the predictive information, making a simple linear model as effective as a more complex neural network for this task.

