# Assignment: Stock Prediction NIFTY 50
Based on the analysis performed in this notebook, the assignment is to focus on building and evaluating models for predicting the High price of the NIFTY 50 index.

Specifically, you should concentrate on the following models and time windows:

Models:

KNN (K-Nearest Neighbors Regressor)
RNN (Simple Recurrent Neural Network)
GRU (Gated Recurrent Unit)
LSTM (Long Short-Term Memory)
Bidirectional LSTM
Time Windows (Input Days):

30 days
60 days
90 days
For the Deep Learning models (RNN, GRU, LSTM, Bidirectional LSTM), train them for 50 epochs.

The goal is to train these specific models for the 'High' column using the specified time windows and evaluate their performance using MAE and RMSE, comparing the results.



## Import libraries and load data

In [2]:
import numpy as np
import pandas as pd
from copy import deepcopy
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, SimpleRNN, LSTM, GRU, Bidirectional

# Load data
df = pd.read_csv('/content/data.csv')


## Create helper functions

In [3]:
def return_pairs(column, days):
    prices = list(column)
    X, y = [], []
    for i in range(len(prices) - days):
        X.append(prices[i:i+days])
        y.append(prices[i+days])
    return np.array(X), np.array(y)

# Neural network builders
def build_rnn(input_shape):
    model = Sequential([SimpleRNN(50, activation='tanh', input_shape=input_shape), Dense(1)])
    model.compile(optimizer='adam', loss='mse')
    return model

def build_lstm(input_shape):
    model = Sequential([LSTM(50, activation='tanh', input_shape=input_shape), Dense(1)])
    model.compile(optimizer='adam', loss='mse')
    return model

def build_gru(input_shape):
    model = Sequential([GRU(50, activation='tanh', input_shape=input_shape), Dense(1)])
    model.compile(optimizer='adam', loss='mse')
    return model

def build_bilstm(input_shape):
    model = Sequential([Bidirectional(LSTM(50, activation='tanh'), input_shape=input_shape), Dense(1)])
    model.compile(optimizer='adam', loss='mse')
    return model


## Prepare data (Target column is 'High')

In [4]:
target_col = 'High'
time_windows = [30, 60, 90]

high_data = {}
for days in time_windows:
    X, y = return_pairs(df[target_col], days)
    high_data[f"X_High_{days}"] = X
    high_data[f"y_High_{days}"] = y


## Define models

In [5]:
ml_model = ("KNN", KNeighborsRegressor())
dl_model_builders = {"RNN": build_rnn, "GRU": build_gru, "LSTM": build_lstm, "Bidirectional_LSTM": build_bilstm}


## Train models

In [6]:
trained_high_models = {}

for days in time_windows:
    X = high_data[f"X_High_{days}"]
    y = high_data[f"y_High_{days}"]

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

    # KNN
    knn = deepcopy(ml_model[1])
    knn.fit(X_train, y_train)
    y_train_pred, y_test_pred = knn.predict(X_train), knn.predict(X_test)
    trained_high_models[f"KNN_High_{days}"] = {'model': knn,
                                               'train_mae': mean_absolute_error(y_train, y_train_pred),
                                               'train_rmse': np.sqrt(mean_squared_error(y_train, y_train_pred)),
                                               'test_mae': mean_absolute_error(y_test, y_test_pred),
                                               'test_rmse': np.sqrt(mean_squared_error(y_test, y_test_pred))}

    # DL Models
    X_train_dl, X_test_dl = np.expand_dims(X_train, -1), np.expand_dims(X_test, -1)
    for name, builder in dl_model_builders.items():
        model_dl = builder((X_train.shape[1], 1))
        model_dl.fit(X_train_dl, y_train, epochs=50, batch_size=8, verbose=0)
        y_train_pred, y_test_pred = model_dl.predict(X_train_dl).flatten(), model_dl.predict(X_test_dl).flatten()
        trained_high_models[f"{name}_High_{days}"] = {'model': model_dl,
                                                      'train_mae': mean_absolute_error(y_train, y_train_pred),
                                                      'train_rmse': np.sqrt(mean_squared_error(y_train, y_train_pred)),
                                                      'test_mae': mean_absolute_error(y_test, y_test_pred),
                                                      'test_rmse': np.sqrt(mean_squared_error(y_test, y_test_pred))}


[1m177/177[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step
[1m177/177[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m177/177[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m177/177[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step 
[1m176/176[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step
[1m176/176[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
[1m176/176[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step
[1m20/20[0m [32m━━━━━━━━━━━━

## Comparision and evaluation

In [7]:
results_high_df = pd.DataFrame([{"Model": name, **metrics} for name, metrics in trained_high_models.items()])
results_high_df.sort_values(by='test_mae', ascending=True)

Unnamed: 0,Model,model,train_mae,train_rmse,test_mae,test_rmse
10,KNN_High_90,KNeighborsRegressor(),36.92914,58.769966,48.790144,74.204848
5,KNN_High_60,KNeighborsRegressor(),36.850899,58.333589,52.819089,82.674612
0,KNN_High_30,KNeighborsRegressor(),42.646429,68.669169,56.933037,89.679421
1,RNN_High_30,"<Sequential name=sequential, built=True>",6390.529515,8844.565414,5893.23049,8448.009992
6,RNN_High_60,"<Sequential name=sequential_4, built=True>",6418.710871,8877.674712,5963.325271,8380.258712
11,RNN_High_90,"<Sequential name=sequential_8, built=True>",6434.073696,8882.0126,6156.608613,8599.038758
2,GRU_High_30,"<Sequential name=sequential_1, built=True>",6929.283272,9345.664401,6410.650098,8931.117664
7,GRU_High_60,"<Sequential name=sequential_5, built=True>",6960.433434,9380.668503,6478.858361,8873.764212
4,Bidirectional_LSTM_High_30,"<Sequential name=sequential_3, built=True>",7065.829164,9448.90008,6546.643983,9031.121297
9,Bidirectional_LSTM_High_60,"<Sequential name=sequential_7, built=True>",7035.757016,9438.030763,6554.360588,8930.223338


## Key Findings from Evaluation

The following table summarizes the performance of each model for predicting the 'High' price of the NIFTY 50 index across different time windows. The models are evaluated based on Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) on both the training and test sets.

In [8]:
display(results_high_df.sort_values(by='test_mae', ascending=True))

Unnamed: 0,Model,model,train_mae,train_rmse,test_mae,test_rmse
10,KNN_High_90,KNeighborsRegressor(),36.92914,58.769966,48.790144,74.204848
5,KNN_High_60,KNeighborsRegressor(),36.850899,58.333589,52.819089,82.674612
0,KNN_High_30,KNeighborsRegressor(),42.646429,68.669169,56.933037,89.679421
1,RNN_High_30,"<Sequential name=sequential, built=True>",6390.529515,8844.565414,5893.23049,8448.009992
6,RNN_High_60,"<Sequential name=sequential_4, built=True>",6418.710871,8877.674712,5963.325271,8380.258712
11,RNN_High_90,"<Sequential name=sequential_8, built=True>",6434.073696,8882.0126,6156.608613,8599.038758
2,GRU_High_30,"<Sequential name=sequential_1, built=True>",6929.283272,9345.664401,6410.650098,8931.117664
7,GRU_High_60,"<Sequential name=sequential_5, built=True>",6960.433434,9380.668503,6478.858361,8873.764212
4,Bidirectional_LSTM_High_30,"<Sequential name=sequential_3, built=True>",7065.829164,9448.90008,6546.643983,9031.121297
9,Bidirectional_LSTM_High_60,"<Sequential name=sequential_7, built=True>",7035.757016,9438.030763,6554.360588,8930.223338


**Observations:**

*   The **KNN** model shows significantly lower MAE and RMSE on the test set compared to all the deep learning models (RNN, GRU, LSTM, Bidirectional LSTM) for all time windows.
*   Among the KNN models, the **90-day time window** provides the best performance with the lowest test MAE (48.79) and test RMSE (74.20).
*   The deep learning models, while showing some variation among themselves, generally exhibit much higher errors, suggesting they may not be well-tuned or suitable for this specific prediction task with the current configuration and dataset.