<a href="https://colab.research.google.com/github/Hami-611/21Days_AI-ML_Challenge/blob/main/Day_13_Next_Gen_Forecasting_Applying_Deep_Learning_to_Time_Series_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Assignment Documentation

Based on the analysis performed in this notebook, the assignment is to focus on building and evaluating models for predicting the **High** price of the NIFTY 50 index.

Specifically, you should concentrate on the following models and time windows:

*   **Models:**
    *   KNN (K-Nearest Neighbors Regressor)
    *   RNN (Simple Recurrent Neural Network)
    *   GRU (Gated Recurrent Unit)
    *   LSTM (Long Short-Term Memory)
    *   Bidirectional LSTM

*   **Time Windows (Input Days):**
    *   30 days
    *   60 days
    *   90 days

For the Deep Learning models (RNN, GRU, LSTM, Bidirectional LSTM), train them for **50 epochs**.

The goal is to train these specific models for the 'High' column using the specified time windows and evaluate their performance using MAE and RMSE, comparing the results.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, GRU, LSTM, Bidirectional, Dense, Dropout

In [2]:
# Load the dataset
df = pd.read_csv('/content/data.csv')

# Display the first few rows
display(df.head())

Unnamed: 0,Date,Open,High,Low,Close
0,2000-01-03,1482.15,1592.9,1482.15,1592.2
1,2000-01-04,1594.4,1641.95,1594.4,1638.7
2,2000-01-05,1634.55,1635.5,1555.05,1595.8
3,2000-01-06,1595.8,1639.0,1595.8,1617.6
4,2000-01-07,1616.6,1628.25,1597.2,1613.3


## Data preprocessing



In [3]:
# 1. Select the 'High' column
high_prices = df['High']

# 2. Initialize a MinMaxScaler
scaler = MinMaxScaler()

# 3. Reshape the high_prices data
high_prices_reshaped = high_prices.values.reshape(-1, 1)

# 4. Fit and transform the data
scaled_high_prices = scaler.fit_transform(high_prices_reshaped)

# 5. Define a function to create time windowed sequences
def create_sequences(data, window_size):
    X, y = [], []
    for i in range(len(data) - window_size):
        X.append(data[i:(i + window_size), 0])
        y.append(data[i + window_size, 0])
    return np.array(X), np.array(y)

# 6. Apply the function for each time window
window_sizes = [30, 60, 90]
X_30, y_30 = create_sequences(scaled_high_prices, window_sizes[0])
X_60, y_60 = create_sequences(scaled_high_prices, window_sizes[1])
X_90, y_90 = create_sequences(scaled_high_prices, window_sizes[2])

# 7. Reshape input data for deep learning models
X_30_reshaped = X_30.reshape((X_30.shape[0], X_30.shape[1], 1))
X_60_reshaped = X_60.reshape((X_60.shape[0], X_60.shape[1], 1))
X_90_reshaped = X_90.reshape((X_90.shape[0], X_90.shape[1], 1))

print(f"Shape of X_30_reshaped: {X_30_reshaped.shape}")
print(f"Shape of y_30: {y_30.shape}")
print(f"Shape of X_60_reshaped: {X_60_reshaped.shape}")
print(f"Shape of y_60: {y_60.shape}")
print(f"Shape of X_90_reshaped: {X_90_reshaped.shape}")
print(f"Shape of y_90: {y_90.shape}")

Shape of X_30_reshaped: (6285, 30, 1)
Shape of y_30: (6285,)
Shape of X_60_reshaped: (6255, 60, 1)
Shape of y_60: (6255,)
Shape of X_90_reshaped: (6225, 90, 1)
Shape of y_90: (6225,)


## KNN model



In [4]:
def train_and_evaluate_knn(X, y, scaler, window_size):
    """Trains and evaluates a KNN Regressor for a given time window."""
    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Initialize a KNeighborsRegressor
    knn_model = KNeighborsRegressor(n_neighbors=5)

    # Train the KNN model
    knn_model.fit(X_train, y_train)

    # Make predictions on the testing data
    y_pred_scaled = knn_model.predict(X_test)

    # Inverse transform the predictions and actual test values
    y_test_original = scaler.inverse_transform(y_test.reshape(-1, 1))
    y_pred_original = scaler.inverse_transform(y_pred_scaled.reshape(-1, 1))

    # Calculate MAE and RMSE
    mae = mean_absolute_error(y_test_original, y_pred_original)
    rmse = np.sqrt(mean_squared_error(y_test_original, y_pred_original))

    # Print the results
    print(f"KNN Regressor with window size {window_size}:")
    print(f"  MAE: {mae:.4f}")
    print(f"  RMSE: {rmse:.4f}")

# Call the function for each time window size
train_and_evaluate_knn(X_30, y_30, scaler, 30)
train_and_evaluate_knn(X_60, y_60, scaler, 60)
train_and_evaluate_knn(X_90, y_90, scaler, 90)

KNN Regressor with window size 30:
  MAE: 63.0793
  RMSE: 100.9408
KNN Regressor with window size 60:
  MAE: 57.7650
  RMSE: 93.6251
KNN Regressor with window size 90:
  MAE: 50.4311
  RMSE: 78.6860


## RNN model



In [5]:
def train_and_evaluate_simple_rnn(input_shape, X_train, y_train, X_test, y_test, scaler, window_size):
    """Builds, trains, and evaluates a Simple RNN model."""
    # Build the Simple RNN model
    model = Sequential()
    model.add(SimpleRNN(units=50, return_sequences=True, input_shape=input_shape))
    model.add(Dropout(0.2))
    model.add(SimpleRNN(units=50))
    model.add(Dropout(0.2))
    model.add(Dense(units=1))

    # Compile the model
    model.compile(optimizer='adam', loss='mean_squared_error')

    # Train the model
    history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2, verbose=0)

    # Evaluate the model
    loss = model.evaluate(X_test, y_test, verbose=0)
    print(f"Simple RNN with window size {window_size} - Test Loss (MSE): {loss:.4f}")

    # Make predictions
    y_pred_scaled = model.predict(X_test)

    # Inverse transform the predictions and actual test values
    y_test_original = scaler.inverse_transform(y_test.reshape(-1, 1))
    y_pred_original = scaler.inverse_transform(y_pred_scaled)

    # Calculate MAE and RMSE
    mae = mean_absolute_error(y_test_original, y_pred_original)
    rmse = np.sqrt(mean_squared_error(y_test_original, y_pred_original))

    # Print the results
    print(f"Simple RNN with window size {window_size}:")
    print(f"  MAE: {mae:.4f}")
    print(f"  RMSE: {rmse:.4f}")

# Split data for each window size
X_30_train, X_30_test, y_30_train, y_30_test = train_test_split(X_30_reshaped, y_30, test_size=0.2, random_state=42)
X_60_train, X_60_test, y_60_train, y_60_test = train_test_split(X_60_reshaped, y_60, test_size=0.2, random_state=42)
X_90_train, X_90_test, y_90_train, y_90_test = train_test_split(X_90_reshaped, y_90, test_size=0.2, random_state=42)


# Call the function for each time window size
train_and_evaluate_simple_rnn((X_30_reshaped.shape[1], 1), X_30_train, y_30_train, X_30_test, y_30_test, scaler, 30)
train_and_evaluate_simple_rnn((X_60_reshaped.shape[1], 1), X_60_train, y_60_train, X_60_test, y_60_test, scaler, 60)
train_and_evaluate_simple_rnn((X_90_reshaped.shape[1], 1), X_90_train, y_90_train, X_90_test, y_90_test, scaler, 90)

  super().__init__(**kwargs)


Simple RNN with window size 30 - Test Loss (MSE): 0.0002
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step
Simple RNN with window size 30:
  MAE: 204.3943
  RMSE: 315.2109


  super().__init__(**kwargs)


Simple RNN with window size 60 - Test Loss (MSE): 0.0000
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step
Simple RNN with window size 60:
  MAE: 134.7820
  RMSE: 176.2921


  super().__init__(**kwargs)


Simple RNN with window size 90 - Test Loss (MSE): 0.0000
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step
Simple RNN with window size 90:
  MAE: 81.0846
  RMSE: 130.7446


## GRU model



In [6]:
def train_and_evaluate_gru(input_shape, X_train, y_train, X_test, y_test, scaler, window_size):
    """Builds, trains, and evaluates a GRU model."""
    # Build the GRU model
    model = Sequential()
    model.add(GRU(units=50, return_sequences=True, input_shape=input_shape))
    model.add(Dropout(0.2))
    model.add(GRU(units=50))
    model.add(Dropout(0.2))
    model.add(Dense(units=1))

    # Compile the model
    model.compile(optimizer='adam', loss='mean_squared_error')

    # Train the model
    history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2, verbose=0)

    # Evaluate the model
    loss = model.evaluate(X_test, y_test, verbose=0)
    print(f"GRU with window size {window_size} - Test Loss (MSE): {loss:.4f}")

    # Make predictions
    y_pred_scaled = model.predict(X_test)

    # Inverse transform the predictions and actual test values
    y_test_original = scaler.inverse_transform(y_test.reshape(-1, 1))
    y_pred_original = scaler.inverse_transform(y_pred_scaled)

    # Calculate MAE and RMSE
    mae = mean_absolute_error(y_test_original, y_pred_original)
    rmse = np.sqrt(mean_squared_error(y_test_original, y_pred_original))

    # Print the results
    print(f"GRU with window size {window_size}:")
    print(f"  MAE: {mae:.4f}")
    print(f"  RMSE: {rmse:.4f}")

# Call the function for each time window size
train_and_evaluate_gru((X_30_reshaped.shape[1], 1), X_30_train, y_30_train, X_30_test, y_30_test, scaler, 30)
train_and_evaluate_gru((X_60_reshaped.shape[1], 1), X_60_train, y_60_train, X_60_test, y_60_test, scaler, 60)
train_and_evaluate_gru((X_90_reshaped.shape[1], 1), X_90_train, y_90_train, X_90_test, y_90_test, scaler, 90)

  super().__init__(**kwargs)


GRU with window size 30 - Test Loss (MSE): 0.0002
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 11ms/step
GRU with window size 30:
  MAE: 231.5595
  RMSE: 328.4980


  super().__init__(**kwargs)


GRU with window size 60 - Test Loss (MSE): 0.0001
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 16ms/step
GRU with window size 60:
  MAE: 177.1460
  RMSE: 278.5381


  super().__init__(**kwargs)


GRU with window size 90 - Test Loss (MSE): 0.0001
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 19ms/step
GRU with window size 90:
  MAE: 153.3246
  RMSE: 208.2133


## LSTM model


In [7]:
def train_and_evaluate_lstm(input_shape, X_train, y_train, X_test, y_test, scaler, window_size):
    """Builds, trains, and evaluates an LSTM model."""
    # Build the LSTM model
    model = Sequential()
    model.add(LSTM(units=50, return_sequences=True, input_shape=input_shape))
    model.add(Dropout(0.2))
    model.add(LSTM(units=50))
    model.add(Dropout(0.2))
    model.add(Dense(units=1))

    # Compile the model
    model.compile(optimizer='adam', loss='mean_squared_error')

    # Train the model
    history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2, verbose=0)

    # Evaluate the model
    loss = model.evaluate(X_test, y_test, verbose=0)
    print(f"LSTM with window size {window_size} - Test Loss (MSE): {loss:.4f}")

    # Make predictions
    y_pred_scaled = model.predict(X_test)

    # Inverse transform the predictions and actual test values
    y_test_original = scaler.inverse_transform(y_test.reshape(-1, 1))
    y_pred_original = scaler.inverse_transform(y_pred_scaled)

    # Calculate MAE and RMSE
    mae = mean_absolute_error(y_test_original, y_pred_original)
    rmse = np.sqrt(mean_squared_error(y_test_original, y_pred_original))

    # Print the results
    print(f"LSTM with window size {window_size}:")
    print(f"  MAE: {mae:.4f}")
    print(f"  RMSE: {rmse:.4f}")

# Call the function for each time window size
train_and_evaluate_lstm((X_30_reshaped.shape[1], 1), X_30_train, y_30_train, X_30_test, y_30_test, scaler, 30)
train_and_evaluate_lstm((X_60_reshaped.shape[1], 1), X_60_train, y_60_train, X_60_test, y_60_test, scaler, 60)
train_and_evaluate_lstm((X_90_reshaped.shape[1], 1), X_90_train, y_90_train, X_90_test, y_90_test, scaler, 90)

  super().__init__(**kwargs)


LSTM with window size 30 - Test Loss (MSE): 0.0004
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step
LSTM with window size 30:
  MAE: 353.2603
  RMSE: 509.9431


  super().__init__(**kwargs)


LSTM with window size 60 - Test Loss (MSE): 0.0001
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step
LSTM with window size 60:
  MAE: 210.4903
  RMSE: 310.6218


  super().__init__(**kwargs)


LSTM with window size 90 - Test Loss (MSE): 0.0001
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 18ms/step
LSTM with window size 90:
  MAE: 151.7005
  RMSE: 207.5123


## Bidirectional LSTM model



In [8]:
def train_and_evaluate_bidirectional_lstm(input_shape, X_train, y_train, X_test, y_test, scaler, window_size):
    """Builds, trains, and evaluates a Bidirectional LSTM model."""
    # Build the Bidirectional LSTM model
    model = Sequential()
    model.add(Bidirectional(LSTM(units=50, return_sequences=True), input_shape=input_shape))
    model.add(Dropout(0.2))
    model.add(Bidirectional(LSTM(units=50)))
    model.add(Dropout(0.2))
    model.add(Dense(units=1))

    # Compile the model
    model.compile(optimizer='adam', loss='mean_squared_error')

    # Train the model
    history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2, verbose=0)

    # Evaluate the model
    loss = model.evaluate(X_test, y_test, verbose=0)
    print(f"Bidirectional LSTM with window size {window_size} - Test Loss (MSE): {loss:.4f}")

    # Make predictions
    y_pred_scaled = model.predict(X_test)

    # Inverse transform the predictions and actual test values
    y_test_original = scaler.inverse_transform(y_test.reshape(-1, 1))
    y_pred_original = scaler.inverse_transform(y_pred_scaled)

    # Calculate MAE and RMSE
    mae = mean_absolute_error(y_test_original, y_pred_original)
    rmse = np.sqrt(mean_squared_error(y_test_original, y_pred_original))

    # Print the results
    print(f"Bidirectional LSTM with window size {window_size}:")
    print(f"  MAE: {mae:.4f}")
    print(f"  RMSE: {rmse:.4f}")

# Call the function for each time window size
train_and_evaluate_bidirectional_lstm((X_30_reshaped.shape[1], 1), X_30_train, y_30_train, X_30_test, y_30_test, scaler, 30)
train_and_evaluate_bidirectional_lstm((X_60_reshaped.shape[1], 1), X_60_train, y_60_train, X_60_test, y_60_test, scaler, 60)
train_and_evaluate_bidirectional_lstm((X_90_reshaped.shape[1], 1), X_90_train, y_90_train, X_90_test, y_90_test, scaler, 90)

  super().__init__(**kwargs)


Bidirectional LSTM with window size 30 - Test Loss (MSE): 0.0001
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 18ms/step
Bidirectional LSTM with window size 30:
  MAE: 133.4445
  RMSE: 201.3737


  super().__init__(**kwargs)


Bidirectional LSTM with window size 60 - Test Loss (MSE): 0.0001
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 24ms/step
Bidirectional LSTM with window size 60:
  MAE: 191.2091
  RMSE: 284.7557


  super().__init__(**kwargs)


Bidirectional LSTM with window size 90 - Test Loss (MSE): 0.0001
[1m39/39[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 35ms/step
Bidirectional LSTM with window size 90:
  MAE: 177.5884
  RMSE: 234.2708


## Summary and Analysis

In this notebook, we have performed the following steps:

1. **Data Loading and Preprocessing**: Loaded the NIFTY 50 index data and preprocessed the 'High' price column by scaling it and creating time windowed sequences for 30, 60, and 90 days.
2. **Model Training and Evaluation**: Trained and evaluated five different models (KNN, Simple RNN, GRU, LSTM, and Bidirectional LSTM) for predicting the 'High' price using the three specified time windows. The models were evaluated based on Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).

Here is a summary of the results obtained:

### KNN Regressor

| Window Size | MAE     | RMSE     |
|-------------|---------|----------|
| 30          | 63.0793 | 100.9408 |
| 60          | 57.7650 | 93.6251  |
| 90          | 50.4311 | 78.6860  |

*Analysis:* The KNN model's performance improved as the window size increased, with the 90-day window yielding the lowest MAE and RMSE.

### Simple RNN

| Window Size | MAE     | RMSE     |
|-------------|---------|----------|
| 30          | 204.3943 | 315.2109 |
| 60          | 134.7820 | 176.2921 |
| 90          | 81.0846 | 130.7446 |

*Analysis:* Similar to KNN, the Simple RNN showed better performance with larger window sizes. The MAE and RMSE are significantly higher than the KNN model.

### GRU

| Window Size | MAE     | RMSE     |
|-------------|---------|----------|
| 30          | 231.5595 | 328.4980 |
| 60          | 177.1460 | 278.5381 |
| 90          | 153.3246 | 208.2133 |

*Analysis:* The GRU model's performance also improved with increasing window size. The MAE and RMSE are higher than both KNN and Simple RNN.

### LSTM

| Window Size | MAE     | RMSE     |
|-------------|---------|----------|
| 30          | 353.2603 | 509.9431 |
| 60          | 210.4903 | 310.6218 |
| 90          | 151.7005 | 207.5123 |

*Analysis:* The LSTM model, like the others, performed better with larger window sizes. The MAE and RMSE are the highest among all models for the 30 and 60 day windows, but closer to the GRU for the 90 day window.

### Bidirectional LSTM

| Window Size | MAE     | RMSE     |
|-------------|---------|----------|
| 30          | 133.4445 | 201.3737 |
| 60          | 191.2091 | 284.7557 |
| 90          | 177.5884 | 234.2708 |

*Analysis:* The Bidirectional LSTM shows varied performance across window sizes. The 30-day window had the lowest MAE and RMSE, while the 60 and 90-day windows performed worse than the Simple RNN and GRU for the same window sizes.

### Overall Comparison

Based on the MAE and RMSE values, the **KNN Regressor with a 90-day window** appears to be the best performing model among those tested for predicting the 'High' price of the NIFTY 50 index, achieving the lowest MAE (50.4311) and RMSE (78.6860).

Among the deep learning models, the Simple RNN generally performed better than GRU and LSTM, with the 90-day window again yielding the best results. The Bidirectional LSTM showed promising results for the 30-day window, but its performance degraded for larger window sizes.

It's important to note that these results are based on the specific model configurations and hyperparameters used. Further tuning and experimentation could potentially improve the performance of all models.