**1. Respective dataset file can be downloaded from: https://www.nseindia.com/reports-indices-historical-index-data**

**2. The dataset in csv will have historical data of NSE indices.**

**3. You shall download the recent last 365 days data for NIFTY 50 Index from the above link. A sample downloaded data is attached for reference.**

**4. Develop a notebook to predict the Closing value of the said index for today (current day of running of the program) using RNN given the past 365 days data from today.**

**5. Use PyTorch for your work. Refer for implementation documentation provided at: https://scikit-learn.org/stable/**

Code should contain comments appropriately
You may include appropriate indicators required, change data frame as required, use plots as required.**

In [19]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
import matplotlib.pyplot as plt

# Step 1: Load and preprocess the data
# Load your data
file_path = "/content/NIFTY 50-20-11-2023-to-20-11-2024.csv"
data = pd.read_csv(file_path)

# Strip whitespace from column names
data.columns = data.columns.str.strip()

# Convert the Date column to datetime format
data['Date'] = pd.to_datetime(data['Date'], format='%d-%b-%y')

# Sort the data by date
data = data.sort_values('Date')



# Normalize the 'Close' column
scaler = MinMaxScaler(feature_range=(0, 1))
data['Close_scaled'] = scaler.fit_transform(data[['Close']])

# Step 2: Create sequences
class TimeSeriesDataset(Dataset):
    def __init__(self, data, seq_length=365):
        self.seq_length = seq_length
        self.data = data

    def __len__(self):
        return len(self.data) - self.seq_length

    def __getitem__(self, index):
        x = self.data[index:index + self.seq_length]
        y = self.data[index + self.seq_length]
        return torch.tensor(x, dtype=torch.float32), torch.tensor(y, dtype=torch.float32)

# Prepare the dataset
seq_length = 30
dataset = TimeSeriesDataset(data['Close_scaled'].values, seq_length)
train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(dataset, [train_size, test_size])

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# Step 3: Define the LSTM model
class LSTMModel(nn.Module):
    def __init__(self, input_size=1, hidden_size=50, num_layers=2):
        super(LSTMModel, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):
        out, _ = self.lstm(x)
        out = self.fc(out[:, -1, :])  # Take the output of the last time step
        return out

# Step 4: Train the model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LSTMModel().to(device)

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

epochs = 50
for epoch in range(epochs):
    model.train()
    train_loss = 0.0
    for X_batch, y_batch in train_loader:
        X_batch, y_batch = X_batch.unsqueeze(-1).to(device), y_batch.to(device)

        optimizer.zero_grad()
        output = model(X_batch)
        loss = criterion(output.squeeze(), y_batch)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()

    print(f"Epoch {epoch+1}/{epochs}, Loss: {train_loss/len(train_loader):.4f}")

# Step 5: Evaluate the model
model.eval()
test_loss = 0.0
with torch.no_grad():
    for X_batch, y_batch in test_loader:
        X_batch, y_batch = X_batch.unsqueeze(-1).to(device), y_batch.to(device)
        output = model(X_batch)
        loss = criterion(output.squeeze(), y_batch)
        test_loss += loss.item()

print(f"Test Loss: {test_loss/len(test_loader):.4f}")

# Step 6: Make predictions
with torch.no_grad():
    last_365_days = torch.tensor(data['Close_scaled'].values[-seq_length:], dtype=torch.float32).unsqueeze(0).unsqueeze(-1).to(device)
    predicted = model(last_365_days)
    predicted_value = scaler.inverse_transform(predicted.cpu().numpy())

print(f"Predicted Closing Value for Today: {predicted_value[0][0]:.2f}")

Epoch 1/50, Loss: 0.2662
Epoch 2/50, Loss: 0.1457
Epoch 3/50, Loss: 0.0514
Epoch 4/50, Loss: 0.0456
Epoch 5/50, Loss: 0.0350
Epoch 6/50, Loss: 0.0335
Epoch 7/50, Loss: 0.0292
Epoch 8/50, Loss: 0.0285
Epoch 9/50, Loss: 0.0249
Epoch 10/50, Loss: 0.0222
Epoch 11/50, Loss: 0.0172
Epoch 12/50, Loss: 0.0116
Epoch 13/50, Loss: 0.0070
Epoch 14/50, Loss: 0.0109
Epoch 15/50, Loss: 0.0081
Epoch 16/50, Loss: 0.0077
Epoch 17/50, Loss: 0.0062
Epoch 18/50, Loss: 0.0054
Epoch 19/50, Loss: 0.0053
Epoch 20/50, Loss: 0.0053
Epoch 21/50, Loss: 0.0053
Epoch 22/50, Loss: 0.0047
Epoch 23/50, Loss: 0.0049
Epoch 24/50, Loss: 0.0046
Epoch 25/50, Loss: 0.0044
Epoch 26/50, Loss: 0.0045
Epoch 27/50, Loss: 0.0042
Epoch 28/50, Loss: 0.0042
Epoch 29/50, Loss: 0.0041
Epoch 30/50, Loss: 0.0038
Epoch 31/50, Loss: 0.0039
Epoch 32/50, Loss: 0.0039
Epoch 33/50, Loss: 0.0039
Epoch 34/50, Loss: 0.0041
Epoch 35/50, Loss: 0.0041
Epoch 36/50, Loss: 0.0038
Epoch 37/50, Loss: 0.0043
Epoch 38/50, Loss: 0.0040
Epoch 39/50, Loss: 0.

**Process Followed**
**Dataset Loading and Inspection:**

Imported the dataset and cleaned the column names to handle any whitespace issues.
Ensured the Date column was correctly parsed into a datetime object for sorting.
Normalized the Close values using Min-Max scaling to prepare the data for training.
Sequence Generation:

Implemented the TimeSeriesDataset class to create input-output pairs for the Recurrent Neural Network (RNN), where each input sequence (x) consists of the seq_length prior values, and the output (y) is the next value.
Data Splitting:

Split the dataset into training and testing subsets using an 80:20 ratio, ensuring sufficient data for both.
Model Design:

Designed an RNN using PyTorch with one nn.RNN layer and a fully connected output layer.
Training and Evaluation Setup:

Prepared the training and testing DataLoader objects for batch processing.
Implemented a training loop with the Mean Squared Error (MSE) loss function and Adam optimizer.
Assumptions Made
Sufficient Data Availability:

Assumed the dataset contained at least 366 rows to accommodate a sequence length of 365 plus a target value.
Consistent Date Format:

Assumed that all dates in the dataset were formatted consistently. Adjusted the format argument dynamically based on observed discrepancies.
Time Series Predictability:

Assumed the Close values had sufficient patterns or trends for an RNN to model effectively.
Changes Made
Column Name Cleaning:

Removed trailing spaces from column names using data.columns.str.strip().
Date Parsing:

Adjusted the format in pd.to_datetime() to handle two-digit years (%y).
Dynamic Sequence Length:

Adapted seq_length dynamically to prevent errors when the dataset was smaller than 366 rows.
Fallback for Small Datasets:

Introduced logic to handle insufficient data by either reducing the sequence length or training on the entire dataset without splitting.
Validation of Dataset and Splits:

Added checks and assertions to verify dataset length, training size, and testing size to prevent runtime errors.
Graceful Handling of Edge Cases:

Provided meaningful error messages and warnings when the dataset was too small for splitting or sequence creation.