# Assignment: LSTM Model for Time Series Data

In this assignment, you will be working with time series data to implement and train an LSTM model using PyTorch.
The dataset contains acceleration intensity time series data along with labels indicating vascular or heart conditions.
Your task is to complete the code in the provided Python script by filling in the missing pieces. The goal is to understand
data preprocessing, model implementation, training, and evaluation using a deep learning framework.

In [None]:
import pandas as pd
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
import os

## 1. Data Preparation and Standardization
Complete the `load_time_series(subjectID)` function to standardize the input time series data.

In [None]:
def load_time_series(subjectID):
    filename = f'{subjectID}_90004_0_0.csv'
    filepath = os.path.join('/content/drive/MyDrive/UNC/Lab_BIG-S2/DataDownload/UKBdownload/Acceleration_intensity_time_series_field90004/file_120960_rows_1000files/', filename)
    series = pd.read_csv(filepath, usecols=[0])
    series = series.iloc[:, 0]

    # TODO: Implement standardization (subtract mean and divide by standard deviation)

    ##################################################################################

    series = series.fillna(0).astype(np.float32).to_numpy()
    return series

## 2. Dataset Class
In the `TimeSeriesDataset` class, fill in the missing code to create sequences from the time series data.

In [None]:
class TimeSeriesDataset(Dataset):
    def __init__(self, dataframe, sequence_length=945, num_sequences=128):
        self.dataframe = dataframe
        self.num_sequences = num_sequences
        self.sequence_length = sequence_length

    def __getitem__(self, idx):
        row = self.dataframe.iloc[idx]
        subjectID, gender = row['subjectID'], row['gender']
        data = load_time_series(subjectID)

        # TODO: Split data into sequences of length `sequence_length`

        #############################################################

        return torch.tensor(sequences, dtype=torch.float32), torch.tensor(gender, dtype=torch.long), idx

## 3. LSTM Model Training
Implement the training loop in the `train_model` function.

In [None]:
def train_model(model, train_loader, criterion, optimizer, num_epochs=150):
    for epoch in range(num_epochs):
        model.train()
        total_loss = 0
        correct_train = 0
        total_train = 0

        # TODO: Complete the training loop

        ##################################

        train_accuracy = 100 * correct_train / total_train
        print(f'Epoch {epoch+1}, Loss: {total_loss/len(train_loader)}, Train Accuracy: {train_accuracy}%')

## 4. Evaluation
Complete the `evaluate_model` function to calculate the average loss and accuracy.

In [None]:
def evaluate_model(model, test_loader, criterion):
    model.eval()
    total_loss = 0
    total_test = 0
    correct_test = 0

    # TODO: Complete the evaluation logic

    #####################################

    test_accuracy = 100 * correct_test / total_test
    print(f'Evaluate Loss: {total_loss / len(test_loader)}, Test Accuracy: {test_accuracy}%')
    return total_loss / len(test_loader), test_accuracy

## Additional Questions
- What impact does the number of sequences (`num_sequences`) have on the performance of the LSTM model?
- How does changing the learning rate affect the model's convergence during training?