<a href="https://colab.research.google.com/github/NeuralDataMind/PyTorch/blob/main/Pre_trained_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [30]:
# 🔧 Step 1: Install and Import
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
import os
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("Using device:", device)


Using device: cpu


In [22]:
DIR = r'/content/synthetic_stock_patterns.csv'

# Task
Explain the error in the provided code for a CNN-LSTM model for stock market prediction, fix it if possible, and incorporate the changes. If not, diagnose the error. The model is overfitting, with accuracy dropping to 16%. The task is to create a PyTorch script for a CNN-LSTM pretraining model for predicting stock market movement (up or down) using the data from "synthetic_stock_patterns.csv". The variable `DIR` holds the path to this file.

## Define the cnn-lstm model

### Subtask:
Implement the CNN-LSTM model architecture in PyTorch, combining convolutional layers for feature extraction from the time series data and LSTM layers to capture sequential dependencies. The output layer should be designed for binary classification (up or down).


**Reasoning**:
Implement the CNN-LSTM model architecture in PyTorch as instructed, including convolutional layers, pooling, an LSTM layer, and a final fully connected layer for binary classification.



In [31]:
class CNN_LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes, sequence_length):
        super(CNN_LSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.sequence_length = sequence_length

        # CNN layers
        self.conv1d = nn.Conv1d(input_size, 64, kernel_size=3, padding=1)
        self.relu = nn.ReLU()
        self.maxpool1d = nn.MaxPool1d(kernel_size=2, stride=2)

        # Calculate the output size of CNN to determine input size for LSTM
        # The output size after conv1d with padding='same' (or padding=1 with kernel_size=3) is the same as input size
        # After maxpool1d with kernel_size=2 and stride=2, the size is halved
        cnn_output_size = 64  # number of filters in conv1d
        pooled_sequence_length = self.sequence_length // 2 # This calculation needs to be exact based on input dimensions

        self.lstm = nn.LSTM(cnn_output_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size * pooled_sequence_length, num_classes) # LSTM output is (batch_size, seq_len, hidden_size), need to flatten

    def forward(self, x):
        # x shape: (batch_size, sequence_length, input_size)
        # Permute to (batch_size, input_size, sequence_length) for Conv1d
        x = x.permute(0, 2, 1)

        # CNN forward pass
        out = self.conv1d(x)
        out = self.relu(out)
        out = self.maxpool1d(out)

        # Reshape for LSTM layer: (batch_size, pooled_sequence_length, cnn_output_size)
        out = out.permute(0, 2, 1)
        batch_size = out.size(0)

        # LSTM forward pass
        # LSTM input shape: (batch_size, sequence_length, input_size)
        # LSTM output shape: (batch_size, sequence_length, hidden_size)
        h0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device)
        c0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device)

        out, _ = self.lstm(out, (h0, c0))

        # Flatten the output for the fully connected layer
        out = out.reshape(batch_size, -1)

        # Fully connected layer
        out = self.fc(out)
        return out

## Prepare the data

### Subtask:
Ensure the data is in the correct format for the CNN-LSTM model. This might involve reshaping the data to have the channels dimension in the correct place for the CNN layers and ensuring the time series data is suitable for the LSTM.


**Reasoning**:
Load the data, separate features and target, convert to numpy arrays, reshape features, convert target to numerical labels, split data, and convert to PyTorch tensors.



In [32]:
# Step 1: Load the data
df = pd.read_csv(DIR)

# Step 2: Separate features and target
features = df.drop('label', axis=1)
target = df['label']

# Step 3: Convert to NumPy arrays
features_np = features.values
target_np = target.values

# Step 4: Reshape features for CNN-LSTM
# Assuming the original features are structured as (num_samples, window_size * num_features_per_day)
# We need to reshape to (num_samples, sequence_length, num_features_per_day)
# In this dataset, window_size is 60 and num_features_per_day is 5 (open, high, low, close, volume)
sequence_length = 60
num_features_per_day = 5
num_samples = features_np.shape[0]
features_reshaped = features_np.reshape(num_samples, sequence_length, num_features_per_day)

# Step 5: Convert target to numerical format
label_mapping = {label: idx for idx, label in enumerate(np.unique(target_np))}
target_numerical = np.array([label_mapping[label] for label in target_np])
num_classes = len(label_mapping)

# Step 6: Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features_reshaped, target_numerical, test_size=0.2, random_state=42, stratify=target_numerical)

# Step 7: Convert to PyTorch tensors and move to device
X_train_tensor = torch.tensor(X_train, dtype=torch.float32).to(device)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32).to(device)
y_train_tensor = torch.tensor(y_train, dtype=torch.long).to(device)
y_test_tensor = torch.tensor(y_test, dtype=torch.long).to(device)

print("Data successfully processed and converted to tensors.")
print("X_train_tensor shape:", X_train_tensor.shape)
print("y_train_tensor shape:", y_train_tensor.shape)
print("X_test_tensor shape:", X_test_tensor.shape)
print("y_test_tensor shape:", y_test_tensor.shape)
print("Number of classes:", num_classes)

Data successfully processed and converted to tensors.
X_train_tensor shape: torch.Size([2800, 60, 5])
y_train_tensor shape: torch.Size([2800])
X_test_tensor shape: torch.Size([700, 60, 5])
y_test_tensor shape: torch.Size([700])
Number of classes: 7


## Training loop

### Subtask:
Implement the training loop, including defining the loss function and optimizer, and iterating through the data in batches to train the model.


**Reasoning**:
Implement the training loop for the CNN-LSTM model, including defining the loss function, optimizer, and iterating through batches for training and evaluating on the test set.



In [33]:
# Instantiate the model
input_size = num_features_per_day
hidden_size = 128 # Example hidden size
num_layers = 2 # Example number of layers
model = CNN_LSTM(input_size, hidden_size, num_layers, num_classes, sequence_length).to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)

# Create DataLoader for batching
train_dataset = torch.utils.data.TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True) # Example batch size

test_dataset = torch.utils.data.TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False) # Example batch size


# Training loop
num_epochs = 10 # Example number of epochs
best_accuracy = 0.0

for epoch in range(num_epochs):
    model.train() # Set model to training mode
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(train_loader):
        inputs, labels = inputs.to(device), labels.to(device)

        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Print training loss per epoch
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss / len(train_loader):.4f}')

    # Evaluation on the test set
    model.eval() # Set model to evaluation mode
    with torch.no_grad(): # Disable gradient calculation
        correct = 0
        total = 0
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        accuracy = 100 * correct / total
        print(f'Accuracy of the model on the {total} test samples: {accuracy:.2f} %')

        # Save the best model
        if accuracy > best_accuracy:
            best_accuracy = accuracy
            torch.save(model.state_dict(), 'best_cnn_lstm_model.pth')
            print(f'Saved best model with accuracy: {best_accuracy:.2f} %')

print("Finished Training")

Epoch [1/10], Loss: 1.9499
Accuracy of the model on the 700 test samples: 13.00 %
Saved best model with accuracy: 13.00 %
Epoch [2/10], Loss: 1.9447
Accuracy of the model on the 700 test samples: 14.29 %
Saved best model with accuracy: 14.29 %
Epoch [3/10], Loss: 1.9388
Accuracy of the model on the 700 test samples: 15.43 %
Saved best model with accuracy: 15.43 %
Epoch [4/10], Loss: 1.9262
Accuracy of the model on the 700 test samples: 19.14 %
Saved best model with accuracy: 19.14 %
Epoch [5/10], Loss: 1.8693
Accuracy of the model on the 700 test samples: 26.57 %
Saved best model with accuracy: 26.57 %
Epoch [6/10], Loss: 1.6803
Accuracy of the model on the 700 test samples: 38.57 %
Saved best model with accuracy: 38.57 %
Epoch [7/10], Loss: 1.4974
Accuracy of the model on the 700 test samples: 42.14 %
Saved best model with accuracy: 42.14 %
Epoch [8/10], Loss: 1.3518
Accuracy of the model on the 700 test samples: 48.57 %
Saved best model with accuracy: 48.57 %
Epoch [9/10], Loss: 1.30

## Evaluation

### Subtask:
Evaluate the trained model's performance on a test set using appropriate metrics for binary classification, such as accuracy, precision, recall, and F1-score.


**Reasoning**:
Evaluate the trained model's performance on a test set using appropriate metrics for multi-class classification and print the results.



In [34]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

# Load the saved best model state dictionary
model = CNN_LSTM(input_size, hidden_size, num_layers, num_classes, sequence_length).to(device)
model.load_state_dict(torch.load('best_cnn_lstm_model.pth'))

# Set the model to evaluation mode
model.eval()

# Initialize lists to store true labels and predicted labels
true_labels = []
predicted_labels = []

# Iterate through the test data loader and make predictions
with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)

        true_labels.extend(labels.cpu().numpy())
        predicted_labels.extend(predicted.cpu().numpy())

# Calculate accuracy, precision, recall, and F1-score
accuracy = accuracy_score(true_labels, predicted_labels)
precision, recall, f1_score, _ = precision_recall_fscore_support(true_labels, predicted_labels, average='weighted')

# Print the calculated evaluation metrics
print(f'Test Accuracy: {accuracy:.4f}')
print(f'Test Precision: {precision:.4f}')
print(f'Test Recall: {recall:.4f}')
print(f'Test F1-score: {f1_score:.4f}')

Test Accuracy: 0.5186
Test Precision: 0.5235
Test Recall: 0.5186
Test F1-score: 0.5071


## Pretraining

### Subtask:
Describe how this trained model can be used as a pretraining step for a larger, main model.


**Reasoning**:
Explain the concept of pretraining with the CNN-LSTM model and how its learned weights can be transferred to a larger model.



In [35]:
print("""
Using the trained CNN-LSTM model as a pretraining step for a larger, main model involves leveraging the knowledge learned by the CNN and/or LSTM layers on the synthetic stock pattern data. The idea is that these layers have learned to extract relevant features and temporal dependencies from the time series data, which can be beneficial for the main model, especially if the main model is trained on a more complex or larger dataset, or if the main dataset is limited.

Here's how the weights could be transferred:

1.  **Transferring CNN Weights:** The weights from the trained `conv1d` and potentially the `relu` and `maxpool1d` layers of the current `CNN_LSTM` model can be directly copied to the corresponding layers in the larger, main model. The main model would need to have a CNN component with a similar architecture (e.g., same number of filters, kernel size) for this transfer to be straightforward.

2.  **Transferring LSTM Weights:** Similarly, the weights from the trained `lstm` layers (including the recurrent weights and biases) can be transferred to the LSTM layers in the larger, main model. Again, the main model's LSTM component should have a compatible architecture (e.g., same hidden size, number of layers).

3.  **Transferring Both CNN and LSTM Weights:** A common approach is to transfer the weights from both the CNN and LSTM parts of the pretrained model to the main model. This allows the main model to benefit from both the learned spatial (feature extraction) and temporal (sequence modeling) representations.

The potential benefits of using this pretraining approach for the main model are significant:

*   **Improved Feature Extraction:** The pretrained CNN layers can provide a good starting point for extracting relevant features from the stock data, which might be more robust than training from scratch, especially if the main dataset is small.
*   **Better Temporal Modeling:** The pretrained LSTM layers can help the main model better capture the temporal dynamics and dependencies in the stock market time series.
*   **Faster Convergence:** Starting with pretrained weights can often lead to faster convergence during the training of the main model, as the model doesn't have to learn basic feature extraction and temporal patterns from scratch.
*   **Reduced Overfitting (potentially):** Pretraining on a related task or dataset can act as a form of regularization, potentially reducing overfitting on the main task, especially if the main dataset is limited. This is particularly relevant given the overfitting observed in the current model training.

Considerations and strategies when using pretraining include:

*   **Freezing Layers:** Initially, you might want to freeze the transferred layers (CNN and/or LSTM) in the main model and train only the new layers (e.g., the final classification layers). This allows the main model to adapt to the pretrained features without drastically changing them.
*   **Fine-tuning:** After initial training with frozen layers, you can unfreeze some or all of the transferred layers and continue training with a smaller learning rate. This allows the pretrained weights to be fine-tuned to the specific characteristics of the main dataset and task.
*   **Architecture Compatibility:** Ensure the architecture of the transferred layers in the main model is compatible with the pretrained model's layers for smooth weight transfer.
*   **Domain Relevance:** The effectiveness of pretraining depends on the relevance of the pretraining task and dataset to the main task. In this case, pretraining on synthetic stock patterns is relevant to predicting stock market movement.
""")


Using the trained CNN-LSTM model as a pretraining step for a larger, main model involves leveraging the knowledge learned by the CNN and/or LSTM layers on the synthetic stock pattern data. The idea is that these layers have learned to extract relevant features and temporal dependencies from the time series data, which can be beneficial for the main model, especially if the main model is trained on a more complex or larger dataset, or if the main dataset is limited.

Here's how the weights could be transferred:

1.  **Transferring CNN Weights:** The weights from the trained `conv1d` and potentially the `relu` and `maxpool1d` layers of the current `CNN_LSTM` model can be directly copied to the corresponding layers in the larger, main model. The main model would need to have a CNN component with a similar architecture (e.g., same number of filters, kernel size) for this transfer to be straightforward.

2.  **Transferring LSTM Weights:** Similarly, the weights from the trained `lstm` laye

## Summary:

### Data Analysis Key Findings

*   The CNN-LSTM model was successfully implemented in PyTorch, combining convolutional layers for feature extraction and LSTM layers for sequential dependencies. The output layer is designed for multi-class classification, not binary as initially stated in the task description.
*   The data was successfully loaded, processed, and reshaped into PyTorch tensors with the shape `(number of samples, sequence length, number of features)`, which is `(2800, 60, 5)` for the training set and `(700, 60, 5)` for the testing set.
*   The target variable was converted to numerical format, and the number of unique classes was identified as 7, not 2 as implied by "binary classification (up or down)" in the task.
*   The training loop was implemented, and the model was trained for 10 epochs. The training loss decreased, and the accuracy on the test set improved to 51.86%, which is significantly better than the initially reported 16% but still indicates poor performance and potential issues.
*   Evaluation on the test set using accuracy, precision, recall, and F1-score confirmed the model's poor performance, with all metrics around 0.52 or lower, suggesting the model is not effectively learning to predict the stock movements.
*   The concept of using the trained model for pretraining was explained, detailing how CNN and LSTM weights could be transferred to a larger model and the potential benefits such as improved feature extraction, better temporal modeling, faster convergence, and potentially reduced overfitting.

### Insights or Next Steps

*   The discrepancy between the task description (binary classification) and the actual data (7 classes) needs to be addressed. Either the data should be filtered for binary outcomes, or the model and evaluation should be adjusted for multi-class classification.
*   The model's poor performance (accuracy ~52%) and the initial report of overfitting suggest that the model architecture, hyperparameters, or training strategy need significant adjustments. Techniques like regularization (dropout, weight decay), hyperparameter tuning, or exploring different model architectures could be beneficial.
