# 2. Model Prototyping Notebook

This notebook is for interactively developing and testing the machine learning models before they are finalized in the `ml_pipeline` scripts.

## 2.1 Setup

Import libraries and load the feature-rich dataset.

In [None]:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from pathlib import Path
import matplotlib.pyplot as plt

# Define project root and data paths
project_root = Path('.').resolve().parent
processed_data_path = project_root / 'data' / 'processed' / 'feature_rich_data.csv'

In [None]:
if processed_data_path.exists():
    df = pd.read_csv(processed_data_path, index_col='date', parse_dates=True)
    print("Feature-rich data loaded successfully.")
    df.head()
else:
    print(f"Processed data file not found at: {processed_data_path}
          f"Please run the ml_pipeline scripts first.")

## 2.2 Time-Series Prediction Model (LSTM-Transformer)

Here, we will build a simplified version of the prediction model to test the concept.

### 2.2.1 Data Preparation

In [None]:
# For this prototype, we'll predict the next day's closing price based on the last 60 days of data.
sequence_length = 60
target_column = 'close'

# Scale the data
scaler = MinMaxScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns, index=df.index)

# Create sequences
sequences = []
labels = []
for i in range(len(df_scaled) - sequence_length):
    sequences.append(df_scaled.iloc[i:i+sequence_length].values)
    labels.append(df_scaled.iloc[i+sequence_length][target_column])

X = np.array(sequences)
y = np.array(labels)

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False) # Time-series data should not be shuffled

print(f'X_train shape: {X_train.shape}')
print(f'y_train shape: {y_train.shape}')

### 2.2.2 Model Definition (Simple LSTM)

In [None]:
class SimpleLSTM(nn.Module):
    def __init__(self, input_size, hidden_size=50, num_layers=2):
        super(SimpleLSTM, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=0.2)
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):
        h_lstm, _ = self.lstm(x)
        output = self.fc(h_lstm[:, -1, :]) # Get the output from the last time step
        return output

input_dim = X_train.shape[2] # Number of features
model = SimpleLSTM(input_dim)

### 2.2.3 Training (Prototype Loop)

In [None]:
# This is a simplified training loop for demonstration.
# The full script in `ml_pipeline` is more robust.

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32).view(-1, 1)

epochs = 10
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    outputs = model(X_train_t)
    loss = criterion(outputs, y_train_t)
    loss.backward()
    optimizer.step()
    print(f'Epoch {epoch+1}/{epochs}, Loss: {loss.item():.4f}')

## 2.3 Vision Model (YOLOv8)

Prototyping for the YOLO model is typically done via the command line or a dedicated training script (`train_vision.py`). This section provides an overview of the process.

### 2.3.1 Process Overview

1.  **Data Collection**: Gather screenshots of charts displaying the patterns you want to detect (e.g., bullish flags, head and shoulders).
2.  **Annotation**: Use a tool like `labelImg` or `CVAT` to draw bounding boxes around the patterns in your images. This generates `.txt` files with class and coordinate information.
3.  **Dataset Configuration**: Create a `.yaml` file that points to your training and validation image directories and lists the class names.
4.  **Training**: Run the training script (`ml_pipeline/train_vision.py`) which uses the `ultralytics` library to train the model.
5.  **Inference**: Once trained, the model (`best.pt`) can be loaded and used to predict patterns on new chart images.

### 2.3.2 Example: Running Inference (after training)

In [None]:
from ultralytics import YOLO

# Load your custom-trained model
# Make sure you have copied the best.pt file from your training run
model_path = project_root / 'models' / 'vision' / 'best.pt'

if model_path.exists():
    vision_model = YOLO(model_path)
    print("YOLO vision model loaded successfully.")
    
    # Example of running prediction on a new image
    # image_path = project_root / 'data' / 'new_chart.png' 
    # if image_path.exists():
    #     results = vision_model.predict(image_path)
    #     results[0].show() # Display the image with bounding boxes
else:
    print(f'Vision model not found at {model_path}. Please train the model first.')