# Day Trading Model - Local Notebook

This notebook trains and evaluates the intraday classification model using Yahoo Finance 1-minute data. Run the notebook locally to experiment with parameters, inspect the dataset, and simulate the model's live signal output.

## Environment setup

1. Create a virtual environment and activate it.
2. Install dependencies with `pip install -r requirements.txt`.
3. (Optional) Export broker environment variables if you plan to hit the Alpaca paper trading API:
   ```bash
   export BROKER_API_KEY=your-key
   export BROKER_API_SECRET=your-secret
   export BROKER_BASE_URL=https://paper-api.alpaca.markets
   ```

In [None]:
import os
import sys
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Ensure the project package is on the path
sys.path.append(str(Path.cwd()))

from trading_models.config import AppConfig
from trading_models.models.day_trading.config import DayTradingConfig
from trading_models.models.day_trading.data import load_or_download, describe_data
from trading_models.models.day_trading.features import engineer_features
from trading_models.models.day_trading.pipeline import DayTradingPipeline
from trading_models.models.day_trading.realtime import DayTradingStreamer
from trading_models.utils import save_json

In [None]:
app_cfg = AppConfig()
app_cfg.ensure_directories()
model_cfg = DayTradingConfig(symbol="AAPL", lookback_days=10, epochs=12)
model_cfg

## Download intraday dataset

We fetch 1-minute candles from Yahoo Finance using the [`yfinance`](https://github.com/ranaroussi/yfinance) library. Data is cached inside `data/day_trading` so subsequent runs are faster unless you set `force=True`.

In [None]:
raw_df = load_or_download(app_cfg, model_cfg, force=False)
describe_data(raw_df)

In [None]:
raw_df.tail()

## Feature engineering

We derive common technical indicators (simple/exponential moving averages, volatility, momentum, RSI) and create a binary target that indicates whether the next bar's return is greater than `threshold` (default 0.05%).

In [None]:
features_df, feature_cols = engineer_features(raw_df, model_cfg)
features_df.head()

In [None]:
len(features_df), len(feature_cols)

## Train/validation split

The split is chronological to respect the time-series nature of intraday candles.

In [None]:
split_idx = int(len(features_df) * (1 - model_cfg.validation_size))
split_idx = max(1, min(len(features_df) - 1, split_idx))
train_df = features_df.iloc[:split_idx]
val_df = features_df.iloc[split_idx:]

X_train = train_df[feature_cols].values
y_train = train_df["target"].values
X_val = val_df[feature_cols].values
y_val = val_df["target"].values

split_idx, X_train.shape, X_val.shape

## Train the SGDClassifier model

We fit an `SGDClassifier` with `log_loss` to allow `partial_fit` training epochs. The scaler and estimator are persisted to `artifacts/day_trading` for reuse by the web application and the real-time streamer.

In [None]:
pipeline = DayTradingPipeline(app_cfg, model_cfg)
pipeline.model

In [None]:
history = pipeline.model.fit(X_train, y_train, X_val, y_val)
val_metrics = pipeline.model.evaluate(X_val, y_val)
history, val_metrics

### Plot epoch metrics

In [None]:
plt.style.use('dark_background')
fig, ax = plt.subplots(figsize=(10, 4))
for key in ['accuracy', 'precision', 'recall', 'f1']:
    ax.plot([entry['epoch'] for entry in history], [entry[key] for entry in history], label=key)
ax.set_xlabel('Epoch')
ax.set_ylabel('Score')
ax.set_title('Validation metrics per epoch')
ax.legend()
plt.tight_layout()
plt.show()

### Confusion matrix on the validation window

In [None]:
from sklearn.metrics import ConfusionMatrixDisplay

fig, ax = plt.subplots(figsize=(4, 4))
y_pred_val = pipeline.model.predict(X_val)
ConfusionMatrixDisplay.from_predictions(y_val, y_pred_val, ax=ax, cmap='Blues')
ax.set_title('Validation confusion matrix')
plt.show()

## Persist model artifacts and metadata

In [None]:
pipeline.model.save()
pipeline.storage_dir, list(pipeline.storage_dir.iterdir())

## Save metrics via the pipeline helper

In [None]:
pipeline_metadata = {
    "evaluation": val_metrics,
    "history": history,
    "metadata": {
        "config": model_cfg.__dict__,
        "features": feature_cols,
    },
}
save_json(pipeline.storage_dir / model_cfg.metrics_filename, pipeline_metadata)
pipeline_metadata

## Simulate real-time predictions

After training, we can instantiate the `DayTradingStreamer` to pull the most recent minute bars, compute features, and stream probability scores. In a production setup you would call `submit_market_order` from `trading_models.broker.alpaca_client` when the signal is above/below your trade thresholds.

In [None]:
streamer = DayTradingStreamer(pipeline)
stream_df = streamer.latest_points()
stream_df.tail()

### Plot latest prediction probabilities

In [None]:
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(stream_df['timestamp'], stream_df['probability'], label='Long probability', color='#facc15')
ax.set_ylabel('Probability')
ax.set_xlabel('Timestamp')
ax.set_title('Streaming probability of positive return')
ax.tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.show()

## Next steps

* Review and tune the feature windows, RSI period, and decision threshold to match your risk tolerance.
* Integrate portfolio sizing, stop-loss and take-profit rules before sending orders to a live broker.
* Schedule retraining using the CLI (`python -m trading_models.cli train day_trading`) or deploy the Flask app to Heroku using the included `Procfile`.