# Sequence Trading System - Google Colab Quickstart

This notebook provides a complete setup and workflow for running the Sequence trading system on Google Colab.

**Features:**
- CNN-LSTM-Attention hybrid models for FX prediction
- Intrinsic time representations (directional-change bars)
- GDELT sentiment analysis integration
- Reinforcement learning (A3C) for trading policies
- Fast NPY data format (30-50x faster than CSV)

---

## üöÄ Step 1: Clone Repository & Install Dependencies

In [None]:
# Clone the Sequence repository
!git clone https://github.com/YOUR_USERNAME/Sequence.git
%cd Sequence

# Install dependencies
!pip install -q -r requirements.txt

# Install TimesFM foundation model (optional, for ensemble evaluation)
!pip install -q -e ./models/timesFM

print("\n‚úÖ Installation complete!")

## ‚öôÔ∏è Step 2: Setup Python Path (CRITICAL)

**‚ö†Ô∏è IMPORTANT**: Run this cell in every new Colab session before importing any Sequence modules!

In [None]:
import sys
from pathlib import Path

# Add Sequence root and run/ to Python path
ROOT = Path.cwd()
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))
if str(ROOT / "run") not in sys.path:
    sys.path.insert(0, str(ROOT / "run"))

print("‚úÖ Python path configured:")
print(f"   - ROOT: {ROOT}")
print(f"   - run/: {ROOT / 'run'}")

# Check NumPy compatibility (common Colab issue)
print("\n" + "="*60)
print("Checking NumPy compatibility...")
print("="*60)

try:
    import numpy as np
    # Try importing a compiled extension to test compatibility
    from numpy.random import RandomState
    print("‚úÖ NumPy compatibility check passed")
except ValueError as e:
    if "numpy.dtype size changed" in str(e):
        print("‚ö†Ô∏è  NumPy binary incompatibility detected!")
        print("   This is a common Colab issue when packages were compiled")
        print("   against a different NumPy version.\n")
        print("üîß SOLUTION:")
        print("   1. Run in a new cell: !pip install --upgrade numpy pandas scikit-learn scipy --quiet")
        print("   2. Restart runtime: Runtime ‚Üí Restart runtime")
        print("   3. Re-run this cell\n")
        print("‚ùå Cannot continue until NumPy is fixed.")
        raise SystemExit("NumPy compatibility issue - see instructions above")
    else:
        raise
except ImportError:
    print("‚ö†Ô∏è  NumPy not installed. Run: !pip install -r requirements.txt")

# Verify imports work
print("\n" + "="*60)
print("Verifying Sequence imports...")
print("="*60)

try:
    from config.config import DataConfig, ModelConfig, TrainingConfig
    from train.features.agent_features import build_feature_frame
    from utils.logger import get_logger
    print("‚úÖ All imports successful!")
except ImportError as e:
    print(f"‚ùå Import failed: {e}")
    print("Please restart the runtime and try again")

## üéÆ Step 3: Check GPU Availability

To enable GPU: **Runtime** ‚Üí **Change runtime type** ‚Üí **GPU** (T4 or better)

In [None]:
import torch

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"‚úÖ GPU Available: {gpu_name}")
    print(f"   Memory: {gpu_memory:.1f} GB")
else:
    print("‚ö†Ô∏è  No GPU detected")
    print("   Consider enabling GPU: Runtime ‚Üí Change runtime type ‚Üí GPU")

## üíæ Step 4: Mount Google Drive (Optional - Recommended for Data Persistence)

Mount Google Drive to save data and models across sessions.

In [None]:
import os

from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# Create Sequence data directory in Drive
drive_data_dir = Path('/content/drive/MyDrive/Sequence/data')
drive_data_dir.mkdir(parents=True, exist_ok=True)

# Set environment variable for data directory
os.environ['SEQUENCE_DATA_DIR'] = str(drive_data_dir)

print("‚úÖ Google Drive mounted")
print(f"   Data directory: {drive_data_dir}")

---

# üìä Workflow Examples

Choose a workflow below based on your needs.

## Workflow 1: Data Preparation Only

Prepare dataset with technical features and save in NPY format (30-50x faster loading).

In [None]:
# Prepare GBPUSD data for 2023-2024
!python data/prepare_dataset.py \
  --pairs gbpusd \
  --years 2023,2024 \
  --t-in 120 \
  --t-out 10 \
  --task-type classification \
  --input-root data/data

# Output files created:
# - data/data/gbpusd/gbpusd_prepared.npy (fast binary format)
# - data/data/gbpusd/gbpusd_prepared.csv (backward compatibility)
# - data/data/gbpusd/gbpusd_prepared_metadata.json (feature info)
# - data/data/gbpusd/gbpusd_prepared_datetime.npy (datetime index)

## Workflow 2: Complete Training Pipeline

Download data, prepare features, and train model end-to-end.

In [None]:
# Complete pipeline: download ‚Üí prepare ‚Üí train
!python run/training_pipeline.py \
  --pairs gbpusd \
  --run-histdata-download \
  --years 2023,2024 \
  --t-in 120 \
  --t-out 10 \
  --epochs 20 \
  --batch-size 64 \
  --learning-rate 1e-3 \
  --checkpoint-dir models

# Model checkpoint will be saved to: models/gbpusd_best_model.pt

## Workflow 3: Training with Intrinsic Time (Directional-Change Bars)

Use event-driven time representation instead of fixed intervals.

In [None]:
# Prepare data with directional-change bars
!python data/prepare_dataset.py \
  --pairs gbpusd \
  --years 2024 \
  --t-in 120 \
  --t-out 10 \
  --intrinsic-time \
  --dc-threshold-up 0.0005 \
  --dc-threshold-down 0.0005

# Train on intrinsic time data
!python train/run_training.py \
  --pairs gbpusd \
  --epochs 20 \
  --batch-size 64

## Workflow 4: Multi-Task Learning (Price + Volatility + Regime)

Train model to predict multiple targets simultaneously.

In [None]:
# Prepare multi-task dataset
!python data/prepare_multitask_dataset.py \
  --pairs gbpusd \
  --years 2024 \
  --t-in 120 \
  --t-out 10

# Train multi-task model
!python train/run_training_multitask.py \
  --pairs gbpusd \
  --epochs 30 \
  --batch-size 64 \
  --learning-rate 1e-3

## Workflow 5: GDELT Sentiment Analysis Integration (BigQuery)

Add news sentiment features using Google BigQuery GDELT dataset.

In [None]:
# Authenticate with Google Cloud
import os

from google.colab import auth

auth.authenticate_user()

# Set your GCP project ID
os.environ['GOOGLE_CLOUD_PROJECT'] = 'your-project-id'  # Replace with your project ID

print("‚úÖ Google Cloud authentication complete")

In [None]:
# Prepare data with GDELT sentiment features
!python data/prepare_dataset.py \
  --pairs gbpusd \
  --years 2024 \
  --t-in 120 \
  --t-out 10 \
  --include-sentiment \
  --use-bigquery-gdelt \
  --gdelt-themes "WB_1427_BUSINESS_FINANCE,WB_2327_BUSINESS_FINANCIAL_MARKETS"

# Train with sentiment features
!python train/run_training.py \
  --pairs gbpusd \
  --epochs 20 \
  --batch-size 64

## Workflow 6: Reinforcement Learning (A3C) with Backtesting

Train an RL agent to learn optimal trading policies on historical data.

In [None]:
# First, prepare data if not already done
!python data/prepare_dataset.py \
  --pairs gbpusd \
  --years 2023,2024 \
  --t-in 120 \
  --t-out 10

# Train A3C agent with backtesting environment
!python rl/run_a3c_training.py \
  --pair gbpusd \
  --env-mode backtesting \
  --historical-data data/data/gbpusd/gbpusd_prepared.csv \
  --num-workers 2 \
  --total-steps 100000 \
  --learning-rate 1e-4

---

# üìà Model Evaluation

## Evaluate Trained Model

In [None]:
# Evaluate model on test set
!python eval/run_evaluation.py \
  --pairs gbpusd \
  --checkpoint-path models/gbpusd_best_model.pt \
  --t-in 120 \
  --t-out 10

# Outputs: accuracy, precision, recall, F1 score, confusion matrix

## Ensemble with TimesFM Foundation Model

In [None]:
# Ensemble your model with Google's TimesFM
!python eval/ensemble_timesfm.py \
  --pairs gbpusd \
  --years 2024 \
  --t-in 120 \
  --t-out 10 \
  --checkpoint-root models \
  --device cuda

---

# üõ†Ô∏è Utility Commands

## Check Prepared Data Files

In [None]:
import json
from pathlib import Path

pair = "gbpusd"
data_dir = Path(f"data/data/{pair}")

if data_dir.exists():
    print(f"üìÇ Data files for {pair.upper()}:")
    print("="*60)

    for file_path in sorted(data_dir.glob(f"{pair}_prepared*")):
        size_mb = file_path.stat().st_size / 1e6
        print(f"  {file_path.name:40} {size_mb:>8.2f} MB")

    # Show metadata
    metadata_path = data_dir / f"{pair}_prepared_metadata.json"
    if metadata_path.exists():
        with open(metadata_path) as f:
            metadata = json.load(f)
        print("\nüìä Dataset Info:")
        print("="*60)
        print(f"  Rows: {metadata['num_rows']:,}")
        print(f"  Features: {len(metadata['feature_columns'])}")
        print(f"  t_in: {metadata['t_in']}, t_out: {metadata['t_out']}")
        print(f"  Target type: {metadata['target_type']}")
else:
    print(f"‚ùå No data found for {pair.upper()}")
    print("   Run data preparation first")

## Check Model Checkpoints

In [None]:
from pathlib import Path

models_dir = Path("models")

if models_dir.exists():
    model_files = list(models_dir.glob("*.pt"))

    if model_files:
        print("ü§ñ Saved Models:")
        print("="*60)
        for model_path in sorted(model_files):
            size_mb = model_path.stat().st_size / 1e6
            print(f"  {model_path.name:40} {size_mb:>8.2f} MB")
    else:
        print("‚ùå No model checkpoints found")
        print("   Train a model first")
else:
    print("‚ùå models/ directory not found")

## Monitor GPU Usage

In [None]:
# Check GPU memory usage
!nvidia-smi

---

# üîß Troubleshooting

## Common Issues and Solutions

### Import Errors
**Error**: `ModuleNotFoundError: No module named 'config'`

**Solution**: Re-run Step 2 (Setup Python Path) cell

### Out of Memory
**Error**: `CUDA out of memory`

**Solution**: Reduce batch size:
```bash
--batch-size 32  # or even 16
```

### Slow Data Loading
**Issue**: Loading takes 30-60 seconds

**Solution**: The codebase now automatically uses NPY format (30-50x faster). Make sure you're using the latest version.

### Missing Data Files
**Error**: `FileNotFoundError: gbpusd_prepared.csv`

**Solution**: Run data preparation first (Workflow 1)

---

## üìö Additional Resources

- **Main Documentation**: [README.md](https://github.com/YOUR_USERNAME/Sequence/blob/main/README.md)
- **Colab Setup Guide**: [COLAB_SETUP.md](https://github.com/YOUR_USERNAME/Sequence/blob/main/COLAB_SETUP.md)
- **Configuration Reference**: [CLAUDE.md](https://github.com/YOUR_USERNAME/Sequence/blob/main/CLAUDE.md)
- **Architecture Guide**: [docs/ARCHITECTURE_OVERVIEW.md](https://github.com/YOUR_USERNAME/Sequence/blob/main/docs/ARCHITECTURE_OVERVIEW.md)

---

**Happy Trading! üöÄüìà**