# Train on Google Colab (Cloud GPU)

This notebook is the **cloud runbook** for training all models in one go, without running the API server.

**What you upload first**
- Your SQLite DB file `data/app.db` (must contain `candles` + `instrument_meta`).

**What you download after training**
- `data/app.db` (contains ridge + deep models inside `trained_models` table)
- `data/models/pattern_seq.pt` (sequence-based candlestick pattern model)

Security:
- Don’t paste tokens into notebook outputs (use Secrets).

## 1) Pick a Free Cloud Notebook (Colab/Kaggle/others)

**Options**

| Platform | Free GPU? | Typical session limits | Notes |
|---|---:|---|---|
| Google Colab (Free) | Sometimes | ~2–12h, may disconnect | Best docs + Drive mount |
| Kaggle Notebooks | Often | ~9h GPU weekly quota | Great for long-ish jobs; requires Kaggle secrets |
| Other “free GPU” sites | Unreliable | Varies | Often limited or disappears |

For your “train ALL NSE_EQ” job, **free tiers are not guaranteed** to finish in one session. The only practical way on free tiers is:
- run with **checkpointing/resume**, and
- continue over multiple sessions (or upgrade to paid for stability).

Next cell prints runtime info.

In [None]:
import os, sys, platform, textwrap

print('python:', sys.version)
print('platform:', platform.platform())
print('cwd:', os.getcwd())
print('pid:', os.getpid())

## 2) Open a Hosted Runtime (GPU/TPU/CPU) and Verify Hardware

- **Colab**: Runtime → Change runtime type → Hardware accelerator → **GPU**.
- **Kaggle**: Notebook settings → Accelerator → **GPU**.

Next cells detect GPU/TPU and PyTorch CUDA.

In [None]:
# GPU check (works in Colab/Kaggle)
import subprocess, shutil

if shutil.which('nvidia-smi'):
    subprocess.run(['nvidia-smi'])
else:
    print('nvidia-smi not found (CPU runtime or TPU-only runtime).')

# TPU hint (Colab)
print('TPU_NAME:', os.environ.get('TPU_NAME'))

In [None]:
# PyTorch CUDA check
try:
    import torch

    print('torch:', torch.__version__)
    print('cuda_available:', torch.cuda.is_available())
    print('device_count:', torch.cuda.device_count())
    if torch.cuda.is_available():
        print('device0:', torch.cuda.get_device_name(0))
except Exception as e:
    print('torch import failed:', type(e).__name__, str(e)[:200])

## 3) Mount/Access Data (Google Drive / Kaggle Datasets / Direct Download)

For your backend training job, you want persistence for:
- SQLite DB (`DATABASE_PATH`) that stores instruments + job resume cursor
- Model artifacts (whatever your project writes under `data/`)

### Option A: Google Drive (Colab)
Mount Drive and use a folder like `MyDrive/DemoBackendAi/`.

### Option B: Kaggle Datasets
If you later export candles to a Kaggle dataset, you can mount it under `/kaggle/input/...`.

### Option C: Direct download
Useful for non-auth data; shown here with checksum verification example.

In [None]:
# Option A: Colab Drive mount
# (If you're in Kaggle, skip this cell.)
try:
    from google.colab import drive

    drive.mount('/content/drive')
    BASE_DIR = '/content/drive/MyDrive/DemoBackendAi'
except Exception as e:
    # Not in Colab
    BASE_DIR = '/content/DemoBackendAi'
    print('Drive mount skipped:', type(e).__name__)

import os
os.makedirs(BASE_DIR, exist_ok=True)
print('BASE_DIR=', BASE_DIR)

# Use one DATA_DIR everywhere
DATA_DIR = os.path.join(BASE_DIR, 'data')
os.makedirs(DATA_DIR, exist_ok=True)
print('DATA_DIR=', DATA_DIR)

In [None]:
# Option C: Direct download with checksum (example template)
# This is just a template; your Upstox candle fetch uses authenticated API calls instead.
import hashlib, urllib.request

def sha256_file(path: str) -> str:
    h = hashlib.sha256()
    with open(path, 'rb') as f:
        for chunk in iter(lambda: f.read(1024 * 1024), b''):
            h.update(chunk)
    return h.hexdigest()

# Example usage (disabled by default)
EXAMPLE_URL = None  # e.g. 'https://example.com/file.bin'
EXAMPLE_SHA256 = None

if EXAMPLE_URL and EXAMPLE_SHA256:
    out_path = os.path.join(DATA_DIR, 'download.bin')
    urllib.request.urlretrieve(EXAMPLE_URL, out_path)
    digest = sha256_file(out_path)
    print('sha256:', digest)
    assert digest == EXAMPLE_SHA256

## 4) Install Dependencies

In Colab, install normal deps first. Install PyTorch only if it’s missing (Colab usually includes it).

In [None]:
# ---- Choose ONE: clone from git OR upload zip ----
# Option 1: git clone (recommended)
REPO_URL = ''  # <- paste your repo URL (GitHub/GitLab). Prefer a PRIVATE repo.
REPO_DIR = os.path.join(BASE_DIR, 'repo')

if REPO_URL:
    import shutil, subprocess
    if os.path.exists(REPO_DIR) and os.listdir(REPO_DIR):
        print('Repo already exists:', REPO_DIR)
    else:
        os.makedirs(REPO_DIR, exist_ok=True)
        subprocess.check_call(['bash', '-lc', f'git clone --depth 1 {REPO_URL} {REPO_DIR}'])

print('REPO_DIR=', REPO_DIR)

# Option 2: zip upload (Colab)
# from google.colab import files
# uploaded = files.upload()  # upload DemoBackendAi.zip
# !unzip -o DemoBackendAi.zip -d "$REPO_DIR"

In [None]:
# Install dependencies
import os, subprocess, sys

assert os.path.exists(os.path.join(REPO_DIR, 'requirements.txt')), 'Set REPO_DIR first (clone/unzip).'

subprocess.check_call(['bash', '-lc', f'cd "{REPO_DIR}" && pip install -q -r requirements.txt'])

# Install torch only if missing (Colab usually has CUDA torch)
try:
    import torch  # noqa: F401
    print('torch already available')
except Exception:
    subprocess.check_call(['bash', '-lc', 'pip install -q torch'])

import fastapi, httpx
print('fastapi:', fastapi.__version__)
print('httpx:', httpx.__version__)

## 5) Upload Your Existing DB (recommended)

To train on cloud, the fastest path is to upload your existing SQLite DB from your PC.

That DB should already contain:
- `candles` (historical candles)
- `instrument_meta` (NSE_EQ universe keys)

This notebook will place it at `./data/app.db` so the backend training code finds it automatically.

In [None]:
# Upload DB + configure paths (Colab)
import os
from pathlib import Path

# Colab upload helper
try:
    from google.colab import files
except Exception:
    files = None

os.chdir(REPO_DIR)
os.makedirs('data', exist_ok=True)
os.makedirs('data/models', exist_ok=True)

# 1) Upload your app.db from your PC
if files is not None:
    uploaded = files.upload()  # upload a file named app.db
    if 'app.db' in uploaded:
        Path('data/app.db').write_bytes(uploaded['app.db'])
        print('Saved DB to data/app.db')
    else:
        print('Upload skipped or file not named app.db')
else:
    print('Not running in Colab; ensure data/app.db exists manually')

# 2) Make backend read the DB/model paths
os.environ['DATABASE_PATH'] = os.path.abspath('data/app.db')
os.environ['PATTERN_SEQ_MODEL_PATH'] = os.path.abspath('data/models/pattern_seq.pt')

print('DATABASE_PATH=', os.environ['DATABASE_PATH'])
print('PATTERN_SEQ_MODEL_PATH=', os.environ['PATTERN_SEQ_MODEL_PATH'])

In [None]:
# OPTIONAL: Hugging Face token (if you ever push models)
# os.environ['HF_TOKEN'] = '...'

# OPTIONAL: Kaggle API credential setup (if needed)
# 1) Upload kaggle.json via notebook UI
# 2) Then:
# !mkdir -p ~/.kaggle
# !cp kaggle.json ~/.kaggle/
# !chmod 600 ~/.kaggle/kaggle.json
# !kaggle datasets list | head

## 6) Define Training Config (batch size, epochs, mixed precision) for Cloud Limits

Free tiers are time-limited. Keep configs conservative, and checkpoint often.

Effective batch size:

$$\text{effective\_batch}=\text{batch\_size}\times\text{grad\_accumulation}$$

Below is a general config object you can tune. (Your project’s CLI already has many knobs; we’ll map them in Section 7.)

In [None]:
from dataclasses import dataclass

@dataclass
class CloudTrainConfig:
    batch_size: int = 128
    grad_accumulation: int = 1
    epochs_long: int = 3
    epochs_intraday: int = 3
    sleep_seconds_per_chunk: float = 0.2
    sleep_seconds_per_symbol: float = 0.4
    page_size: int = 200

cfg = CloudTrainConfig()
print(cfg)
print('effective_batch=', cfg.batch_size * cfg.grad_accumulation)

## 7) Train All Models (NO API server needed)

You can train everything in one go with a single command. It prints:
- which model family is training
- progress percentage
- elapsed + ETA

This uses `scripts/train_all_models_local.py` (runs training directly in-process).

In [None]:
# Smoke run (quick sanity check)
# - trains ridge on DEFAULT_UNIVERSE
# - trains deep on first 3 NSE_EQ symbols (GPU required)
# - trains pattern-seq on first 3 NSE_EQ symbols
import os, subprocess

os.chdir(REPO_DIR)

cmd = (
    'python scripts/train_all_models_local.py '
    '--nse-max-symbols 3 '
    '--deep-epochs 1 --ps-epochs 1 '
    '--run-ridge-batch --run-deep-nse-eq --run-pattern-seq'
 )
print(cmd)
subprocess.check_call(['bash', '-lc', cmd])

In [None]:
# Preflight: confirm your DB has the needed tables + universe
import os
os.chdir(REPO_DIR)

from app.core.db import db_conn
from app.universe.service import UniverseService

with db_conn() as conn:
    tables = [r['name'] for r in conn.execute("SELECT name FROM sqlite_master WHERE type='table'").fetchall()]
print('tables:', sorted(tables)[:10], '... total=', len(tables))

uni = UniverseService()
print('NSE_EQ count:', uni.count(prefix='NSE_EQ|'))

In [None]:
# Full run: train all models (may take a long time)
import os, subprocess
os.chdir(REPO_DIR)

cmd = (
    'python scripts/train_all_models_local.py '
    '--nse-max-symbols 0 '
    '--run-ridge-batch --run-deep-nse-eq --run-pattern-seq'
 )
print(cmd)
# Uncomment to start the full run:
# subprocess.check_call(['bash', '-lc', cmd])

In [None]:
# If you don't have GPU, you can still train everything except deep models:
import os, subprocess
os.chdir(REPO_DIR)

cmd = (
    'python scripts/train_all_models_local.py '
    '--no-run-deep-nse-eq '
    '--run-ridge-batch --run-pattern-seq'
 )
print(cmd)
# subprocess.check_call(['bash', '-lc', cmd])

## 8) Download Trained Artifacts to Your PC

After training finishes, download these files:
- `data/app.db` (ridge + deep models live inside this DB)
- `data/models/pattern_seq.pt` (pattern sequence model)

In [None]:
# Download artifacts (Colab)
import os
from pathlib import Path

try:
    from google.colab import files
except Exception:
    files = None

os.chdir(REPO_DIR)

db_path = Path('data/app.db')
ps_path = Path('data/models/pattern_seq.pt')

print('DB exists:', db_path.exists(), db_path)
print('pattern_seq exists:', ps_path.exists(), ps_path)

if files is not None:
    if db_path.exists():
        files.download(str(db_path))
    if ps_path.exists():
        files.download(str(ps_path))
else:
    print('Not in Colab; download files manually from the filesystem')