# SpectraMind V50 — Kaggle Notebook Template

**Purpose**: a safe, reproducible scaffold for running the NeurIPS 2025 Ariel Data Challenge
workflows on Kaggle without internet access.

This template supports:
- Environment detection (Kaggle vs local)
- Read-only data access at `/kaggle/input`
- Optional import of the SpectraMind V50 package if it is available as a Kaggle dataset
- Strict, pinned deps preferred (see `requirements-kaggle.txt` in the repo)
- Reproducible config snapshot embedded in the notebook

## 0) Environment & Paths

In [None]:
import os, sys, json, platform
from pathlib import Path
import pandas as pd
import numpy as np

IS_KAGGLE = Path('/kaggle/input').exists()
COMP_DIR = Path('/kaggle/input/ariel-data-challenge-2025') if IS_KAGGLE else Path('./data')
print("Env:", "Kaggle" if IS_KAGGLE else "Local", "| Python:", sys.version.split()[0])

# Optional: repo-attached dataset with installed package (no internet)
SPECTRAMIND_DS = Path('/kaggle/input/spectramind-v50')
if IS_KAGGLE and SPECTRAMIND_DS.exists():
    sys.path.insert(0, str(SPECTRAMIND_DS / 'src'))
    print("SpectraMind source path added:", SPECTRAMIND_DS/'src')

## 1) Config Snapshot (embed minimal JSON for provenance)

In [None]:
# Store a minimal config snapshot for reproducibility
config = {
    "pipeline": ["calibrate", "predict"],
    "model": {
        "fgs1_encoder": "mamba_ssm-lite",
        "airs_encoder": "cnn-lite",
        "decoder": "heteroscedastic-head"
    },
    "data": {
        "competition": str(COMP_DIR),
        "bins": 283
    },
    "runtime": {
        "env": "kaggle" if IS_KAGGLE else "local",
        "python": platform.python_version()
    }
}
config_path = Path('outputs'); config_path.mkdir(exist_ok=True, parents=True)
with open(config_path/'config_snapshot.json', 'w') as f:
    json.dump(config, f, indent=2)
print("Wrote:", config_path/'config_snapshot.json')

## 2) Data Access (read competition files if present)

In [None]:
def list_files(base: Path, patterns=('.csv', '.parquet', '.json')):
    if not base.exists():
        return []
    out = []
    for p in base.rglob('*'):
        if p.suffix.lower() in patterns:
            out.append(str(p))
    return sorted(out)[:50]

inventory = list_files(COMP_DIR)
print("Sample files:", len(inventory))
for p in inventory[:10]:
    print("-", p)

## 3) Minimal EDA (guarded)

In [None]:
# Example: try to read train.csv / test.csv if they exist
def safe_read_csv(p: Path, n=5):
    try:
        df = pd.read_csv(p)
        print(p.name, df.shape)
        display(df.head(n))
        return df
    except Exception as e:
        print("Failed reading", p, "->", e)

train_csv = COMP_DIR/'train.csv'
test_csv  = COMP_DIR/'test.csv'
train_df = safe_read_csv(train_csv) if train_csv.exists() else None
test_df  = safe_read_csv(test_csv) if test_csv.exists() else None

## 4) Optional: Import SpectraMind and Run Inference Hooks

In [None]:
# If the SpectraMind V50 package is available (attached as a Kaggle dataset), try to import a predict hook
try:
    from spectramind.cli_hooks import notebook_predict
    HAVE_SM = True
    print("SpectraMind hooks available.")
except Exception as e:
    HAVE_SM = False
    print("SpectraMind hooks not available:", e)

# Inference stub (replace IDs/source as needed)
sample_ids = None
if test_df is not None and 'id' in test_df.columns:
    sample_ids = test_df['id'].head(5).tolist()

if HAVE_SM:
    # This function should emit a DataFrame with columns: id, mu_000.., sigma_000..
    preds = notebook_predict(
        comp_dir=str(COMP_DIR),
        config=config,
        ids=sample_ids
    )
    display(preds.head())
    out_csv = Path('outputs')/'submission.csv'
    preds.to_csv(out_csv, index=False)
    print("Wrote submission to", out_csv)
else:
    print("No spectramind package — keeping the template minimal. "
          "You can add a cell to generate a dummy submission matching the schema.")

## 5) Submission Packaging Helper

In [None]:
# A tiny helper to validate/schema-check and zip the submission, if desired.
import zipfile
from pathlib import Path

def zip_submission(csv_path: Path, zip_path: Path):
    assert csv_path.exists(), "CSV not found"
    with zipfile.ZipFile(zip_path, mode='w', compression=zipfile.ZIP_DEFLATED) as zf:
        zf.write(csv_path, arcname=csv_path.name)
    print("Created:", zip_path)

sub = Path('outputs')/'submission.csv'
if sub.exists():
    zip_submission(sub, Path('submission.zip'))
else:
    print("No submission.csv found — skip zipping.")

## Notes

- **No internet**: this notebook avoids `pip install` and assumes dependencies come from attached datasets / Kaggle image.
- **Reproducibility**: we snapshot a minimal config in `outputs/config_snapshot.json`; extend as needed.
- **Importing SpectraMind**: If you publish your repo as a Kaggle Dataset containing `src/spectramind`, this template will auto-import.
- **Schema**: ensure your submission matches the challenge schema (283 μ, 283 σ bins).
- **Runtime**: keep cell runtimes reasonable (Kaggle typical limit ≤9h).