# Google Tunix Hackathon: Dual-Stream End-to-End Kernel

This notebook is structured for Kaggle submission in the Google Tunix hackathon. It runs entirely from the checked-in repo without external assets, and it reproduces the full Dual-Stream Architecture plus the `dualstream_anticollapse` monitoring toolkit shipped in this repository. Key compliance notes:

- ✅ Self contained: installs only from local requirements and Hugging Face models that Kaggle already caches.
- ✅ Deterministic seeds and CPU-friendly defaults so it runs inside the kernel limits.
- ✅ Clear outputs and saved artifacts that match the competition's expectation for a single-run, reviewable notebook.


## 1) Environment prep

The cell below auto-detects the repo location (works on Kaggle or locally), installs minimal dependencies, and sets seeds for reproducibility.

In [1]:

import os, sys, json, random
from pathlib import Path
import numpy as np
import torch

# Detect repository root (works on Kaggle datasets and local clones)
DEFAULT_ROOT = Path('.')
kaggle_root = Path('/kaggle/input/dual-stream')
repo_root = kaggle_root if kaggle_root.exists() else DEFAULT_ROOT
print(f"Using repo root: {repo_root.resolve()}")

# Make project modules importable
sys.path.append(str(repo_root / 'python_poc'))
sys.path.append(str(repo_root / 'dualstream_anticollapse'))

# Light-weight dependency install (no custom wheels needed)
try:
    import pandas  # noqa: F401
except ImportError:
    !pip install -q pandas scikit-learn scipy matplotlib seaborn

# Transformers/torch are large; install only if missing.
try:
    import transformers  # noqa: F401
except ImportError:
    !pip install -q -r {repo_root}/python_poc/requirements.txt

# Seeds for reproducibility
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Device: {device}")


Using repo root: /workspace/dual-stream


  from .autonotebook import tqdm as notebook_tqdm


Device: cpu


## 2) Dual-Stream generator

Wrap the `python_poc/dual_stream_poc.py` implementation so we can produce Answer + Monologue streams directly from the notebook. To stay within Kaggle limits the default model is `gpt2`, but you can swap in a Tunix-supplied checkpoint by changing `model_name`.

In [2]:

from dual_stream_poc import DualStream

prompt = """You are an alignment auditor. Briefly explain why transparency in model reasoning matters."""
ds = DualStream(model_name="gpt2", top_k=5, device=device)
dual_result = ds.generate(prompt=prompt, max_new_tokens=40, temperature=0.0)

print("Answer Stream:\n", dual_result['answer_text'])
print("\nMonologue Stream (first 5 lines):")
for line in dual_result['monologue_text'].split('\n')[:5]:
    print(line)


Answer Stream:
 

The first thing to understand is that model reasoning is a very complex process. It is not a simple process. It is a complex process that requires a lot of work.

The second

Monologue Stream (first 5 lines):
[LOGIT_LENS:TOP_5:('198',0.238),('Explain',0.036),('If',0.032),('You',0.030),('The',0.026)]
[LOGIT_LENS:TOP_5:('198',0.998),('The',0.000),('1',0.000),('A',0.000),('I',0.000)]
[LOGIT_LENS:TOP_5:('The',0.045),('What',0.028),('1',0.027),('In',0.021),('',0.021)]
[LOGIT_LENS:TOP_5:('first',0.038),('model',0.027),('following',0.024),('most',0.014),('problem',0.013)]
[LOGIT_LENS:TOP_5:('thing',0.226),('step',0.163),('question',0.050),('part',0.033),('rule',0.030)]


## 3) Train baseline and monitor for drift/performance regressions

`dualstream_anticollapse` ships drift detection, performance checks, and governance helpers. The next cell mirrors the CLI flow (`train` ➜ `monitor`) using the bundled demo CSVs, producing on-disk artifacts compatible with the hackathon review process.

In [3]:

import pandas as pd
from dualstream_anticollapse.config import Config
from dualstream_anticollapse.retrain import build_model, fit_model, predict
from dualstream_anticollapse.metrics import classification_metrics
from dualstream_anticollapse.governance import save_model
from dualstream_anticollapse.monitor import ModelMonitor

artifacts = Path('artifacts')
artifacts.mkdir(exist_ok=True)

# Load demo data
ref_path = repo_root / 'dualstream_anticollapse' / 'demo' / 'reference.csv'
cur_path = repo_root / 'dualstream_anticollapse' / 'demo' / 'current.csv'
ref = pd.read_csv(ref_path)
cur = pd.read_csv(cur_path)

# Fit a baseline classifier
target = 'y'
features = ['x1', 'x2']
cfg = Config(target=target, features=features, output_dir=str(artifacts))

X_ref, y_ref = ref[features], ref[target].astype(int)
model = build_model(cfg.model_type)
model = fit_model(model, X_ref, y_ref)

# Save baseline metrics + model artifacts
y_pred_ref, y_proba_ref = predict(model, X_ref)
metrics_base = classification_metrics(y_ref, y_pred_ref, y_proba_ref)
save_model(model, artifacts / 'model.joblib', {"stage": "baseline", "metrics": metrics_base})
print("Baseline metrics", json.dumps(metrics_base, indent=2))

# Monitor a new batch
mon = ModelMonitor(cfg, {"metrics": metrics_base}, state_path=str(artifacts / 'state.json'), alert_sink='stdout')
drift_triggered = mon.check_drift(ref, cur)
outliers_triggered, outliers = mon.check_outliers(cur)

X_cur, y_cur = cur[features], cur[target].astype(int)
y_pred_cur, y_proba_cur = predict(model, X_cur)
perf_report = mon.check_performance(classification_metrics(y_cur, y_pred_cur, y_proba_cur))

print("Drift triggered:", drift_triggered)
print("Outliers triggered:", outliers_triggered)
print("Performance report:\n", json.dumps(perf_report, indent=2))


Baseline metrics {
  "accuracy": 0.826,
  "precision": 0.7800687285223368,
  "recall": 0.908,
  "f1": 0.8391866913123845,
  "confusion_matrix": [
    [
      186,
      64
    ],
    [
      23,
      227
    ]
  ],
  "auc": 0.9235680000000001,
  "log_loss": 1.0177470322299007
}
{"ts": 1764184421.6330183, "event": "data_drift", "payload": {"drifted": [{"feature": "x1", "psi": 0.34730059968180615, "ks_pvalue": 1.4169802610765143e-14}]}}
{"ts": 1764184421.6410162, "event": "performance_degradation", "payload": {"triggers": ["accuracy_drop"], "metrics": {"accuracy": 0.742, "precision": 0.6809651474530831, "recall": 0.9621212121212122, "f1": 0.7974882260596546, "confusion_matrix": [[117, 119], [10, 254]], "auc": 0.8975988700564972, "log_loss": 2.543059662289643}, "baseline": {"accuracy": 0.826, "precision": 0.7800687285223368, "recall": 0.908, "f1": 0.8391866913123845, "confusion_matrix": [[186, 64], [23, 227]], "auc": 0.9235680000000001, "log_loss": 1.0177470322299007}}}
Drift triggered: 

## 4) Coherence audit of the Dual-Stream output

Convert the generated Answer/Monologue into the JSONL schema expected by the Coherence Auditor and run the audit. Any misalignment or safety markers will be surfaced in `report`.

In [4]:

from dualstream_anticollapse.coherence import CoherenceAuditor

# Assemble a JSONL-style record from the DualStream output
def build_logit_topk(frames):
    if not frames:
        return []
    # Use the top-k distribution from the first generation step as a summary
    first = frames[0]
    tokenizer = ds.tokenizer
    return [(tokenizer.decode([tid]).strip(), prob) for tid, prob in zip(first['topk_ids'], first['topk_probs'])]

record = {
    "answer": dual_result['answer_text'],
    "monologue": dual_result['monologue_text'],
    "logits_topk": build_logit_topk(dual_result['monologue_frames']),
}

# Run audit
ca = CoherenceAuditor(cfg.thresholds.__dict__)
report = ca.audit_record(record)
print(json.dumps(report, indent=2))


{
  "answer": "\n\nThe first thing to understand is that model reasoning is a very complex process. It is not a simple process. It is a complex process that requires a lot of work.\n\nThe second",
  "coherent": true,
  "reasons": [],
  "deception_hits": [],
  "conflict_hits": [],
  "safety_hits": [],
  "logits_topk": [
    [
      "",
      0.23849539458751678
    ],
    [
      "Explain",
      0.036299243569374084
    ],
    [
      "If",
      0.032072119414806366
    ],
    [
      "You",
      0.029976723715662956
    ],
    [
      "The",
      0.026129810139536858
    ]
  ]
}


## 5) Export for Kaggle submission

The following cell writes the monitored artifacts and the coherence report to the working directory. On Kaggle, mark these files as output so reviewers can verify compliance with the Tunix rules without re-running heavy models.

In [5]:

output_dir = Path('submission_outputs')
output_dir.mkdir(exist_ok=True)

# Save coherence report
with open(output_dir / 'coherence_report.json', 'w') as f:
    json.dump(report, f, indent=2)

# Save dual-stream output
with open(output_dir / 'dual_stream_output.json', 'w') as f:
    json.dump(dual_result, f, indent=2)

print("Wrote:", sorted(p.name for p in output_dir.iterdir()))


Wrote: ['coherence_report.json', 'dual_stream_output.json']


### What to submit

- The notebook itself (this file) with all cells executed.
- The `submission_outputs/` folder containing `coherence_report.json` and `dual_stream_output.json` as supplemental artifacts.
- Optionally the `artifacts/` folder if you want reviewers to inspect drift/performance traces.

This layout matches Kaggle's single-notebook review flow and demonstrates the full integration of the Dual-Stream generator plus the anti-collapse/monitoring stack.