# Onboarding Notebook: Anticipatory Prefill Project

This notebook verifies your environment is ready for development.
It checks:
1. Python & GPU
2. Installs dependencies (handling both local and remote execution)
3. Loads the project config & model
4. Runs a basic inference
5. Runs a KV-cache sanity check

## 1. Runtime Checks & Setup

In [7]:
import sys
import os
import torch

print(f"Python Version: {sys.version}")
if torch.cuda.is_available():
    print(f"GPU Available: {torch.cuda.get_device_name(0)}")
    print(f"CUDA Version: {torch.version.cuda}")
else:
    print("WARNING: No GPU detected. Please enable GPU runtime.")

Python Version: 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]
GPU Available: Tesla T4
CUDA Version: 12.6


## 2. Remote Setup (Colab/VS Code Remote)
If running remotely (e.g. VS Code connected to Colab), we clone the repo into a subdirectory to ensure we have access to `src/`.

In [8]:
import os

REPO_DIR = "OT1-APITS"

# Check if we are running locally (src exists relative to us) or need to clone
if os.path.exists("../src"):
    print("Running locally (parent directory has src).")
    REPO_ROOT = ".."
elif os.path.exists("src"):
    print("Running locally (current directory has src).")
    REPO_ROOT = "."
else:
    # We are likely in a fresh Colab VM
    print(f"Local src not found. Checking for clone in {REPO_DIR}...")
    if not os.path.exists(REPO_DIR):
        print("Cloning repo...")
        !git clone https://github.com/Samsam19191/OT1-APITS.git {REPO_DIR}
    else:
        print(f"Repo already cloned in {REPO_DIR}.")
    
    REPO_ROOT = REPO_DIR

print(f"Repository root set to: {REPO_ROOT}")

Local src not found. Checking for clone in OT1-APITS...
Cloning repo...
Cloning into 'OT1-APITS'...
remote: Enumerating objects: 12, done.[K
remote: Counting objects: 100% (12/12), done.[K
remote: Compressing objects: 100% (11/11), done.[K
remote: Total 12 (delta 0), reused 9 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (12/12), 7.79 KiB | 7.79 MiB/s, done.
Repository root set to: OT1-APITS


## 3. Install Dependencies

In [9]:
import os

req_file = os.path.join(REPO_ROOT, "requirements.txt")

if os.path.exists(req_file):
    print(f"Found requirements at: {req_file}")
    !pip install -r {req_file}
else:
    print(f"WARNING: requirements.txt not found at {req_file}")
    print("Current working directory files:", os.listdir("."))
    if os.path.exists(REPO_ROOT):
        print(f"Files in {REPO_ROOT}:", os.listdir(REPO_ROOT))
        
    print("Installing default packages as fallback...")
    !pip install torch transformers accelerate bitsandbytes sentencepiece protobuf

Current working directory files: ['.config', 'model_validation_report.md', 'OT1-APITS', 'README.md', 'experiments', 'sample_data']
Files in OT1-APITS: ['.git', 'model_validation_report.md', 'README.md', 'experiments']
Installing default packages as fallback...


KeyboardInterrupt: 

## 4. Import & Load Model

In [None]:
import sys
import os

# Add the repo root key paths to sys.path so imports work
if REPO_ROOT not in sys.path:
    sys.path.append(os.path.abspath(REPO_ROOT))

import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

try:
    # Try importing from src.config
    from src.config import MODEL_NAME, LOAD_IN_4BIT, DEVICE, MAX_SEQ_LEN_TYPING
    print(f"Config loaded from src. Model: {MODEL_NAME}")
except ImportError as e:
    print(f"ImportError: {e}")
    print("Could not import src.config. Using defaults.")
    MODEL_NAME = "Qwen/Qwen2.5-1.5B-Instruct"
    LOAD_IN_4BIT = True
    DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

print(f"Transformers version: {transformers.__version__}")
print(f"Loading model: {MODEL_NAME}...")

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    load_in_4bit=LOAD_IN_4BIT,
    trust_remote_code=True
)

print("Model loaded successfully!")

## 5. Sanity Inference

In [None]:
input_text = "def fibonacci(n):"
inputs = tokenizer(input_text, return_tensors="pt").to(DEVICE)

outputs = model.generate(**inputs, max_new_tokens=20)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("--- Generation sanity check ---")
print(result)
print("------------------------------")

## 6. KV-Cache Sanity Check

In [None]:
prompt = "The quick brown fox"
inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)

# Run forward pass with use_cache=True
with torch.no_grad():
    outputs = model(inputs.input_ids, use_cache=True)

kv_cache = outputs.past_key_values

if kv_cache is not None:
    print("✅ KV Cache extracted successfully.")
    # Check structure (layers, keys/values)
    print(f"Cache type: {type(kv_cache)}")
    if hasattr(kv_cache, "get_seq_length"):
         print(f"Sequence length in cache: {kv_cache.get_seq_length()}")
    else:
         # Fallback for old tuple-based cache
         print(f"Layers: {len(kv_cache)}")
else:
    print("❌ Failed to extract KV Cache.")