# RWKV-7 TTT Phantom Colab

This notebook demonstrates training and generation with the RWKV-7 TTT Phantom model using a single NVIDIA A100 40GB GPU.

## Setup
Clone the repository and install the required packages.

In [None]:
!git clone https://github.com/USER/RWKV-7-TTT-Phantom.git
%cd RWKV-7-TTT-Phantom
!pip install -r requirements.txt

## Configuration
Define a configuration that comfortably fits on a 40GB GPU.

In [None]:
from rwkv7_ttt_phantom import RWKV7PhantomConfig

config = RWKV7PhantomConfig(
    n_layer=12,
    n_embd=768,
    n_head=12,
    head_size=64,
    vocab_size=65536,
    ghost_ratio=0.75,
    core_ratio=0.25,
    ttt_enabled=True,
    ttt_lr=0.01,
    ttt_steps=1,
    max_depth_iter=1,
    use_svd_compression=True,
    svd_rank_ratio=0.5,
    use_sparsity=True,
    sparsity_ratio=0.3,
)
print(config)

## Prepare Data
The training script expects tokenized data in `.bin`/`.idx` format. The cell below creates a tiny random dataset for demonstration.

In [None]:
import numpy as np
from pathlib import Path

Path("data").mkdir(exist_ok=True)

# Random tokens for example
tokens = np.random.randint(0, config.vocab_size, size=10000, dtype=np.uint16)
with open("data/dummy.bin", "wb") as f:
    f.write(tokens.tobytes())

# Simple sequential indices with chunk size 512
idx = np.arange(0, len(tokens)+1, 512, dtype=np.int64)
with open("data/dummy.idx", "wb") as f:
    f.write(idx.tobytes())

## Training
Run `train_model` to train on the prepared data.

In [None]:
from Training import train_model

model = train_model(
    data_path="data/dummy",
    output_dir="./output",
    config=config,
    batch_size=2,
    learning_rate=6e-4,
    num_epochs=1,
    use_wandb=False,
)

## Text Generation
Generate text using the trained model.

In [None]:
import torch
prompt = torch.tensor([[1, 2, 3]])
print(model.model.generate(prompt, max_new_tokens=20))