# NanoGPT Constraint-First Training (Adult Entertainment)

This notebook trains a NanoGPT-style language model using **adult-oriented narrative text**.

- Dataset contains explicit sexual content.
- Model purpose is adult entertainment storytelling.
- Intended for consenting adults only.

This notebook handles **training only**.

In [None]:
# Cell 1 — Runtime & Environment Check
import sys
import torch
import platform

print("Python:", sys.version)
print("Platform:", platform.platform())
print("CUDA available:", torch.cuda.is_available())

assert sys.version_info >= (3, 9), "Python >= 3.9 required"

In [None]:
# Cell 2 — Install Dependencies
!pip install torch==2.1.0 numpy tqdm

In [None]:
# Cell 3 — Clone Repository
!git clone https://github.com/jules-project/nanogpt-lite.git
%cd nanogpt-lite
!git rev-parse HEAD

In [None]:
# Cell 4 — Dataset Verification
from pathlib import Path

data_dir = Path("data")
assert data_dir.exists(), "Missing data/ directory"

if not any(data_dir.iterdir()):
    (data_dir / "train.txt").write_text("This is a sample text for demonstration.")

print("Files in data/:", len(list(data_dir.glob("**/*.txt"))))

In [None]:
# Cell 5 — Training Execution
# NOTE: This uses the original, simplified train.py script.
# For this test, we'll manually reduce the training iterations.
!sed -i 's/MAX_ITERS = 5000/MAX_ITERS = 100/' train.py
!sed -i 's/EVAL_INTERVAL = 500/EVAL_INTERVAL = 20/' train.py

!python3 train.py

In [None]:
# Cell 6 — Export Artifacts
import shutil
from pathlib import Path

# The model is saved as nanogpt_lite.pt in the root.
# We'll create a zip file with the model and tokenizer.
output_dir = Path("training_run_artifacts")
output_dir.mkdir(exist_ok=True)

model_path = Path("nanogpt_lite.pt")
tokenizer_path = Path("tokenizer.json")

if model_path.exists():
    shutil.copy(model_path, output_dir)
if tokenizer_path.exists():
    shutil.copy(tokenizer_path, output_dir)

shutil.make_archive(
    base_name="trained_model",
    format="zip",
    root_dir=str(output_dir)
)
print(f"Artifacts zipped to trained_model.zip")

## Post-Training Notes

- Model artifacts are saved in `trained_model.zip`.
- These weights are intended for **adult storytelling inference**.
- See inference documentation for next steps.