# Fine-Tune Mistral Nemo 12B for English Story Writing

This notebook fine-tunes **Mistral Nemo 12B Instruct** on your custom English novel data using QLoRA via Unsloth.

**Why Mistral Nemo 12B for English stories?**
- Ranked among the best open-source models for creative writing and narrative generation
- Least censored major model at this scale - produces vivid, uninhibited prose
- 12B parameters outperforms Llama 3.1 8B and Gemma 2 9B on writing benchmarks
- 128K context window for long chapter coherence
- Basis for many top community creative writing models (ArliAI RPMax, Lyra, etc.)
- Fits on a free Colab T4 GPU with 4-bit QLoRA via Unsloth

**Requirements:** Google Colab with T4 GPU (free tier works) or Kaggle Notebooks

## 1. Install Dependencies

In [None]:
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

## 2. Load Model

Unsloth makes Mistral Nemo fit on a 12GB GPU with QLoRA. On a T4 (16GB) you'll have plenty of headroom.

In [None]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 4096  # Nemo supports 128K, but 4096 is efficient for training
dtype = None            # Auto-detect
load_in_4bit = True     # QLoRA 4-bit quantization

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

print(f"Model loaded. GPU memory used: {torch.cuda.memory_allocated() / 1024**3:.1f} GB")

## 3. Configure LoRA Adapters

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r=32,                    # LoRA rank
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",  # Saves 30% more VRAM
    random_state=3407,
)

# Print trainable parameters
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total = sum(p.numel() for p in model.parameters())
print(f"Trainable: {trainable:,} / {total:,} ({100 * trainable / total:.2f}%)")

## 4. Upload and Load Dataset

Upload your `train.jsonl` file (generated by the Novel Writer pipeline).

Each line should have: `{"instruction": "...", "input": "", "output": "..."}`

In [None]:
from google.colab import files
import os

# Upload your dataset
if not os.path.exists("train.jsonl"):
    print("Please upload your train.jsonl file:")
    uploaded = files.upload()
else:
    print("train.jsonl already exists, skipping upload.")

# Check file size
import json
with open("train.jsonl", "r", encoding="utf-8") as f:
    lines = f.readlines()
print(f"Dataset: {len(lines)} entries")
print(f"Sample entry:")
sample = json.loads(lines[0])
print(f"  instruction: {sample.get('instruction', '')[:100]}...")
print(f"  output: {sample.get('output', '')[:100]}...")

In [None]:
from datasets import load_dataset

dataset = load_dataset("json", data_files="train.jsonl", split="train")

# Split into train/validation (90/10)
split = dataset.train_test_split(test_size=0.1, seed=42)
train_dataset = split["train"]
eval_dataset = split["test"]

print(f"Training samples: {len(train_dataset)}")
print(f"Validation samples: {len(eval_dataset)}")

# Mistral Nemo uses [INST] prompt format
story_prompt = """<s>[INST] You are a talented fiction author. Write vivid, engaging prose with strong characters, sensory details, and natural dialogue. Continue the narrative in the established style.

{instruction} [/INST]{output}</s>"""

def formatting_func(examples):
    instructions = examples["instruction"]
    outputs = examples["output"]
    texts = []
    for instruction, output in zip(instructions, outputs):
        text = story_prompt.format(instruction=instruction, output=output)
        texts.append(text)
    return {"text": texts}

train_dataset = train_dataset.map(formatting_func, batched=True)
eval_dataset = eval_dataset.map(formatting_func, batched=True)

# Preview a formatted sample
print("--- Formatted sample ---")
print(train_dataset[0]["text"][:500])

## 5. Train

**Training tips for creative writing:**
- Lower learning rate (1e-4) prevents catastrophic forgetting of the model's existing writing ability
- 1-2 epochs is usually enough - more epochs increases repetition
- Early stopping prevents overfitting

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments, EarlyStoppingCallback

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        output_dir="checkpoints_nemo_english",
        num_train_epochs=2,                         # 1-2 epochs for creative writing
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_ratio=0.05,
        learning_rate=1e-4,                         # Lower LR preserves base writing ability
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=10,
        eval_strategy="steps",
        eval_steps=50,
        save_strategy="steps",
        save_steps=50,
        save_total_limit=3,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        greater_is_better=False,
        seed=3407,
    ),
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)

print("Starting training...")
stats = trainer.train()
print(f"Training complete! Total steps: {stats.global_step}")

## 6. Test Generation

In [None]:
FastLanguageModel.for_inference(model)

test_prompts = [
    "Continue the story: The old lighthouse keeper climbed the spiral stairs one last time. After forty years, tonight would be his final watch.",
    "Write a scene: Two strangers meet in a rain-soaked cafe in Paris. One of them is hiding a secret.",
    "Describe the moment a warrior returns to her village after a decade of war, only to find it completely changed.",
]

for i, prompt in enumerate(test_prompts):
    inputs = tokenizer(
        f"<s>[INST] You are a talented fiction author. Write vivid, engaging prose.\n\n{prompt} [/INST]",
        return_tensors="pt"
    ).to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.8,
        top_p=0.9,
        top_k=50,
        do_sample=True,
        repetition_penalty=1.1,  # Helps prevent repetitive prose
    )
    response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)

    print(f"\n{'='*60}")
    print(f"Prompt {i+1}: {prompt}")
    print(f"{'='*60}")
    print(response)
    print(f"\n[{len(response)} chars generated]")

## 7. Save Model

In [None]:
# Save LoRA adapters
model.save_pretrained("nemo_english_story_lora")
tokenizer.save_pretrained("nemo_english_story_lora")
print("LoRA adapters saved to nemo_english_story_lora/")

# Download as zip
!zip -r nemo_english_story_lora.zip nemo_english_story_lora/
from google.colab import files
files.download("nemo_english_story_lora.zip")

## 8. (Optional) Save to Google Drive

In [None]:
# Uncomment to save to Google Drive instead
# from google.colab import drive
# drive.mount('/content/drive')
# !cp -r nemo_english_story_lora /content/drive/MyDrive/
# print("Saved to Google Drive!")

## 9. (Optional) Export to GGUF for Local Use

Export the model to GGUF format for running locally with llama.cpp or Ollama.

In [None]:
# Uncomment to export to GGUF (takes ~10 min)
# model.save_pretrained_gguf(
#     "nemo_english_story_gguf",
#     tokenizer,
#     quantization_method="q4_k_m",  # Good balance of quality and size
# )
# files.download("nemo_english_story_gguf/unsloth.Q4_K_M.gguf")
# print("GGUF exported! Run locally with: ollama run ./unsloth.Q4_K_M.gguf")