# Simpsonify LoRA Training (Kohya_ss)

**Professional cartoon-style LoRA training using Kohya's training scripts**

This notebook trains a LoRA model to transform photos into cartoon-style images.

---

## Configuration Summary

- **Base Model**: Stable Diffusion v1.5
- **Training Method**: Kohya sd-scripts with LoRA
- **LoRA Rank**: 16, Alpha: 8-16
- **Resolution**: 512x512
- **Epochs**: 10-12
- **Expected Duration**: ~60-100 minutes on T4 GPU

---

## 1. Setup Environment

In [None]:
# Check GPU
import torch
print("torch:", torch.__version__)
print("cuda:", torch.cuda.is_available())
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

In [None]:
# Install/upgrade PyTorch (optional - if needed)
# Uncomment if you need a specific PyTorch version

# !pip -q uninstall -y torch torchvision torchaudio
# !pip -q install torch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 --index-url https://download.pytorch.org/whl/cu118

## 2. Clone Kohya Training Scripts

In [None]:
# Clone Kohya sd-scripts repository
!cd /content
!rm -rf kohya_ss
!git clone --recurse-submodules https://github.com/bmaltais/kohya_ss.git

print("‚úì Kohya scripts cloned")

In [None]:
# Install required dependencies
!pip install -U pip
!pip install -q accelerate diffusers transformers safetensors einops tqdm pillow voluptuous

# Check accelerate version
import accelerate
print("accelerate:", accelerate.__version__)

## 3. Upload and Prepare Dataset

In [None]:
# Upload your dataset ZIP file
from google.colab import files
uploaded = files.upload()

print("‚úì File uploaded")

In [None]:
# Extract dataset
import zipfile
import os

# Find the uploaded zip file
zip_file = [f for f in uploaded.keys() if f.endswith('.zip')][0]

# Extract to /content/dataset/train/
with zipfile.ZipFile(zip_file, 'r') as zip_ref:
    zip_ref.extractall('/content/dataset/train')

print("‚úì Dataset extracted to /content/dataset/train")
!ls -la /content/dataset/train

In [None]:
# Generate captions for all images
from pathlib import Path

# Adjust this path to match your extracted dataset structure
# If images are in a subfolder, update this:
img_dir = Path("/content/dataset/train/20_simpsons")  # Adjust if needed

# If images are directly in /content/dataset/train:
# img_dir = Path("/content/dataset/train")

imgs = sorted(p for p in img_dir.iterdir() if p.suffix.lower() in [".png", ".jpg", ".jpeg"])

# Caption text - customize this based on your style!
# For Simpsons style:
caption = "simpsons_style, cartoon, animated"

# For general cartoon style:
# caption = "cartoonify, 2D cartoon, flat colors, simple shapes"

# Create .txt file for each image
for p in imgs:
    p.with_suffix(".txt").write_text(caption, encoding="utf-8")

print(f"‚úì Created captions for {len(imgs)} images")
print(f"Sample caption: {imgs[0].with_suffix('.txt').read_text()}")

## 4. Configure Training Parameters

**Two training configurations are provided:**
- **Config A**: Simpsons Style (original settings)
- **Config B**: Cartoonify Style (optimized settings)

Choose one by running the corresponding cell below.

In [None]:
# === CONFIG A: Simpsons Style (Basic) ===

CONFIG = {
    "output_name": "simpsons_style_lora",
    "network_dim": 16,
    "network_alpha": 16,
    "max_train_epochs": 12,
    "learning_rate": "1e-4",
    "unet_lr": "1e-4",
    "text_encoder_lr": "5e-5",
    "lr_scheduler": "constant",
    "optimizer_type": "AdamW",
    "save_every_n_epochs": 1,
    "additional_args": ""
}

print("‚úì Config A (Simpsons Style) loaded")
print(f"Training for {CONFIG['max_train_epochs']} epochs")
print(f"LoRA rank: {CONFIG['network_dim']}, alpha: {CONFIG['network_alpha']}")

In [None]:
# === CONFIG B: Cartoonify Style (Advanced) ===

CONFIG = {
    "output_name": "cartoonify_lora",
    "network_dim": 16,
    "network_alpha": 8,
    "max_train_epochs": 10,
    "learning_rate": "5e-5",
    "unet_lr": "5e-5",
    "text_encoder_lr": "5e-5",
    "lr_scheduler": "cosine_with_restarts",
    "lr_scheduler_num_cycles": 3,
    "lr_warmup_steps": 50,
    "optimizer_type": "AdamW8bit",
    "save_every_n_epochs": 2,
    "additional_args": """
  --xformers \
  --cache_latents \
  --cache_latents_to_disk \
  --noise_offset=0.05 \
  --adaptive_noise_scale=0.00357 \
  --min_snr_gamma=5 \
  --max_grad_norm=1.0
    """
}

print("‚úì Config B (Cartoonify - Advanced) loaded")
print(f"Training for {CONFIG['max_train_epochs']} epochs")
print(f"LoRA rank: {CONFIG['network_dim']}, alpha: {CONFIG['network_alpha']}")
print("Advanced features: xformers, noise offset, min-SNR gamma")

## 5. Start Training

‚ö†Ô∏è **This will take 60-100 minutes depending on your dataset size and epochs**

In [None]:
# Create output directories
!mkdir -p /content/output /content/logs

# Build training command
base_cmd = f"""
cd /content/kohya_ss/sd-scripts && python train_network.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
  --train_data_dir="/content/dataset/train" \
  --output_dir="/content/output" \
  --output_name="{CONFIG['output_name']}" \
  --save_model_as="safetensors" \
  --caption_extension=".txt" \
  --network_module="networks.lora" \
  --network_dim={CONFIG['network_dim']} \
  --network_alpha={CONFIG['network_alpha']} \
  --resolution=512 \
  --train_batch_size=1 \
  --max_train_epochs={CONFIG['max_train_epochs']} \
  --learning_rate={CONFIG['learning_rate']} \
  --unet_lr={CONFIG['unet_lr']} \
  --text_encoder_lr={CONFIG['text_encoder_lr']} \
  --lr_scheduler="{CONFIG['lr_scheduler']}" \
  --mixed_precision="fp16" \
  --save_precision="fp16" \
  --optimizer_type="{CONFIG['optimizer_type']}" \
  --gradient_checkpointing \
  --max_data_loader_n_workers=2 \
  --save_every_n_epochs={CONFIG['save_every_n_epochs']} \
  --seed=42 \
  --console_log_level="INFO" \
  --console_log_file="/content/logs/train.log"
"""

# Add scheduler-specific params if using cosine_with_restarts
if CONFIG['lr_scheduler'] == 'cosine_with_restarts':
    base_cmd += f" --lr_scheduler_num_cycles={CONFIG.get('lr_scheduler_num_cycles', 3)}"
    base_cmd += f" --lr_warmup_steps={CONFIG.get('lr_warmup_steps', 50)}"

# Add additional args
base_cmd += CONFIG.get('additional_args', '')

print("Starting training...\n")
print("Command:")
print(base_cmd)
print("\n" + "="*60 + "\n")

# Run training
!{base_cmd}

## 6. Check Training Log

In [None]:
# View last 50 lines of training log
!tail -50 /content/logs/train.log

In [None]:
# List all generated checkpoints
!ls -lh /content/output/*.safetensors

## 7. Test Trained Models

Generate test images with different checkpoints to compare quality.

In [None]:
import torch
from diffusers import StableDiffusionPipeline
from pathlib import Path
from IPython.display import display

# Load base pipeline
BASE = "runwayml/stable-diffusion-v1-5"
OUTDIR = Path("/content/test_outputs")
OUTDIR.mkdir(parents=True, exist_ok=True)

pipe = StableDiffusionPipeline.from_pretrained(
    BASE,
    torch_dtype=torch.float16,
    safety_checker=None,
).to("cuda")
pipe.enable_attention_slicing()

print("‚úì Pipeline loaded")

In [None]:
def run_test(lora_path: str, tag: str, prompts, seed=42, steps=25, cfg=7.0):
    """Test a LoRA checkpoint with multiple prompts"""
    
    # Load LoRA
    pipe.load_lora_weights(lora_path)
    
    # Optional: adjust LoRA strength
    try:
        pipe.set_adapters(["default"], adapter_weights=[1.0])
    except Exception:
        pass

    g = torch.Generator("cuda").manual_seed(seed)

    images = []
    for i, p in enumerate(prompts, 1):
        img = pipe(
            prompt=p,
            negative_prompt="blurry, deformed, extra fingers, watermark, text",
            num_inference_steps=steps,
            guidance_scale=cfg,
            generator=g,
            height=512,
            width=512,
        ).images[0]
        
        fp = OUTDIR / f"{tag}_p{i}.png"
        img.save(fp)
        images.append(img)
        display(img)

    # Unload LoRA
    try:
        pipe.unload_lora_weights()
    except Exception:
        pass
    
    return images

In [None]:
# Test prompts - customize based on your trigger word!

# For simpsons_style:
prompts = [
    "portrait photo of a person, simpsons_style, cartoon, animated",
    "full body, outdoor street scene, simpsons_style, cartoon, animated",
    "group of people, living room, simpsons_style, cartoon, animated",
]

# For cartoonify:
# prompts = [
#     "portrait of a woman, cartoonify, 2D cartoon, flat colors",
#     "portrait of a man with beard, cartoonify, simple shapes, bold outline",
#     "smiling person, cartoonify, animated style",
# ]

In [None]:
# Test multiple checkpoints
# Adjust epoch numbers based on your CONFIG['save_every_n_epochs']

output_name = CONFIG['output_name']

lora_files = {
    "e06": f"/content/output/{output_name}-000006.safetensors",
    "e08": f"/content/output/{output_name}-000008.safetensors",
    "e10": f"/content/output/{output_name}-000010.safetensors",
    "final": f"/content/output/{output_name}.safetensors",
}

# Test each checkpoint
for tag, path in lora_files.items():
    if Path(path).exists():
        print(f"\n{'='*60}")
        print(f"Testing: {tag}")
        print(f"{'='*60}\n")
        run_test(path, tag, prompts)
    else:
        print(f"‚ö† Checkpoint not found: {path}")

print("\n‚úì All tests complete!")
print(f"Images saved to: {OUTDIR}")

## 8. Compare Checkpoints Side-by-Side

In [None]:
from PIL import Image

# Quick comparison between two checkpoints
prompt = "portrait of a woman, cartoonify, 2D cartoon, flat colors"
epochs_to_compare = [6, 8]  # Adjust based on available checkpoints

images = []
for epoch in epochs_to_compare:
    lora_path = f"/content/output/{output_name}-{epoch:06d}.safetensors"
    
    if not Path(lora_path).exists():
        print(f"‚ö† Checkpoint not found: {lora_path}")
        continue
    
    print(f"Testing epoch {epoch}...")
    
    pipe.load_lora_weights(lora_path)
    pipe.fuse_lora()
    
    img = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
    images.append(img)
    
    pipe.unfuse_lora()

# Create comparison grid
if images:
    grid_width = 512 * len(images)
    grid = Image.new('RGB', (grid_width, 512))
    for i, img in enumerate(images):
        grid.paste(img, (512*i, 0))

    grid.save("/content/comparison.png")
    display(grid)
    print("‚úì Comparison saved: comparison.png")
else:
    print("No images generated for comparison")

## 9. Download Trained Models

In [None]:
from google.colab import files
import os

# List all available models
print("Available models:\n")
!ls -lh /content/output/*.safetensors

# Download the final model
final_model = f"/content/output/{CONFIG['output_name']}.safetensors"
if os.path.exists(final_model):
    print(f"\nDownloading: {final_model}")
    files.download(final_model)
    print("‚úì Download started!")
else:
    print(f"‚ö† Model not found: {final_model}")

In [None]:
# Download a specific checkpoint
checkpoint_epoch = 8  # Change this to download different epochs

checkpoint_file = f"/content/output/{CONFIG['output_name']}-{checkpoint_epoch:06d}.safetensors"

if os.path.exists(checkpoint_file):
    print(f"Downloading checkpoint from epoch {checkpoint_epoch}...")
    files.download(checkpoint_file)
    print("‚úì Download started!")
else:
    print(f"‚ö† Checkpoint not found: {checkpoint_file}")

---

## Next Steps

1. **Download** your `.safetensors` file(s)
2. **Choose best checkpoint** based on test images
3. **Rename** (e.g., `my_cartoon_style.safetensors`)
4. **Copy** to your project:
   ```bash
   cp my_cartoon_style.safetensors /path/to/simpsonify/backend/models/
   ```
5. **Update** `backend/.env`:
   ```
   SD_LORA_PATH=/path/to/backend/models/my_cartoon_style.safetensors
   ```
6. **Restart** backend and test!

---

## Training Tips

### Loss Analysis
- **Good training**: Loss decreases steadily (0.15 ‚Üí 0.04 ‚Üí 0.02)
- **Overfitting**: Loss plateaus early and doesn't improve
- **Optimal checkpoint**: Usually 60-80% through training

### Configuration Guide

**For stronger style transfer:**
- Increase `network_alpha` (16 ‚Üí 32)
- More epochs (12 ‚Üí 15)
- Lower learning rate (1e-4 ‚Üí 5e-5)

**For subtle effects:**
- Lower `network_alpha` (16 ‚Üí 8)
- Fewer epochs (12 ‚Üí 8)
- Higher learning rate

---

**Happy Training! üé®**