# Training Color Palette ControlNet on ArtCap Dataset

**Goal:** Fine-tune a ControlNet model to condition on **5-color horizontal stripe palettes** extracted from artworks.  
**Base Model:** Stable Diffusion v1.5  
**Dataset:** `SaFFire/artcap-color-palette-controlnet` (512×512 images + palettes + captions)  
**Hardware:** 2× GPU (multi-GPU setup via Accelerate)

This notebook:
1. Sets up a clean Python 3.10 virtual environment  
2. Installs compatible versions of PyTorch, Diffusers, etc.  
3. Clones the exact Diffusers version used for training  
4. Configures Accelerate for multi-GPU  
5. Launches the official ControlNet training script

In [None]:
# ==========================================================
# CELL 1: Create Python 3.10 Virtual Environment
# ==========================================================
# Kaggle/Colab often have outdated base Python — we force 3.10 for compatibility

!sudo apt-get update -y
!sudo apt-get install python3.10 python3.10-distutils python3.10-venv -y

!python3.10 -m venv /content/py310
!/content/py310/bin/pip install --upgrade pip

Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:3 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:5 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:6 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ Packages [83.6 kB]
Get:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:8 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [2,204 kB]
Get:9 https://r2u.stat.illinois.edu/ubuntu jammy/main amd64 Packages [2,849 kB]
Get:10 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease [18.1 kB]
Get:11 https://r2u.stat.illinois.edu/ubuntu jammy/main all Packages [9,539 kB]
Hit:12 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Get:13 http://

### Install Dependencies in the Virtual Environment

In [None]:
# ==========================================================
# CELL 2: Install All Required Libraries
# ==========================================================
# We use a pinned set of versions for stability with ControlNet training

# Install PyTorch with CUDA 11.8 (matches most Kaggle/Colab GPUs)
!/content/py310/bin/pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 --index-url https://download.pytorch.org/whl/cu118

# Core Diffusers & training ecosystem
!/content/py310/bin/pip install diffusers==0.20.2
!/content/py310/bin/pip install transformers==4.33.0
!/content/py310/bin/pip install accelerate==0.23.0
!/content/py310/bin/pip install peft==0.6.0
!/content/py310/bin/pip install datasets==2.14.5
!/content/py310/bin/pip install xformers==0.0.22.post7
!/content/py310/bin/pip install huggingface_hub==0.16.4

# Utility libraries
!/content/py310/bin/pip install kagglehub matplotlib pillow

# Fix numpy <2 and compatible pandas/pyarrow versions
# (many Diffusers scripts break with numpy 2.0+)
PY="/content/py310/bin/python3.10"
!$PY -m pip install "numpy<2" --force-reinstall
!$PY -m pip install --upgrade pyarrow==14.0.2 pandas==1.5.3

print("✅ All dependencies installed in Python 3.10 environment")

Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting torch==2.1.0+cu118
  Downloading https://download.pytorch.org/whl/cu118/torch-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl (2325.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 GB[0m [31m10.1 MB/s[0m  [33m0:00:31[0m
[?25hCollecting torchvision==0.16.0+cu118
  Downloading https://download.pytorch.org/whl/cu118/torchvision-0.16.0%2Bcu118-cp310-cp310-linux_x86_64.whl (6.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.2/6.2 MB[0m [31m127.6 MB/s[0m  [33m0:00:00[0m
[?25hCollecting filelock (from torch==2.1.0+cu118)
  Downloading https://download.pytorch.org/whl/filelock-3.19.1-py3-none-any.whl.metadata (2.1 kB)
Collecting typing-extensions (from torch==2.1.0+cu118)
  Downloading https://download.pytorch.org/whl/typing_extensions-4.15.0-py3-none-any.whl.metadata (3.3 kB)
Collecting sympy (from torch==2.1.0+cu118)
  Downloading https://download.pytor

### Clone Exact Diffusers Version (v0.20.2)

In [None]:
# ==========================================================
# CELL 3: Clone Diffusers v0.20.2 (exact version used in training)
# ==========================================================
# Ensures we use the same training script version as the paper/examples

!rm -rf diffusers
!git clone --branch v0.20.2 https://github.com/huggingface/diffusers

Cloning into 'diffusers'...
remote: Enumerating objects: 114942, done.[K
remote: Counting objects: 100% (447/447), done.[K
remote: Compressing objects: 100% (241/241), done.[K
remote: Total 114942 (delta 339), reused 213 (delta 205), pack-reused 114495 (from 4)[K
Receiving objects: 100% (114942/114942), 88.26 MiB | 33.30 MiB/s, done.
Resolving deltas: 100% (85808/85808), done.
Note: switching to '6fc8aff521590418576b698a8be1d276018367da'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false



### Configure Accelerate with default settings

In [None]:
# ==========================================================
# CELL 4: Configure Accelerate 
# ==========================================================
# Run default config for Accelerate in the new Python environment
!/content/py310/bin/accelerate config default

accelerate configuration saved at /root/.cache/huggingface/accelerate/default_config.yaml


### Launch ControlNet Training

In [None]:
# ==========================================================
# CELL 5: Launch ControlNet Training (Multi-GPU)
# ==========================================================
# Using the official Diffusers train_controlnet.py script

!/content/py310/bin/accelerate launch \
  --multi_gpu \
  --num_processes=2 \          # Number of GPUs (adjust if needed)
  diffusers/examples/controlnet/train_controlnet.py \
  --pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5 \
  --output_dir=/kaggle/working/color-palette \           # Where checkpoints will be saved
  --train_data_dir=SaFFire/artcap-color-palette-controlnet\    \ # HF dataset ID
  --conditioning_image_column=conditioning_image \
  --image_column=image \
  --caption_column=prompt \
  --resolution=512 \
  --train_batch_size=4 \
  --gradient_accumulation_steps=2 \   # Effective batch size = 4 × 2 × 2 = 16
  --learning_rate=5e-5 \
  --lr_scheduler=cosine \
  --lr_warmup_steps=500 \
  --mixed_precision=fp16 \
  --enable_xformers_memory_efficient_attention \
  --gradient_checkpointing \
  --checkpointing_steps=2000 \
  --checkpoints_total_limit=2 \   # Keep only last 2 checkpoints as kaggle has limited disk space
  --max_train_steps=4000 \
  --proportion_empty_prompts=0.5 \ # 50% chance of empty prompt (good for generalization)
  --seed=42

# After training:
# Checkpoints will be saved in /kaggle/working/color-palette
# You can download them or push to Hugging Face manually

12/15/2025 11:18:03 - INFO - __main__ - Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 2
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: fp16

12/15/2025 11:18:03 - INFO - __main__ - Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 2
Process index: 1
Local process index: 1
Device: cuda:1

Mixed precision type: fp16

Downloading tokenizer_config.json: 100%|███████| 806/806 [00:00<00:00, 4.62MB/s]
Downloading vocab.json: 1.06MB [00:00, 28.5MB/s]
Downloading merges.txt: 525kB [00:00, 66.1MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████| 472/472 [00:00<00:00, 3.32MB/s]
Downloading config.json: 100%|█████████████████| 617/617 [00:00<00:00, 4.22MB/s]
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Downloading scheduler_config.json: 100%|███████| 308/308 [00:00<00:00, 1.46MB/s]
{'sample_max_value