<h1>Fine-tuning script for Stable Diffusion for text2image using LoRA.</h1>

This notebook demonstrates how to fine-tune Stable Diffusion using LoRA (Low-Rank Adaptation) to generate images in a specific style or domain. The training process works by taking your custom dataset of images with captions, adding random noise to the images, and teaching the model to predict and remove that noise. By learning to "denoise" images from your dataset, the model learns the visual patterns and style of your data.


The LoRA technique allows this learning to happen efficiently by adding small trainable adapter layers to the existing Stable Diffusion model, rather than retraining the entire model from scratch. After training, you'll have a lightweight LoRA adapter that can be loaded into any Stable Diffusion pipeline to generate new images that match the style of your training data.


Let's get started with setting up our environment and fine-tuning process!


In this challenge, you will fine-tune Stable Difusion to generate Image with Studio Ghibli style using LoRA.
We will use the dataset from AtlasIA, which contains images of moroccan culture with Studio Ghibli Style.

First, let's get the training file:

In [None]:
!wget https://raw.githubusercontent.com/ybendou/hackai-2025/main/py/train_text_to_image_lora.py

In [None]:
#@title Some installs

!pip install -U datasets

## Before fine-tuning, let's connect to HuggingFace and Wandb

In [None]:
!huggingface-cli login

In [None]:
!wandb login

<h1>Running the fine-tuning </h1>

You can check the fine-tuning as well as the generation images on your wandb profile.

In [None]:
MODEL_NAME="CompVis/stable-diffusion-v1-4"
DATASET_NAME="atlasia/Ghibli-style-morocco-dataset"
OUTPUT_DIR="sd-ghibli-model-lora"
!mkdir -p OUTPUT_DIR
!mkdir -p "logs"

!python train_text_to_image_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME --caption_column="text" \
  --resolution=256 --random_flip \
  --train_batch_size=1 \
  --num_train_epochs=100 --checkpointing_steps=5000 \
  --learning_rate=1e-04 --lr_scheduler="constant" --rank 128\
  --seed=42 \
  --output_dir="sd-ghibli-model-lora" \
  --validation_prompt="Craftsman, Moroccan Ghibli studio style" --report_to="wandb"



<h1>Exercise</h1>

Run the fine-tuning and check the generated images at different epochs.


Go to train_text_to_image_lora.py, read the code and answer:

- What is the model trying to predict as its target?
- Why do you think training a model to 'denoise' images would help it generate new images?

