# Goal of this notebook

# Important thing before we start

**For the fine-tuning part, I used the train_text_to_image_lora.py file. Meanwhile, I did not write this script by myself, it was taken from the https://huggingface.co/blog/lora reference. Hence, the file cannot be found in this repository. Please refer to the mentioned link to get access to this file.**

### Load all the necessary libraries

In [16]:
#! pip install git+https://github.com/huggingface/diffusers.git

In [17]:
#! pip install accelerate diffusers transformers datasets

### Setting up the fine-tuning (see references)

In [3]:
import os

os.environ["MODEL_NAME"] = "CompVis/stable-diffusion-v1-4"
os.environ["DATASET_NAME"] = "AdamLucek/oldbookillustrations-small"

In [4]:
! mkdir fine_tuned_model

In [5]:
! accelerate launch train_text_to_image_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME \
  --dataloader_num_workers=8 \
  --resolution=512 --center_crop --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=2501 \
  --learning_rate=3e-04 \
  --max_grad_norm=1 \
  --lr_scheduler="cosine" --lr_warmup_steps=0 \
  --output_dir="fine_tuned_model" \
  --mixed_precision="fp16" \
  --checkpointing_steps=500 \
  --seed=1337

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
2025-03-28 11:55:40.885899: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1743162941.165331    1661 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743162941.253917    1661 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-28 11:55:41.848369: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in perf

### Fine-tuning successfully completed

### Inference

In [6]:
import torch
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

# Load the model and scheduler
model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

# Move the pipeline to the GPU
pipe = pipe.to("cuda")

# Define the prompt
prompt = "A futuristic cityscape at sunset with flying cars and neon lights"

# Generate the image
image = pipe(prompt).images[0]

# Save the image
image.save("generated_image_no_fine_tune.png")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model_index.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

Fetching 16 files:   0%|          | 0/16 [00:00<?, ?it/s]

preprocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

scheduler_config-checkpoint.json:   0%|          | 0.00/209 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

In [7]:
model_path = 'fine_tuned_model'
pipe.unet.load_attn_procs(model_path) # by default chooses the latest checkpoint, which is what we need

# Move the pipeline to the GPU
pipe = pipe.to("cuda")

# Define the prompt
prompt = "A futuristic cityscape at sunset with flying cars and neon lights"

# Generate the image
image = pipe(prompt).images[0]

# Save the image
image.save("generated_image_with_fine_tune.png")

  deprecate("load_attn_procs", "0.40.0", deprecation_message)


  0%|          | 0/50 [00:00<?, ?it/s]

### Inference works. Now let's create some prompts and test the old and the new models

### Do the inference. 15 prompts in total. Generate images for each prompt with the original and with the fine-tuned model

In [8]:
! mkdir images_original

In [9]:
! mkdir images_fine_tune

In [1]:
prompts = ['A bald man with a long beard smiles faintly',
           'Vase with foliated ornaments',
           'A big black cat looking at the window',
           'A monkey wearing a feathered hat walks around the countryside',
           'A boy is reading a book in his bedroom',
           'A family is having a dinner',
          'A country cottage',
          'Wildflowers in a meadow',
          'A steam train on a bridge',
          'Children playing in a park',
          'A medieval castle',
          'A botanical illustration of a rose',
          'A map of Europe in the 18th century',
          'A caricature of a politician',
          'A scene from a Shakespeare play']

In [2]:
len(prompts)

15

### Original model

In [12]:
model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

i = 1
for prompt in prompts:
  image = pipe(prompt).images[0]
  image.save(f"images_original/{i}.png")
  i += 1

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

In [13]:
# convert the images to a convenient zip file
! zip -r images_original.zip images_original

  adding: images_original/ (stored 0%)
  adding: images_original/2.png (deflated 0%)
  adding: images_original/5.png (deflated 0%)
  adding: images_original/12.png (deflated 0%)
  adding: images_original/3.png (deflated 0%)
  adding: images_original/11.png (deflated 0%)
  adding: images_original/4.png (deflated 0%)
  adding: images_original/13.png (deflated 0%)
  adding: images_original/8.png (deflated 0%)
  adding: images_original/14.png (deflated 0%)
  adding: images_original/15.png (deflated 0%)
  adding: images_original/9.png (deflated 0%)
  adding: images_original/1.png (deflated 0%)
  adding: images_original/7.png (deflated 0%)
  adding: images_original/6.png (deflated 0%)
  adding: images_original/10.png (deflated 0%)


### Fine-Tuned model

In [14]:
model_path = 'fine_tuned_model'
pipe.unet.load_attn_procs(model_path)
pipe = pipe.to("cuda")

i = 1
for prompt in prompts:
  image = pipe(prompt).images[0]
  image.save(f"images_fine_tune/{i}.png")
  i += 1

  deprecate("load_attn_procs", "0.40.0", deprecation_message)


  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

In [15]:
! zip -r images_fine_tune.zip images_fine_tune

  adding: images_fine_tune/ (stored 0%)
  adding: images_fine_tune/2.png (deflated 0%)
  adding: images_fine_tune/5.png (deflated 0%)
  adding: images_fine_tune/12.png (deflated 1%)
  adding: images_fine_tune/3.png (deflated 0%)
  adding: images_fine_tune/11.png (deflated 0%)
  adding: images_fine_tune/4.png (deflated 0%)
  adding: images_fine_tune/13.png (deflated 0%)
  adding: images_fine_tune/8.png (deflated 0%)
  adding: images_fine_tune/14.png (deflated 0%)
  adding: images_fine_tune/15.png (deflated 0%)
  adding: images_fine_tune/9.png (deflated 0%)
  adding: images_fine_tune/1.png (deflated 0%)
  adding: images_fine_tune/7.png (deflated 0%)
  adding: images_fine_tune/6.png (deflated 0%)
  adding: images_fine_tune/10.png (deflated 0%)


### Please check Results_summary.pdf file to have a look at the generated pictures, as well as my comments

### References:

https://huggingface.co/blog/lora - how to do LoRA on stable diffusion (code and idea were taken from there)

https://huggingface.co/datasets/gigant/oldbookillustrations - original old books dataset

https://huggingface.co/datasets/AdamLucek/oldbookillustrations-small?row=0 - small dataset (the one that I used)