---

📌 **This notebook has been updated in [jhj0517/finetuning-notebooks](https://github.com/jhj0517/finetuning-notebooks) repository!**

## Version : 1.0.6
---

In [None]:
#@title #(Optional) Check GPU

#@markdown To train Flux LoRA, at least 32GB VRAM (A100 in Colab) is recommended.
#@markdown <br>You can check your GPU setup before start.
!nvidia-smi

In [None]:
#@title #1. Install Dependencies
#@markdown This notebook is powered by https://github.com/ostris/ai-toolkit
#@markdown <br>You can ignore the "Restart session" popup after you run the cell.
!git clone -b fix/numpy-error https://github.com/jhj0517/ai-toolkit.git
%cd ai-toolkit
!git submodule update --init --recursive
!pip install -r requirements.txt

# To fix numpy incompatibility from https://github.com/ostris/ai-toolkit/issues/267
import os
!pip install --quiet --force-reinstall --no-deps numpy==1.26.3
os.kill(os.getpid(), 9)


Cloning into 'ai-toolkit'...
remote: Enumerating objects: 5576, done.[K
remote: Counting objects: 100% (373/373), done.[K
remote: Compressing objects: 100% (90/90), done.[K
remote: Total 5576 (delta 328), reused 283 (delta 283), pack-reused 5203 (from 2)[K
Receiving objects: 100% (5576/5576), 30.56 MiB | 10.77 MiB/s, done.
Resolving deltas: 100% (3954/3954), done.
/content/ai-toolkit
Submodule 'repositories/batch_annotator' (https://github.com/ostris/batch-annotator) registered for path 'repositories/batch_annotator'
Submodule 'repositories/ipadapter' (https://github.com/tencent-ailab/IP-Adapter.git) registered for path 'repositories/ipadapter'
Submodule 'repositories/leco' (https://github.com/p1atdev/LECO) registered for path 'repositories/leco'
Submodule 'repositories/sd-scripts' (https://github.com/kohya-ss/sd-scripts.git) registered for path 'repositories/sd-scripts'
Cloning into '/content/ai-toolkit/repositories/batch_annotator'...
Cloning into '/content/ai-toolkit/repositorie

In [1]:
#@title # 2. (Optional) Mount Google Drive

#@markdown It's not mandatory but it's recommended to mount to Google Drive and use the Google Drive's path for your training image dataset.

#@markdown The dataset should have following structure:

#@markdown Each image file should have a corresponding text file (`.txt`) with the same name.
#@markdown The text file contains prompts associated with the image.

#@markdown ### Example File Structure:
#@markdown ```
#@markdown your-dataset/
#@markdown ├── a (1).png         # Image file
#@markdown ├── a (1).txt         # Corresponding prompt for a (1).png
#@markdown ├── a (2).png         # Another image file
#@markdown ├── a (2).txt         # Corresponding prompt for a (2).png
#@markdown ```

from google.colab import drive
import os
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
!ls -l /content/drive/MyDrive/flux_training/


total 1678
-rw------- 1 root root 1717541 Jun 24 19:35 15_ohwxman.zip


In [4]:
!unzip "/content/drive/MyDrive/flux_training/15_ohwxxman.zip" -d "/content/dataset"


Archive:  /content/drive/MyDrive/flux_training/15_ohwxxman.zip
   creating: /content/dataset/15_ohwxman/
  inflating: /content/dataset/15_ohwxman/a (1).jpg  
 extracting: /content/dataset/15_ohwxman/a (1).txt  
  inflating: /content/dataset/15_ohwxman/a (10).jpg  
 extracting: /content/dataset/15_ohwxman/a (10).txt  
  inflating: /content/dataset/15_ohwxman/a (11).jpg  
 extracting: /content/dataset/15_ohwxman/a (11).txt  
  inflating: /content/dataset/15_ohwxman/a (12).jpg  
 extracting: /content/dataset/15_ohwxman/a (12).txt  
  inflating: /content/dataset/15_ohwxman/a (13).jpg  
 extracting: /content/dataset/15_ohwxman/a (13).txt  
  inflating: /content/dataset/15_ohwxman/a (14).jpg  
 extracting: /content/dataset/15_ohwxman/a (14).txt  
  inflating: /content/dataset/15_ohwxman/a (15).jpg  
 extracting: /content/dataset/15_ohwxman/a (15).txt  
  inflating: /content/dataset/15_ohwxman/a (16).jpg  
 extracting: /content/dataset/15_ohwxman/a (16).txt  
  inflating: /content/dataset/15_

In [5]:
#@title # 3. (Optional) Register Huggingface Token To Download Base Model

#@markdown If you don't have entire base model files ([black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)) in the drive you need to sign in to Huggingface to download the model.

#@markdown Get your tokens from https://huggingface.co/settings/tokens, and register it in colab's seceret as **`HF_TOKEN`** and use it in any notebook. ( 'Read' permission is enough )

#@markdown To register secrets in colab, click on the key-shaped icon in the left panel and enter your **`HF_TOKEN`** like this:

#@markdown ![image](https://media.githubusercontent.com/media/jhj0517/finetuning-notebooks/master/docs/screenshots/colab_secrets.png)

import getpass
import os
from google.colab import userdata

hf_token = userdata.get('HF_TOKEN')
os.environ['HF_TOKEN'] = hf_token

print("HF_TOKEN environment variable has been set.")

HF_TOKEN environment variable has been set.


In [10]:
#@title # 4. Train with Parameters
import os
import sys
sys.path.append('/content/ai-toolkit')
from toolkit.job import run_job
from collections import OrderedDict
from PIL import Image
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

#@markdown ## Paths Configuration

#@markdown Set your dataset path and output path for lora here.
DATASET_DIR = "/content/dataset" # @param {type:"string"}
OUTPUT_DIR = '/content/drive/MyDrive/finetuning-notebooks/flux/outputs'  # @param {type:"string"}
LORA_NAME = 'ohwxman_v1'  # @param {type:"string"}
os.makedirs(OUTPUT_DIR, exist_ok=True)

#@markdown ## Base Model Configuration
#@markdown If you'll just use the default repo id here then you need to register huggingface token in the previous section
repo_id_or_path = 'black-forest-labs/FLUX.1-dev' # @param {type:"string"}
quantize = False # @param {type:"boolean"}

#@markdown ## Process Settings
#@markdown (max_step_saves_to_keep = how many checkpoints to keep during training. )
dtype = "bf16" # @param {type:"string"}
save_every = 500 # @param {type:"number"}
max_step_saves_to_keep = 2 # @param {type:"number"}
#@markdown Whenever `sample_every` step, it will make samples to the output directory with prompts below to benchmark your result.
#@markdown <br>Below is the example with the trigger word "A teddy dog". The "trigger word" thing is not necessary.
sample_every = 100 # @param {type:"number"}
sample_seed = 42 # @param {type: "number"}
sample_steps = 20 # @param {type: "number"}
sample_prompt_1 = "ohwxxman, a headshot of a person with a neutral expression, studio lighting" # @param {type: "string"}
sample_prompt_2 = "ohwxxman, a face shot of a person looking playfully, soft lighting, plain background" # @param {type: "string"}
sample_prompt_3 = "ohwxxman, a portrait of a person smiling playfully, sharp focus, lit background" # @param {type: "string"}
### Add `sample_prompts` as much as you need  ###
sample_prompts = [sample_prompt_1, sample_prompt_2, sample_prompt_3]

performance_log_every = 1000 # @param {type:"number"}
#@markdown ### Network
#@markdown You can train only specific layers, `only_if_contains` is enabled when `train_only_specific_layers` is True..
linear = 16 # @param {type:"number"}
linear_alpha = 8 # @param {type:"number"}
## network_kwargs
train_only_specific_layers = False # @param {type:"boolean"}
only_if_contains = ["transformer.single_transformer_blocks.7.proj_out", "transformer.single_transformer_blocks.20.proj_out"] # @param {type: "raw"}

#@markdown ## Dataset Settings
caption_ext = "txt" # @param {type:"string"}
caption_dropout_rate = 0.05 # @param {type:"number"}
shuffle_tokens = False # @param {type:"boolean"}
cache_latents_to_disk = True # @param {type:"boolean"}
resolution = "1024" # @param {type:"string"}
resolution = [int(res.strip()) for res in resolution.split(",")]

#@markdown ## Training Settings
batch_size = 1 # @param {type:"number"}
# Recommended range is 500 ~ 4000
steps = 3150 # @param {type:"number"}
gradient_accumulation_steps = 1 # @param {type:"number"}
train_dtype = "bf16" # @param {type:"string"}
lr = 1e-4 # @param {type:"number"}
train_unet = True # @param {type:"boolean"}
train_text_encoder = False # @param {type:"boolean"}
content_or_style = 'content' # @param ["content", "style", "balanced"]
gradient_checkpointing = True # @param {type:"boolean"}
noise_scheduler = 'flowmatch' # @param {type:"string"}
optimizer = 'adamw8bit' # @param {type:"string"}
# ema settings
use_ema = False # @param {type:"boolean"}
ema_decay = 0.99 # @param {type:"number"}

# Training
job_to_run = OrderedDict([
    ('job', 'extension'),
    ('config', OrderedDict([
        # this name will be the folder and filename name
        ('name', LORA_NAME),
        ('process', [
            OrderedDict([
                ('type', 'sd_trainer'),
                ('training_folder', OUTPUT_DIR),
                ('performance_log_every', 1000),
                ('device', 'cuda:0'),
                ('network', OrderedDict([
                    ('type', 'lora'),
                    ('linear', linear),
                    ('linear_alpha', linear_alpha)
                ])),
                ('save', OrderedDict([
                    ('dtype', dtype),
                    ('save_every', save_every),
                    ('max_step_saves_to_keep', max_step_saves_to_keep)
                ])),
                ('datasets', [
                    OrderedDict([
                        ('folder_path', DATASET_DIR),
                        ('caption_ext', caption_ext),
                        ('caption_dropout_rate', caption_dropout_rate),
                        ('shuffle_tokens', shuffle_tokens),
                        ('cache_latents_to_disk', cache_latents_to_disk),
                        ('resolution', resolution)
                    ])
                ]),
                ('train', OrderedDict([
                    ('batch_size', batch_size),
                    ('steps', steps),
                    ('gradient_accumulation_steps', gradient_accumulation_steps),
                    ('train_unet', train_unet),
                    ('train_text_encoder', train_text_encoder),
                    ('content_or_style', content_or_style),
                    ('gradient_checkpointing', gradient_checkpointing),
                    ('noise_scheduler', noise_scheduler),
                    ('optimizer', optimizer),
                    ('lr', lr),
                    ('ema_config', OrderedDict([
                        ('use_ema', use_ema),
                        ('ema_decay', ema_decay)
                    ])),
                    ('dtype', train_dtype)
                ])),
                ('model', OrderedDict([
                    ('name_or_path', repo_id_or_path),
                    ('is_flux', True),
                    ('quantize', quantize),
                ])),
                ('sample', OrderedDict([
                    ('sampler', 'flowmatch'),
                    ('sample_every', sample_every),
                    ('width', 1024),
                    ('height', 1024),
                    ('prompts', sample_prompts),
                    ('neg', ''),
                    ('seed', sample_seed),
                    ('walk_seed', True),
                    ('guidance_scale', 4),
                    ('sample_steps', sample_steps)
                ]))
            ])
        ])
    ])),
    ('meta', OrderedDict([
        ('name', '[name]'),
        ('version', '1.0')
    ]))
])

# Conditional Parameters
if train_only_specific_layers:
    network = job_to_run["config"]["process"][0]["network"]
    network_kwargs = network.setdefault("network_kwargs", OrderedDict())
    network_kwargs["only_if_contains"] = only_if_contains

run_job(job_to_run)


{
    "type": "sd_trainer",
    "training_folder": "/content/drive/MyDrive/finetuning-notebooks/flux/outputs",
    "performance_log_every": 1000,
    "device": "cuda:0",
    "network": {
        "type": "lora",
        "linear": 16,
        "linear_alpha": 8
    },
    "save": {
        "dtype": "bf16",
        "save_every": 500,
        "max_step_saves_to_keep": 2
    },
    "datasets": [
        {
            "folder_path": "/content/dataset",
            "caption_ext": "txt",
            "caption_dropout_rate": 0.05,
            "shuffle_tokens": false,
            "cache_latents_to_disk": true,
            "resolution": [
                1024
            ]
        }
    ],
    "train": {
        "batch_size": 1,
        "steps": 3150,
        "gradient_accumulation_steps": 1,
        "train_unet": true,
        "train_text_encoder": false,
        "content_or_style": "content",
        "gradient_checkpointing": true,
        "noise_scheduler": "flowmatch",
        "optimizer": "ada

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

OutOfMemoryError: CUDA out of memory. Tried to allocate 72.00 MiB. GPU 0 has a total capacity of 22.16 GiB of which 41.38 MiB is free. Process 12629 has 22.12 GiB memory in use. Of the allocated memory 21.92 GiB is allocated by PyTorch, and 12.76 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [None]:
#@title # 5. (Optional) Test Your LoRA
import torch
from diffusers import FluxPipeline
from numba import cuda
import os

BASE_MODEL_REPO_ID_OR_PATH = "black-forest-labs/FLUX.1-dev" # @param {type: "string"}
PROMPT = "a Teddy dog looks up and says hello world" # @param {type: "string"}
YOUR_LORA_PATH = "/content/drive/MyDrive/finetuning-notebooks/flux/outputs/something/Your-First-Lora-V1.safetensors" # @param {type:"string"}

pipe = FluxPipeline.from_pretrained(BASE_MODEL_REPO_ID_OR_PATH, torch_dtype=torch.bfloat16)
pipe.to("cuda")
pipe.load_lora_weights(YOUR_LORA_PATH)

out = pipe(
    prompt=PROMPT,
    guidance_scale=0.,
    height=1024,
    width=1024,
    num_inference_steps=4,
    max_sequence_length=256,
).images[0]
out.save("image.png")

from IPython.display import display
display(out)