## Workspace Setup

To begin, we need to install the necessary packages for the project. Execute the code in the following cell in your terminal (not in a Jupyter cell) to install all required dependencies. After that, use the second code cell to log in with your Hugging Face credentials, as the models we will use are hosted on Hugging Face.

### Copy and paste the following commands into your terminal:


In [None]:
!git clone https://github.com/ostris/ai-toolkit.git
%cd ai-toolkit

In [None]:
!pip3 install -r requirements.txt

In [None]:

!pip install peft

In [None]:
!pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

In [None]:
!huggingface-cli login

In [None]:
!git submodule update --init --recursive



## Image Captioning / Labelling

Once the environment is set up, we will caption our images. You need to upload your images into a single directory. Specify the directory name in the variable `your_dir` in the next code cell. Running the cells will automatically generate labeled text files for each image.

## Install Additional Packages

Before proceeding, ensure you have the following packages installed:


In [None]:

!pip install -U transformers
!pip install einops


In [None]:
from unittest.mock import patch
from transformers.dynamic_module_utils import get_imports
import os
def fixed_get_imports(filename: str | os.PathLike) -> list[str]:
    if not str(filename).endswith("modeling_florence2.py"):
        return get_imports(filename)
    imports = get_imports(filename)
    imports.remove("flash_attn")
    return imports

In [None]:
!pip install -U oyaml transformers einops albumentations python-dotenv

import requests
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForCausalLM
import os

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32



model_id = 'microsoft/Florence-2-large'
with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports): #workaround for unnecessary flash_attn requirement
            model = AutoModelForCausalLM.from_pretrained(model_id, attn_implementation="sdpa", torch_dtype='auto',trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model.to(device)

your_dir = 'fold'


prompt = "<MORE_DETAILED_CAPTION>"
for i in os.listdir(f'{your_dir}'):
    if i.split('.')[-1]=='txt':
        continue
    image = Image.open(f'{your_dir}/'+i)

    inputs = processor(text=prompt, images=image, return_tensors="pt").to(device, torch_dtype)

    generated_ids = model.generate(
      input_ids=inputs["input_ids"],
      pixel_values=inputs["pixel_values"],
      max_new_tokens=1024,
      num_beams=3,
      do_sample=False
    )
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]

    parsed_answer = processor.post_process_generation(generated_text, task="<MORE_DETAILED_CAPTION>", image_size=(image.width, image.height))
    # print(parsed_answer)
    with open(f'{your_dir}/'+f"{i.split('.')[0]}.txt", "w") as f:
        f.write(parsed_answer["<MORE_DETAILED_CAPTION>"])
        f.close()



## Organizing Your Image Folder

Ensure your image folder is structured as follows:

Your Image Directory
│
├── img1.png
├── img1.txt
├── img2.png
├── img2.txt
└──

Make sure that each image has a corresponding text file with the same name.



## Configure the YAML File for Training

We will use the `config/examples/train_lora_flux_24gb.yaml` file to configure the model training. Important lines to edit include:

- **Line 5**: Change the name of the model.
- **Line 30**: Specify the path to your image directory.
- **Lines 69-70**: Adjust the height and width to match your training images.

You may also want to modify the prompts to better suit your training data. Adjust the batch size and gradient accumulation steps for optimal training performance.


In [18]:
your_dir = '/content/Olentzero'
lora_name = 'olentzero'
batch_size = 1
training_steps = 500
name_or_path = "black-forest-labs/FLUX.1-schnell"
width = 1024
height = 1024
sample_steps = 20

yaml = f"""---
job: extension
config:
  # this name will be the folder and filename name
  name: {lora_name}
  process:
    - type: 'sd_trainer'
      # root folder to save training sessions/samples/weights
      training_folder: "output"
      # uncomment to see performance stats in the terminal every N steps
#      performance_log_every: 1000
      device: cuda:0
      # if a trigger word is specified, it will be added to captions of training data if it does not already exist
      # alternatively, in your captions you can add [trigger] and it will be replaced with the trigger word
#      trigger_word: "p3r5on"
      network:
        type: "lora"
        linear: 16
        linear_alpha: 16
      save:
        dtype: float16 # precision to save
        save_every: 250 # save every this many steps
        max_step_saves_to_keep: 4 # how many intermittent saves to keep
      datasets:
        # datasets are a folder of images. captions need to be txt files with the same name as the image
        # for instance image2.jpg and image2.txt. Only jpg, jpeg, and png are supported currently
        # images will automatically be resized and bucketed into the resolution specified
        # on windows, escape back slashes with another backslash so
        # "C:\\path\\to\\images\\folder"
        - folder_path: {your_dir}
          caption_ext: "txt"
          caption_dropout_rate: 0.05  # will drop out the caption 5% of time
          shuffle_tokens: false  # shuffle caption order, split by commas
          cache_latents_to_disk: true  # leave this true unless you know what you're doing
          resolution: [1024]  # flux enjoys multiple resolutions
      train:
        batch_size: {batch_size}
        steps: {training_steps}  # total number of steps to train 500 - 4000 is a good range
        gradient_accumulation_steps: 1
        train_unet: true
        train_text_encoder: false  # probably won't work with flux
        gradient_checkpointing: true  # need the on unless you have a ton of vram
        noise_scheduler: "flowmatch" # for training only
        optimizer: "adamw8bit"
        lr: 1e-4
        # uncomment this to skip the pre training sample
#        skip_first_sample: true
        # uncomment to completely disable sampling
#        disable_sampling: true
        # uncomment to use new vell curved weighting. Experimental but may produce better results
        linear_timesteps: true

        # ema will smooth out learning, but could slow it down. Recommended to leave on.
        ema_config:
          use_ema: true
          ema_decay: 0.99

        # will probably need this if gpu supports it for flux, other dtypes may not work correctly
        dtype: bf16
      model:
        # huggingface model name or path
        name_or_path: {name_or_path}
        is_flux: true
        quantize: true  # run 8bit mixed precision
#        low_vram: true  # uncomment this if the GPU is connected to your monitors. It will use less vram to quantize, but is slower.
      sample:
        sampler: "flowmatch" # must match train.noise_scheduler
        sample_every: 250 # sample every this many steps
        width: {width}
        height: {height}
        prompts:
          - "Olentzero"
          - "A drawing of olentzero"
          - "Fat Olentzero"
          - "A car drived for Olentzero"
        neg: ""  # not used on flux
        seed: 42
        walk_seed: true
        guidance_scale: 4
        sample_steps: {sample_steps}
# you can add any additional meta info here. [name] is replaced with config name at top
meta:
  name: "[name]"
  version: '1.0'
"""

with open('/content/mylora.yaml', 'w') as file:
    file.write(yaml)
    file.close()

## Training the Model

Once everything is configured, you can train your LoRA model. Use the following command in your terminal:

In [19]:
 !python3 /content/ai-toolkit/run.py /content/mylora.yaml

Running 1 job
2024-12-05 11:50:43.788856: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-05 11:50:43.824246: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-05 11:50:43.834738: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-05 11:50:43.856887: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
  check_for_updates()
  return register

## Running Inference with the New Model ( FLUX.1 LoRA)

After training, you can run inference to generate images based on the trained model. Ensure you have the necessary packages installed:


In [None]:
!pip install -U diffusers accelerate transformers


Then, use the following code to load the model and generate an image:

In [None]:
import torch
from diffusers import DiffusionPipeline

model_id = 'black-forest-labs/FLUX.1-schnell'
adapter_id = f'output/{lora_name}/{lora_name}.safetensors'
pipeline = DiffusionPipeline.from_pretrained(model_id)
pipeline.load_lora_weights(adapter_id)

prompt = "Olentzero drinking orange juice"
negative_prompt = "blurry, cropped, ugly"

pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
image = pipeline(
    prompt=prompt,
    num_inference_steps=50,
    generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(1641421826),
    width=1152,
    height=768,
).images[0]
# image.save("output.png", format="PNG")



Finally, display the generated image:

In [None]:
display(image)