## Using Hugging face DreamBooth

Used to finetune diffusion models with very little images. It works by associating a special word in the prompt with the example images.
https://huggingface.co/docs/diffusers/en/training/dreambooth

- During finetuning, images selected will high impact : if you have some portrait, you prob only will be able to generate portraits. It will also prob degrade perf on other tasks
- Dreambooth tries to reduce this issue by associating the finetuning target with a new word

- Method :
    - Finetune model to recognize rarely used token, like the letter [v]
    - Pair rare token with commonly used token, like man -> new classed called "[v] man"
    - Prior Preservation Loss : penalize drifting by creating images to compare to (in this case, just "a photo of man"). We use this as a loss to check performance doesn't deteriorate or the model doesn't drift too much when we used the new class

The actual finetuning is done with LORA
- not updating the weights themselves, but trains new smaller set of weights (you then reconstruct original weight size via matrix multiplication, A*B = W but nparam(A)+nparam(B) << nparam(W)) 

In [9]:
import torch
from utils.finetuning_utils import DreamBoothTrainer

In [14]:
if torch.cuda.is_available():
    model_name = 'stabilityai/stable-diffusion-xl-base-1.0'
else:
    model_name = 'runwayml/stable-diffusion-v1-5'

In [18]:
# Define hyperparameters
hyperparameters = {
    "instance_prompt": "a photo of a [V] man",
    "class_prompt": "a photo of a man",
    "seed": 4329,
    "pretrained_model_name_or_path": model_name,
    "resolution": 1024 if torch.cuda.is_available() else 512, # depending on model, sizes are different
    "num_inference_steps": 50,
    "guidance_scale": 5.0,
    "num_class_images": 200,
    "prior_loss_weight": 1.0
}

In [19]:
trainer = DreamBoothTrainer(hyperparameters)

10/25/2024 16:36:12 - INFO - utils.finetuning_utils - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: mps

Mixed precision type: no



In [20]:
# To run the training pipeline
trainer.generate_class_images()

{'requires_safety_checker', 'image_encoder'} was not found in config. Values will be initialized to default values.


Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of runwayml/stable-diffusion-v1-5.
{'mid_block_type', 'num_attention_heads', 'use_linear_projection', 'addition_embed_type', 'transformer_layers_per_block', 'addition_time_embed_dim', 'projection_class_embeddings_input_dim', 'timestep_post_act', 'time_embedding_act_fn', 'reverse_transformer_layers_per_block', 'time_embedding_type', 'dropout', 'conv_in_kernel', 'encoder_hid_dim_type', 'addition_embed_type_num_heads', 'resnet_skip_time_act', 'upcast_attention', 'time_embedding_dim', 'resnet_time_scale_shift', 'dual_cross_attention', 'class_embeddings_concat', 'num_class_embeds', 'class_embed_type', 'attention_type', 'cross_attention_norm', 'encoder_hid_dim', 'only_cross_attention', 'time_cond_proj_dim', 'mid_block_only_cross_attention', 'conv_out_kernel', 'resnet_out_scale_factor'} was not found in config. Values will be initialized to default values.
Loaded unet as UNet2DConditionModel from `unet` subfolder of runwayml/stable-

Generating class images:   0%|          | 0/50 [00:00<?, ?it/s]

RuntimeError: MPS backend out of memory (MPS allocated: 14.16 GB, other allocations: 252.47 MB, max allowed: 18.13 GB). Tried to allocate 4.00 GB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

In [None]:
# intialize model
tokenizer, text_encoder, vae, unet = trainer.initialize_models()

In [None]:
# Add noise to generate images in Stable Diffusion
from diffusers import DDPMScheduler

noise_scheduler = DDPMScheduler.from_pretrained(
    trainer.hyperparameters.pretrained_model_name_or_path,
    subfolder="scheduler"
)

In [None]:
unet = trainer.initialize_lora(unet)

In [None]:
optimizer, params_to_optimize = trainer.initialize_optimizer(unet)

In [None]:
# Initialize the datasets
train_dataset, train_dataloader = trainer.prepare_dataset(tokenizer, text_encoder)
lr_scheduler = trainer.initialize_scheduler(train_dataloader, optimizer)

In [6]:
unet, optimizer, train_dataloader, lr_scheduler = trainer.accelerator.prepare(
    unet, optimizer, train_dataloader, lr_scheduler)

total_batch_size = \
    trainer.hyperparameters.train_batch_size * \
    trainer.hyperparameters.gradient_accumulation_steps

NameError: name 'trainer' is not defined

In [None]:
for epoch in range(0, trainer.hyperparameters.num_train_epochs):
    unet.train()

    for step, batch in enumerate(train_dataloader):
        with trainer.accelerator.accumulate(unet):
            pixel_values = batch["pixel_values"].to(dtype=vae.dtype)
            model_input = vae.encode(pixel_values).latent_dist.sample()
            model_input = model_input * vae.config.scaling_factor

            noise = torch.randn_like(model_input)
            bsz, channels, height, width = model_input.shape

            timesteps = torch.randint(
                0,
                noise_scheduler.config.num_train_timesteps,
                (bsz,),
                device=model_input.device
            )

            timesteps = timesteps.long()
            noisy_model_input = noise_scheduler.add_noise(
                model_input,
                noise,
                timesteps
            )

            encoder_hidden_states = batch["input_ids"]

            model_predict = unet(
                noisy_model_input,
                timesteps,
                encoder_hidden_states,
                return_dict=False,
            )[0]

            target = noise

            model_pred, model_pred_prior = torch.chunk(model_pred, 2, dim=0)
            target, target_prior = torch.chunk(target, 2, dim=0)

            instance_loss = \
                F.mse_loss(
                    model_pred.float(),
                    target.float(),
                    reduction="mean"
                )
            
            prior_loss = \
                F.mse_loss(
                    model_pred_prior.float(),
                    target_prior.float(),
                    reduction="mean"
                )
            
            loss = \
                instance_loss + \
                trainer.hyperparameters.prior_loss_weight * \
                prior_loss
            
            trainer.accelerator.backward(loss)
            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()
            global_step +=1

        loss_metrics = {
            "loss": loss.detach().item,
            "prior_loss": prior_loss.detach().item,
            "lr": lr_scheduler.get_last_lr()[0],
        }

        experiment.log_metrics(loss_metrics, step=global_step)

        progress_bar.set_postfix(**loss_metrics)
        progress_bar.update(1)


        if global_step >= trainer.hyperparameters.max_train_steps:
            break

    trainer.save_lora_weights(unet)
experiment.add_tag(f"dreambooth-training")
experiment.log_parameteres(trainer.hyperparameters)
trainer.accelerator.end_training()