### Dreambooth 模型微调
DreamBooth 是一种深度学习生成模型，用于微调现有的文本到图像模型，由 Google Research 和波士顿大学的研究人员于 2022 年开发。最初使用 Google 自己的 Imagen 文本到图像模型开发，DreamBooth 的实现可以应用到其他文本到图像模型，它可以让模型通过的三到五张图像对一个主题进行训练后生成更精细和个性化的输出。

![](../../images/dreambooth.png)

接下来我们将使用 DreamBooth 来微调我们的 stable diffusion xl模型.

#### Notebook 步骤
1. 导入 boto3, sagemaker python SDK
2. 构建 dreambooth fine-tuning 镜像
3. 实现模型微调
   * 配置超参
   * 创建训练任务
4. 测试

#### 1. 导入 boto3, sagemaker python SDK

In [52]:
import sagemaker
import boto3
from sagemaker.pytorch import PyTorch
sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()
role = sagemaker.get_execution_role()
account_id = boto3.client('sts').get_caller_identity().get('Account')
region_name = boto3.session.Session().region_name

images_s3uri = 's3://{0}/dreambooth-xl/images/'.format(bucket)
models_s3uri = 's3://{0}/stable-diffusion/models/'.format(bucket)
dreambooth_s3uri = 's3://{0}/stable-diffusion/dreambooth/'.format(bucket)



#### 2. 构建 dreambooth xl fine-tuning 镜像

In [None]:
!cd sd_xl_dreambooth && git clone https://github.com/huggingface/diffusers

In [None]:
%%writefile Dockerfile
## You should change below region code to the region you used, here sample is use us-west-2
#From 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04
From 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04

RUN pip install wandb
RUN pip install xformers==0.0.18
RUN pip install bitsandbytes
#RUN export TORCH_CUDA_ARCH_LIST="7.5 8.0 8.6" && export FORCE_CUDA="1" && pip install ninja triton==2.0.0.dev20221120 && git clone https://github.com/xieyongliang/xformers.git /tmp/xformers && cd /tmp/xformers && git submodule update --init --recursive && pip install -r requirements.txt && pip install -e . 


ENV LANG=C.UTF-8
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE

* build & push docker镜像

In [None]:
## You should change below region code to the region you used, here sample is use us-west-2
!aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-west-2.amazonaws.com

In [53]:
## define repo name, should contain *sagemaker* in the name
repo_name = "sd_xl_dreambooth_finetuning"

In [None]:
%%script env repo_name=$repo_name bash

#!/usr/bin/env bash

# This script shows how to build the Docker image and push it to ECR to be ready for use
# by SageMaker.

# The argument to this script is the image name. This will be used as the image on the local
# machine and combined with the account and region to form the repository name for ECR.
# The name of our algorithm
algorithm_name=${repo_name}

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

* 准备训练图像

In [None]:
from huggingface_hub import snapshot_download

local_dir = "./dog"
snapshot_download(
    "diffusers/dog-example",
    local_dir=local_dir, repo_type="dataset",
    ignore_patterns=".gitattributes",
)

In [None]:
!chmod -R 777 ./sd_xl_dreambooth
!./sd_xl_dreambooth/s5cmd sync ./dog/ $images_s3uri

#### 3. 模型微调

   * image_uri: ecr仓库中的 docker 镜像地址
   * instance_type: 用于训练任务的实例大小 , 建议使用 ml.g4dn.xlarge, ml.g5.xlarge
   * class_prompt: 提示词类别
   * instance_prompt: 用于你的图片的关键词
   * model_name: 预训练的模型名称
   

In [58]:
%%writefile ./sd_xl_dreambooth/train.sh


mkdir -p /tmp/dog
ls -lt ./
chmod 777 ./s5cmd


cd diffusers && pip install -e .
cd examples/dreambooth/ && pip install -r requirements_sdxl.txt

cp -r /opt/ml/input/data/images/* /tmp/dog/

export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export INSTANCE_DIR="/tmp/dog/"
export OUTPUT_DIR="/tmp/ouput"
#export OUTPUT_DIR="/opt/ml/model/"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"
export dreambooth_s3uri="s3://sagemaker-us-west-2-687912291502/stable-diffusion/dreambooth/"

accelerate launch /opt/ml/code/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py \
  --gradient_checkpointing \
  --use_8bit_adam \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --pretrained_vae_model_name_or_path=$VAE_PATH \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-5 \
  --report_to="tensorboard" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --enable_xformers_memory_efficient_attention

/opt/ml/code/s5cmd sync /tmp/ouput/ $dreambooth_s3uri/output/$(date +%Y-%m-%d-%H-%M-%S)/


Overwriting ./sd_xl_dreambooth/train.sh


* 本地跑测试

In [59]:
!pip list|grep -i xformers

In [None]:
!./sd_xl_dreambooth/train.sh

   * 创建训练任务

In [None]:
import time
from sagemaker.estimator import Estimator
from sagemaker.pytorch.estimator import PyTorch

environment = {
    'PYTORCH_CUDA_ALLOC_CONF':'max_split_size_mb:32'
}

## The image uri which is build and pushed above
image_uri = "{}.dkr.ecr.{}.amazonaws.com/{}:latest".format(account_id, region_name, repo_name)
base_job_name = 'sd-xl-dreambooth-finetuning-high'
instance_type = 'ml.g5.2xlarge'
inputs = {
    'images': f"s3://{bucket}/dreambooth-xl/images/"
}

estimator = PyTorch(role=role,
                      entry_point='train.sh',
                      source_dir='./sd_xl_dreambooth/',
                      base_job_name=base_job_name,
                      instance_count=1,
                      instance_type=instance_type,
                      image_uri=image_uri,
                      environment=environment,
                      keep_alive_period_in_seconds=3600, #warmpool，为下一次训练保持机器&镜像（滚动续期，最大1hour）；需要开quota。
                      disable_profiler=True,
                      debugger_hook_config=False,
                      max_run=24*60*60*2)

estimator.fit(inputs)

Using provided s3_resource


INFO:sagemaker:Creating training-job with name: sd-xl-dreambooth-finetuning-high-2023-08-25-10-08-30-083


2023-08-25 10:08:34 Starting - Starting the training job...
2023-08-25 10:08:48 Starting - Preparing the instances for training......
2023-08-25 10:09:57 Downloading - Downloading input data...
2023-08-25 10:10:22 Training - Downloading the training image...........................
2023-08-25 10:14:48 Training - Training image download completed. Training in progress....[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2023-08-25 10:15:27,344 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2023-08-25 10:15:27,358 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-08-25 10:15:27,368 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2023-08-25 10:15:27,369 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2023-08-25 10:

In [57]:
print("Model artifact saved at:\n", dreambooth_s3uri)

Model artifact saved at:
 s3://sagemaker-us-west-2-687912291502/stable-diffusion/dreambooth/
