# Подготовка 

Мы используем пример от [Lambda Labs](https://lambdalabs.com/blog/how-to-fine-tune-stable-diffusion-how-we-made-the-text-to-pokemon-model-at-lambda)


Этот пример используется для обучения [данный репозиторий](https://github.com/justinpinkney/stable-diffusion/tree/fc3a4fd8d5e171627ef30769146c68b9c072dffb)

Отредактируем yaml файл с конфигурацией обучения

In [41]:
!git clone https://github.com/justinpinkney/stable-diffusion.git

Cloning into 'stable-diffusion'...
remote: Enumerating objects: 1631, done.[K
remote: Counting objects: 100% (721/721), done.[K
remote: Compressing objects: 100% (58/58), done.[K
remote: Total 1631 (delta 684), reused 663 (delta 663), pack-reused 910[K
Receiving objects: 100% (1631/1631), 73.86 MiB | 2.14 MiB/s, done.
Resolving deltas: 100% (1017/1017), done.


```yaml
data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 4
    num_workers: 4
    num_val_workers: 0 # Avoid a weird val dataloader issue
    train:
      target: ldm.data.simple.hf_dataset_load_disk
      params:
        name: promt_dataset
        image_transforms:
        - target: torchvision.transforms.Resize
          params:
            size: 512
            interpolation: 3
        - target: torchvision.transforms.RandomCrop
          params:
            size: 512
        - target: torchvision.transforms.RandomHorizontalFlip
    validation:
      target: ldm.data.simple.TextOnly
      params:
        captions:
        - "fistula on a metal pipe"
        - "Photo of a fistula on a metal pipe"
        - "crack on a metal pipe"
        - "Photo of a crack on a metal pipe"
        output_size: 512
        n_gpus: 1 # small hack to make sure we see all our samples
```

Мы поменяли две строчки 

`ldm.data.simple.hf_dataset -> ldm.data.simple.hf_dataset_load_disk`

`lambdalabs/pokemon-blip-captions -> promt_dataset`


`ldm.data.simple.hf_dataset` это функция отвечающая за чтение датасета, оригинале она находиться по [этой ссылке](https://github.com/justinpinkney/stable-diffusion/blob/fc3a4fd8d5e171627ef30769146c68b9c072dffb/ldm/data/simple.py#L123)


Нам надо создать рядом с этой же функцией, нашу функцию которая будет грузить датасет с диска

In [None]:
from dataset import load_from_disk
...
def hf_dataset_load_disk(
    name,
    image_transforms=[],
    image_column="image",
    text_column="text",
    split='train',
    image_key='image',
    caption_key='txt',
    ):
    """Make huggingface dataset with appropriate list of transforms applied
    """
    ds = load_from_disk(name)
    image_transforms = [instantiate_from_config(tt) for tt in image_transforms]
    image_transforms.extend([transforms.ToTensor(),
                                transforms.Lambda(lambda x: rearrange(x * 2. - 1., 'c h w -> h w c'))])
    tform = transforms.Compose(image_transforms)

    assert image_column in ds.column_names, f"Didn't find column {image_column} in {ds.column_names}"
    assert text_column in ds.column_names, f"Didn't find column {text_column} in {ds.column_names}"

    def pre_process(examples):
        processed = {}
        processed[image_key] = [tform(im) for im in examples[image_column]]
        processed[caption_key] = examples[text_column]
        return processed

    ds.set_transform(pre_process)
    return ds

Да, изначально в репозитории присудствуеи CocoImagesAndCaptionsTrain2017, но он для работы с сегментацией, к нашему варианту это не подходит 

In [1]:
%cd stable-diffusion/

/app/notebooks/stable-diffusion


In [27]:
%cd libs
%cd taming-transformers
!git clone https://github.com/CompVis/taming-transformers.git


/app/notebooks/libs
/app/notebooks/libs/taming-transformers
fatal: destination path 'taming-transformers' already exists and is not an empty directory.


In [28]:
import sys
sys.path.append("/app/notebooks/libs/taming-transformers/taming")

In [33]:
%pwd

'/app/notebooks/libs/taming-transformers'

In [34]:
%cd ../../


/app/notebooks


In [40]:
import taming

In [42]:
!python stable-diffusion/main.py \
    -t \
    --base /app/notebooks/stable-diffusion/configs/stable-diffusion/pokemon.yaml \
    --gpus 0 \
    --scale_lr False \
    --num_nodes 1 \
    --check_val_every_n_epoch 10 \
    --finetune_from /app/checkpoints/sd-v1-4-full-ema.ckpt

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
Global seed set to 23
Running on GPUs 0
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Keeping EMAs of 688.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
^C
Traceback (most recent call last):
  File "stable-diffusion/main.py", line 670, in <module>
    model = instantiate_from_config(config.model)
  File "/app/notebooks/stable-diffusion/ldm/util.py", line 79, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "/app/notebooks/stable-diffusion/ldm/models/diffusion/ddpm.py", line 526,

In [26]:
from taming.modules.vqvae.quantize import VectorQuantizer2 as VectorQuantizer