# Text guided image synthesis - Part 1: preparing enviornment

In this notebook we install all dependencies and download AI models for this project.

## Assumptions

 - We run on linux.
 - Apt package is available.
 - We are root.
 - torchvision pytorch and nvidia drivers are installed.
 - Jupyter notebook or jupyter lab is installed.

This is essentially setup that we have if we run docker image from pytorch (https://hub.docker.com/r/pytorch/pytorch/).

## Dependencies from apt

We need to have curl and git if it is not already installed.

In [1]:
! apt install -y curl wget git

Reading package lists... Done
Building dependency tree       
Reading state information... Done
git is already the newest version (1:2.17.1-1ubuntu0.9).
The following NEW packages will be installed:
  curl libcurl4 wget
0 upgraded, 3 newly installed, 0 to remove and 3 not upgraded.
Need to get 694 kB of archives.
After this operation, 2007 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 wget amd64 1.19.4-1ubuntu2.2 [316 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libcurl4 amd64 7.58.0-2ubuntu3.16 [220 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 curl amd64 7.58.0-2ubuntu3.16 [159 kB]
Fetched 694 kB in 0s (1749 kB/s)[0m[33m
debconf: delaying package configuration, since apt-utils is not installed

7[0;23r8[1ASelecting previously unselected package wget.
(Reading database ... 9772 files and directories currently installed.)
Preparing to unpack .../wget_1.19.4-1ubuntu2.2_amd64.

## Install required packages using pip

(Pip as to be installed)

We install:

 - taming transformes paper source code
 - tensorflow (for superresolution)
 - Image Superresolution package
 - CLIP source code
 - pytorch lightning
 - omegaconf

In [2]:
!pip install omegaconf pytorch-lightning tensorflow
!pip install git+https://github.com/bfirsh/taming-transformers.git
!pip install git+https://github.com/openai/CLIP.git
!pip install git+https://github.com/idealo/image-super-resolution.git

Collecting omegaconf
  Downloading omegaconf-2.1.1-py3-none-any.whl (74 kB)
[K     |████████████████████████████████| 74 kB 2.4 MB/s eta 0:00:011
[?25hCollecting pytorch-lightning
  Downloading pytorch_lightning-1.6.0-py3-none-any.whl (582 kB)
[K     |████████████████████████████████| 582 kB 18.5 MB/s eta 0:00:01
[?25hCollecting tensorflow
  Downloading tensorflow-2.8.0-cp38-cp38-manylinux2010_x86_64.whl (497.6 MB)
[K     |████████████████████████████████| 497.6 MB 2.4 kB/s  eta 0:00:01    |█▋                              | 25.4 MB 30.8 MB/s eta 0:00:16     |███████▊                        | 119.4 MB 52.9 MB/s eta 0:00:08     |███████████▏                    | 174.2 MB 52.9 MB/s eta 0:00:07     |█████████████████▏              | 266.8 MB 38.2 MB/s eta 0:00:07     |██████████████████████████▉     | 416.6 MB 49.0 MB/s eta 0:00:02
[?25hCollecting antlr4-python3-runtime==4.8
  Downloading antlr4-python3-runtime-4.8.tar.gz (112 kB)
[K     |████████████████████████████████| 112 kB 59.

## Download AI models

We will download them to /weights path.

First we download CLIP model. We will download the best performing model (ViT-B-32).

In [3]:
!mkdir weights
!curl -o weights/ViT-B-32.pt https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  337M  100  337M    0     0  32.4M      0  0:00:10  0:00:10 --:--:-- 32.6M


Now we download VQGAN model. We choose model depending on what we plan to generate.
Each cell downloads one model. All models take around 17.7GB so it is recommended to download only the ones you plan to use.

Imagenet 16384 model - Imagenet is a dataset that contains images of 16384 different objects. Model will be good for generating objects.

In [4]:
!curl -L -o weights/vqgan_imagenet_f16_16384.ckpt -C - 'https://heibox.uni-heidelberg.de/f/867b05fc8c4841768640/?dl=1'
!curl -L -o weights/vqgan_imagenet_f16_16384.yaml -C - 'https://heibox.uni-heidelberg.de/f/274fb24ed38341bfa753/?dl=1'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
100  934M  100  934M    0     0  14.5M      0  0:01:04  0:01:04 --:--:-- 14.8M   37  351M    0     0  13.8M      0  0:01:07  0:00:25  0:00:42 14.9M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   692  100   692    0     0   1272      0 --:--:-- --:--:-- --:--:--  1272


COCO model - Dataset contains objects in everyday environment. Another model good for generating objects.

In [4]:
!curl -L -o weights/coco.yaml -C - 'https://dl.nmkd.de/ai/clip/coco/coco.yaml'
!curl -L -o weights/coco.ckpt -C - 'https://dl.nmkd.de/ai/clip/coco/coco.ckpt'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1980    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0 100  1980    0     0   2625      0 --:--:-- --:--:-- --:--:--  2625
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 8045M  100 8045M    0     0  29.6M      0  0:04:31  0:04:31 --:--:-- 26.4M  32 2618M    0     0  29.3M      0  0:04:34  0:01:29  0:03:05 29.3M7589M    0     0  29.6M      0  0:04:31  0:04:16  0:00:15 31.5M     0  29.6M      0  0:04:31  0:04:19  0:00:12 33.0M 96 7784M    0     0  29.6M      0  0:04:31  0:04:22  0:00:09 32.7M


FacesHQ model - Dataset containing faces. Model will be good for generating portraits (or more specifically monsters because this method does not generate good portraits).

In [4]:
!curl -L -o weights/faceshq.yaml -C - 'https://drive.google.com/uc?export=download&id=1fHwGx_hnBtC8nsq7hesJvs-Klv-P0gzT'
!curl -L -o weights/faceshq.ckpt -C - 'https://app.koofr.net/content/links/a04deec9-0c59-4673-8b37-3d696fe63a5d/files/get/last.ckpt?path=%2F2020-11-13T21-41-45_faceshq_transformer%2Fcheckpoints%2Flast.ckpt'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  1451  100  1451    0     0   1545      0 --:--:-- --:--:-- --:--:--  1545
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3789M  100 3789M    0     0  32.3M      0  0:01:57  0:01:57 --:--:-- 59.7MM


Wikiart model - Dataset containing paintings. Model will be good for generating paintings.

In [5]:
!curl -L -o weights/wikiart_16384.ckpt -C - 'http://eaidata.bmk.sh/data/Wikiart_16384/wikiart_f16_16384_8145600.ckpt'
!curl -L -o weights/wikiart_16384.yaml -C - 'http://eaidata.bmk.sh/data/Wikiart_16384/wikiart_f16_16384_8145600.yaml'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  958M  100  958M    0     0  7414k      0  0:02:12  0:02:12 --:--:-- 7680k 0     0  7379k      0  0:02:13  0:01:24  0:00:49 7606k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   920  100   920    0     0   2030      0 --:--:-- --:--:-- --:--:--  2026


Flickr dataset - Dataset containing a lot of landscapes. Model will be good for generating landscapes.

In [6]:
!curl -L -o weights/sflckr.yaml -C - 'https://heibox.uni-heidelberg.de/d/73487ab6e5314cb5adba/files/?p=%2Fconfigs%2F2020-11-09T13-31-51-project.yaml&dl=1'
!curl -L -o weights/sflckr.ckpt -C - 'https://heibox.uni-heidelberg.de/d/73487ab6e5314cb5adba/files/?p=%2Fcheckpoints%2Flast.ckpt&dl=1'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  1603  100  1603    0     0   2852      0 --:--:-- --:--:-- --:--:--  2852
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 4066M  100 4066M    0     0  14.9M      0  0:04:32  0:04:32 --:--:-- 14.9MM    0     0  14.8M      0  0:04:32  0:02:49  0:01:43 14.9M  72 2960M    0     0  14.8M      0  0:04:32  0:03:18  0:01:14 14.9M


Superresolution models.

In [14]:
!curl -o weights/rdn-C6-D20-G64-G064-x2_PSNR_epoch086.hdf5 https://public-asai-dl-models.s3.eu-central-1.amazonaws.com/ISR/rdn-C6-D20-G64-G064-x2/PSNR-driven/rdn-C6-D20-G64-G064-x2_PSNR_epoch086.hdf5
!curl -o weights/rdn-C3-D10-G64-G064-x2_PSNR_epoch134.hdf5 https://public-asai-dl-models.s3.eu-central-1.amazonaws.com/ISR/rdn-C3-D10-G64-G064-x2/PSNR-driven/rdn-C3-D10-G64-G064-x2_PSNR_epoch134.hdf5
!curl -o weights/rdn-C6-D20-G64-G064-x2_ArtefactCancelling_epoch219.hdf5 https://public-asai-dl-models.s3.eu-central-1.amazonaws.com/ISR/rdn-C6-D20-G64-G064-x2/ArtefactCancelling/rdn-C6-D20-G64-G064-x2_ArtefactCancelling_epoch219.hdf5
!curl -o weights/rrdn-C4-D3-G32-G032-T10-x4_epoch299.hdf5 https://public-asai-dl-models.s3.eu-central-1.amazonaws.com/ISR/rrdn-C4-D3-G32-G032-T10-x4-GANS/rrdn-C4-D3-G32-G032-T10-x4_epoch299.hdf5

--2022-03-27 18:32:50--  https://public-asai-dl-models.s3.eu-central-1.amazonaws.com/ISR/rdn-C3-D10-G64-G064-x2/PSNR-driven/rdn-C3-D10-G64-G064-x2_PSNR_epoch134.hdf5
Resolving public-asai-dl-models.s3.eu-central-1.amazonaws.com (public-asai-dl-models.s3.eu-central-1.amazonaws.com)... 52.219.170.46
Connecting to public-asai-dl-models.s3.eu-central-1.amazonaws.com (public-asai-dl-models.s3.eu-central-1.amazonaws.com)|52.219.170.46|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10694096 (10M) [binary/octet-stream]
Saving to: ‘rdn-C3-D10-G64-G064-x2_PSNR_epoch134.hdf5’


2022-03-27 18:32:54 (3.19 MB/s) - ‘rdn-C3-D10-G64-G064-x2_PSNR_epoch134.hdf5’ saved [10694096/10694096]

--2022-03-27 18:32:54--  https://public-asai-dl-models.s3.eu-central-1.amazonaws.com/ISR/rdn-C6-D20-G64-G064-x2/ArtefactCancelling/rdn-C6-D20-G64-G064-x2_ArtefactCancelling_epoch219.hdf5
Resolving public-asai-dl-models.s3.eu-central-1.amazonaws.com (public-asai-dl-models.s3.eu-central-1.amazon