# Robot Face Generator Tool # 
Written by: Jasper Bosschart
## Context ##

This is a tool that can generate robot faces using Image Generation Artificial intelligence (AI). <br>
The tool is built around the HuggingFace🤗 [Diffusers](https://huggingface.co/docs/diffusers/index) library. This library can be used to access a wide variety of diffusion AI models available on the HuggingFace🤗 [website](https://huggingface.co/). This tool uses an Image Generation Model called ["stable-diffusion-v1-5"](https://huggingface.co/runwayml/stable-diffusion-v1-5) by runwayml. <br>
The tool is still a work in progress and is subjected to change. It is currently built within Jupyter Notebook using Python, as this allows the file to be ran off a remote location easily. This is necessary as the tool requires large amounts of computational power, something a simple laptop is not able to output. Future development might enable for a standalone application.

## How does it work? ##
A diffusion Model works as follows: 
It starts out with an image completely filled with Gaussian noise, static basically, what you would see on very old TV's. From this initial (static) image, a diffusion model starts de-noising the image. This process gets repeated many times over until a "new" image starts to form. The noise reduction patterns that a diffusion model uses to de-noise an image are related to a text prompt that you give it at the start. But how does it know how a pattern and a text prompt are related?
This is done through training, A diffusion model is first trained on a large data set of labeled images. these labeled images gradually get more noisy over time and the diffusion model is tasked to remove the noise each time. As such it learns how to de-noise images and it can link these de-noising patterns it starts to develop to the labels that the images have.

## How To Generate? ##

### Step 1: ###

Jupyter Notebook is basically a looking python file, where you can add fully formatted text around and in between snippets of code. It might look quite confusing and a little scary at first, but don’t worry you will be guided through the whole process as long as you keep reading the accompanied text for each code snippet. As mentioned, before we will use a library called diffusers to be able to install our image generation AI. for this diffusers library and our AI to work well, we need some other Libraries too. to download all the necessary  libraries for the tool to work, run the code snippet below:

In [None]:
!pip install xformers
!pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
!pip install -U diffusers 
!pip install -U accelerate 
!pip install -U transformers

### Step 2: ###
Now that we have downloaded the necessary libraries for the tool we need to import them into our current session, something which is not done automatically: <br>
Some more code is added to this snippet to be able to create a grid of images later on in the file.

In [None]:
import torch
import accelerate 
import transformers
from diffusers import StableDiffusionPipeline

from PIL import Image
def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

### Step 3: ####
Now it is time to download the Image Generation Model, This can take some time so don't worry if it does not immediately finish. As you can see in the code snippet above, we specifically imported *StableDiffusionPipeline* from *diffusers*. This function basically does most of the hard work for us, it generates an initial noise image, it creates time intervals for multiple iterations and so forth. <br>
to make sure the model runs on a graphics card *.to("cuda")* is added, without that the generation of an image will take 30 minutes instead of 30 seconds.

In [None]:
#orch.backends.cuda.matmul.allow_tf32 = False
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")
#pipe.enable_sequential_cpu_offload()
#generator = torch.Generator("cuda").manual_seed(1024)

### Step 4: ####
In this step it is time to let your creativity roam free, you need to create a text prompt of something you want to have generated. As you can see there are two prompts to fill in, a normal *prompt* and a *neg_prompt* or negative prompt. The normal prompt allows you to write whatever you want from the diffusion model, while the negative prompt allows you to write whatever you don't want. 

You want an image of a boat, but no people on the boat? <br>
prompt="photograph of a boat" <br>
neg_prompt="people"



Some possible prompts you could use to increase the quality of your images and limit the negative aspects: <br>
prompt = "robot portrait, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, 8k" <br>
neg_prompt = "human features, ugly, beginner, amateur, painting, drawing, disfigured, uncanny valley"

In [None]:
prompt = "robot portrait, realistic"
neg_prompt = "painting"


### Step 5: ####
You are almost there, you just need to run the following code snippet and wait for approximately 30 seconds, don't sweat it if it take a little longer, the server might be busy. 
the code snippet currently creates 2 images, computational limitation, using the same text prompts from step 4 and runs it through the pipeline. Afterwards they get saved within *images* and by using *grid* you are displaying them.

If the generation takes longer then 5 min, maybe ask for help, you might be running your program without a graphics card.

In [None]:
num_images = 2
Multi_prompt = [prompt] * num_images
Multi_prompt_N = [neg_prompt] * num_images

images = pipe(prompt=Multi_prompt, negative_prompt=Multi_prompt_N).images

grid = image_grid(images, rows=1, cols=2)
display(grid)

In [None]:
grid = image_grid(images, rows=1, cols=2)
display(grid)

In [None]:
grid.save(f"robot1.png")

In [None]:
images = pipe(prompt=prompt, negative_prompt=neg_prompt, num_images_per_prompt=2).images

grid = image_grid(images, rows=1, cols=4)
display(grid)

In [None]:
display(grid)

In [None]:
display(images[2])

## Sources ##
For this project the following sources have been used:
-  https://huggingface.co/blog/stable_diffusion#how-does-stable-diffusion-work,
-  https://huggingface.co/runwayml/stable-diffusion-v1-5,
-  https://huggingface.co/docs/diffusers/using-diffusers/write_own_pipeline,
-  https://huggingface.co/docs/diffusers/v0.16.0/en/optimization/fp16

@InProceedings{Rombach_2022_CVPR, <br>
    author    = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn}, <br>
    title     = {High-Resolution Image Synthesis With Latent Diffusion Models}, <br>
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, <br>
    month     = {June}, <br>
    year      = {2022}, <br>
    pages     = {10684-10695} <br>
} <br>

In [None]:
torch.cuda.empty_cache()

In [None]:
!nvidia-smi
!nvcc --version

In [None]:
~/.local/lib/python3.8/site-packages$
rm -rf workspaces/
mkdir workspaces