This notebook is meant for generating the dataset and extracting the features from the dataset

# Parse and get captions from VizViz

In [None]:
!nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-f2df5755-48ba-dc99-b18d-16c1dd176e2e)


In [None]:
from google.colab import drive
import os
import json 
from tqdm import tqdm
drive.mount('/gdrive', force_remount=True)

Mounted at /gdrive


In [None]:
annotations_folder = "/gdrive/MyDrive/Final Project/Data/Annotations"
images_folder = "/gdrive/MyDrive/Final Project/Data/Generated Images"

The annotations are json objects that contain captions with some other metadata

In [None]:
files = os.listdir(annotations_folder)
files

['train.json', 'val.json']

Load the json objects to a dictionary for both train and validation

In [None]:
annotations = {}
for file_name in files:
  f = open(os.path.join(annotations_folder,file_name))
  annotations[file_name.replace(".json","")] = json.load(f)
  f.close

Extract the captions. We store them in the format :-



```
{
  Train : { id : 0, caption : <Caption>, ...}
  Test  : { id : 1, caption : <Caption>, ...}
}
```

We kept the ids with the captions so it's easier to co relate the generated images with their corresponding captions

In [None]:
captions = {}
for key, val in annotations.items():
  id_map = {}
  for x in val['annotations']:
    existing_captions = id_map.get(x['image_id'], [])
    existing_captions.append(x['caption'])
    id_map[x['image_id']] = existing_captions
  
  captions[key] = {id : captions_list[0] for id, captions_list in id_map.items()}
  print(f"Loaded {len(captions[key].keys())} captions for {key}. Sample : {list(captions[key].values())[0]}")

Loaded 23431 captions for train. Sample : ITS IS A BASIL LEAVES CONTAINER ITS CONTAINS THE NET WEIGHT TOO.
Loaded 7750 captions for val. Sample : A computer screen shows a repair prompt on the screen.


# Setup stable diffusion

This bit of code sets up the stable diffusion. This will ensure that the we download the model and be able to generate a image by passing a prompt to the stable duffusion model

In [None]:
!pip install diffusers==0.4.0
!pip install transformers scipy ftfy
!pip install "ipywidgets>=7,<8"

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting diffusers==0.4.0
  Downloading diffusers-0.4.0-py3-none-any.whl (229 kB)
[K     |████████████████████████████████| 229 kB 13.3 MB/s 
Collecting huggingface-hub>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[K     |████████████████████████████████| 182 kB 76.6 MB/s 
Installing collected packages: huggingface-hub, diffusers
Successfully installed diffusers-0.4.0 huggingface-hub-0.11.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[K     |████████████████████████████████| 5.8 MB 15.4 MB/s 
Collecting ftfy
  Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 2.1 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64

You also need to accept the model license before downloading or using the weights. In this post we'll use model version `v1-4`, so you'll need to  visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. 

You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).

As google colab has disabled external widgtes, we need to enable it explicitly. Run the following cell to be able to use `notebook_login`

In [None]:
from google.colab import output
output.enable_custom_widget_manager()

Now you can login with your user token.

In [None]:
from huggingface_hub import notebook_login

notebook_login()

Token is valid.
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /root/.huggingface/token
Login successful


### Stable Diffusion Pipeline

`StableDiffusionPipeline` is an end-to-end inference pipeline that you can use to generate images from text with just a few lines of code.

First, we load the pre-trained weights of all components of the model.

In addition to the model id [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4), we're also passing a specific `revision` and `torch_dtype` to the `from_pretrained` method.
Make sure you have succesfully login so that it can be verified that you have indeed accepted the model's license.

We want to ensure that every free Google Colab can run Stable Diffusion, hence we're loading the weights from the half-precision branch [`fp16`](https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/fp16) and also tell `diffusers` to expect the weights in float16 precision by passing `torch_dtype=torch.float16`.

If you want to ensure the highest possible precision, please make sure to remove `revision="fp16"` and `torch_dtype=torch.float16` at the cost of a higher memory usage.

In [None]:
import torch
from diffusers import StableDiffusionPipeline

# make sure you're logged in with `huggingface-cli login`
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16)  

Next, let's move the pipeline to GPU to have faster inference.

In [None]:
pipe = pipe.to("cuda")

# Use the extracted captions to generate the images

Setup a random seed to reproduce results

In [None]:
generator = torch.Generator("cuda").manual_seed(1024)

Let's create a function that takes a id and a prompt, and generates an image out of it.

In [None]:
def generate_and_save_image(file_name,prompt,folder):
  print("Generating file", file_name)
  image = pipe(prompt, num_inference_steps=15, generator=generator).images[0]
  image.save(os.path.join(folder, file_name))

In [None]:
captions.keys()

In [None]:
from IPython.display import clear_output
for data_type, id_caption_pairs in captions.items():
  print("Running for ", data_type)
  path = os.path.join(images_folder,data_type) #This will determine if we should put it in the train folder or the test folder
  generated_images = set(os.listdir(path))
  for id, prompt in id_caption_pairs.items():
    file_name = f"{id}.jpg"
    if file_name not in generated_images:
      generate_and_save_image(file_name,prompt,path)
      clear_output(wait=10)