**Conditional Image2Image:**

Source Code to generate more images using the original COCO food dataset.  Conditional image to image generation is used with the SDXL Turbo pipeline in this script.

Reference: https://huggingface.co/stabilityai/sdxl-turbo

Tweak the parameters as required:

*   The code has been run using google colab. Datasets are saved under the folder structure as specified in the code. Make sure to follow the structure. If you use a different folder structure, store the path variables accordingly.

*   To generate multiple images for each input image, set the parameter 'num_images_per_prompt' accordingly

For an input image, the generated image will be saved as 100000000000+orginal image filename under AugSD folder. Say, the input image name is '000000112887.jpg' then the generated image will be with the filename '100000112887.jpg'.

Author: Maria Mathews






In [None]:
from google.colab import drive
drive.mount('/content/gdrive/')
root_dir = '/content/gdrive/'

Mounted at /content/gdrive/


In [None]:
!unzip /content/gdrive/MyDrive/CV_Project/cake_coco/cake_image_train2017.zip

Archive:  /content/gdrive/MyDrive/CV_Project_Mine/cake_coco/cake_image_train2017.zip
  inflating: cake_image_train2017/000000000092.jpg  
  inflating: cake_image_train2017/000000000113.jpg  
  inflating: cake_image_train2017/000000000127.jpg  
  inflating: cake_image_train2017/000000000428.jpg  
  inflating: cake_image_train2017/000000000735.jpg  
  inflating: cake_image_train2017/000000000790.jpg  
  inflating: cake_image_train2017/000000000982.jpg  
  inflating: cake_image_train2017/000000001180.jpg  
  inflating: cake_image_train2017/000000001261.jpg  
  inflating: cake_image_train2017/000000001290.jpg  
  inflating: cake_image_train2017/000000001424.jpg  
  inflating: cake_image_train2017/000000001522.jpg  
  inflating: cake_image_train2017/000000001667.jpg  
  inflating: cake_image_train2017/000000001688.jpg  
  inflating: cake_image_train2017/000000001813.jpg  
  inflating: cake_image_train2017/000000002411.jpg  
  inflating: cake_image_train2017/000000002525.jpg  
  inflating: c

In [None]:
!nvidia-smi

Wed Apr 24 15:50:59 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   44C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [None]:
!pip install diffusers transformers accelerate
!pip install ipython

Collecting diffusers
  Downloading diffusers-0.27.2-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m15.1 MB/s[0m eta [36m0:00:00[0m
Collecting accelerate
  Downloading accelerate-0.29.3-py3-none-any.whl (297 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m297.6/297.6 kB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cudnn_cu

In [None]:
import torch
import os
import json
from diffusers import AutoPipelineForImage2Image, EulerDiscreteScheduler
from diffusers.utils import make_image_grid, load_image
from torchvision import transforms
from PIL import Image
from google.colab import files
import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
import random
from google.colab import files
import shutil
import cv2

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

In [None]:
#RUN ONLY ONCE. Download NLTK resources
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [None]:
#Save generated images to drive
init_image_path =  "/content/cake_image_train2017"
# Define the folder for the generated image path
gen_image_path = '/content/gdrive/MyDrive/CV_Project/cake_coco/AugSD'
os.makedirs(gen_image_path, exist_ok=True)

In [None]:
#Fetch Captions saved in the drive
captions_file_path = "/content/gdrive/MyDrive/CV_Project/all_food_captions_train2017.json"
with open(captions_file_path, "r") as f:
    captions_data = json.load(f)

In [None]:
pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/sdxl-turbo",
    torch_dtype=torch.float16,
    variant="fp16",
    safety_checker = None,
    requires_safety_checker= False,
    use_safetensors=True)
pipeline = pipeline.to("cuda")
pipeline.enable_model_cpu_offload()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model_index.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

Fetching 18 files:   0%|          | 0/18 [00:00<?, ?it/s]

model.fp16.safetensors:   0%|          | 0.00/246M [00:00<?, ?B/s]

scheduler/scheduler_config.json:   0%|          | 0.00/459 [00:00<?, ?B/s]

text_encoder_2/config.json:   0%|          | 0.00/575 [00:00<?, ?B/s]

tokenizer/special_tokens_map.json:   0%|          | 0.00/586 [00:00<?, ?B/s]

tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

text_encoder/config.json:   0%|          | 0.00/565 [00:00<?, ?B/s]

tokenizer/tokenizer_config.json:   0%|          | 0.00/704 [00:00<?, ?B/s]

tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

tokenizer_2/special_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

tokenizer_2/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

tokenizer_2/tokenizer_config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

unet/config.json:   0%|          | 0.00/1.78k [00:00<?, ?B/s]

vae/config.json:   0%|          | 0.00/607 [00:00<?, ?B/s]

tokenizer_2/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

model.fp16.safetensors:   0%|          | 0.00/1.39G [00:00<?, ?B/s]

diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/5.14G [00:00<?, ?B/s]

Keyword arguments {'safety_checker': None, 'requires_safety_checker': False} are not expected by StableDiffusionXLImg2ImgPipeline and will be ignored.


Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

In [None]:
#Function to extract the food related labels to append to the prompt
def foodKeywordMapping(caption="A cheesy pepperoni pizza sitting on top of a pan"):
    # Tokenize the caption into words
    words = word_tokenize(caption)
    # Perform part-of-speech tagging to identify nouns
    tagged_words = pos_tag(words)
    # Extract nouns related to food (NN: noun, NNPS: proper noun, NNS: plural noun)
    food_keywords = [word for word, tag in tagged_words if tag in ['NN', 'NNPS', 'NNS']]
    # Store the food keywords in a dictionary
    food_dict = {'food_keywords': list(food_keywords)}
    return food_dict

print(foodKeywordMapping())

{'food_keywords': ['cheesy', 'pepperoni', 'pizza', 'top', 'pan']}


In [None]:
#Negative prompts to the pipeline
negative_prompt = "bad anatomy, bad hands, three hands, three legs, bad arms, missing legs, missing arms, poorly drawn face, bad face, fused face, cloned face, worst face, three crus, extra crus, fused crus, worst feet, three feet, fused feet, fused thigh, three thigh, fused thigh, extra thigh, worst thigh, missing fingers, extra fingers, ugly fingers, long fingers, horn, extra eyes, huge eyes, 2girl, amputation, disconnected limbs, cartoon, cg, 3d, unreal, animate, bad proportions, Deformed, Mutated, Text, Signature, dull colors, low contrast, disfigured, poor details, Drawing, Bad photography"

#No. of images to be generated
num_images_per_prompt= 1

In [None]:
def image_exists(gen_image_path, gen_image_id):
    # Construct the full path of the image file
    image_path = os.path.join(gen_image_path, f"{gen_image_id}.jpg")
    # Check if the image file exists
    return os.path.exists(image_path)

In [None]:
counter = 0
# Iterate over the .jpg images in the folder
for filename in os.listdir(init_image_path):

    if filename.endswith(".jpg"):
        # Extract image ID (filename without extension)
        image_id = os.path.splitext(filename)[0]

        #Avoiding regeneration of images - assuming number of images to be generated is 1.
        gen_image_id = int(image_id)+100000000000
        gen_filename= str(gen_image_id)
        if image_exists(gen_image_path, gen_image_id):
          print(f"Image {gen_image_id}.jpg already exists in {gen_image_path}.")
          continue  # Skip to the next iteration if the image exists

        #Else resume image generation
        counter = counter+1
        print(f"#{counter} -- {image_id}")
        # Extract captions for the image ID from captions_data
        captions = [caption_entry["caption"] for caption_entry in captions_data if caption_entry["image_id"] == int(image_id)]

        # Create a set to store unique food keywords for this image
        unique_food_keywords = set()

        # Call foodKeywordMapping() for each caption and store the unique keywords
        for caption in captions:
            food_dict = foodKeywordMapping(caption)
            unique_food_keywords.update(food_dict['food_keywords'])

        # Convert the set of unique food keywords to a list and sort it alphabetically
        label = sorted(list(unique_food_keywords))
        # Convert the list of food keywords to a comma-separated string
        label = ', '.join(label)

        # Print the image id, Captions and its label
        print(f"Image ID: {image_id}")
        for i, caption in enumerate(captions, start=1):
            print(f"Caption {i}: {caption}")
        print(f"Label: {label}")
        i=random.randint(0, 4)
        print(f"Caption_i: {captions[i]}")
        image_path = os.path.join(init_image_path, filename)
        init_image = Image.open(image_path)

        prompt = f'("a picture of {label}").blend(0.9, 1.5)'
        print(prompt)
        # Run the pipeline for the image. Change :2 as per the number of images you need to generate
        # When using SDXL-Turbo for image-to-image generation, make sure that num_inference_steps * strength is larger or equal to 1
        images = pipeline(prompt = captions[i], negative_prompt= negative_prompt, strength=1.0, guidance_scale=0.0, num_inference_steps=2, batch=3,  num_images_per_prompt=num_images_per_prompt, image=init_image).images[0]
        #print(len(images))
        # Display the images using make_image_grid
        make_image_grid([init_image , images], rows=1, cols=2)

        gen_image_id = int(image_id)+100000000000
        gen_filename= str(gen_image_id)
        # Save each image
        gen_image_file = os.path.join(gen_image_path, f"{gen_image_id}.jpg")
        images.save(gen_image_file)
        print(f"Saved image {gen_image_id} to: {gen_image_file}")

Image 100000521366.jpg already exists in /content/gdrive/MyDrive/CV_Project_Mine/cake_coco/AugSD.
Image 100000283624.jpg already exists in /content/gdrive/MyDrive/CV_Project_Mine/cake_coco/AugSD.
Image 100000123273.jpg already exists in /content/gdrive/MyDrive/CV_Project_Mine/cake_coco/AugSD.
Image 100000322592.jpg already exists in /content/gdrive/MyDrive/CV_Project_Mine/cake_coco/AugSD.
Image 100000324952.jpg already exists in /content/gdrive/MyDrive/CV_Project_Mine/cake_coco/AugSD.
Image 100000456500.jpg already exists in /content/gdrive/MyDrive/CV_Project_Mine/cake_coco/AugSD.
Image 100000340069.jpg already exists in /content/gdrive/MyDrive/CV_Project_Mine/cake_coco/AugSD.
Image 100000321718.jpg already exists in /content/gdrive/MyDrive/CV_Project_Mine/cake_coco/AugSD.
Image 100000252178.jpg already exists in /content/gdrive/MyDrive/CV_Project_Mine/cake_coco/AugSD.
Image 100000084273.jpg already exists in /content/gdrive/MyDrive/CV_Project_Mine/cake_coco/AugSD.
Image 100000069577.j