# GenAI 

This notebook shows how do use different Generative AI techniques such as Zero Shot learning for object detection and segmentation, as well as prompted inpainting

Developed for reinvent by: Gonzalo Barbeito, Romil Shah, Andrea Montanari, Derek Graber, Matt Polloc, Junjie Tang, Fabian Benitez-Quiroz

## Prerequisites

- An instance or notebook with G5 instance
- A python environment from the requirements.txt, eg conda create -n summit-q4-2023 python=3.10 && pip install -r requirements.txt  


![image.png](./assets/0.png)

## Zero-learning foundational models (Object Detection)

Zero-shot learning is a Generative AI technique that allows a model to perform a machine learning task (such as object detection of a given class) without any specific training for some classes. 

Modern zero-shot learning in images allows you to find or segment objects by just using a single prompt. For example, algorithms such as [Grouding Dino](https://github.com/IDEA-Research/GroundingDINO) works by using transformer-based detector and pre-training technique so it learns the association between language and vision modalities

### Example | Find objects using Grounding Dino

In the cell below we can search for street objects such as stop signs and traffic lights just by prompting for it. 
We have built a set of of functions to abstract this functionality.

1. `GenAiModels` is just a holder for an inference pipeline based on [Grounded-Segment-Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything). This pipeline chains the following elements in succession:
    - [Grounding Dino](https://github.com/IDEA-Research/GroundingDINO): an object detection model. It uses a prompt in natural language as input to detect objects for which it has not been explicitly trained (zero-shot learning)
    - [SamProcessor](https://huggingface.co/docs/transformers/main/model_doc/sam#transformers.SamProcessor): an image processor used for preparing images to be used with the SAM model
    - [SAM Model](https://github.com/facebookresearch/segment-anything): a model capable of segmenting objects in an image using bounding boxes or points as inputs, 
    - [StableDiffusion](https://github.com/Stability-AI/stablediffusion): a model used to generate images based on text prompts. Specifically for this use case, we will be using a model capable of inpainting, which replaces a specific part of an image making sense in context.
2. `Prompts` object holds the prompts to select and replace


In [None]:
# install requirements from requirements.txt 
import subprocess
try:
    from groundingdino.util.inference import load_model
except:
    subprocess.call("rm GroundingDINO -rf", shell=True) 
    subprocess.call("git clone https://github.com/IDEA-Research/GroundingDINO.git && cd GroundingDINO/ && pip install -e . -q && cd .. ", shell=True) 
    subprocess.call("wget -q weights  https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth -O ./weights/groundingdino_swinb_cogcoor.pth", shell=True)

## Restart kernel to take effect of grounding dino

In [None]:
%load_ext autoreload
%autoreload 2
# Import basic set of classes
import warnings
warnings.filterwarnings('ignore')

from IPython.display import display
from IPython.display import Image as python_image
from PIL import Image
from utils import (
    GenAiModels,
    load_image_func,
    Prompts,
    natural_sort,
    plot_images,    
)
import glob
import os

# Create the GenAImodels class to do object detection and segmentation
genai_models = GenAiModels()


### Import data from the object detection pipeline

In [None]:
img_url = "./assets/northwest.jpg"
from PIL import Image
image_tmp = load_image_func(img_url, 512)
image_tmp

In [None]:
image_tmp = load_image_func(img_url, 512)

# define a set of prompts
prompt_object = Prompts(search_prompt="house on the right")
# execute the prompts for object detection
_, image_tmp = genai_models.prompt_segmentation(image_tmp, prompt_object)
image_tmp.convert("RGB")

## Perform segmentation using segment anything

Similarly, we can use a combination of GroundingDino in tandem with GenerativeAI segmentation model such as [SegmentAnything](https://github.com/facebookresearch/segment-anything) to perform detailed segmentation without training for it. Similar to zero-shot object detection methods, it finds the relatioship between language and fine grain visual segmentation. This methods are usually trained using over 10M images

In [None]:
image_tmp = load_image_func(img_url, 512)
prompt_object = Prompts(search_prompt="stop sign. traffic lights. house on the right")
image_tmp, _ =  genai_models.prompt_segmentation(image_tmp, prompt_object)
image_tmp.convert("RGB")

## Example: Use GenAI to blend certain area with the background

In [None]:
image_tmp = load_image_func(img_url, 512)
prompt_object = Prompts(search_prompt="stop sign. cable. traffic lights. house on the right.")
image_tmp = genai_models.remove_from_image(image_tmp, prompt_object)
image_tmp.convert("RGB")

### Task: Use your own prompt to remove objects from the image 

Use your own prompts to find and remove parts of the image. Some possible prompts are car and house

In [None]:
image = load_image_func(img_url, 512)
prompt_object = Prompts(search_prompt="car")
image_tmp = genai_models.remove_from_image(image, prompt_object)
image_tmp.convert("RGB")

## Replace an object using Stable Diffusion Inpainting

Traditional Inpainting algorithms deal with replacing information from an image using the context of the rest of the image. This is used in many phones nowadays to remove objects from picture or photobombing. Generative AI methods push this boundary further by replacing a selected area with an object described inside a prompt. For example, you can replace the stree with a flooded street.


In [None]:
image = load_image_func(img_url, 512)
prompt_object = Prompts(search_prompt="street",
                       replace_prompt="flooded street. 4k")
image_tmp = genai_models.search_replace(image, prompt_object, seed=50)
image_tmp.convert("RGB")
plot_images([image, image_tmp])

## Application: Use zero shot learning to create a verification job

Some customers want to ease the burden of a private workforce annotation. For example, Amazon facilites want to create a way to find the truck, trailer, license plate, lets see an example

In [None]:
img_url = "./assets/truck2.webp"

image_tmp = load_image_func(img_url, 1024)

# define a set of prompts
prompt_1 = "truck. car license plates. trailer"
prompt_2 = "trailer"
prompt_3 = "back part of a truck"
prompt_4 = prompt_3 + ".car license plates"
prompt_object = Prompts(search_prompt=prompt_2)
# execute the prompts for object detection
_, image_tmp = genai_models.prompt_segmentation(image_tmp, prompt_object)
image_tmp.convert("RGB")

## Prompted Segmention in a group of images

We can scale the concept of segmentation to a video by using the same segmetation prompt across a set of frames. Let's use the images inside the rosbag

In [None]:
images_list = natural_sort(glob.glob('images/*.png'))
search_prompt="vehicle lane"
prompt_object = Prompts(search_prompt=search_prompt)
generated_list = genai_models.prompt_segmentation_list(images_list[:20], prompt_object)
generated_list[0].save("out.gif", save_all=True, append_images=generated_list[1:], duration=200, loop=0)
display(python_image('out.gif'))

## Replace area with a new texture

In the example below we are selecting "vehicle lane" and replacing it across al frames with a "flooded street". We can generate also a reconstruction of the video by attaching the generated frames

In [None]:
images_list = natural_sort(glob.glob('images/*.png'))

search_prompt = "vehicle lane"
replace_prompt = "flooded street. 4k."
prompt_object = Prompts(search_prompt=search_prompt,
                        replace_prompt=replace_prompt,
                       negative_prompt="text. low quality. car")

generated_list_1 = genai_models.search_replace_list(images_list[:10], prompt_object, temporal_smoothing=False)

# result_image = prompt_segmentation(image, model, prompt_object, 3)
generated_list_1[0].save("out.gif", save_all=True, append_images=generated_list_1[1:], duration=200, loop=0)
display(python_image('out.gif'))

## Generate same video but this time using temporal information

We can also use temporal information to produce a smooth transition across consecutive frames. In the example below, we are replacing vehicle lane with a "snowy street"

Algorithms for optical flow need to learn how to:
  1. Find correspondence between points.
  2. Compute the relative offsets between points.
  3. Predict flow across large regions of space, even to parts of the image that lack texture for correspondence.
1. The learned procedure needs to generalize to real data, which means it needs to work for objects and textures that were not seen in the training data.

![](./assets/optical-flow1.jpg?modified=12345678) ![](./assets/optical-flow2.jpg?modified=12345678)

In [None]:
images_list = natural_sort(glob.glob('./images/*.png'))

search_prompt="vehicle lane"
replace_prompt = "snowy street, 4k"
prompt_object = Prompts(search_prompt=search_prompt,
                        replace_prompt=replace_prompt,
                        negative_prompt="low quality. cars. car. vehicle. car rear.")

generated_list = genai_models.search_replace_list(images_list[:15], 
                                                  prompt_object, 
                                                  temporal_smoothing=True, 
                                                  fancy_flow=False,
                                                  seed = 18)
generated_list[0].save("out_temp.gif", save_all=True, append_images=generated_list[1:], duration=200, loop=0)


display(python_image('out_temp.gif'))

# Using GenAI to get the optical flow

We can also use Perceiver-IO (https://arxiv.org/pdf/2107.14795.pdf) to produce the flow from image to image

![image](./assets/perceiver_io.jpg?modified=12345678)

Use the same function but turn fancy_flow from False to True

In [None]:
idx = 8
frame_1 = Image.open(images_list[idx])
frame_2 = Image.open(images_list[idx+1])

rendered_optical_flow = genai_models.optical_flow_pipeline((frame_1 .resize((500,500)), frame_2.resize((500,500))), render=True)
Image.fromarray(rendered_optical_flow)

In [None]:
images_list = natural_sort(glob.glob('./images/*.png'))

search_prompt="vehicle lane"
replace_prompt = "snowy street, 4k"
prompt_object = Prompts(search_prompt=search_prompt,
                        replace_prompt=replace_prompt,
                        negative_prompt="low quality. cars. car. vehicle. car rear.")

generated_list = genai_models.search_replace_list(images_list[:15], 
                                                  prompt_object, 
                                                  temporal_smoothing=True, 
                                                  fancy_flow=True,
                                                  seed = 18)
generated_list[0].save("out_temp_fancy.gif", save_all=True, append_images=generated_list[1:], duration=200, loop=0)


display(python_image('out_temp_fancy.gif'))

## Save images as png and run the object detection pipeline again

We can save the generated images and trigger the object detection pipeline as in the first notebook

In [None]:
import uuid
new_dir = f"output/{str(uuid.uuid4())}/"
print(new_dir)
os.makedirs(new_dir, exist_ok=True)

for k1, image in enumerate(generated_list):
    file_name = f"{new_dir}{str(k1).rjust(5,'0')}.png"
    print(file_name)
    image.save(file_name)