# Lesson 3: Object Detection

<p style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> ⏳ <b>Note <code>(Kernel Starting)</code>:</b> This notebook takes about 30 seconds to be ready to use. You may start and watch the video while you wait.</p>


* In this classroom, the libraries have been already installed for you.
* If you would like to run this code on your own machine, you need to install the following:
    ```
    !pip install -q comet_ml transformers ultralytics torch
    ```

### Set up Comet

In [None]:
import comet_ml

Info about ['Comet'](https://www.comet.com/site/?utm_source=dlai&utm_medium=course&utm_campaign=prompt_engineering_for_vision_models&utm_content=dlai_L3)

In [None]:
comet_ml.init(anonymous=True, project_name="3: OWL-ViT + SAM")

In [None]:
exp = comet_ml.Experiment()

### Load the image

In [None]:
# To display the image
from PIL import Image

In [None]:
logged_artifact = exp.get_artifact("L3-data", "anmorgan24")

>Note: the images referenced in this notebook have already been uploaded to the Jupyter directory, in this classroom, for your convenience. For further details, please refer to the **Appendix** section located at the end of the lessons.

In [None]:
local_artifact = logged_artifact.download("./")

In [None]:
# Display the images
raw_image = Image.open("L3_data/dogs.jpg")
raw_image

### Get bounding boxes with OWL-ViT object detection model

>Note: `pipeline` is already installed for you in this classroom.

In [None]:
from transformers import pipeline

In [None]:
OWL_checkpoint = "./models/google/owlvit-base-patch32"

Info about ['google/owlvit-base-patch32'](https://huggingface.co/google/owlvit-base-patch32)

* Build the pipeline for the detector model.

In [None]:
# Load the model
detector = pipeline(
    model= OWL_checkpoint,
    task="zero-shot-object-detection"
)

In [None]:
# What you want to identify in the image
text_prompt = "dog"

In [None]:
output = detector(
    raw_image,
    candidate_labels = [text_prompt]
)

In [None]:
# Print the output to identify the bounding boxes detected
output

* Use the **util**'s function to prompt boxes in top of the image.

>Note: ```utils``` is an additional file containing the methods that have been already developed for you to be used in this classroom. 
For further details, please refer to the **Appendix** section located at the end of the lessons.

In [None]:
from utils import preprocess_outputs

In [None]:
input_scores, input_labels, input_boxes = preprocess_outputs(output)

In [None]:
from utils import show_boxes_and_labels_on_image

In [None]:
# Show the image with the bounding boxes
show_boxes_and_labels_on_image(
    raw_image,
    input_boxes[0],
    input_labels,
    input_scores
)

### Get segmentation masks using Mobile SAM

In [None]:
# Load the SAM model from the imported ultralytics library
from ultralytics import SAM

In [None]:
SAM_version = "mobile_sam.pt"

Info about [mobile_sam.pt](https://docs.ultralytics.com/models/mobile-sam/)

In [None]:
model = SAM(SAM_version)

* Generate an array using numpy.

In [None]:
import numpy as np

In [None]:
# Create a list of positive labels of same length as the number of predictions generated above
labels = np.repeat(1, len(output))

In [None]:
# Print the labels
labels

In [None]:
result = model.predict(
    raw_image,
    bboxes=input_boxes[0],
    labels=labels
)

In [None]:
result

In [None]:
masks = result[0].masks.data
masks

In [None]:
from utils import show_masks_on_image

In [None]:
# Visualize the masks
show_masks_on_image(
    raw_image,
    masks
)

>Note: Please note that the results obtained from running this notebook may vary slightly from those demonstrated by the instructor in the video. 

### Image Editing: blur out faces

* Load the image.

In [None]:
from PIL import Image

In [None]:
image_path = "L3_data/people.jpg"

>Note: the images referenced in this notebook have already been uploaded to the Jupyter directory, in this classroom, for your convenience. For further details, please refer to the **Appendix** section located at the end of the lessons.

In [None]:
raw_image = Image.open(image_path)
raw_image

In [None]:
raw_image.size

* Resize the image.

In [None]:
# Calculate width percent to maintain aspect ratio in resize transformation
mywidth = 600
wpercent = mywidth / float(raw_image.size[0])
wpercent

In [None]:
# Calculate height percent to maintain aspect ratio in resize transformation
hsize = int( float(raw_image.size[1]) * wpercent )
hsize

In [None]:
# Resize the image
raw_image = raw_image.resize([mywidth, hsize])
raw_image

In [None]:
raw_image.size

In [None]:
# Save the resized image
image_path_resized = "people_resized.jpg"
raw_image.save(image_path_resized)

### Detect faces

In [None]:
candidate_labels = ["human face"]

In [None]:
# Define a new Comet experiment for this new pipeline
exp = comet_ml.Experiment()

In [None]:
# Log raw image to the experiment
_ = exp.log_image(
    raw_image,
    name = "Raw image"
)

* Create bounding boxes with OWL-ViT.

In [None]:
# Apply detector model to the raw image
output = detector(
    raw_image,
    candidate_labels=candidate_labels
)

In [None]:
input_scores, input_labels, input_boxes = preprocess_outputs(output)

In [None]:
# Print values of the bounding box coordinates identified
input_boxes

#### Log the images and bounding boxes.

In [None]:
metadata = {
    "OWL prompt": candidate_labels,
    "SAM version": SAM_version,
    "OWL Version": OWL_checkpoint
}

In [None]:
from utils import make_bbox_annots

In [None]:
annotations = make_bbox_annots(
    input_scores,
    input_labels,
    input_boxes,
    metadata
)

In [None]:
_ = exp.log_image(
    raw_image,
    annotations= annotations,
    metadata=metadata,
    name= "OWL output"
)

### Segmentation masks using SAM

In [None]:
result = model.predict(
    image_path_resized,
    bboxes=input_boxes[0],
    labels=np.repeat(1, len(input_boxes[0]))
)

### Blur entire image first

In [None]:
from PIL.ImageFilter import GaussianBlur

In [None]:
blurred_img = raw_image.filter(GaussianBlur(radius=5))

In [None]:
blurred_img 

In [None]:
masks = result[0].masks.data.cpu().numpy()

In [None]:
# Create an array of zeroes of the same shape as our image mask
total_mask = np.zeros(masks[0].shape)

In [None]:
# Add each output mask to the total_mask
for mask in masks:
    total_mask = np.add(total_mask,mask)

In [None]:
# Where there is any value other than zero (where any masks exist), show the blurred image
# Else, show the original (unblurred) image
output = np.where(
    np.expand_dims(total_mask != 0, axis=2),
    blurred_img,
    raw_image
)

In [None]:
import matplotlib.pyplot as plt

In [None]:
# Print image with faces blurred
plt.imshow(output)

* Log this image in the **Comet** platform.

In [None]:
metadata = {
    "OWL prompt": candidate_labels,
    "SAM version": SAM_version,
    "OWL version": OWL_checkpoint
}

In [None]:
_ = exp.log_image(
    output,
    name="Blurred masks",
    metadata = metadata,
    annotations=None
)

### Blur just faces of those not wearing sunglasses

In [None]:
# New label
candidate_labels = ["a person without sunglasses"]

* Re-run the pipeline.

In [None]:
exp = comet_ml.Experiment()

In [None]:
_ = exp.log_image(raw_image, name="Raw image")

In [None]:
output = detector(raw_image, candidate_labels=candidate_labels)

In [None]:
input_scores, input_labels, input_boxes = preprocess_outputs(output)

In [None]:
# Print the bounding box coordinates
input_boxes

* Explore what is happening in the **Comet** platform.

In [None]:
from utils import make_bbox_annots

In [None]:
metadata = {
    "OWL prompt": candidate_labels,
    "SAM version": SAM_version,
    "OWL version": OWL_checkpoint,
}

In [None]:
annotations = make_bbox_annots(
    input_scores,
    input_labels,
    input_boxes,
    metadata
)

In [None]:
_ = exp.log_image(
    raw_image,
    annotations=annotations,
    metadata=metadata,
    name="OWL output no sunglasses"
)

In [None]:
result = model.predict(
    image_path_resized,
    bboxes=input_boxes[0],
    labels=np.repeat(1, len(input_boxes[0]))
)

In [None]:
from PIL.ImageFilter import GaussianBlur
blurred_img = raw_image.filter(GaussianBlur(radius=5))

In [None]:
masks = result[0].masks.data.cpu().numpy()

total_mask = np.zeros(masks[0].shape)
for mask in masks:
    total_mask = np.add(total_mask, mask)

In [None]:
# Print the result
output = np.where(
    np.expand_dims(total_mask != 0, axis=2),
    blurred_img,
    raw_image
)
plt.imshow(output)

* Analyze results in the **Comet** platform.

In [None]:
metadata = {
    "OWL prompt": candidate_labels,
    "SAM version": SAM_version,
    "OWL version": OWL_checkpoint,
}

In [None]:
_ = exp.log_image(
    output,
    name="Blurred masks no sunglasses",
    metadata=metadata,
    annotations=None
)

### Try yourself! 
Try the image editing with the following images.

>Note: the images referenced in this notebook have already been uploaded to the Jupyter directory, in this classroom, for your convenience. For further details, please refer to the **Appendix** section located at the end of the lessons.

In [None]:
cafe_img = Image.open("L3_data/cafe.jpg")
cafe_img

In [None]:
crosswalk_img = Image.open("L3_data/crosswalk.jpg")
crosswalk_img

In [None]:
metro_img = Image.open("L3_data/metro.jpg")
metro_img

In [None]:
friends_img = Image.open("L3_data/friends.jpg")
friends_img

### Additional Resources

* For more on how to use [Comet](https://www.comet.com/site/?utm_source=dlai&utm_medium=course&utm_campaign=prompt_engineering_for_vision_models&utm_content=dlai_L3) for experiment tracking, check out this [Quickstart Guide](https://colab.research.google.com/drive/1jj9BgsFApkqnpPMLCHSDH-5MoL_bjvYq?usp=sharing) and the [Comet Docs](https://www.comet.com/docs/v2/?utm_source=dlai&utm_medium=course&utm_campaign=prompt_engineering_for_vision_models&utm_content=dlai_L3).
* This course was based off a set of two blog articles from Comet. Explore them here for more on how to use newer versions of Stable Diffusion in this pipeline, additional tricks to improve your inpainting results, and a breakdown of the pipeline architecture:
  * [SAM + Stable Diffusion for Text-to-Image Inpainting](https://www.comet.com/site/blog/sam-stable-diffusion-for-text-to-image-inpainting/?utm_source=dlai&utm_medium=course&utm_campaign=prompt_engineering_for_vision_models&utm_content=dlai_L3)
  * [Image Inpainting for SDXL 1.0 Base Model + Refiner](https://www.comet.com/site/blog/image-inpainting-for-sdxl-1-0-base-refiner/?utm_source=dlai&utm_medium=course&utm_campaign=prompt_engineering_for_vision_models&utm_content=dlai_L3)