## 1. Install Packages

Packages contain pre-defined functions. They are needed for GD to run properly

In [None]:
%pip install torch 
%pip install torchvision
%pip install requests
%pip install supervision
%pip install transformers
%pip install addict
%pip install yapf
%pip install timm
%pip install numpy
%pip install opencv-python
%pip install supervision
%pip install cython
%pip install pycocotools

## 2. Cloning Grounding Dino
GD is currently published online in a *repo*, a platform publicly hosting code. We will clone it into the machine so we can use it freely. 

In this cell, we import the `os` library, which allows us to interact with the operating system. We then use `os.getcwd()` to get the current working directory, which will be stored in a variable named `projectdir`. Finally, we print the value of `projectdir` to verify the current directory.

In [None]:
import os
projectdir = os.getcwd()
projectdir

 We import the `torch` library for deep learning and the `requests` library for downloading files from the internet. We also clone the GroundingDINO repository from GitHub, which contains the code we'll be using in this workshop.

In [None]:
import torch
import requests

# Clone the repository
os.system("git clone https://github.com/IDEA-Research/GroundingDINO.git")

##  3. Download Weight File
When a machine learning model is trained, the information it has learnt is saved as a **model**. 
Here, we will be downloading an already-existing weight file so that our model knows how to identify objects from the get go!

In this cell, we create a directory called `weights` to store the weight file for our model. We then change our working directory to `weights` and download the weight file from a specified URL using the `requests` library. If the download is successful, we save the file in the `weights` directory and print a confirmation message. Otherwise, we print an error message.


In [None]:
# Create a directory for weights
weights_dir = os.path.join(projectdir,"weights")
os.makedirs(weights_dir, exist_ok=True)

# Change directory to the weights directory
os.chdir(weights_dir)
# Download the weight file
weight_url = "https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth"
weight_filename = os.path.basename(weight_url)
weight_filepath = os.path.join(weights_dir, weight_filename)

response = requests.get(weight_url)
if response.status_code == 200:
    with open(weight_filepath, 'wb') as f:
        f.write(response.content)
    print("Weight file downloaded successfully.")
else:
    print(f"Failed to download weight file. Status code: {response.status_code}")

## 4. Download data image sample

This cell creates a directory called `data` to store images. We then change our working directory to `data` and download a set of images from specified URLs using the `requests` library. Each image is saved in the `data` directory with its respective filename. A confirmation message is printed for each successful download, and an error message is printed if a download fails.
Note that these images are only being downloaded for demo purposes in the workshop -- you can also add your own images later on!

In [None]:
# Create a directory for data
data_dir = os.path.join(projectdir,"data")
os.makedirs(data_dir, exist_ok=True)

# Change directory to the data directory
os.chdir(data_dir)

# URLs of the images to download
image_urls = {
    "compass.jpg": "https://unsplash.com/photos/xu2WYJek5AI/download?ixid=M3wxMjA3fDB8MXxzZWFyY2h8MTV8fGNvbXBhc3N8ZW58MHx8fHwxNjg5MTc2NzMyfDA&force=true&w=960",
    "air.jpg": "https://unsplash.com/photos/AlA8S9tALAs/download?ixid=M3wxMjA3fDB8MXxzZWFyY2h8MTR8fHBhcmFjaHV0ZXxlbnwwfHx8fDE2ODkwOTU1MTJ8MA&force=true&w=960",
    "ocean.jpg": "https://unsplash.com/photos/1PWhYZ_erME/download?ixid=M3wxMjA3fDB8MXxhbGx8fHx8fHx8fHwxNjg5MDA2MTk5fA&force=true&w=960",
    "snow.jpg": "https://unsplash.com/photos/MB1FuEh0AzU/download?ixid=M3wxMjA3fDB8MXxzZWFyY2h8NHx8c25vd2JvYXJkZXJzfGVufDB8MHx8fDE2ODkwMTk0NTB8MA&force=true&w=960",
    "hardware.jpg": "https://unsplash.com/photos/lllK4-63KTw/download?ixid=M3wxMjA3fDB8MXxzZWFyY2h8Mnx8Ym9sdCUyMGFuZCUyMHdhc2hlcnxlbnwwfHx8fDE2ODkxNzg1NTN8MA&force=true&w=960"
}

# Download each image
for filename, url in image_urls.items():
    response = requests.get(url)
    if response.status_code == 200:
        with open(os.path.join(data_dir, filename), 'wb') as f:
            f.write(response.content)
        print(f"{filename} downloaded successfully.")
    else:
        print(f"Failed to download {filename}. Status code: {response.status_code}")

## 5. Load Model
In this section, we import the necessary modules and define paths for the GroundingDINO model configuration and weights. We then use the `load_model` function from the `groundingdino.util.inference` module to load the model with the specified configuration and weights. This prepares our model to be used right away!

In [None]:
import os
os.chdir(projectdir)
%cd GroundingDINO

In [None]:
from groundingdino.util.inference import load_model

# Define paths
groundingdino_dir = os.path.join(projectdir, "GroundingDINO")
model_config_path = os.path.join(groundingdino_dir, "groundingdino/config/GroundingDINO_SwinT_OGC.py")
weights_path = os.path.join(projectdir, "weights/groundingdino_swint_ogc.pth")

# Load model
model = load_model(model_config_path, weights_path)



## 6. Play with GroundingDINO🦖

In this cell, we perform object detection using the loaded GroundingDINO model. We define the image name, path, and text prompt for the detection, along with thresholds for boxes and text. We load the image using the `load_image` function and then use the `predict` function to perform object detection based on the specified text prompt. The detected objects are then annotated on the image using the `annotate` function. Finally, we display the annotated image using the `plot_image` function from the `supervision` module. This demonstrates the ability of our model to detect and highlight objects in an image based on textual descriptions.

In [None]:
import supervision as sv
from groundingdino.util.inference import load_image, predict, annotate

# Define constants and paths
IMAGE_NAME = "compass.jpg"
IMAGE_PATH = os.path.join(projectdir, "data", IMAGE_NAME)
TEXT_PROMPT = "compass"
BOX_THRESHOLD = 0.70
TEXT_THRESHOLD = 0.25
DEVICE = "cpu"  # Specify "cpu" as the device

# Load image
image_source, image = load_image(IMAGE_PATH)
print(image.shape)

# Perform object detection
boxes, logits, phrases = predict(
    model=model,
    image=image,
    caption=TEXT_PROMPT,
    box_threshold=BOX_THRESHOLD,
    text_threshold=TEXT_THRESHOLD,
    device=DEVICE  # Pass "cpu" as the device
)

# Annotate the image
annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)

# Display the annotated image
sv.plot_image(annotated_frame, (16, 16))



In this cell, we apply the GroundingDINO model for object detection on a different image and text prompt. We define the image name as "hardware.jpg" and the text prompt as "spanner". 

The same process is followed as before: we load the image, perform object detection using the `predict` function with the specified text prompt and thresholds, and then annotate the detected objects on the image. Finally, we display the annotated image to visualize the results of our object detection task. This showcases the versatility of our model in detecting various objects based on textual descriptions.

In [None]:
# Define constants and paths
IMAGE_NAME = "hardware.jpg"
IMAGE_PATH = os.path.join(projectdir, "data", IMAGE_NAME)
TEXT_PROMPT = "spanner"
BOX_THRESHOLD = 0.70
TEXT_THRESHOLD = 0.25
DEVICE = "cpu"  # Specify "cpu" as the device

# Load image
image_source, image = load_image(IMAGE_PATH)

# Perform object detection
boxes, logits, phrases = predict(
    model=model,
    image=image,
    caption=TEXT_PROMPT,
    box_threshold=BOX_THRESHOLD,
    text_threshold=TEXT_THRESHOLD,
    device=DEVICE  # Pass "cpu" as the device
)

# Annotate the image
annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)

# Display the annotated image
sv.plot_image(annotated_frame, (16, 16))

___________________________________________________________________

# GD + OPENCV/ Open Webcam and Snap frame

In this cell, we expand the application of the GroundingDINO model to perform object detection on frames captured from a webcam. We define constants and paths for capturing and storing frames, along with the text prompt "face" for object detection. The process involves capturing frames at specified intervals, performing object detection on each frame using the `predict` function, annotating the detected objects, and finally displaying the annotated frames. This demonstrates how our model can be applied to real-time object detection in video streams or live camera feeds.


In [None]:
import os
import cv2
import time
import supervision as sv
from groundingdino.util.inference import load_image, predict, annotate

# Constants and paths
IMAGE_FOLDER = "snap"
TEXT_PROMPT = "face"
BOX_THRESHOLD = 0.50
TEXT_THRESHOLD = 0.25
DEVICE = "cpu"  # Specify "cpu" as the device
SNAP_INTERVAL = 2   # Interval to capture frames (in seconds)
NUM_SNAPS = 5     # Total number of frames to capture

# Change directory to the data directory
os.chdir(projectdir)

# Check if the image folder exists and delete its contents if it does
if os.path.exists(IMAGE_FOLDER):
    for filename in os.listdir(IMAGE_FOLDER):
        file_path = os.path.join(IMAGE_FOLDER, filename)
        try:
            if os.path.isfile(file_path):
                os.unlink(file_path)
        except Exception as e:
            print(e)

# Create folder if it doesn't exist
else:
    os.makedirs(IMAGE_FOLDER)

# Function to capture frames from webcam and save them to the folder
def capture_frames(folder, interval, num_snaps):
    cap = cv2.VideoCapture(0)  # 0 for default webcam
    
    time.sleep(2)  # Delay start by 2 seconds
    
    frame_count = 0
    while frame_count < num_snaps:
        ret, frame = cap.read()
        if not ret:
            break

        cv2.imshow('snap', frame)
        frame_count += 1

        # Save frame every interval seconds
        image_name = f"snap_{frame_count}.jpg"
        cv2.imwrite(os.path.join(folder, image_name), frame)

        time.sleep(interval)
    cv2.destroyAllWindows()


# Function to annotate images in the folder
def annotate_images(folder):
    annotated_images = []
    for filename in os.listdir(folder):
        if filename.endswith(".jpg") or filename.endswith(".jpeg") or filename.endswith(".png"):
            image_path = os.path.join(folder, filename)
            image_source, image = load_image(image_path)

            # Perform object detection
            boxes, logits, phrases = predict(
                model=model,
                image=image,
                caption=TEXT_PROMPT,
                box_threshold=BOX_THRESHOLD,
                text_threshold=TEXT_THRESHOLD,
                device=DEVICE
            )

            # Annotate the image
            annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)
            annotated_images.append((image_path, annotated_frame))

    return annotated_images

# Capture frames from webcam and save them
capture_frames(IMAGE_FOLDER,SNAP_INTERVAL, NUM_SNAPS)

# Annotate images in the folder and replace them
annotated_images = annotate_images(IMAGE_FOLDER)
for image_path, annotated_frame in annotated_images:
    cv2.imwrite(image_path, annotated_frame)

# Display annotated images
for image_path, annotated_frame in annotated_images:
    sv.plot_image(annotated_frame, (16, 16), f"Annotated Image: {image_path}")
    
cv2.destroyAllWindows()


______________________________________________________________________________________________________________________________________

## Now, time to use your own images!
Here, as long as you only edit the values of the variables below, you can use your own images for inference!

In [None]:
#write the name of the files here
image_names = ["example1.jpg",]
image_prompts = ["prompt for example 1",] #add them in order!

Run the cell below and see the results!

In [None]:
DEVICE = "cpu"
BOX_THRESHOLD = 0.70
TEXT_THRESHOLD = 0.25
image_sources, processed_images = [], []
for image, prompt in image_names, image_prompts:
    path = os.path.join(projectdir, "data", image)
    s,i = load_image(path)
    image_sources.append(s)
    processed_images.append(i)

    boxes, logits, phrases = predict(
    model=model,
    image=image,
    caption=prompt,
    box_threshold=BOX_THRESHOLD,
    text_threshold=TEXT_THRESHOLD,
    device=DEVICE  # Pass "cpu" as the device
)
annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)

# Display the annotated image
sv.plot_image(annotated_frame, (16, 16))