# **THEORETICAL**

1. **Main purpose of RCNN in object detection:**

RCNN (Region-based Convolutional Neural Network) focuses on object detection by generating region proposals (potential bounding boxes) and classifying them using CNNs. Its primary goal is to locate and classify objects within an image.

2. **Difference between Fast RCNN and Faster RCNN:**
- Fast RCNN improves upon RCNN by sharing computation across regions using a single forward pass, making it faster and more efficient.
- Faster RCNN introduces a Region Proposal Network (RPN) to replace the slow selective search algorithm in Fast RCNN, making it significantly faster by integrating proposal generation directly into the network.

3.  **How does YOLO handle object detection in real-time?**

YOLO (You Only Look Once) treats object detection as a single regression problem, predicting bounding boxes and class probabilities simultaneously in a single forward pass, enabling real-time performance.


4. **Explain the concept of Region Proposal Networks (RPN) in Faster RCNN.**

RPNs are a fully convolutional network that generates potential object regions (proposals) directly from feature maps. It outputs bounding boxes and objectness scores, which are refined by subsequent layers.



5. **How does YOLOv9 improve upon its predecessors?**

YOLOv9 introduces advancements such as a more efficient backbone network, better feature aggregation techniques, improved anchor-free mechanisms, and enhanced attention mechanisms, boosting speed and accuracy.



6. **What role does non-max suppression play in YOLO object detection?**

Non-Max Suppression (NMS) eliminates redundant bounding boxes by keeping the one with the highest confidence score and suppressing others with significant overlap, ensuring one bounding box per detected object.

7. **Describe the data preparation process for training YOLOv9.**

- Data collection: Gather diverse labeled images for training.
- Annotation: Use tools like LabelImg to annotate bounding boxes and classes.
- Normalization: Scale image pixel values.
- Data Augmentation: Apply transformations like rotation, scaling, or flipping - to enhance dataset diversity.
- Dataset Formatting: Prepare files in YOLO format (text files with bounding box coordinates).

8. **What is the significance of anchor boxes in object detection models like YOLOv9?**

Anchor boxes represent predefined shapes and sizes used to predict bounding boxes. They help in detecting objects of various scales and aspect ratios efficiently.



9. **What is the key difference between YOLO and R-CNN architectures?**

- YOLO: Performs object detection as a single regression task in one pass.
- R-CNN: Uses a two-step approach, first generating region proposals and then classifying them, making it slower.

10. **Why is Faster RCNN considered faster than Fast RCNN?**

Faster RCNN integrates the RPN for generating region proposals, eliminating the need for an external proposal generator like selective search, which reduces computation time.

11. **What is the role of selective search in RCNN?**

Selective search generates region proposals by grouping similar pixels, which are then passed to a CNN for classification. It is computationally intensive and a bottleneck in RCNN

12. **How does YOLOv9 handle multiple classes in object detection?**

YOLOv9 outputs class probabilities for each bounding box. For every detected bounding box, it predicts the likelihood of belonging to each class, and the highest probability determines the class.



13. **What are the key differences between YOLOv3 and YOLOv9?**

- YOLOv3: Uses Darknet-53 as the backbone and anchor-based detection.
- YOLOv9: Incorporates advanced backbones, attention mechanisms, anchor-free techniques, and better optimization for speed and accuracy.


14. **How is the loss function calculated in Faster RCNN?**

The loss function in Faster RCNN is a combination of:

- Classification loss: Cross-entropy for object classification.
- Regression loss: Smooth L1 loss for bounding box refinement.

15. **Explain how YOLOv9 improves speed compared to earlier versions.**

YOLOv9 leverages lighter and more efficient backbone networks, enhanced feature aggregation, and anchor-free mechanisms to reduce computational complexity and increase processing speed.

16. **What are some challenges faced in training YOLOv9?**

- Requires large annotated datasets.
- Balancing detection of small and large objects.
- Computational resource constraints for large models.
- Overfitting in small datasets.

18. **What is the significance of fine-tuning in YOLO?**

Fine-tuning allows adapting a pre-trained YOLO model to a specific dataset, improving performance on a particular task while reducing training time and resource requirements.

19. **What is the concept of bounding box regression in Faster RCNN?**

Bounding box regression refines the coordinates of predicted bounding boxes to better align with ground truth by minimizing the difference (via regression loss).

20. **Describe how transfer learning is used in YOLO.**

Transfer learning uses a pre-trained YOLO model trained on a large dataset (e.g., COCO) and fine-tunes it on a smaller, task-specific dataset by freezing initial layers and training later layers.



21. **What is the role of the backbone network in object detection models like YOLOv9?**

The backbone network (e.g., CSPNet in YOLOv9) extracts features from the input image, which are used by subsequent layers for object detection.

22. **How does YOLO handle overlapping objects?**

YOLO uses Non-Max Suppression (NMS) to resolve overlapping objects by retaining the bounding box with the highest confidence score.

23. **What is the importance of data augmentation in object detection?**

Data augmentation increases dataset diversity, helps prevent overfitting, and improves model generalization by applying transformations like flipping, rotation, scaling, and color adjustment.

24. **How is performance evaluated in YOLO-based object detection?**

Performance is measured using:
- Precision and recall.
- Intersection over Union (IoU).
- Mean Average Precision (mAP) for multiple classes.

25. **How do the computational requirements of Faster RCNN compare to those of YOLO?**

Faster RCNN is computationally intensive due to its two-stage architecture, while YOLO is lightweight and faster, making it suitable for real-time applications.

26. **What role do convolutional layers play in object detection with RCNN?**

Convolutional layers extract spatial features from images, such as edges and textures, which are essential for region classification and bounding box prediction.



27. **How does the loss function in YOLO differ from other object detection models?**

YOLO combines multiple loss components:
- Localization loss for bounding box prediction.
- Confidence loss for object presence.
- Classification loss for class prediction.


28. **What are the key advantages of using YOLO for real-time object detection?**
- Single-stage architecture for speed.
- High inference speed suitable for real-time applications.
- Simultaneous detection and classification in one pass.

29. **How does Faster RCNN handle the trade-off between accuracy and speed?**

Faster RCNN uses RPNs to reduce the number of proposals, improving efficiency while maintaining accuracy through multi-stage processing.

30. **What is the role of the backbone network in both YOLO and Faster RCNN, and how do they differ?**

- In YOLO, the backbone (e.g., CSPNet) is designed for real-time feature extraction and lightweight processing.
- In Faster RCNN, the backbone (e.g., ResNet) focuses on extracting detailed features for accuracy over speed.

# **PRACTICAL**

In [6]:
pip install ultralytics opencv-python numpy


Note: you may need to restart the kernel to use updated packages.


In [11]:
# 1. How do you load and run inference on a custom image using the YOLOv8 model (labeled as YOLOv9)6

from ultralytics import YOLO

# Load the pretrained YOLOv8 model (YOLOv9 if applicable)
model = YOLO('yolov8n.pt')  # Replace 'yolov8n.pt' with 'yolov9.pt' if you're using YOLOv9
image_path="images (1).jpeg"

results=model.predict(image_path)



image 1/1 c:\Users\abhin\Desktop\ABHINAYA\ASSIGNMENT\images (1).jpeg: 416x640 1 dog, 69.4ms
Speed: 3.0ms preprocess, 69.4ms inference, 1.0ms postprocess per image at shape (1, 3, 416, 640)


In [12]:
# 2. How do you load the Faster RCNN model with a ResNet50 backbone and print its architecture
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn

# Load the Faster R-CNN model with a ResNet-50 backbone
model = fasterrcnn_resnet50_fpn(pretrained=True)  # Set 'pretrained=False' to load untrained model

# Print the architecture of the model
print(model)



Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to C:\Users\abhin/.cache\torch\hub\checkpoints\fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
100%|██████████| 160M/160M [00:34<00:00, 4.83MB/s] 


FasterRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64, eps=0.0)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64, eps=0.0)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64, eps=0.0)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256, eps=0.0)
          (relu): ReLU(

In [None]:
#3. How do you perform inference on an online image using the Faster RCNN model and print the predictions?

import torch
import requests
from PIL import Image
from io import BytesIO
import torchvision.transforms as T
from torchvision.models.detection import fasterrcnn_resnet50_fpn

# Load the pre-trained Faster R-CNN model
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()  # Set the model to evaluation mode

# URL of the image
image_url = 'https://as1.ftcdn.net/v2/jpg/05/12/59/86/1000_F_512598633_WDA3L2yTL8ylYQtuz4ob7kHKNBL3T1b0.jpg'

# Fetch and open the image
response = requests.get(image_url)
img = Image.open(BytesIO(response.content))

# Define the image transform for Faster R-CNN
transform = T.Compose([
    T.ToTensor(),  # Convert image to tensor
])

# Apply the transformation
img_tensor = transform(img)

# Make the image batch-like (add a batch dimension)
img_tensor = img_tensor.unsqueeze(0)

# Perform inference
with torch.no_grad():  # Disable gradient calculation for inference
    prediction = model(img_tensor)

# Extract predictions (bounding boxes, labels, and scores)
boxes = prediction[0]['boxes']  # Bounding boxes
labels = prediction[0]['labels']  # Labels
scores = prediction[0]['scores']  # Scores for each detection

# Print the predictions
for i in range(len(boxes)):
    print(f"Prediction {i+1}:")
    print(f"Bounding Box: {boxes[i]}")
    print(f"Label: {labels[i]}")
    print(f"Score: {scores[i]}")


Prediction 1:
Bounding Box: tensor([483.7888, 240.9229, 723.1383, 521.0302])
Label: 1
Score: 0.9991336464881897
Prediction 2:
Bounding Box: tensor([ 63.2200, 225.4734, 317.1106, 523.9548])
Label: 1
Score: 0.9964587092399597
Prediction 3:
Bounding Box: tensor([271.3429, 193.1448, 386.7447, 485.1668])
Label: 1
Score: 0.9936636686325073
Prediction 4:
Bounding Box: tensor([434.1417, 271.4573, 530.8661, 502.3597])
Label: 1
Score: 0.9805803298950195
Prediction 5:
Bounding Box: tensor([804.0786, 367.3167, 836.8406, 396.5289])
Label: 1
Score: 0.9457072019577026
Prediction 6:
Bounding Box: tensor([205.6631, 236.0242, 291.8345, 437.3893])
Label: 1
Score: 0.9445033073425293
Prediction 7:
Bounding Box: tensor([801.0987, 367.0279, 820.9108, 396.2760])
Label: 1
Score: 0.9244999289512634
Prediction 8:
Bounding Box: tensor([241.0067, 348.4763, 268.9014, 373.5356])
Label: 47
Score: 0.9156351685523987
Prediction 9:
Bounding Box: tensor([298.3379, 266.8593, 530.3373, 516.7930])
Label: 1
Score: 0.88179641

In [18]:
import shutil
import os

# Path to the cache folder
cache_dir = os.path.expanduser('~/.cache/torch/hub/')

# Remove YOLOv5 cache
shutil.rmtree(os.path.join(cache_dir, 'ultralytics_yolov5'))


FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\abhin/.cache/torch/hub/ultralytics_yolov5'

In [20]:
# 4. How do you load an image and perform inference using YOLOv9, then display the detected objects with bounding boxes and class labels6

#Don't know the ans

In [21]:
# 5.  How do you display bounding boxes for the detected objects in an image using Faster RCNN?
import torch
from torchvision import models, transforms
import cv2
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# Load the pre-trained Faster R-CNN model
model = models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()  # Set the model to evaluation mode

# Load the image (replace with your image file)
img = Image.open('images (1).jpeg')

# Define the transformation pipeline (normalize the image as required by the model)
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert image to tensor
])

# Apply the transformation to the image
img_tensor = transform(img).unsqueeze(0)  # Add batch dimension

# Perform inference
with torch.no_grad():
    predictions = model(img_tensor)

# Extract predictions (boxes, labels, scores)
boxes = predictions[0]['boxes'].cpu().numpy()
labels = predictions[0]['labels'].cpu().numpy()
scores = predictions[0]['scores'].cpu().numpy()

# Filter out low-confidence predictions
threshold = 0.5
valid_boxes = boxes[scores >= threshold]
valid_labels = labels[scores >= threshold]
valid_scores = scores[scores >= threshold]

# Convert image to OpenCV format
img_cv = np.array(img)
img_cv = cv2.cvtColor(img_cv, cv2.COLOR_RGB2BGR)

# Draw bounding boxes and labels on the image
for box, label, score in zip(valid_boxes, valid_labels, valid_scores):
    x1, y1, x2, y2 = box.astype(int)
    cv2.rectangle(img_cv, (x1, y1), (x2, y2), (0, 255, 0), 2)
    cv2.putText(img_cv, f'{label} {score:.2f}', (x1, y1 - 10),
                cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 0, 0), 2)

# Display the image with bounding boxes
cv2.imshow("Detected Objects", img_cv)
cv2.waitKey(0)
cv2.destroyAllWindows()





In [23]:
#6.How do you perform inference on a local image using Faster RCNN6
import torch
from torchvision import models, transforms
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import cv2

# Load the pre-trained Faster R-CNN model
model = models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()  # Set the model to evaluation mode

# Load the image from your local path (replace 'path_to_image.jpg' with your image file)
img_path = 'images (1).jpeg'
img = Image.open(img_path)

# Define the transformation pipeline (convert image to tensor)
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert the image to a tensor
])

# Apply the transformation to the image
img_tensor = transform(img).unsqueeze(0)  # Add a batch dimension

# Perform inference (detect objects)
with torch.no_grad():  # Disable gradient calculations since we're not training
    predictions = model(img_tensor)

# Extract predictions: boxes, labels, and scores
boxes = predictions[0]['boxes'].cpu().numpy()  # Bounding boxes
labels = predictions[0]['labels'].cpu().numpy()  # Class labels
scores = predictions[0]['scores'].cpu().numpy()  # Confidence scores

# Filter out boxes with low confidence (score threshold)
threshold = 0.5
valid_boxes = boxes[scores >= threshold]
valid_labels = labels[scores >= threshold]
valid_scores = scores[scores >= threshold]

# Convert the image to OpenCV format for visualization
img_cv = np.array(img)
img_cv = cv2.cvtColor(img_cv, cv2.COLOR_RGB2BGR)

# Loop through valid boxes and draw them on the image
for box, label, score in zip(valid_boxes, valid_labels, valid_scores):
    x1, y1, x2, y2 = box.astype(int)
    cv2.rectangle(img_cv, (x1, y1), (x2, y2), (0, 255, 0), 2)  # Draw bounding box
    cv2.putText(img_cv, f'{label} {score:.2f}', (x1, y1 - 10), 
                cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 0, 0), 2)  # Add label and score

# Display the image with bounding boxes
cv2.imshow("Detected Objects", img_cv)
cv2.waitKey(0)
cv2.destroyAllWindows()


In [31]:
# 7. How can you change the confidence threshold for YOLO object detection and filter out low-confidencem predictions

# not sure about the ans

import matplotlib.pyplot as plt

# Example lists of training and validation loss (replace with your actual loss values)
train_losses = []
val_losses = []

# Simulating training loop (replace with your actual model training)
for epoch in range(10):  # Assuming 10 epochs for this example
    # Simulated loss values (replace with actual loss values during training)
    train_loss = 0.1 * epoch  # Example training loss (just a placeholder)
    val_loss = 0.1 * epoch + 0.05  # Example validation loss (just a placeholder)

    # Append the loss values to their respective lists
    train_losses.append(train_loss)
    val_losses.append(val_loss)

# Plot the training and validation loss curves
plt.figure(figsize=(10, 6))
plt.plot(range(10), train_losses, label="Training Loss", color='blue', marker='o')
plt.plot(range(10), val_losses, label="Validation Loss", color='red', marker='x')
plt.title("Training and Validation Loss Curves")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.grid(True)
plt.show()




  plt.show()


In [34]:
import matplotlib.pyplot as plt

# Example lists of training and validation loss (replace with your actual loss values)
train_losses = []
val_losses = []

# Simulating training loop (replace with your actual model training)
for epoch in range(10):  # Assuming 10 epochs for this example
    # Simulated loss values (replace with actual loss values during training)
    train_loss = 0.1 * epoch  # Example training loss (just a placeholder)
    val_loss = 0.1 * epoch + 0.05  # Example validation loss (just a placeholder)

    # Append the loss values to their respective lists
    train_losses.append(train_loss)
    val_losses.append(val_loss)

# Plot the training and validation loss curves
plt.figure(figsize=(10, 6))
plt.plot(range(10), train_losses, label="Training Loss", color='blue', marker='o')
plt.plot(range(10), val_losses, label="Validation Loss", color='red', marker='x')
plt.title("Training and Validation Loss Curves")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.grid(True)
plt.show()



  plt.show()


In [36]:
import torch
import os
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt
from torchvision import transforms
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.transforms import functional as F

# Step 1: Load pre-trained Faster R-CNN model
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()  # Set the model to evaluation mode

# Step 2: Define the transformation for input images
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert image to tensor
])

# Step 3: Function to perform inference and display bounding boxes
def perform_inference_on_folder(image_folder_path):
    # Step 4: Iterate through the images in the folder
    for img_name in os.listdir(image_folder_path):
        img_path = os.path.join(image_folder_path, img_name)
        
        if img_path.endswith(('.png', '.jpg', '.jpeg')):  # Check for valid image files
            # Step 5: Load the image
            img = Image.open(img_path).convert("RGB")
            
            # Step 6: Apply the transformation (Convert image to tensor)
            img_tensor = transform(img).unsqueeze(0)  # Add batch dimension
            
            # Step 7: Perform inference
            with torch.no_grad():  # Disable gradient calculation
                prediction = model(img_tensor)
            
            # Step 8: Get bounding boxes and labels
            boxes = prediction[0]['boxes']
            labels = prediction[0]['labels']
            scores = prediction[0]['scores']
            
            # Step 9: Set a confidence threshold
            confidence_threshold = 0.5
            filtered_boxes = boxes[scores >= confidence_threshold]
            filtered_labels = labels[scores >= confidence_threshold]
            filtered_scores = scores[scores >= confidence_threshold]
            
            # Step 10: Display the image with bounding boxes
            draw = ImageDraw.Draw(img)
            
            for box, label, score in zip(filtered_boxes, filtered_labels, filtered_scores):
                box = box.tolist()  # Convert tensor to list for drawing
                draw.rectangle(box, outline="red", width=3)  # Draw bounding box
                
                # Draw label text
                label_text = f"Label: {label.item()} Score: {score.item():.2f}"
                draw.text((box[0], box[1]), label_text, fill="red")
            
            # Step 11: Show the image with bounding boxes
            plt.figure(figsize=(8, 8))
            plt.imshow(img)
            plt.axis('off')
            plt.show()

# Step 12: Define the folder path containing the images
image_folder_path = "img"  # Replace with your folder path

# Step 13: Call the function to perform inference on the images in the folder
perform_inference_on_folder(image_folder_path)


  plt.show()


In [40]:
import torch
import os
from PIL import Image, ImageDraw, ImageFont
import matplotlib.pyplot as plt
from torchvision import transforms
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.transforms import functional as F

# Step 1: Load pre-trained Faster R-CNN model
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()  # Set the model to evaluation mode

# Step 2: Define the transformation for input images
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert image to tensor
])

# Step 3: Function to perform inference and display bounding boxes with confidence scores
def perform_inference_on_image(image_path):
    # Step 4: Load the image
    img = Image.open(image_path).convert("RGB")
    
    # Step 5: Apply the transformation (Convert image to tensor)
    img_tensor = transform(img).unsqueeze(0)  # Add batch dimension
    
    # Step 6: Perform inference
    with torch.no_grad():  # Disable gradient calculation
        prediction = model(img_tensor)
    
    # Step 7: Get bounding boxes, labels, and scores
    boxes = prediction[0]['boxes']
    labels = prediction[0]['labels']
    scores = prediction[0]['scores']
    
    # Step 8: Set a confidence threshold
    confidence_threshold = 0.5
    filtered_boxes = boxes[scores >= confidence_threshold]
    filtered_labels = labels[scores >= confidence_threshold]
    filtered_scores = scores[scores >= confidence_threshold]
    
    # Step 9: Draw bounding boxes and labels with confidence scores
    draw = ImageDraw.Draw(img)
    
    for box, label, score in zip(filtered_boxes, filtered_labels, filtered_scores):
        box = box.tolist()  # Convert tensor to list for drawing
        draw.rectangle(box, outline="red", width=3)  # Draw bounding box
        
        # Create the label text (Label ID and confidence score)
        label_text = f"Label: {label.item()} Score: {score.item():.2f}"
        
        # Draw the label text near the bounding box
        draw.text((box[0], box[1] - 10), label_text, fill="red")
    
    # Step 10: Display the image with bounding boxes and confidence scores
    plt.figure(figsize=(8, 8))
    plt.imshow(img)
    plt.axis('off')
    plt.show()

# Step 11: Define the image path
image_path = "images (1).jpeg"  # Replace with the path to your image

# Step 12: Call the function to perform inference and display bounding boxes with confidence scores
perform_inference_on_image(image_path)


  plt.show()


In [42]:
pip uninstall yolov5


Note: you may need to restart the kernel to use updated packages.




In [43]:
pip install yolov5


Collecting yolov5
  Downloading yolov5-7.0.14-py37.py38.py39.py310-none-any.whl.metadata (10 kB)
Collecting gitpython>=3.1.30 (from yolov5)
  Using cached GitPython-3.1.43-py3-none-any.whl.metadata (13 kB)
Collecting thop>=0.1.1 (from yolov5)
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl.metadata (2.7 kB)
Collecting fire (from yolov5)
  Downloading fire-0.7.0.tar.gz (87 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting boto3>=1.19.1 (from yolov5)
  Downloading boto3-1.35.77-py3-none-any.whl.metadata (6.7 kB)
Collecting sahi>=0.11.10 (from yolov5)
  Downloading sahi-0.11.19-py3-none-any.whl.metadata (17 kB)
Collecting huggingface-hub<0.25.0,>=0.12.0 (from yolov5)
  Downloading huggingface_hub-0.24.7-py3-none-any.whl.metadata (13 kB)
Collecting roboflow>=0.2.29 (from yolov5)
  Downloading roboflow-1.1.49-py3-none-any.whl.metadata (9.7 kB)
Collecting botocore<1.36.0,>=1.35.77 (from boto3>=1.19.1->yolov5)
  Downl

  You can safely remove it manually.
  You can safely remove it manually.
ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'c:\\Users\\abhin\\anaconda3\\envs\\assignment_env\\Lib\\site-packages\\cv2\\cv2.pyd'
Consider using the `--user` option or check the permissions.



In [1]:
import torch
import cv2
from PIL import Image, ImageDraw
import numpy as np

# Step 1: Load the pre-trained YOLOv5 model (small version in this example)
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
model.eval()  # Set the model to evaluation mode

# Step 2: Load the image
image_path = 'images (1).jpeg'  # Replace with your image path
img = Image.open(image_path)

# Step 3: Perform inference on the image using the YOLO model
results = model(img)

# Step 4: Get the bounding boxes, labels, and confidence scores
boxes = results.xywh[0][:, :-2]  # Bounding boxes (x, y, width, height)
labels = results.xywh[0][:, -2].int()  # Labels of detected objects
scores = results.xywh[0][:, -1]  # Confidence scores

# Step 5: Convert the image to numpy array for OpenCV compatibility
img_cv = np.array(img)

# Step 6: Draw the bounding boxes on the image
draw = ImageDraw.Draw(img)
for box, label, score in zip(boxes, labels, scores):
    x1, y1, w, h = box
    x1, y1, w, h = int(x1 - w / 2), int(y1 - h / 2), int(w), int(h)  # Convert to top-left corner and width/height

    # Draw bounding box
    draw.rectangle([x1, y1, x1 + w, y1 + h], outline="red", width=3)

    # Add label and confidence score
    label_text = f"Label: {label.item()} Score: {score:.2f}"
    draw.text((x1, y1 - 10), label_text, fill="red")

# Step 7: Save the resulting image with bounding boxes
output_image_path = 'output_image_with_bboxes.jpg'  # Path to save the image
img.save(output_image_path)

# Optionally, display the image
img.show()

print(f"Saved the image with bounding boxes to {output_image_path}")


Using cache found in C:\Users\abhin/.cache\torch\hub\ultralytics_yolov5_master


[31m[1mrequirements:[0m Ultralytics requirement ['gitpython>=3.1.30'] not found, attempting AutoUpdate...
Collecting gitpython>=3.1.30
  Downloading GitPython-3.1.43-py3-none-any.whl.metadata (13 kB)
Collecting gitdb<5,>=4.0.1 (from gitpython>=3.1.30)
  Downloading gitdb-4.0.11-py3-none-any.whl.metadata (1.2 kB)
Downloading GitPython-3.1.43-py3-none-any.whl (207 kB)
Downloading gitdb-4.0.11-py3-none-any.whl (62 kB)
Installing collected packages: gitdb, gitpython
Successfully installed gitdb-4.0.11 gitpython-3.1.43

[31m[1mrequirements:[0m AutoUpdate success  4.5s, installed 1 package: ['gitpython>=3.1.30']
[31m[1mrequirements:[0m  [1mRestart runtime or rerun command for updates to take effect[0m



YOLOv5  2024-12-10 Python-3.9.20 torch-2.5.1+cpu CPU

Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt to yolov5s.pt...
100%|██████████| 14.1M/14.1M [00:03<00:00, 4.24MB/s]

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape... 
  with amp.autocast(autocast):


Saved the image with bounding boxes to output_image_with_bboxes.jpg
