# Questions 05

A military organization wants to develop a machine learning model that can identify enemy
vehicles in satellite imagery. The model will take as input a satellite image and output a list
of bounding boxes that correspond to the location of enemy vehicles in the image. Develop
a ML solution for the aforesaid scenario with an example Dataset.

# Implemeting the necessary libraries

In [4]:
!pip install ultralytics -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/645.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.4/645.2 kB[0m [31m2.1 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━[0m [32m553.0/645.2 kB[0m [31m8.8 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m645.2/645.2 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [5]:
# For data manipulation
import pandas as pd
import numpy as np

# For saving the model
import pickle

# For warnings
import warnings
warnings.filterwarnings("ignore")


# importing the pre-trained YOLO model class
from ultralytics import YOLO
import yaml
import cv2
from google.colab.patches import cv2_imshow

## Here in this cell, we have taken the pre-trained yolov8 model.

There are many versions of YOLO models available but the factors that make the difference in each and every version is the speed,accuracy and parameters and mAP (mean average precision value).

# About the dataset


Dataset Structure
Open Images V7 is structured in multiple components catering to varied computer vision challenges:

Images: About 9 million images, often showcasing intricate scenes with an average of 8.3 objects per image.
Bounding Boxes: Over 16 million boxes that demarcate objects across 600 categories.
Segmentation Masks: These detail the exact boundary of 2.8M objects across 350 classes.
Visual Relationships: 3.3M annotations indicating object relationships, properties, and actions.
Localized Narratives: 675k descriptions combining voice, text, and mouse traces.
Point-Level Labels: 66.4M labels across 1.4M images, suitable for zero/few-shot semantic segmentation.

In [6]:
# loading the pre-trained model

base_model = YOLO("yolov8n.pt")


Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to 'yolov8n.pt'...
100%|██████████| 6.23M/6.23M [00:00<00:00, 194MB/s]


In [24]:
# making the predictions and storing in results variable

results = base_model.predict("/content/pexels-gül-işık-18984676.jpg", save=True, classes=[1,2,3,5,7])

# we have used only specific classes like [bicycles, cars, motorcycle, trucks] to display the bounding boxes


image 1/1 /content/pexels-gül-işık-18984676.jpg: 640x448 2 bicycles, 2 cars, 1 motorcycle, 156.5ms
Speed: 4.4ms preprocess, 156.5ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 448)
Results saved to [1mruns/detect/predict[0m


In [22]:
# printing the co-ordinates xyxy

for result in results:
  print(result.boxes.xyxy)

tensor([[ 816.8940, 4492.8984, 2643.5737, 6294.9946],
        [2240.9717, 4190.5327, 3001.8154, 4675.4028],
        [   0.0000, 4101.8276, 1081.4702, 4951.4741],
        [4323.6787, 4360.7671, 4589.0752, 5135.5498],
        [3110.1943, 4632.0186, 3538.6572, 5682.8394]])


In [23]:
# Printing the co-ordinates xywh

for result in results:
  print(result.boxes.xywh)

tensor([[1730.2339, 5393.9463, 1826.6797, 1802.0962],
        [2621.3936, 4432.9678,  760.8438,  484.8701],
        [ 540.7351, 4526.6509, 1081.4702,  849.6465],
        [4456.3770, 4748.1582,  265.3965,  774.7827],
        [3324.4258, 5157.4287,  428.4629, 1050.8208]])


# In the scenario, we have used the YOLO algorithms

YOLO (You look only Once) algorithm looks at the images only once and will be able to predict the objects inside the images.

This works on the concept of taking the images and forming the bounding rectangle boxes vertically and horizontally.

Once the images are formed this will take the most common area of the two bounding boxes as moving from left to right.
The algorithm used to filter the bounding boxes based on their weightage is non-maximum suppression(NMS).

This is used for sorting out the algorithms with more weightage and remove the remaining both in rectangle and horizontal boxes.

Once the area is figured out, it then classifies whether the area belongs to which class.


Here, we are listing the bounding boxes that contains 4 values.


xywh:
x_center = (box_x_left+box_x_width/2)/image_width
y_center = (box_y_top+box_height/2)/image_height
width = box_width/image_width
height = box_height/image_height


We can take all the predictions in the results variables which is of ultralytics boxes type and then by iterating through this variable, we can get each and every image prediction details.






In [None]:
!pip freeze -> requirements.txt