# Simple YOLO inference : LAB 1 Bonus

> This lab is meant to run locally, prefered on a windows machine because we'll use the OpenCV library

In this LAB 1  Bonus we will :

- Run YOLOv8 inference
- you can also run inference on your webcam if you run this notbook on windows
- We will also run inference on crime dataset


In [None]:
%pip install --upgrade torch torchvision
%pip install --upgrade ultralytics

In [1]:
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

cuda


## Simple inference tests

In [2]:
import torch
import torchvision
import ultralytics

print(f"PyTorch version: {torch.__version__}")
print(f"Torchvision version: {torchvision.__version__}")
print(f"Ultralytics version: {ultralytics.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU device: {torch.cuda.get_device_name(0)}")

PyTorch version: 2.5.1+cu121
Torchvision version: 0.20.1+cu121
Ultralytics version: 8.3.64
CUDA available: True
CUDA version: 12.1
GPU device: NVIDIA GeForce RTX 3060 Laptop GPU


In [3]:
from ultralytics import YOLO
import cv2

# Load pretrained nano version 
model = YOLO('yolov8n.pt')

def detect_objects(image_path):
    # Run inference
    results = model(image_path)
    
    # Process results
    for result in results:
        boxes = result.boxes  # Bounding box objects
        for box in boxes:
            x1, y1, x2, y2 = box.xyxy[0]  #(top, left, bottom, right)
            
            # Get confidence
            confidence = box.conf[0]
            
            # Get class name
            class_id = int(box.cls[0])
            class_name = result.names[class_id]
            
            print(f'Detected {class_name} with confidence {confidence:.2f} at location {x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}')

# Example usage
image_path = 'test.jpg'  # Replace with your image path
detect_objects(image_path)

# For webcam/video:
# results = model.predict(source="0", show=True)  # 0 for webcam if you run on windows
# results = model.predict(source="video.mp4", show=True)  # for video file
# results = model.predict(source="test.jpg", show=True)


image 1/1 d:\CODE\AI_Action_Recognition\test.jpg: 448x640 2 persons, 1 car, 1 airplane, 220.6ms
Speed: 10.0ms preprocess, 220.6ms inference, 858.8ms postprocess per image at shape (1, 3, 448, 640)
Detected airplane with confidence 0.85 at location 2, 282, 430, 433
Detected person with confidence 0.41 at location 378, 359, 403, 419
Detected person with confidence 0.30 at location 303, 372, 322, 437
Detected car with confidence 0.26 at location 599, 378, 623, 391


## Run inference on DDCASS

1. Download the dataset from kaggle here : [Kaggle DCSASS link](https://www.kaggle.com/datasets/mateohervas/dcsass-dataset?resource=download)
2. Unzip it in the current directory, we'll onyl use the "*shoplifting*" folder for our example.

In [4]:
import os
from ultralytics import YOLO
import cv2

# Load the YOLO Pose model
model = YOLO('yolov8n-pose.pt')  # Use the pose-specific YOLO model

# Path to the DCSASS Shoplifting directory
dcsass_path = "./DCSASS/Shoplifting"

# Iterate through directories and files
for folder in os.listdir(dcsass_path):
    folder_path = os.path.join(dcsass_path, folder)
    if os.path.isdir(folder_path):
        for file in os.listdir(folder_path):
            video_path = os.path.join(folder_path, file)
            
            # Perform pose detection
            results = model.predict(source=video_path, show=True)
            
            # Access and process the pose results
            for result in results:
                keypoints = result.keypoints  # Get the keypoints for each detected person
                print(f"Keypoints for frame: {keypoints}")



errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

video 1/1 (frame 1/120) d:\CODE\AI_Action_Recognition\DCSASS\Shoplifting\Shoplifting001_x264.mp4\Shoplifting001_x264_0.mp4: 480x640 (no detections), 125.7ms
video 1/1 (frame 2/120) d:\CODE\AI_Action_Recognition\DCSASS\Shoplifting\Shoplifting001_x264.mp4\Shoplifting001_x264_0.mp4: 480x640 (no detections), 29.0ms
video 1/1 (frame 3/120) d:\CODE\AI_Action_Recognition\DCSASS\Shoplifting\Shoplifting001_x264.mp4\Shoplifting001_x264_0.mp4: 480x640 (no detections), 26.8ms
video 1/1 (frame 4/120) d:\CODE\AI_Action_Recognition\DCSASS\Shoplifting\Shoplifting001_x264.mp4\Shopli

KeyboardInterrupt: 