# Object Detection Workshop

Official Resources
- Website: https://docs.ultralytics.com
- Github: https://github.com/ultralytics/ultralytics

Reference:
- Doc: https://docs.ultralytics.com
- Arguements for prediction: https://docs.ultralytics.com/modes/predict/#inference-arguments

## Install YOLO8

In [1]:
%pip install -q ultralytics # download the package quietly

Note: you may need to restart the kernel to use updated packages.


## Load Pre-trained Model

![model_select.png](src/model_select.jpg)

In [2]:
from ultralytics import YOLO

model = YOLO('yolov8n.pt')    # yolov8n.pt 即是預訓練好的模型
# pretrained models: 'yolov8n', 'yolov8s', 'yolov8m', 'yolov8l', 'yolov8x',
# model = YOLO('your_model.pt')    # 如果你有自己訓練好的模型，可以用這行代替上一行

## Inference on an image

For more arguments, check this official page:
https://docs.ultralytics.com/modes/predict/#inference-arguments


In [4]:
source = 'src/bus.jpg'
results = model.predict(source, save=True, conf=0.5)
# the results will be saved in runs/detect/detect




  from .autonotebook import tqdm as notebook_tqdm


image 1/1 /Users/fish/Desktop/CODE/acne-detector/src/bus.jpg: 640x480 3 persons, 1 bus, 76.6ms
Speed: 5.0ms preprocess, 76.6ms inference, 325.4ms postprocess per image at shape (1, 3, 640, 480)
Results saved to [1mruns/detect/predict2[0m


## Inference on a video

In [8]:
source = 'src/newjeans.mp4'
results = model.predict(source, save=True, conf=0.5)
# the results will be saved in runs/detect/detect



errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

video 1/1 (frame 1/507) /Users/fish/Desktop/CODE/acne-detector/src/newjeans.mp4: 384x640 4 persons, 39.7ms
video 1/1 (frame 2/507) /Users/fish/Desktop/CODE/acne-detector/src/newjeans.mp4: 384x640 4 persons, 41.8ms
video 1/1 (frame 3/507) /Users/fish/Desktop/CODE/acne-detector/src/newjeans.mp4: 384x640 4 persons, 43.4ms
video 1/1 (frame 4/507) /Users/fish/Desktop/CODE/acne-detector/src/newjeans.mp4: 384x640 4 persons, 39.1ms
video 1/1 (frame 5/507) /Users/fish/Desktop/CODE/acne-detector/src/newjeans.mp4: 384x640 4 persons, 39.4ms
video 1/1 (frame 6/507) /Users/fish/D

## Specify Certain Classes

In [5]:
# in coco dataset, 0: person, 1: bicycle, 2: car, 3: motorcycle, 4: airplane, 5: bus ...
classes = [0]   # at here, we only want to detect person
source = 'src/bus.jpg'
results = model.predict(source, save=True, conf=0.5, classes=classes)
# see the difference between this and the first example


image 1/1 /Users/fish/Desktop/CODE/acne-detector/src/bus.jpg: 640x480 3 persons, 53.3ms
Speed: 2.8ms preprocess, 53.3ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 480)
Results saved to [1mruns/detect/predict2[0m


## Count How Many Objects

In [6]:
# 0: person, 1: bicycle, 2: car, 3: motorcycle, 4: airplane, 5: bus ...
source = 'src/bus.jpg'
results = model.predict(source, save=True, conf=0.5, classes=None)
for r in results:
    print(r.boxes.cls)  # r.boxes.cls 為預測結果的類別 # cls為class的縮寫

# tensor([5. 0., 0., 0.])
# 5: there exists 1 bus
# 0: there exists 3 people




image 1/1 /Users/fish/Desktop/CODE/acne-detector/src/bus.jpg: 640x480 3 persons, 1 bus, 51.6ms
Speed: 2.6ms preprocess, 51.6ms inference, 0.4ms postprocess per image at shape (1, 3, 640, 480)
Results saved to [1mruns/detect/predict2[0m
tensor([5., 0., 0., 0.])


In [9]:
# count number of people
source = 'src/bus.jpg'
results = model.predict(source, save=True, conf=0.5, classes=None)
people_count = 0
for r in results:
    for c in r.boxes.cls:   # for class in r.classes
        # print(c)    # this is a tensor
        print(model.names[int(c)])  # model_names[0]:person, model_names[5]: bus
        if int(c) == 0:
            people_count += 1
print(f'There are {people_count} people')   # 3


image 1/1 /Users/fish/Desktop/CODE/acne-detector/src/bus.jpg: 640x480 3 persons, 1 bus, 56.8ms
Speed: 2.8ms preprocess, 56.8ms inference, 0.4ms postprocess per image at shape (1, 3, 640, 480)
Results saved to [1mruns/detect/predict2[0m
bus
person
person
person
There are 3 people


## Inference on Webcam

For this session, your ipython notebook may crash, please use this line of code in traditional python file.

In [2]:
%pip install opencv-python



In [10]:
import cv2
from ultralytics import YOLO

# Load the YOLOv8 model
model = YOLO('yolov8n.pt')
# model = YOLO('your_model.pt')

# Open the video file
video_path = "src/newjeans.mp4"
video_path = 0  # if 0 -> webcam
cap = cv2.VideoCapture(video_path)

# Loop through the video frames, press 'q' to quit
while cap.isOpened():
    # Read a frame from the video
    success, frame = cap.read()

    if success:
        # Run YOLOv8 inference on the frame
        results = model(frame)

        # Visualize the results on the frame
        annotated_frame = results[0].plot()

        # Display the annotated frame
        cv2.imshow("YOLOv8 Inference", annotated_frame)

        # Break the loop if 'q' is pressed
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    else:
        # Break the loop if the end of the video is reached
        break

# Release the video capture object and close the display window
cap.release()
cv2.destroyAllWindows()


0: 384x640 4 persons, 38.7ms
Speed: 1.6ms preprocess, 38.7ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 4 persons, 40.5ms
Speed: 1.3ms preprocess, 40.5ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 4 persons, 38.6ms
Speed: 1.1ms preprocess, 38.6ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 6 persons, 38.2ms
Speed: 1.1ms preprocess, 38.2ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 4 persons, 37.8ms
Speed: 1.1ms preprocess, 37.8ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 5 persons, 40.7ms
Speed: 1.3ms preprocess, 40.7ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 4 persons, 40.4ms
Speed: 1.1ms preprocess, 40.4ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 38.3ms
Speed: 1.1ms preprocess, 38.3ms inference, 0.4ms postprocess per image at shape (

: 