<a href="https://colab.research.google.com/github/itberrios/CV_tracking/blob/main/setup_tutorials/YOLOv8_video_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **YOLOv8**

This tutorial shows how to apply YOLOv8 from [ultralytics](https://github.com/ultralytics/ultralytics) on a video. The documentation is located here: https://docs.ultralytics.com/



First install necessary libraries

In [1]:
!pip install pytube
!pip install moviepy
!pip install ffmpeg

# bug fix for imageio-ffmpeg
!pip install imageio==2.4.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pytube
  Downloading pytube-12.1.2-py3-none-any.whl (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.0/57.0 KB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pytube
Successfully installed pytube-12.1.2
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ffmpeg
  Downloading ffmpeg-1.4.tar.gz (5.1 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: ffmpeg
  Building wheel for ffmpeg (setup.py) ... [?25l[?25hdone
  Created wheel for ffmpeg: filename=ffmpeg-1.4-py3-none-any.whl size=6084 sha256=3f6fadc02fe129d6b2064482f4dd44d1e3d11b9d63906e6e94122532fbb9e84b
  Stored in directory: /root/.cache/pip/wheels/30/33/46/5ab7eca55

In [2]:
!git clone https://github.com/ultralytics/ultralytics
%cd ultralytics
!pip install -r requirements.txt

Cloning into 'ultralytics'...
remote: Enumerating objects: 5438, done.[K
remote: Counting objects: 100% (357/357), done.[K
remote: Compressing objects: 100% (199/199), done.[K
remote: Total 5438 (delta 171), reused 272 (delta 150), pack-reused 5081[K
Receiving objects: 100% (5438/5438), 4.36 MiB | 910.00 KiB/s, done.
Resolving deltas: 100% (3543/3543), done.
/content/ultralytics
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting thop>=0.1.1
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Installing collected packages: thop
Successfully installed thop-0.1.1.post2209072238


In [3]:
%cd ..
!pip install ultralytics

/content
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ultralytics
  Downloading ultralytics-8.0.49-py3-none-any.whl (303 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m303.7/303.7 KB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
Collecting sentry-sdk
  Downloading sentry_sdk-1.16.0-py2.py3-none-any.whl (184 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/184.3 KB[0m [31m25.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sentry-sdk, ultralytics
Successfully installed sentry-sdk-1.16.0 ultralytics-8.0.49


### Base Library Import

In [4]:
import os
import numpy as np
import cv2
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams["figure.figsize"] = (20, 10)

### Get Youtube video and download it

In [5]:
from pytube import YouTube

video_url = r'https://www.youtube.com/watch?v=lJqJEVE4xBs'
yt = YouTube(video_url)
print("Video Title: ", yt.title)

# download video
video_path = yt.streams \
  .filter(progressive=True, file_extension='mp4') \
  .order_by('resolution') \
  .desc() \
  .first() \
  .download() 

Video Title:  ASTONISHING! Horse stops to a walk mid-race but somehow still wins at Haydock! 🤯


### Load model

In [6]:
from ultralytics import YOLO

model = YOLO("yolov8n.pt")

Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to yolov8n.pt...


  0%|          | 0.00/6.23M [00:00<?, ?B/s]

In [None]:
from ultralytics.yolo.utils.plotting import Annotator
from ultralytics.yolo.utils.plotting import Colors

colors = Colors()


def annotate_frame(results, frame):

  # draw boxes on image
  for result in results:

    # set frame to annotate
    annotator = Annotator(frame)
          
    # draw boxes with classes on frame
    boxes = result.boxes
    for box in boxes:
        bbox = box.xyxy[0] 
        cls = int(box.cls)
        annotator.box_label(box=bbox, label=model.names[cls], color=colors(cls))

  # return annotated frame
  return annotator.result() 



## Run model on frames

In [None]:
cap = cv2.VideoCapture(video_path)

# resize images
# cap.set(3, 640) # width
# cap.set(4, 480) # height


if (cap.isOpened() == False):
    print("Error opening video file")

# get frame rate
fps = cap.get(cv2.CAP_PROP_FPS)

# store frames
result_frames = []
frames = []
i = 0
while(cap.isOpened()):

    # read each video frame
    ret, frame = cap.read()

    if ret == True:
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        frames.append(image)

        # get detections
        results = model(source=frame, stream=True)
        result_frames.append(annotate_frame(results, frame))

    # Break the loop
    else:
        break

    # Colab runs out of memory
    if i > 1000:
      break

    i += 1
 
# When everything done, release
# the video capture object
cap.release()
 
# Closes all the frames
cv2.destroyAllWindows()

Ultralytics YOLOv8.0.48 🚀 Python-3.8.10 torch-1.13.1+cu116 CUDA:0 (Tesla T4, 15102MiB)
YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs

0: 384x640 10 persons, 9 horses, 25.0ms
Speed: 4.1ms preprocess, 25.0ms inference, 5.9ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 11 persons, 9 horses, 9.7ms
Speed: 0.4ms preprocess, 9.7ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 10 persons, 8 horses, 12.3ms
Speed: 0.5ms preprocess, 12.3ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 9 persons, 9 horses, 10.7ms
Speed: 0.5ms preprocess, 10.7ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 11 persons, 10 horses, 10.7ms
Speed: 0.5ms preprocess, 10.7ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 640)

0: 384x640 9 persons, 9 horses, 9.5ms
Speed: 0.7ms preprocess, 9.5ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)

0: 384x6

In [None]:
plt.imshow(result_frames[0])

<matplotlib.image.AxesImage at 0x7f522c05c070>

### Perform directly on video

This will crash colab since video is too large

In [None]:
# results = model.predict(source=video_path, show=True)

### Create detection video

In [None]:
result_frames[0].shape

(720, 1280, 3)

In [None]:
out = cv2.VideoWriter('out.mp4',cv2.VideoWriter_fourcc(*'MP4V'), fps, (1280,720))

for frame in result_frames:
  out.write(frame)


out.release()

Now let's add audio back into our video

In [None]:
from moviepy.editor import VideoFileClip

videoclip = VideoFileClip(video_path)
audioclip = videoclip.audio

Imageio: 'ffmpeg-linux64-v3.3.1' was not found on your computer; downloading it now.
Try 1. Download from https://github.com/imageio/imageio-binaries/raw/master/ffmpeg/ffmpeg-linux64-v3.3.1 (43.8 MB)
Downloading: 8192/45929032 bytes (0.0%)4104192/45929032 bytes (8.9%)7577600/45929032 bytes (16.5%)10403840/45929032 bytes (22.7%)12591104/45929032 bytes (27.4%)15040512/45929032 bytes (32.7%)17850368/45929032 bytes (38.9%)20471808/45929032 bytes (44.6%)23126016/45929032 bytes (50.4%)25370624/45929032 bytes (55.2%)27926528/45929032 bytes (60.8%)31506432/45929032 bytes (68.6%)35651584/45929032 bytes (77.6%)

In [None]:
# trim audio file to length of video
video_length = len(result_frames)/fps
audioclip_trimmed = audioclip.subclip(0, video_length)

In [None]:
from moviepy.editor import CompositeAudioClip
detection_video = VideoFileClip('out.mp4')

new_audioclip = CompositeAudioClip([audioclip_trimmed])
detection_video.audio = new_audioclip
detection_video.write_videofile('out_with_audio.mp4')
     

[MoviePy] >>>> Building video out_with_audio.mp4
[MoviePy] Writing audio in out_with_audioTEMP_MPY_wvf_snd.mp3


100%|██████████| 884/884 [00:01<00:00, 503.73it/s]

[MoviePy] Done.
[MoviePy] Writing video out_with_audio.mp4



100%|█████████▉| 1002/1003 [01:22<00:00, 12.15it/s]


[MoviePy] Done.
[MoviePy] >>>> Video ready: out_with_audio.mp4 

