# Roboflow Video Inference

This colab notebook reflects [the Roboflow video inference repo](https://github.com/roboflow-ai/video-inference) but utilizes the pip package for inference calls instead of `CURL` commands.

Documentation for the Roboflow pip package can be found [via this link](https://docs.roboflow.com/python).

Video inference uses the following steps:
- break down videos to images using FFMPEG
- perform inference on each image and render a bounding box for each detected image
- stich images back together into a video format

# FFMPEG Set Up

### FFMPEG Installation

In [None]:
from IPython.display import clear_output
import os, urllib.request
HOME = os.path.expanduser("~")
pathDoneCMD = f'{HOME}/doneCMD.sh'
if not os.path.exists(f"{HOME}/.ipython/ttmg.py"):
    hCode = "https://raw.githubusercontent.com/yunooooo/gcct/master/res/ttmg.py"
    urllib.request.urlretrieve(hCode, f"{HOME}/.ipython/ttmg.py")

from ttmg import (
    loadingAn,
    textAn,
)

loadingAn(name="lds")
textAn("Installing Dependencies...", ty='twg')
os.system('pip install git+git://github.com/AWConant/jikanpy.git')
os.system('add-apt-repository -y ppa:jonathonf/ffmpeg-4')
os.system('apt-get update')
os.system('apt install mediainfo')
os.system('apt-get install ffmpeg')
clear_output()
print('Installation finished.')

Installation finished.


### FFMPEG Execution



#### Access video

Upload one video at a time for now to the `videos_to_infer` directory, video globs will result in videos being stiched together.

You can connect g-drive and use `cp` to copy a single file out of it or manually upload a video from your local device.

Regardless, make sure it lands in the `videos_to_infer` directory.

In [None]:
%cd /content/
!mkdir videos_to_infer
!mkdir inferred_videos
%cd videos_to_infer

/content
/content/videos_to_infer


#### Optional: Link your Google Drive to upload files to/from Google Drive


*   process outlined in the next 2 cells



In [None]:
# OPTIONAL - link your g-drive to pull videos from
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
# OPTIONAL - copy your videos from g-drive to /content/
!cp /content/gdrive/MyDrive/TESTDATA/IMG_2167.mov /content/videos_to_infer

#### Break down video frames into images

In [None]:
# break video down into images - UPDATE THE PATH TO THE FILE!
# Example: video file named 'Test.mp4'
# (cont'd) update path to: "/content/videos_to_infer/Test.mp4"
os.environ['inputFile'] = '/content/videos_to_infer/IMG_2167.mov'

# fps value: the number of frames to sample per second from the video
# higher value for fps: sample more frames
!ffmpeg  -hide_banner -loglevel error -i "$inputFile" -vf fps=10 "$inputFile_out%04d.png"

# Video Inference Section

## Roboflow PIP package installation

In [None]:
!pip3 install roboflow

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting roboflow
  Downloading roboflow-1.0.9-py3-none-any.whl (56 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/56.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Collecting cycler==0.10.0 (from roboflow)
  Downloading cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting idna==2.10 (from roboflow)
  Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.8/58.8 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
Collecting pyparsing==2.4.7 (from roboflow)
  Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.8/67.8 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
Collecting python-dotenv (from roboflow)
  Downloading pyt

## Imports


In [None]:
from roboflow import Roboflow
import json
from time import sleep
from PIL import Image, ImageDraw
import io
import base64
import requests
from os.path import exists
import os, sys, re, glob

## Initialization

- private api key found in Roboflow > YOUR_WORKSPACE > Roboflow API
- NOTE: this is your private key, not publishable key!

**Having trouble finding your API key, version number or project ID?** The [documentation's quick start section](https://docs.roboflow.com/python) demostrates how you can find these via the Roboflow platform UI.
* [Obtaining Your API Key](https://docs.roboflow.com/rest-api#obtaining-your-api-key)

In [None]:
!pip install ultralytics==8.0.20

from IPython import display
display.clear_output()

import ultralytics
ultralytics.checks()

from ultralytics import YOLO

#from IPython.display import display, Image

Ultralytics YOLOv8.0.20 🚀 Python-3.10.11 torch-2.0.1+cu118 CPU
Setup complete ✅ (2 CPUs, 12.7 GB RAM, 23.4/107.7 GB disk)


In [None]:
# workspace code
from roboflow import Roboflow
import json

rf = Roboflow(api_key="D1r3hyiOBsbn1cCkuDwd")
project = rf.workspace("pipette-detection").project("pipettes-detection")
dataset = project.version(11).download("yolov8")

#rf = Roboflow(api_key="YOUR API KEY HERE")
#project = rf.workspace().project("YOUR PROJECT")
#dataset = project.version("YOUR VERSION")

# grab the model from that project's version
model = project.version(11).model
print(model)

loading Roboflow workspace...
loading Roboflow project...
Downloading Dataset Version Zip in Pipettes-detection-11 to yolov8: 100% [58731758 / 58731758] bytes


Extracting Dataset Version Zip to Pipettes-detection-11 in yolov8:: 100%|██████████| 1842/1842 [00:00<00:00, 5804.22it/s]


{
  "id": "pipettes-detection/11",
  "name": "Pipettes detection",
  "version": "11",
  "classes": null,
  "overlap": 30,
  "confidence": 40,
  "stroke": 1,
  "labels": false,
  "format": "json",
  "base_url": "https://detect.roboflow.com/"
}


In [None]:
# HELPER FUNCTIONS BLOCK
from PIL import ImageDraw, ImageFont
# ImageDraw.ImageDraw.font = ImageFont.truetype("Tests/fonts/FreeMono.ttf")

def draw_boxes(box, x0, y0, img, class_name, confidence):
    # OPTIONAL - color map, change the key-values for each color to make the
    # class output labels specific to your dataset
    color_map = {
        "pipette":"red",
    }

    # get position coordinates
    bbox = ImageDraw.Draw(img) 
    #font = ImageFont.truetype("arial.ttf", 15)

    bbox.rectangle(box, outline=color_map[class_name], width=3)
    bbox.multiline_text((x0 + 1, y0 + 1), class_name + "\n" + str(confidence), fill='black', align='left')


    return img

def save_with_bbox_renders(img):
    file_name = os.path.basename(img.filename)
    img.save('/content/inferred_videos/' + file_name)
    print(file_name)

## Execution

In [None]:
# perform inference on each image from the split up video
import PIL.Image
# %cd /content/inferred_videos
!pwd
# glob config values
# file_path = "/content/inferred_videos/"
# file_path = "/content/videos_to_infer/"
file_path = "/content/"
extention = "png"

# glob files based on location and file format
globbed_files = sorted(glob.glob(file_path + '*' + extention))
print(globbed_files)

for image in globbed_files:
  # INFERENCE
  predictions = model.predict(image).json()['predictions']
  newly_rendered_image = PIL.Image.open(image)

  # RENDER 
  # for each detection, create a crop and convert into CLIP encoding
  print(predictions)
  for prediction in predictions:
      # rip bounding box coordinates from current detection
      # note: infer returns center points of box as (x,y) and width, height
      # ----- but pillow crop requires the top left and bottom right points to crop
      x0 = prediction['x'] - prediction['width'] / 2
      x1 = prediction['x'] + prediction['width'] / 2
      y0 = prediction['y'] - prediction['height'] / 2
      y1 = prediction['y'] + prediction['height'] / 2
      box = (x0, y0, x1, y1)
  
      newly_rendered_image = draw_boxes(box, x0, y0, newly_rendered_image, 'pipette', prediction['confidence'])

  # WRITE
  save_with_bbox_renders(newly_rendered_image)

/content
['/content/0001.png', '/content/0002.png', '/content/0003.png', '/content/0004.png', '/content/0005.png', '/content/0006.png', '/content/0007.png', '/content/0008.png', '/content/0009.png', '/content/0010.png', '/content/0011.png', '/content/0012.png', '/content/0013.png', '/content/0014.png', '/content/0015.png', '/content/0016.png', '/content/0017.png', '/content/0018.png', '/content/0019.png', '/content/0020.png', '/content/0021.png', '/content/0022.png', '/content/0023.png', '/content/0024.png', '/content/0025.png', '/content/0026.png', '/content/0027.png', '/content/0028.png', '/content/0029.png', '/content/0030.png', '/content/0031.png', '/content/0032.png', '/content/0033.png', '/content/0034.png', '/content/0035.png', '/content/0036.png', '/content/0037.png', '/content/0038.png', '/content/0039.png', '/content/0040.png', '/content/0041.png', '/content/0042.png', '/content/0043.png', '/content/0044.png', '/content/0045.png', '/content/0046.png', '/content/0047.png', '/c

You must line up the starting file of your video (i.e. the image you want to start with) to the FFMPEG wildcard syntax.

For example, if you video file is named `out0001.png` use `y%04d.png`.

If your file is named `/content/inferred_videos/Pi_test_video.mov_out0001.png` use `/content/inferred_videos/Pi_test_video.mov_outy%04d.png`.



In [None]:
# stich images together into video
!pwd
!ffmpeg -r 23 -s 1920x1080 -i /content/inferred_videos/%04d.png -vcodec libx264  -pix_fmt yuv420p testDetectionHand.mp4

/content
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libv

- TODO: add garbage collector instructions here for repeat testing

In [None]:
# CAUTION - deletes all data from the `inferred_videos` and `videos_to_infer` directories
# use this code block between runs for a fresh start between videos.
%cd /content/
!rm -r inferred_videos
!rm -r videos_to_infer
!mkdir inferred_videos
!mkdir videos_to_infer

/content



- TODO - util features such as print list of all unique classe?
- TODO - update docs in notebooks
- TODO - update test cases

# Tested Cases

- no images exist
- image has no detections
- image has no target_detection instance
- image has less object count
- image has less class count
- prediction falls into confidence range
- prediction outside of confidence range
- prediction has less than box req
- prediction has greater than box req
- prediction doesn't match target_detection name
- first prediction meets requirements for upload
- last prediction meets requirements for upload
- all similarities match too high, even when images look drastically different