# Video reading benchmark demo

First let's start with a single video reads; 
here we demonstrate the speeds of various video readers. 
There are many cells with little text, but it should be self evident which is which.

In [3]:
path_to_video = "./videos/original/WUzgd7C1pWA.mp4"

In [2]:
import av
images_av = []

In [3]:
%%timeit

container = av.open(path_to_video)
for frame in container.decode(video=0):
    images_av.append(frame.to_image())

292 ms ± 9.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [4]:
print(len(images_av))

2616


In [5]:
import cv2
images_cv2 = []

In [6]:
%%timeit

cap = cv2.VideoCapture(path_to_video)
while(cap.isOpened()):
    ret, frame = cap.read()
    if ret is True:
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        images_cv2.append(frame)
    else:
        break
cap.release()

96.6 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [7]:
import torchvision
torchvision.set_video_backend("video_reader")
tv_frames = []

In [8]:
%%timeit
vframes, _, _ = torchvision.io.read_video(path_to_video)
tv_frames.append(vframes)

  "The pts_unit 'pts' gives wrong results and will be removed in a "


213 ms ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [4]:
from torchvision.io import VideoReader
tv_frames = []

In [11]:
%%timeit
video = VideoReader(path_to_video)
for frame in video:
    tv_frames.append(frame['data'])

238 ms ± 17.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [5]:
video = VideoReader(path_to_video)
video.get_metadata()

{'video': {'fps': [29.97002997002997], 'duration': [10.9109]},
 'audio': {'framerate': [48000.0], 'duration': [10.9]}}

In [12]:
from decord import VideoReader, cpu
images_dcrd = []

In [13]:
%%timeit
#in memory
with open(path_to_video, 'rb') as f:
  vr = VideoReader(f, ctx=cpu(0))
for i in range(len(vr)):
    # the video reader will handle seeking and skipping in the most efficient manner
    images_dcrd.append(vr[i])

79.1 ms ± 1.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [None]:
images_dcrd = []

In [None]:
%%timeit
# "normal"
vr = VideoReader(path_to_video, ctx=cpu(0))
for i in range(len(vr)):
    # the video reader will handle seeking and skipping in the most efficient manner
    images_dcrd.append(vr[i])

### Benchmark results 

|    Reader   | avg time (ms) | stddev |      notes      |
|:-----------:|:-------------:|:------:|:---------------:|
| torchvision |      228      |  8.07  |    read_video   |
| torchvision |      242      |  29.3  | VideoReader API |
|     pyav    |      292      |  4.96  |                 |
|     cv2     |      96.6     |  1.47  |                 |
|    decord   |      79.1     |   1.2  |    in-memory    |

# GPU Contexts
At the moment, only decord supports direct GPU encoding (with not insignificant build hassle).

Having said that, installing decord from source against FFMPEG from conda_forge actually yielded performance benefits in the grand scheme of things (79 vs 101ms on average) 

In [1]:
from decord import VideoReader, cpu, gpu
images_dcrd = []

In [4]:
%%timeit
# "normal"
vr = VideoReader(path_to_video, ctx=cpu(0))
for i in range(len(vr)):
    # the video reader will handle seeking and skipping in the most efficient manner
    images_dcrd.append(vr[i])
    

77.6 ms ± 6.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [5]:
images_dcrd = []

In [6]:
%%timeit
# "normal"
vr = VideoReader(path_to_video, ctx=gpu(0))
for i in range(len(vr)):
    # the video reader will handle seeking and skipping in the most efficient manner
    images_dcrd.append(vr[i])

79.3 ms ± 2.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Base results

Interestingly, the GPU context didn't significantly improve the performance over single video; something documented in https://github.com/dmlc/decord/issues/106
The authors argue that the benefit comes for a) different video encodings and b) with various bridges (e.g. copying the memory to GPU); I'm not sure if there is a neat way of platform-independent way of measuring this. 


Interesting observation was made by Mike (from pyAV):

> There are some features we may elect to not implement because we don’t believe they fit the PyAV ethos. The only one that we’ve encountered so far is hardware decoding. The FFmpeg man page discusses the drawback of -hwaccel:
>> Note that most acceleration methods are intended for playback and will not be faster than software decoding on modern CPUs. Additionally, ffmpeg will usually need to copy the decoded frames from the GPU memory into the system memory, resulting in further performance loss.

Which means that, if we find a way to keep the frames in GPU memory, this would be beneficial, but otherwise it is rather useless. 