# Dask and PyCamHD Benchmarking
In this notebook we do some benchmarking of the speed at which pycamhd can obtain image files from the server with and without Dask workers.

#### Imports

In [None]:
%matplotlib inline
import pycamhd as camhd
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#### Get a list of frames to obtain from the server

In [None]:
dbcamhd = pd.read_json('dbcamhd.json', orient="records", lines=True)
fileindex = 2064
filename = dbcamhd['filename'][fileindex]
frame_count = dbcamhd['frame_count'][fileindex]
n_images = 256
frame_numbers = np.linspace(750,frame_count-6000, n_images, dtype=np.int64())

#### Load a set of frames using pycamhd
Here we use get_frame to get these frames. This is pretty slow so I have commented it out.

In [None]:
#%%time
#frames = [] # fastest to append the frames into a list of ndarrays
#for frame_number in frame_numbers:
#    frames.append(camhd.get_frame(filename, frame_number, 'rgb24'))

#### Load frames using pycamhd and dask
Here we employ a Dask cluster and Delayed functions to speed up the fetching of images from the raw data server. Scale the cluster up to ~20 (Standard_D2_v3) nodes to fit 32 workers into the cluster.

In [None]:
from dask_kubernetes import KubeCluster
cluster = KubeCluster(n_workers=32)
cluster

In [None]:
from dask import delayed, compute
from dask.distributed import Client
client = Client(cluster)
client

In [None]:
%%time
delayed_frames = []
for frame_number in frame_numbers:
    delayed_frames.append(delayed(camhd.get_frame)(filename, frame_number, 'rgb24'))
frames = compute(*delayed_frames)

#### Show results of benchmark testing
We ran the above cells with n_images from 4 to 512, and with n_workers from 0 to 128. Here are the results from this investigation.

In [None]:
bench_s = pd.DataFrame(
    {'n_images': [4, 8, 16, 32, 64, 128, 256, 512],
     0: [8.8, 16.4, 34.2, 68.0, 143.0, 300.0, 615.0, 1189.0],
     2: [2.6, 6.3, 8.9, 18.8, 37.7, 76.0, 156.0, np.NaN],
     4: [2.4, 2.5, 4.7, 9.4, 20.0, 38.9, 76.0, np.NaN],
     8: [2.4, 2.5, 2.7, 5.1, 11.1, 20.5, 40.1, np.NaN],
     16: [2.4, 2.4, 2.6, 3.3, 5.9, 12.0, 22.1, np.NaN],
     32: [2.3, 2.4, 2.6, 3.5, 4.5, 9.1, 13.7, np.NaN],
     64: [2.5, 2.5, 2.8, 3.4, 4.5, 7.0, 24.9, np.NaN],
     96: [2.6, 2.6, 2.9, 4.0, 5.9, 7.7, 22.5, 25.0],
     128: [2.4, 2.4, 2.8, 3.7, 4.5, 9.5, 13.5, 26.8],
    })
bench_s.set_index('n_images', inplace=True)
bench_s

In [None]:
bench_fps = 1/bench_s.div(bench_s.index.to_series(), axis=0)
bench_fps

In [None]:
ax = bench_fps.plot(figsize=(12, 8), marker='.', markersize=10)
ax.set_ylabel('Frames Per Second');
ax.set_xlabel('Number of Frames');