![Degirum banner](https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/degirum_banner.png)
## AI Inference on many video files
This notebook is an example of how to use DeGirum PySDK to do AI inference of multiple video streams from video files multiplexing frames. This example demonstrates lowest possible and stable AI inference latency while maintaining decent throughput. This is achieved by using synchronous prediction mode and video decoding offloaded into separate thread.

This script works with the following inference options:

1. Run inference on DeGirum Cloud Platform;
2. Run inference on DeGirum AI Server deployed on a localhost or on some computer in your LAN or VPN;
3. Run inference on DeGirum ORCA accelerator directly installed on your computer.

To try different options, you need to specify the appropriate `hw_location` option.

When running this notebook locally, you need to specify your cloud API access token in the [env.ini](../../env.ini) file, located in the same directory as this notebook.

When running this notebook in Google Colab, the cloud API access token should be stored in a user secret named `DEGIRUM_CLOUD_TOKEN`.

In [None]:
# make sure degirum-tools package is installed
!pip show degirum-tools || pip install degirum-tools

#### Specify where you want to run inferences, model_zoo_url, model_name, video file names, and other options here

In [None]:
# hw_location: where you want to run inference
#     "@cloud" to use DeGirum cloud
#     "@local" to run on local machine
#     IP address for AI server inference
# model_zoo_url: url/path for model zoo
#     cloud_zoo_url: valid for @cloud, @local, and ai server inference options
#     '': ai server serving models from local folder
#     path to json file: single model zoo in case of @local inference
# model_name: name of the model for running AI inference
# input_filenames: paths to video files for inference
# offload_preprocessing: True to do image preprocessing outside of inference call
# do_image_compression: True to do JPEG compression before sending image for inference
hw_location = "@cloud"
model_zoo_url = "degirum/public"
model_name = "mobilenet_v2_ssd_coco--300x300_quant_n2x_orca1_1"
input_filenames = [
    "https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/Traffic.mp4",
    "https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/Traffic.mp4",
    "https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/Traffic.mp4",
    "https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/Traffic.mp4",
]
offload_preprocessing = True  # do image preprocessing outside of inference call
do_image_compression = True  # do JPEG compression before sending image for inference

#### The rest of the cells below should run without any modifications

In [None]:
import degirum as dg, degirum_tools
import cv2, numpy, time, threading, queue
from contextlib import ExitStack

In [None]:
# Define stream multiplexing source:
# it reads frames from given video files in round robin fashion
# and puts them into given queue.
# If offload_preprocessing is enabled, it also performs image resizing
def mux_source(streams, frame_queue, model):
    phase = 0  # stream multiplexing phase counter

    while True:
        ret, frame = streams[phase].read()
        if not ret:
            break

        if offload_preprocessing:
            # do image resizing
            frame = model._preprocessor.forward(frame)[0]

        frame_queue.put((frame, phase))

        phase = (phase + 1) % len(streams)  # advance mux phase

    frame_queue.put(None)  # send poison pill

In [None]:
with dg.load_model(
    model_name=model_name,
    inference_host_address=hw_location,
    zoo_url=model_zoo_url,
    token=degirum_tools.get_token(),
    ) as model, ExitStack() as stack:
    # create model object in `with` block to avoid server disconnections on each frame inference

    model.input_image_format = "JPEG" if do_image_compression else "RAW"
    model.measure_time = True

    # open video streams
    streams = [
        stack.enter_context(degirum_tools.open_video_stream(fn))
        for fn in input_filenames
    ]

    frame_queue = queue.Queue(maxsize=10)  # queue to enqueue frames
    start_times = []  # list of frame starting times
    end_times = []  # list of frame result receiving times

    # start frame retrieving thread
    mux_tread = threading.Thread(target=mux_source, args=(streams, frame_queue, model))
    mux_tread.start()

    # initialize progress indicator
    steps = min([stream.get(cv2.CAP_PROP_FRAME_COUNT) for stream in streams])
    progress = degirum_tools.Progress(steps * len(streams))

    # inference loop
    start_time = time.time()
    while True:
        # get frame from queue
        frame = frame_queue.get()
        if frame is None:
            break  # got poison pill: end loop

        # do inference and record times
        start_times.append(time.time())
        res = model(frame[0])
        end_times.append(time.time())

        progress.step()

    mux_tread.join()

    # print time statistics
    for s in sorted(model.time_stats().items()):
        print(s[1])

In [None]:
# process latency times
end_times = numpy.array(end_times)
start_times = numpy.array(start_times)
latency_times_ms = (end_times - start_times) * 1000

print("\nLatency Histogram")
latency_hist = numpy.histogram(latency_times_ms)
for hval, bin in zip(latency_hist[0], latency_hist[1]):
    print(f"{bin:4.0f} ms:     {hval:4}")