## Performance Test for Multi-Model Inference
This notebook contains examples of performance measurements for various use cases involving multiple AI models:
* Baseline: performance of each model running individually
* All models running combined in a single thread
* All models running combined in multiple threads

This script works with the following inference options:

1. Run inference on DeGirum Cloud Platform;
2. Run inference on DeGirum AI Server deployed on a localhost or on some computer in your LAN or VPN;
3. Run inference on DeGirum ORCA accelerator directly installed on your computer.

To try different options, you just need to uncomment **one** of the lines in the code below.

You also need to specify your cloud API access token, cloud zoo URLs, and AI server hostname in [env.ini](env.ini) file, located in the same directory as this notebook.

#### Specify test options here

In [None]:
# list of models to test
model_names = [
    "yolo_v5s_pet_det--512x512_quant_n2x_orca_1",
    "mobilenet_v1_imagenet--224x224_quant_n2x_orca_1",
    "mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1",
]
iterations = 200  # how many iterations to run for each model
use_jpeg = True  # use JPEG or bitmap model input
exclude_preprocessing = True  # exclude preprocessing step from timing measurements
batch_sizes = [2, 4, 8, 16]  # eager batch sizes to test


#### Specify where do you want to run your inferences

In [None]:
import degirum as dg, mytools

cloud_token = mytools.get_token()  # get cloud API access token from env.ini file
cloud_zoo_url = mytools.get_cloud_zoo_url()  # get cloud zoo URL from env.ini file

#
# Please UNCOMMENT only ONE of the following lines to specify where to run AI inference
#

# 1. Inference on the DeGirum Cloud Platform
zoo = dg.connect(dg.CLOUD, cloud_zoo_url, cloud_token)

# 2. Inference on DeGirum AI Server deployed on a localhost or on some computer in your LAN or VPN
# zoo = dg.connect(mytools.get_ai_server_hostname(), cloud_zoo_url, cloud_token)

# 3. Inference on DeGirum ORCA accelerator installed on your computer
# zoo = dg.connect(dg.LOCAL, cloud_zoo_url, cloud_token)


#### The rest of the cells below should run without any modifications

In [None]:
import threading

# create models and input data
data = []
models = []
for model_name in model_names:
    model = zoo.load_model(model_name)
    model.image_backend = "opencv"  # select OpenCV backend
    model.input_numpy_colorspace = "BGR"
    model._model_parameters.InputImgFmt = ["JPEG" if use_jpeg else "RAW"]
    model.measure_time = True
    models.append(model)

    frame = "Images/TwoCats.jpg"
    if exclude_preprocessing:
        frame = model._preprocessor.forward(frame)[0]
    data.append(frame)

# define source of frames
def source(mi):
    for fi in range(iterations):
        yield data[mi]


# define timing results printer
def print_results(results):
    header = f"\n{' ':50} : " + " : ".join([f"{b:5}" for b in batch_sizes]) + "\n"
    lat = "Latency vs batch size (ms)" + header
    fps = "FPS vs batch size" + header

    for model_name, model_batch_results in results.items():
        lat += f"{model_name:50}"
        fps += f"{model_name:50}"
        for batch, model_result in model_batch_results.items():
            lat += f" : {model_result['time_stats']['FrameTotalDuration_ms'].avg:5.1f}"
            fps += f" : {iterations / model_result['elapsed']:5.1f}"

        lat += "\n"
        fps += "\n"

    print(lat)
    print(fps)


# run each model once to warm up the system
for mi, model in enumerate(models):
    model(data[mi])


#### Baseline: performance of each model running individually

In [None]:
def run_sequentially():
    ret = {}
    for model_name in model_names:
        ret[model_name] = {}

    prog = mytools.Progress(len(model_names) * len(batch_sizes), speed_units="steps/s")
    for batch in batch_sizes:
        for mi, model_name in enumerate(model_names):
            models[mi].reset_time_stats()
            models[mi].eager_batch_size = batch
            models[mi].frame_queue_depth = batch
            models[mi].non_blocking_batch_predict = False

            t = mytools.Timer()
            for res in models[mi].predict_batch(source(mi)):
                pass

            ret[model_name][batch] = {
                "elapsed": t(),
                "time_stats": models[mi].time_stats(),
            }
            prog.step()

    return ret


sequential_results = run_sequentially()
print_results(sequential_results)


#### All models running parallel in multiple threads

In [None]:
def run_in_threads():
    ret = {}
    for model_name in model_names:
        ret[model_name] = {}

    nmodels = len(model_names)

    prog = mytools.Progress(len(batch_sizes), speed_units="steps/s")
    for batch in batch_sizes:

        barr = threading.Barrier(nmodels)

        def run_one_model(mi):
            models[mi].reset_time_stats()
            models[mi].eager_batch_size = batch
            models[mi].frame_queue_depth = batch
            models[mi].non_blocking_batch_predict = False
            barr.wait()
            t = mytools.Timer()
            for res in models[mi].predict_batch(source(mi)):
                pass

            ret[model_names[mi]][batch] = {
                "elapsed": t(),
                "time_stats": models[mi].time_stats(),
            }

        threads = []
        for mi in range(nmodels):
            thread = threading.Thread(target=run_one_model, args=(mi,))
            thread.start()
            threads.append(thread)

        for thread in threads:
            thread.join()

        prog.step()

    return ret


in_treads_results = run_in_threads()
print_results(in_treads_results)


#### All models running side-by-side in a single thread

In [None]:
def run_sidebyside():
    ret = {}
    for model_name in model_names:
        ret[model_name] = {}

    prog = mytools.Progress(len(batch_sizes), speed_units="steps/s")
    for batch in batch_sizes:
        predictors = []
        timers = []
        for mi, model_name in enumerate(model_names):
            models[mi].reset_time_stats()
            models[mi].eager_batch_size = batch
            models[mi].frame_queue_depth = batch
            models[mi].non_blocking_batch_predict = True
            predictors.append(models[mi].predict_batch(source(mi)))
            timers.append(mytools.Timer())

        while any(predictors):
            for mi, predictor in enumerate(predictors):
                if predictor is not None:
                    try:
                        next(predictor)
                    except StopIteration:
                        ret[model_names[mi]][batch] = {
                            "elapsed": timers[mi](),
                            "time_stats": models[mi].time_stats(),
                        }
                        predictors[mi] = None

        prog.step()

    return ret


sidebyside_results = run_sidebyside()
print_results(sidebyside_results)
