# Building a Multi-DNN DeepStream Application #
DeepStream pipelines can be constructed to perform complex analytics that involve multiple neural networks. One common use case for this would be to use a detector as a primary inference engine to localize an object and a classifier as a secondary inference engine. This is useful since classification models can often perform better on single objects within the frame. 

<a name='s1'></a>
## Planning the Video AI Application ##
Building a video AI application begins by designing the project based on the use case. For this walk-through, we will build a simple DeepStream pipeline to accurately detect cars from a parking garage camera feed. This is the same pipeline as the sample application from the previous notebook. It includes several high-level steps: 
1. Read data from a video file
2. Perform deep learning inference on each frame
3. Perform post-processing with inference results
4. Write data to a file

<p><img src='images/steps.png' width=720></p>

<a name='s1.1'></a>
### Preview Input Video ###
We begin by using the `ffprobe` command line utility to understand the raw input video's format ([see documentation if needed](https://ffmpeg.org/ffprobe.html)). When using the `ffprobe` command, the `-i` option lets us read an input [URL](https://en.wikipedia.org/wiki/URL) and the `-hide_banner` option suppresses printing the banner to reduce the output. 

In [1]:
# DO NOT CHANGE THIS CELL
import os

# Set the input video path to an environment variable
os.environ['TARGET_VIDEO_PATH']='data/highway_traffic.h264'
os.environ['TARGET_VIDEO_PATH_MP4']='sample_30.mp4'

target_video_path=os.environ['TARGET_VIDEO_PATH']
target_video_path_mp4=os.environ['TARGET_VIDEO_PATH_MP4']

# Analyze video
!ffprobe -i $TARGET_VIDEO_PATH \
         -hide_banner

Input #0, h264, from 'data/road_traffic.h264':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 640x360 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1200k tbn, 60 tbc


In the next step, we convert the video file into a container file before playing it since the H.264 encoded video file can't be played directly in JupyterLab. The [FFmpeg](https://ffmpeg.org/) tool is a very fast video and audio converter with the general syntax: 
* `ffmpeg [global_options] {[input_file_options] -i input_url} ... {[output_file_options] output_url} ...` 

When using the `ffmpeg` command, the `-i` option lets us read an input URL, the `-loglevel quiet` option suppresses the logs to reduce the output, and the `-y` flag overwrites any existing output file with the same name. 

In [4]:
# DO NOT CHANGE THIS CELL
from IPython.display import Video

# Convert the H.264 encoded video file to MP4 container file - this will generate the sample_30.mp4 file
!ffmpeg -i $TARGET_VIDEO_PATH $TARGET_VIDEO_PATH_MP4 \
        -y \
        -loglevel quiet

# View the input video
Video(target_video_path_mp4, width=720)

<p><img src='images/important.png' width=720></p>

Please make note the **dimensions** of the video frames, **frame rate**, **codec**, **color model**, and **file format**. In this case, the file is compressed using the **H.264** codec. The video is made up of frames that are **1280** pixels (width) by **720** pixels (height) using the **yuv420p** color encoding system at **30 FPS** (frames per second). 

<a name='s2'></a>
## Preparing the Deep Learning Model ##
For deep learning inference, we will use a pre-trained model that is available on [**N**VIDIA **G**PU **C**loud](https://ngc.nvidia.com/), the hub for GPU-optimized AI and high-performance computing (HPC) software. NGC offers pre-trained models for many AI tasks.
We'll be using two purpose-built models from NGC - the [TrafficCamNet](https://catalog.ngc.nvidia.com/orgs/nvidia/models/tlt_trafficcamnet) object detection model and the [VehicleTypeNet](https://catalog.ngc.nvidia.com/orgs/nvidia/models/tlt_vehicletypenet) classification model. We need to download and install the NGC CLI before using it. 

<a name='s2.1'></a>
### TrafficCamNet Object Detection Model ###
The TrafficCamNet Object Detection model, according to its [model card](https://catalog.ngc.nvidia.com/orgs/nvidia/models/tlt_trafficcamnet), detects one or more physical objects from four categories (car, persons, road signs, and two-wheelers) _within an image_ and returns a box around each object, as well as a category label for each object. 

For example, given an input image/frame, the inference engine will generate the bounding box coordinates as well as the category labels: 
<p><img src='images/sample_inference.png' width=720></p>

We can then _optionally_ perform post-processing and draw the bounding boxes and text labels on top of the original frame. 
<p><img src='images/sample_bounding_boxes.png' width=720></p>

<a name='s2.3'></a>
### Installing NGC CLI ###
Pre-trained models can be downloaded from NGC using the [NGC CLI](https://ngc.nvidia.com/setup/installers/cli). We need to download and install the NGC CLI before using it. 

In [2]:
# DO NOT CHANGE THIS CELL
import os
os.environ['NGC_DIR']='/dli/task/ngc_assets'

# Download and install NGC CLI - this will create the ngc_assets folder
%env CLI=ngccli_cat_linux.zip
!mkdir -p $NGC_DIR/ngccli
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $NGC_DIR/ngccli
!unzip -o \
       -u "$NGC_DIR/ngccli/$CLI" \
       -d $NGC_DIR/ngccli/
!rm $NGC_DIR/ngccli/*.zip
os.environ["PATH"]="{}/ngccli:{}".format(os.getenv("NGC_DIR", ""), os.getenv("PATH", ""))

env: CLI=ngccli_cat_linux.zip
--2022-05-02 15:56:47--  https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)... 13.249.42.33, 13.249.42.36, 13.249.42.45, ...
Connecting to ngc.nvidia.com (ngc.nvidia.com)|13.249.42.33|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 32589833 (31M) [application/zip]
Saving to: ‘/dli/task/ngc_assets/ngccli/ngccli_cat_linux.zip’


2022-05-02 15:56:47 (164 MB/s) - ‘/dli/task/ngc_assets/ngccli/ngccli_cat_linux.zip’ saved [32589833/32589833]

Archive:  /dli/task/ngc_assets/ngccli/ngccli_cat_linux.zip


## Preparing the Deep Learning Models ##
We'll be using two purpose-built models from NGC - the [TrafficCamNet](https://catalog.ngc.nvidia.com/orgs/nvidia/models/tlt_trafficcamnet) object detection model and the [VehicleTypeNet](https://catalog.ngc.nvidia.com/orgs/nvidia/models/tlt_vehicletypenet) classification model. We need to download and install the NGC CLI before using it. 

In [3]:
# Download the purpose-built TrafficCamNet model from NGC
!ngc registry model download-version nvidia/tao/trafficcamnet:pruned_v1.0 --dest $NGC_DIR

# Download the purpose-built VehicleTypeNet model from NGC
!ngc registry model download-version nvidia/tao/vehicletypenet:pruned_v1.0 --dest $NGC_DIR

Downloaded 5.2 MB in 3s, Download speed: 1.73 MB/s               
----------------------------------------------------
Transfer id: trafficcamnet_vpruned_v1.0 Download status: Completed.
Downloaded local path: /dli/task/ngc_assets/trafficcamnet_vpruned_v1.0
Total files downloaded: 3 
Total downloaded size: 5.2 MB
Started at: 2022-05-02 15:56:51.637279
Completed at: 2022-05-02 15:56:54.642131
Duration taken: 3s
----------------------------------------------------
Downloaded 19.06 MB in 4s, Download speed: 4.77 MB/s               
----------------------------------------------------
Transfer id: vehicletypenet_vpruned_v1.0 Download status: Completed.
Downloaded local path: /dli/task/ngc_assets/vehicletypenet_vpruned_v1-1.0
Total files downloaded: 3 
Total downloaded size: 19.06 MB
Started at: 2022-05-02 15:56:58.210084
Completed at: 2022-05-02 15:57:02.215463
Duration taken: 4s
----------------------------------------------------


<a name='s3'></a>
## Building a Video AI Application

<a name='s3.1'></a>
### Pipeline Components ###
Sample pipeline architecture for a single deepstream application. We've adding new plugins based on their functionality. 
<p><img src='images/deepstream_pipeline.png' width=720></p>

This is the pipeline architecture of the application. We'll be using an object detection network to identify and localize the cars in the frames, followed by a secondary inference to classify vehicle types. 
<p><img src="images/deepstream_multi_gie_pipeline.png" width='720'></p>


* The [filesrc](https://gstreamer.freedesktop.org/documentation/coreelements/filesrc.html) plugin will read data from a local file. There are other plugins available for reading data from various types of sources like camera, RTSP streams, and disk. 
* The [h264parse](https://gstreamer.freedesktop.org/documentation/videoparsersbad/h264parse.html) plugin is used to parse the input elementary H.264 stream into frame-size bytes of data. 
* The [nvv4l2decoder](https://gist.github.com/nins-k/6f7377629000894deedc7f14d7ef64fe) plugin will decode the input data using the appropriate codec, which is H.264 in this case.
* The [nvstreammux](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvstreammux.html) plugin is a required stream aggregator. This video aggregator helps in accepting `n` input streams and converts them into sequential batched frames. Even though our simple application only consumes one input stream, this plugin is required as `nvinfer` accepts _batched_ buffers with the `NvDsBatchMeta` structure already attached. 
    * This plugin takes `width`,  `height`, and `batch-size` as parameters. 
* The [nvinfer](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html) plugin performs transformation (format conversion and scaling) on the input frame based on network requirements and passes the transformed data to the low-level library. This is the plugin that we will use to define the deep learning task(s) associated with our application. The low-level library pre-processes the transformed frames (performs normalization and mean subtraction) and produces final float RGB/BGR/GRAY planar data which are passed to the TensorRT engine for inferencing. The output type generated by the low-level library depends on the network type. We will configure the `nvinfer` plugin for our `TrafficCamNet` object detection model. It attaches the inference results such as _object class(s)_, _bounding box coordinates_, _etc._, to the metadata.
    * **Note**: Behaviour of inference is set through the config file. You can use the files provided below as a template or start the text file from scratch. 
* The [nvvideoconvert](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvvideoconvert.html) plugin converts frames from NV12 (YUV) to RGBA as required by `nvdsosd`. It is also capable of performing scaling, cropping, and rotating on the frames. 
* The [nvdsosd](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvdsosd.html) plugin draws bounding boxes and texts based on the metadata. It requires RGBA buffer as well as `NvDsBatchMeta`. 
* The [nvvideoconvert](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvvideoconvert.html) plugin converts frames from RGBA to I420 (YUV) as required by `avenc_mpeg4`. 
* The [capsfilter](https://gstreamer.freedesktop.org/documentation/coreelements/capsfilter.html) plugin does not modify data as such, but can enforce limitations on the data format. We use it to enforce the video conversion by `nvvideoconvert` to I420 (YUV) format. 
* The [avenc_mpeg4](https://gstreamer.freedesktop.org/documentation/libav/avenc_mpeg4.html) plugin encodes the I420 formatted frames using the MPEG4 codec. 
* The [filesink](https://gstreamer.freedesktop.org/documentation/coreelements/filesink.html) plugin writes incoming data to a file in the local file system. 

*More information about the plugins can be found in the [DeepStream Plugin Guide](https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#plugins-development-guide) and [GStreamer Plugin Guide](https://gstreamer.freedesktop.org/documentation/plugins_doc.html).*

Additionally, we will add a **callback function** on the sink pad of the `nvdsosd` plugin to access the metadata in the application. The application can then use this metadata to solve the given problem (in this case draw the bounding boxes and texts).

<a name='s3.2'></a>
### Initializing GStreamer and Pipeline ###
We initialize GStreamer with `Gst.init(list=None)` and instantiate a `Gst.Pipeline` object as `pipeline` to contain all the elements shown in the pipeline diagram. 

In [6]:
# DO NOT CHANGE THIS CELL
# Import necessary GStreamer libraries and DeepStream python bindings
import gi
gi.require_version('Gst', '1.0')
from gi.repository import GObject, Gst, GLib
import pyds

# Initialize GStreamer
Gst.init(None)

# Create Pipeline element that will form a connection of other elements
pipeline=Gst.Pipeline()
print('Created pipeline')

Created pipeline


<a name='s3.3'></a>
### Creating Pipeline Elements ###
We create each element in the pipeline using `Gst.ElementFactory.make(factoryname, name)` (see [documentation](https://lazka.github.io/pgi-docs/Gst-1.0/classes/ElementFactory.html#Gst.ElementFactory.make)). We can configure elements using `set_property(property_name, value)` (see [documentation](https://gstreamer.freedesktop.org/documentation/plugin-development/basics/args.html?gi-language=c)) with the required parameters as needed. In this step, we also add the elements to the pipeline with `Gst.Pipeline.add(element)`. 

In [7]:
# Create Source element for reading from a file and set the location property
source = Gst.ElementFactory.make("filesrc", "file-source")
source.set_property('location', "data/highway_traffic.h264")

# Create H264 Parser with h264parse as the input file is an elementary h264 stream
h264parser = Gst.ElementFactory.make("h264parse", "h264-parser")

# Create Decoder with nvv4l2decoder for accelerating decoding on GPU
decoder = Gst.ElementFactory.make("nvv4l2decoder", "nvv4l2-decoder")

# Create Streamux with nvstreammux to form batches for one or more sources and set properties
streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
streammux.set_property('width', 640) 
streammux.set_property('height', 360) 
streammux.set_property('batch-size', 1)

# Create Primary GStreamer Inference Element with nvinfer to run inference on the decoder's output after batching
pgie=Gst.ElementFactory.make("nvinfer", "primary-inference")

# Create Secondary Inference Element with nvinfer to run inference on the pgie's output
sgie=Gst.ElementFactory.make("nvinfer", "secondary-inference")

# Create Convertor to convert from YUV to RGBA as required by nvdsosd
nvvidconv1=Gst.ElementFactory.make("nvvideoconvert", "convertor1")

# Create OSD with nvdsosd to draw on the converted RGBA buffer
nvosd=Gst.ElementFactory.make("nvdsosd", "onscreendisplay")

# Create Convertor to convert from RGBA to I420 as required by encoder
nvvidconv2=Gst.ElementFactory.make("nvvideoconvert", "convertor2")

# Create Capsfilter to enforce frame image format
capsfilter=Gst.ElementFactory.make("capsfilter", "capsfilter")
caps=Gst.Caps.from_string("video/x-raw, format=I420")
capsfilter.set_property("caps", caps)

# Create Encoder to encode I420 formatted frames using the MPEG4 codec
encoder=Gst.ElementFactory.make("avenc_mpeg4", "encoder")
encoder.set_property("bitrate", 2000000)

# Create Sink with fakesink as the end point of the pipeline
sink=Gst.ElementFactory.make('filesink', 'filesink')
sink.set_property('location', 'output_04_raw.mpeg4')
sink.set_property("sync", 1)
print('Created elements')

Created elements


In [8]:
# Add elements to pipeline
pipeline.add(source)
pipeline.add(h264parser)
pipeline.add(decoder)
pipeline.add(streammux)
pipeline.add(pgie)
pipeline.add(sgie)
pipeline.add(nvvidconv1)
pipeline.add(nvosd)
pipeline.add(nvvidconv2)
pipeline.add(capsfilter)
pipeline.add(encoder)
pipeline.add(sink)
print('Added elements to pipeline')

Added elements to pipeline


<a name='e1'></a>
The `nvinfer` plugin needs to be configured through a [config file](./spec_files/pgie_config_trafficcamnet_03.txt)(`./spec_files/pgie_config_trafficcamnet_03.txt`)

These properties are important to understand: 
* Properties related to models downloaded from NGC or trained with the TAO Toolkit: 
    * `tlt-encoded-model` - pathname of the TAO Toolkit encoded model
    * `tlt-model-key` - model load key for the TAO Toolkit encoded model
    * `labelfile-path` - pathname of a text file containing the labels for the model. The labels must be new line delimited for object detection models
    * `uff-input-blob-name` - name of the input blob in the UFF file
    * `output-blob-names` - array of output layer names
    * `input-dims` - dimensions of the model input as [channel; height; width; input-order] where input-order=0 for NCHW
    * `net-scale-factor` - pixel normalization factor _(default=1)_

* Recommended properties: 
    * `batch-size` - number of frames to be inferred together in a batch _(default=1)_

* Mandatory properties for detectors: 
    * `num-detected-classes` - number of classes detected by the network

* Optional properties for detectors: 
    * `cluster-mode` - clustering algorithm to use _(default=0 i.e. Group Rectangles)_. Refer to the section on [clustering algorithms supported by nvinfer](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html#cluster-mode-info) in the documentation for more information. 
    * `nms-iou-thresholds`: maximum IOU score between two proposals after which the proposal with the lower confidence will be rejected
    * `pre-cluster-thresholds`: detection threshold to be applied prior to clustering
    * `topk`: keep only top K objects with highest detection scores

* Other optional properties: 
    * `network-mode` - data format to be used for inference _(0=FP32, 1=INT8, 2=FP16 mode | default=0 i.e. FP32)_
    * `process-mode` - mode _(primary or secondary)_ in which the plugin is to operate on _(default=1 i.e. primary)_
    * `model-color-format` - color format required by the model _(default=0 i.e. RGB)_
    * `interval` - number of consecutive batches to be skipped for inference _(primary mode only | default=0)_
    * `gie-unique-id` - unique ID to be assigned to the GIE to enable the application and other elements to identify detected bounding boxes and labels _(default=0)_
    * `gpu-id` - device ID of GPU to use for pre-processing/inference _(dGPU only)_

You can find most of the information needed on the [model card](https://catalog.ngc.nvidia.com/orgs/nvidia/models/tlt_dashcamnet): 

<p><img src='images/encryption_key.png' width=480></p>
<p><img src='images/model_card.png' width=720></p>

The blob names will be based on the deep learning model. For this pre-trained model from NGC, they can be found in the accompanied [text file](./ngc_assets/trafficcamnet_vpruned_v1.0/trafficnet_int8.txt) (`./ngc_assets/trafficcamnet_vpruned_v1.0/trafficnet_int8.txt`). 

<p><img src='images/blob_names.png' width=360></p>

In [9]:
# Set the location of the config file
pgie.set_property('config-file-path', 'spec_files/pgie_config_trafficcamnet_03.txt')
sgie.set_property('config-file-path', 'spec_files/sgie_config_vehicletypenet_04.txt')

<a name='s3.4'></a>
### Linking Pipeline Elements ###
Finally, we link all these elements in the order that data flow through the pipeline with `Gst.Element.link(Gst.Element)`.
<p><img src='images/important.png' width=720></p>

When connecting a source to `nvstreammux` (the muxer), the input's source pad, obtained from `get_static_pad(name='src')`, must be manually linked to a newly requested sink pad from the muxer using `get_request_pad(name)` and the pad template `sink_%u`. This enables the ability to have multiple sources feeding into the `nvstreammux` plugin. 

In [10]:
# Link elements together
source.link(h264parser)
h264parser.link(decoder)

# Link decoder source pad to streammux sink pad
decoder_srcpad=decoder.get_static_pad("src")    
streammux_sinkpad=streammux.get_request_pad("sink_0")
decoder_srcpad.link(streammux_sinkpad)

# Link the rest of the elements in the pipeline
streammux.link(pgie)
pgie.link(sgie)
sgie.link(nvvidconv1)
nvvidconv1.link(nvosd)
nvosd.link(nvvidconv2)
nvvidconv2.link(capsfilter)
capsfilter.link(encoder)
encoder.link(sink)
print('Linked elements in pipeline')

Linked elements in pipeline


<a name='s3.5'></a>
### Probe to Metadata Access ###
We use probes to access metadata, which are callback functions that interact with the pads of elements. To access the metadata, we can use the DeepStream Python bindings, `pyds`. DeepStream uses an extensible standard structure for metadata. The basic metadata structure `NvDsBatchMeta` starts with batch-level metadata, created inside the `Gst-nvstreammux` plugin (see below). The object-level metadata we are looking for is accessible through `NvDsBatchMeta` > `NvDsFrameMeta` > `NvDsObjectMeta`. Some metadata instances are stored in `GList` form, which requires the data to be casted to the appropriate structure using `pyds.NvDsFrameMeta.cast(data)` or  `pyds.NvDsObjectMeta.cast(data)`. The `NvDsObjectMeta` contains the inference results from the deep learning neural networks, depending on what the configurations are. 
<p><img src='images/DS_plugin_metadata.png' width=480></p>

We write the user-defined callback function `osd_sink_pad_buffer_probe`. Inside the function, we iterate through all the metadata types that are attached to the buffer. We want to add a text label at the top of each frame and modify the border color of the bounding boxes. we also define a helper `analyze_meta` function that analyzes the metadata generated by the secondary inference pluginplugin.

By the time the buffers reach the `nvdsosd` plugin, it will have the metadata it needs to draw bounding boxes and text labels for the detected objects on the frames. This means that a probe will not be needed if the default settings are sufficient. Based on specific use cases, we can use a probe to access and modify the metadata used by `nvdsosd` to draw, but it would have to be inserted at or before the `sink` pad of the `nvdsosd` plugin. To access the metadata, we can use the DeepStream Python bindings, `pyds`. There are generally two types of metadata we are interested in for drawing: 
* `NvDsObjectMeta.rect_params` and `NvDsObjectMeta.text_params` related to the objects detected or
* `NvDsBatchMeta` > `NvDsFrameMeta` > `NvDsDisplayMeta` related to overlays we want to add to each frame

 

In [11]:
PGIE_CLASS_ID_CAR=0

# Define helper function
def analyze_meta(obj_meta): 
    # Only car supports secondary inference
    if obj_meta.class_id == PGIE_CLASS_ID_CAR:     
        cls_meta=obj_meta.classifier_meta_list
        
        # Iterate through each class meta until the end
        while cls_meta is not None:
            cls=pyds.NvDsClassifierMeta.cast(cls_meta.data)
            # Get label info
            label_info=cls.label_info_list  
            
            # Iterate through each label info meta until the end
            while label_info is not None:
                # Cast data type of label from pyds.GList
                label_meta=pyds.glist_get_nvds_label_info(label_info.data)
                if cls.unique_component_id==2:
                    print('\t Type & Probability = {}% {}'.format(round(label_meta.result_prob*100), label_meta.result_label))
                try:
                    label_info=label_info.next
                except StopIteration:
                    break
            
            try:
                cls_meta=cls_meta.next
            except StopIteration:
                break
    return None

In [12]:
from random import random

# Define the Probe Function
def osd_sink_pad_buffer_probe(pad, info):
    
    isCongested = False
    
    gst_buffer=info.get_buffer()

    # Retrieve batch metadata from the gst_buffer
    # Note that pyds.gst_buffer_get_nvds_batch_meta() expects the
    # C address of gst_buffer as input, which is obtained with hash(gst_buffer)
    batch_meta=pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame=batch_meta.frame_meta_list

    # Iterate through each frame in the batch metadata until the end
    while l_frame is not None:
        try:
            frame_meta=pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break

        frame_number=frame_meta.frame_num
        num_rects=frame_meta.num_obj_meta
        l_obj=frame_meta.obj_meta_list
        
        cars = 0
        persons = 0
        bicycles = 0
        road_signs = 0
        
        if num_rects >= 23:
            isCongested = True
            
        # Iterate through each object in the frame metadata until the end
        while l_obj is not None:
            try:
                obj_meta=pyds.NvDsObjectMeta.cast(l_obj.data)
                
                if obj_meta.obj_label == "car":
                    cars += 1
                elif obj_meta.obj_label == "person":
                    persons += 1
                elif obj_meta.obj_label == "bicycle":
                    bicycles += 1
                elif obj_meta.obj_label == "road_sign":
                    road_signs += 1
                
                 # Define an analyze_meta function to manipulate metadata
                analyze_meta(obj_meta)
                
            except StopIteration:
                break
            
            
            # Set border color (red, green, blue, alpha) to random values
            obj_meta.rect_params.border_color.set(random(), random(), random(), random())
            
            try: 
                l_obj=l_obj.next
            except StopIteration:
                break

        # Acquire display metadata from pool and set number of labels to 1
        display_meta=pyds.nvds_acquire_display_meta_from_pool(batch_meta)
        display_meta.num_labels=1
        
        # Set text_params of the display metadata to local variable
        py_nvosd_text_params=display_meta.text_params[0]

        #Setting display text to be shown on screen
        py_nvosd_text_params.display_text = f"Number of Objects: {num_rects}\nTraffic Congestion: {isCongested}\
                                            \nDetected Cars: {cars}\nDetected Persons: {persons}\nDetected Bicycles: {bicycles}\
                                            \nDetected Road_Signs: {road_signs}"

        # Use pyds.get_string() to get display_text as string
        # Reading the display_text field here will return the C address of the
        # allocated string. Use pyds.get_string() to get the string content.
        print(pyds.get_string(py_nvosd_text_params.display_text))
        
        # Set the offsets where the string should appear
        py_nvosd_text_params.x_offset=500
        py_nvosd_text_params.y_offset=10

        # Set font, font-color (red, green, blue, alpha), and font-size
        py_nvosd_text_params.font_params.font_name="Serif"
        py_nvosd_text_params.font_params.font_size=13
        py_nvosd_text_params.font_params.font_color.set(1.0, 1.0, 1.0, 1.0)

        # Set text background color (red, green, blue, alpha)
        py_nvosd_text_params.set_bg_clr=1
        py_nvosd_text_params.text_bg_clr.set(0.0, 0.0, 0.0, 1.0)

        # Add to frame metadata
        pyds.nvds_add_display_meta_to_frame(frame_meta, display_meta)

        try:
            l_frame=l_frame.next
        except StopIteration:
            break
    return Gst.PadProbeReturn.OK

With the pipeline defined and the elements linked, we add the callback function on the source pad of the `nvinfer` plugin using `Gst.Pad.add_probe(mask, callback)`. While attached, the probe notifies when there are data passing on a pad. We can use `GST_PAD_PROBE_TYPE_BUFFER` or `GST_PAD_PROBE_TYPE_BUFFER_LIST` for `mask` when creating the probe. We designed the callback function to work with a single buffer so we're using `Gst.PadProbeType.BUFFER`. 

In [13]:
# Add probe to nvdsosd plugin's sink
osdsinkpad=nvosd.get_static_pad("sink")
probe_id=osdsinkpad.add_probe(Gst.PadProbeType.BUFFER, osd_sink_pad_buffer_probe)
print('Attached probe')

Attached probe


<a name='s3.6'></a>
### Starting the Pipeline ###
The pipeline has a bus that we will use to monitor messages. We run a GLib/Gtk+ `MainLoop` (or iterate the default GLib main context regularly) and attach a watch/message handler to the bus with `Gst.Bus.add_signal_watch()`. This way the `GLib.Mainloop` will check the bus for new messages and notify. The message handler is also achieved through a callback function, which we define as `bus_call`. This handler will be called whenever the pipeline emits a message to the bus. The return value of the callback function should be `True` to keep it attached to the bus. Since this callback function for checking messages to the bus is used by many of the DeepStream applications, we refactor it into a [Python script](common/bus_call.py) (`./common/bus_call.py`) for ease of use. When we use `inpsect.getsource(object)` to look at the definition of the function, we find that it's a simple procedure that checks for the message type and handles them accordingly. With the message handler in place, we put the pipeline in the `PLAYING` state and run the `MainLoop`. Finally, when the pipeline is finished, we put the pipeline into the `NULL` state to clean up.

In [14]:
# DO NOT CHANGE THIS CELL
from common.bus_call import bus_call

# Inspect the definition for bus_call
from inspect import getsource

In [15]:
# Create an event loop
loop=GLib.MainLoop()

# Feed GStreamer bus messages to loop
bus=pipeline.get_bus()
bus.add_signal_watch()
bus.connect("message", bus_call, loop)
print('Added bus message handler')

Added bus message handler


In [None]:
# Start play back and listen to events
print("Starting pipeline")
pipeline.set_state(Gst.State.PLAYING)
try:
    loop.run()
except:
    pass

# Cleaning up as the pipeline comes to an end
pipeline.set_state(Gst.State.NULL)

Starting pipeline
Number of Objects: 9
Traffic Congestion: False                                            
Detected Cars: 9
Detected Persons: 0
Detected Bicycles: 0                                            
Detected Road_Signs: 0
Number of Objects: 9
Traffic Congestion: False                                            
Detected Cars: 9
Detected Persons: 0
Detected Bicycles: 0                                            
Detected Road_Signs: 0
Number of Objects: 8
Traffic Congestion: False                                            
Detected Cars: 8
Detected Persons: 0
Detected Bicycles: 0                                            
Detected Road_Signs: 0
Number of Objects: 9
Traffic Congestion: False                                            
Detected Cars: 9
Detected Persons: 0
Detected Bicycles: 0                                            
Detected Road_Signs: 0
Number of Objects: 8
Traffic Congestion: False                                            
Detected Cars: 8
Detected P

<a name='s5.7'></a>
### Viewing the Inference ###
In the next step, we convert the video file into a container file before playing it since the MPEG4 encoded video file can't be played directly in JupyterLab. The [FFmpeg](https://ffmpeg.org/) tool is a very fast video and audio converter with the general syntax: 
* `ffmpeg [global_options] {[input_file_options] -i input_url} ... {[output_file_options] output_url} ...` 

When using the `ffmpeg` command, the `-i` option lets us read an input URL, the `-loglevel quiet` option suppresses the logs to reduce the output, and the `-y` flag overwrites any existing output file with the same name. 

In [19]:
# DO NOT CHANGE THIS CELL
# Convert MPEG4 video file to MP4 container file
!ffmpeg -i /dli/task/output_04_raw.mpeg4 /dli/task/highway_traffic.mp4 \
        -y \
        -loglevel quiet

from IPython.display import Video
# View the output video
Video("output.mp4", width=720)

ValueError: To embed videos, you must pass embed=True (this may make your notebook files huge)
Consider passing Video(url='...')