# Integrating Multiple TAO Models in a single DeepStream Pipeline

In the previous notebook we made a simple DeepStream pipeline for object detection. In this notebook we take the idea forward and build a multi-class object detection,tracking and attribute classification pipeline.

Note: This notebook has code inspired from a sample application provided by Nvidia in a GitHub repository. You can find this repository [here](https://github.com/NVIDIA-AI-IOT/deepstream_python_apps). The test application introduces more than one secondary inference engine. For the sake of simplicity, we have reduced this number to one.

![test2](../images/test2.png)

We can explore the architecture diagram of the application. Here, we have an additional model that identifies the car type. Plugging in the additional model is like adding the original classifier, however there are configuration considerations to take care of. A new idea that we will be using in this notebook is the nvtracker plugin.  

### Changes in configuration

Because this secondary classifier is only intended to execute on objects that we believe are vehicles, we will need to add new configuration parameters to generate this behavior.Two new parameters, `operate-on-gie-id` and `operate-on-class-ids` will let us control this behavior.

The first, `operate-on-gie-id`, lets us configure a classifier to only execute on objects from a
different classifier. In this case, we will configure the secondary classifier to only execute on
objects detected by the primary classifier. The second, `operate-on-class-ids`, lets us configure a
classifier to only execute on objects of a specific class. By combining these two, our secondary
classifies will be configured to only evaluate the type of objects classified as
cars by our primary model.



# Nvtracker

The plugin accepts NV12/RGBA data from the upstream component and scales (converts) the input buffer to a Luma buffer with a specific tracker width and height. (Tracker width and height must be specified in the configuration file's [tracker] section.) 

The low-level library uses a CPU based implementation of the Kanade Lucas Tomasi (KLT)  tracker algorithm. The plugin also supports the Intersection of Union (IoU) tracker algorithm, which uses the intersection of the detector’s bounding boxes across frames to determine the object's unique ID.

![nvtracker](../images/nvtracker.png)

The tracker component updates the object’s metadata with a tracker-id. After this component, we
add a secondary neural network classifier. This classifier works on the objects
detected as “vehicles or cars”. The classifier classifies car type (e.g. coupe, sedan, etc.). This
classifier, after inference on a car object, will append the metadata to its result. Then, the
application, using a callback function, can access the metadata to understand and analyze the
attributes of the objects.

### Building the Pipeline

Let us now build the pipeline in a similar fashion as describe in the previous notebook.

In [11]:
# Import Required Libraries 
import sys
sys.path.append('../source_code')
import gi
import time
import configparser
gi.require_version('Gst', '1.0')
from gi.repository import GObject, Gst, GLib
from common.bus_call import bus_call
import pyds

# Defining the Class Labels
PGIE_CLASS_ID_VEHICLE = 0
PGIE_CLASS_ID_BICYCLE = 1
PGIE_CLASS_ID_PERSON = 2
PGIE_CLASS_ID_ROADSIGN = 3

# Defining the input output video file 
INPUT_VIDEO_NAME  = '../videos/sample_720p.h264'
OUTPUT_VIDEO_NAME = "../videos/out_sec.mp4"

We define a function `make_elm_or_print_err()` to create our elements and report any errors if the creation fails.

Elements are created using the `Gst.ElementFactory.make()` function as part of Gstreamer library.

In [12]:
## Make Element or Print Error and any other detail
def make_elm_or_print_err(factoryname, name, printedname, detail=""):
  print("Creating", printedname)
  elm = Gst.ElementFactory.make(factoryname, name)
  if not elm:
     sys.stderr.write("Unable to create " + printedname + " \n")
  if detail:
     sys.stderr.write(detail)
  return elm

#### Initialise GStreamer and Create an Empty Pipeline

In [13]:
# Standard GStreamer initialization
Gst.init(None)

# Create gstreamer elements
# Create Pipeline element that will form a connection of other elements
print("Creating Pipeline \n ")
pipeline = Gst.Pipeline()

if not pipeline:
    sys.stderr.write(" Unable to create Pipeline \n")

Creating Pipeline 
 


#### Create Elements that are required for our pipeline 

In [14]:
# Creating elements required for the pipeline
# Source element for reading from file
source = make_elm_or_print_err("filesrc", "file-source","Source")
# Parse the data since the input is an elementary .h264 stream
h264parser = make_elm_or_print_err("h264parse", "h264-parser","h264 parse")
# For hardware accelerated decoding of the stream
decoder = make_elm_or_print_err("nvv4l2decoder", "nvv4l2-decoder","Nvv4l2 Decoder")
# Form batches from one or more sources
streammux = make_elm_or_print_err("nvstreammux", "Stream-muxer",'NvStreamMux')
# Run inference on the decoded stream, this property is set through a configuration file later
pgie = make_elm_or_print_err("nvinfer", "primary-inference" ,"pgie")
# Assign unique ids to each detected object
tracker = make_elm_or_print_err("nvtracker", "tracker",'tracker')
# Secondary inference for finding car type
sgie1 = make_elm_or_print_err("nvinfer", "secondary1-nvinference-engine",'sgie1')
# Convert output stream to formatted buffer accepted by Nvosd
nvvidconv = make_elm_or_print_err("nvvideoconvert", "convertor","nvvidconv")
# Draw on the buffer
nvosd = make_elm_or_print_err("nvdsosd", "onscreendisplay","nvosd")
# Encode and save the OSD output
queue = make_elm_or_print_err("queue", "queue", "Queue")
# Convert output for saving
nvvidconv2 = make_elm_or_print_err("nvvideoconvert", "convertor2","nvvidconv2")
# Save as video file
encoder = make_elm_or_print_err("avenc_mpeg4", "encoder", "Encoder")
# Parse output from encoder
codeparser = make_elm_or_print_err("mpeg4videoparse", "mpeg4-parser", 'Code Parser')
# Create a container
container = make_elm_or_print_err("qtmux", "qtmux", "Container")
# Create sink for string the output
sink = make_elm_or_print_err("filesink", "filesink", "Sink")

Creating Source
Creating h264 parse
Creating Nvv4l2 Decoder
Creating NvStreamMux
Creating pgie
Creating tracker
Creating sgie1
Creating nvvidconv
Creating nvosd
Creating Queue
Creating nvvidconv2
Creating Encoder
Creating Code Parser
Creating Container
Creating Sink


Now that we have created the elements ,we can now set various properties for out pipeline at this point. 

For sgie1, we use `operate-on-gie-id` and `operate-on-class-ids`, we have configured the pipeline to only evaluate the type of objects classified as cars by our primary model.

You can access the configuration files here : [pgie](../configs/config_infer_primary_trafficcamnet.txt) , [sgie1](../configs/config_infer_secondary_vehicletypenet.txt)

In [15]:
# Set properties for elements
print("Playing file %s" %INPUT_VIDEO_NAME)
# Set input file
source.set_property('location', INPUT_VIDEO_NAME)
# Set input height, width, and batch size
streammux.set_property('width', 1920)
streammux.set_property('height', 1080)
streammux.set_property('batch-size', 1)
# Set timer (in microseconds) to wait after the first buffer is available
# to push the batch even if batch is never completely formed
streammux.set_property('batched-push-timeout', 4000000)
# Set configuration files for Nvinfer
pgie.set_property('config-file-path', "../configs/config_infer_primary_trafficcamnet.txt")
sgie1.set_property('config-file-path', "../configs/config_infer_secondary_vehicletypenet.txt")
# Set properties of tracker
config = configparser.ConfigParser()
config.read('../configs/tracker_config.txt')
config.sections()
for key in config['tracker']:
    if key == 'tracker-width' :
        tracker_width = config.getint('tracker', key)
        tracker.set_property('tracker-width', tracker_width)
    if key == 'tracker-height' :
        tracker_height = config.getint('tracker', key)
        tracker.set_property('tracker-height', tracker_height)
    if key == 'gpu-id' :
        tracker_gpu_id = config.getint('tracker', key)
        tracker.set_property('gpu_id', tracker_gpu_id)
    if key == 'll-lib-file' :
        tracker_ll_lib_file = config.get('tracker', key)
        tracker.set_property('ll-lib-file', tracker_ll_lib_file)
    if key == 'll-config-file' :
        tracker_ll_config_file = config.get('tracker', key)
        tracker.set_property('ll-config-file', tracker_ll_config_file)
    if key == 'enable-batch-process' :
        tracker_enable_batch_process = config.getint('tracker', key)
        tracker.set_property('enable_batch_process', tracker_enable_batch_process)

# Set encoder bitrate for output video
encoder.set_property("bitrate", 2000000)
# Set output file location, disable sync and async
sink.set_property("location", OUTPUT_VIDEO_NAME)
sink.set_property("sync", 0)
sink.set_property("async", 0)

Playing file %s ../videos/sample_720p.h264


We now link all the elements in the order we prefer and create Gstreamer bus to feed all messages through it. 

In [16]:
# Add and link all elements to the pipeline
# Adding elements
print("Adding elements to Pipeline \n")
pipeline.add(source)
pipeline.add(h264parser)
pipeline.add(decoder)
pipeline.add(streammux)
pipeline.add(pgie)
pipeline.add(tracker)
pipeline.add(sgie1)
pipeline.add(nvvidconv)
pipeline.add(nvosd)
pipeline.add(queue)
pipeline.add(nvvidconv2)
pipeline.add(encoder)
pipeline.add(codeparser)
pipeline.add(container)
pipeline.add(sink)

# Linking elements
# Order: source -> h264parser -> decoder -> streammux -> pgie ->
# -> tracker -> sgie -> vidconv -> osd -> queue -> vidconv2 ->
# -> encoder -> parser -> container -> sink

print("Linking elements in the Pipeline \n")

source.link(h264parser)
h264parser.link(decoder)

sinkpad = streammux.get_request_pad("sink_0")
if not sinkpad:
    sys.stderr.write(" Unable to get the sink pad of streammux \n")
    
srcpad = decoder.get_static_pad("src")
if not srcpad:
    sys.stderr.write(" Unable to get source pad of decoder \n")
    
srcpad.link(sinkpad)
streammux.link(pgie)
pgie.link(tracker)
tracker.link(sgie1)
sgie1.link(nvvidconv)
nvvidconv.link(nvosd)
nvosd.link(queue)
queue.link(nvvidconv2)
nvvidconv2.link(encoder)
encoder.link(codeparser)
codeparser.link(container)
container.link(sink)

Adding elements to Pipeline 

Linking elements in the Pipeline 



True

In [17]:
# Create an event loop and feed GStreamer bus messages to it
loop = GLib.MainLoop()

bus = pipeline.get_bus()
bus.add_signal_watch()
bus.connect ("message", bus_call, loop)

3

Our pipeline now carries the metadata forward but we have not done anything with it until now, but as mentoioned in the above pipeline diagram , we will now create a callback function to write relevant data on the frame once called and create a sink pad in the nvosd element and link it to the callback function. 

This callback function is the same as used in the previous notebook.

In [18]:
# Working with metadata
def osd_sink_pad_buffer_probe(pad,info,u_data):
    
    obj_counter = {
        PGIE_CLASS_ID_VEHICLE:0,
        PGIE_CLASS_ID_PERSON:0,
        PGIE_CLASS_ID_BICYCLE:0,
        PGIE_CLASS_ID_ROADSIGN:0
    }
    # Reset frame number and number of rectanges to zero
    frame_number=0
    num_rects=0
    
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        print("Unable to get GstBuffer ")
        return

    # Retrieve metadata from gst_buffer
    # Note: since we use the pyds shared object library,
    # the input is the C address of gst_buffer
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break
        
        # Get frame number, number of rectangles to draw and object metadata
        frame_number=frame_meta.frame_num
        num_rects = frame_meta.num_obj_meta
        l_obj=frame_meta.obj_meta_list
        
        while l_obj is not None:
            try:
                obj_meta=pyds.NvDsObjectMeta.cast(l_obj.data)
            except StopIteration:
                break
            # Increment object class by 1 and set box border color to red
            obj_counter[obj_meta.class_id] += 1
            obj_meta.rect_params.border_color.set(0.0, 0.0, 1.0, 0.0)
            try: 
                l_obj=l_obj.next
            except StopIteration:
                break
        
        # Setting metadata display configuration
        # Acquire display meta object
        display_meta=pyds.nvds_acquire_display_meta_from_pool(batch_meta)
        display_meta.num_labels = 1
        py_nvosd_text_params = display_meta.text_params[0]
        # Set display text to be shown on screen
        py_nvosd_text_params.display_text = "Frame Number={} Number of Objects={} Vehicle_count={} Person_count={}".format(frame_number, num_rects, obj_counter[PGIE_CLASS_ID_VEHICLE], obj_counter[PGIE_CLASS_ID_PERSON])
        # Set where the string will appear
        py_nvosd_text_params.x_offset = 10
        py_nvosd_text_params.y_offset = 12
        # Font, font colour and font size
        py_nvosd_text_params.font_params.font_name = "Serif"
        py_nvosd_text_params.font_params.font_size = 10
        # Set color (We used white)
        py_nvosd_text_params.font_params.font_color.set(1.0, 1.0, 1.0, 1.0)
        # Set text background colour (We used black)
        py_nvosd_text_params.set_bg_clr = 1
        py_nvosd_text_params.text_bg_clr.set(0.0, 0.0, 0.0, 1.0)
        # Print the display text in the console as well
        print(pyds.get_string(py_nvosd_text_params.display_text))
        pyds.nvds_add_display_meta_to_frame(frame_meta, display_meta)
        
        
        try:
            l_frame=l_frame.next
        except StopIteration:
            break
    return Gst.PadProbeReturn.OK

In [19]:
# Adding probe to sinkpad of the OSD element
osdsinkpad = nvosd.get_static_pad("sink")
if not osdsinkpad:
    sys.stderr.write(" Unable to get sink pad of nvosd \n")
    
osdsinkpad.add_probe(Gst.PadProbeType.BUFFER, osd_sink_pad_buffer_probe, 0)

1

Now with everything defined , we can start the playback and listen the events.

In [20]:
# Start the pipeline
print("Starting pipeline \n")
start_time = time.time()
pipeline.set_state(Gst.State.PLAYING)
try:
    loop.run()
except:
    pass
# Cleanup
pipeline.set_state(Gst.State.NULL)
print("--- %s seconds ---" % (time.time() - start_time))

Starting pipeline 

Frame Number=0 Number of Objects=6 Vehicle_count=3 Person_count=3
Frame Number=1 Number of Objects=7 Vehicle_count=2 Person_count=5
Frame Number=2 Number of Objects=5 Vehicle_count=3 Person_count=2
Frame Number=3 Number of Objects=6 Vehicle_count=4 Person_count=2
Frame Number=4 Number of Objects=6 Vehicle_count=3 Person_count=3
Frame Number=5 Number of Objects=7 Vehicle_count=5 Person_count=2
Frame Number=6 Number of Objects=6 Vehicle_count=4 Person_count=2
Frame Number=7 Number of Objects=6 Vehicle_count=4 Person_count=2
Frame Number=8 Number of Objects=8 Vehicle_count=6 Person_count=2
Frame Number=9 Number of Objects=7 Vehicle_count=5 Person_count=2
Frame Number=10 Number of Objects=6 Vehicle_count=4 Person_count=2
Frame Number=11 Number of Objects=8 Vehicle_count=6 Person_count=2
Frame Number=12 Number of Objects=7 Vehicle_count=5 Person_count=2
Frame Number=13 Number of Objects=3 Vehicle_count=1 Person_count=2
Frame Number=14 Number of Objects=5 Vehicle_count=3 

This video output is not compatible to be shown in this Notebook. To circumvent this, we convert the output in a Jupyter Notebook-readable format. For this we use the shell command `ffmpeg`.

In [21]:
# Convert video profile to be compatible with the Notebook
!ffmpeg -loglevel panic -y -an -i ../videos/out_sec.mp4 -vcodec libx264 -pix_fmt yuv420p -profile:v baseline -level 3 ../videos/output_sec.mp4

Finally, we display the output in the notbook by creating an HTML video element.

In [22]:
# Display the output
from IPython.display import HTML
HTML("""
 <video width="640" height="480" controls>
 <source src="../videos/output_sec.mp4"
 </video>
""".format())