<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# Introduction to the DeepStream SDK #
The DeepStream SDK is a streaming analytics toolkit that can be used to build video AI applications. It simplifies the process by letting developers combine existing or custom plugins to construct video processing pipelines for their specific use case. DeepStream makes it easier than ever to get started building and deploying AI-based intelligent video analytics applications. 

When developing intelligent video analytics solutions, DeepStream helps users tackle laborious tasks like:
* Leverage hardware for accelerated processing
* Optimize pipeline for high data-throughput and low latency
* Optimize neural network model for high-speed inference
* Process data from multiple video streams simultaneously
* Keep track of metadata associated with each frame of a video

In doing so, we enable developers to prioritize important business decisions like: 
* Kind and number of video streams to analyze
* Type(s) of video analytics to perform
* Post-processing of the AI inference results

The DeepStream SDK allows developers to focus on the more *important* tasks related to the project's goal and impact. It empowers developers to build core deep learning networks and IP rather than design end-to-end solutions from scratch. 

## Learning Objectives ##
In this notebook, you will gain the foundational understanding necessary to use the NVIDIA DeepStream SDK effectively, including: 
* History of GStreamer and DeepStream
* Anatomy of a DeepStream Video AI Pipeline
* Different Types of DeepStream Plugins
* How Data Flow Through a DeepStream Pipeline

**Table of Contents**
<br>
This notebook covers the below sections: 
1. [Sample Video AI Application](#s1)
    * [Video Formats](#s1.1)
    * [Exercise #1 - Run Sample Application](#e1)
2. [GStreamer Foundations](#s2)
3. [Anatomy of a DeepStream Pipeline](#s3)
    * [Inspecting Plugins](#s3.1)
    * [Exercise #2 - Explore Plugins](#e2)
4. [Access Insights Generated from AI Inference](#s4)
    * [Probe](#s4.1)

<a name='s1'></a>
## Sample Video AI Application ##
Let's look at a sample video AI application. In this lab, we will build DeepStream pipelines to analyze a parking garage camera feed. This sample application uses the same pipeline we will construct in the next notebook. For demonstration, we refactored the procedure into a [Python script](sample_apps/app_02.py). 

<a name='s1.1'></a>
### Video Formats ###
The input video file is an encoded video file with a **.h264** extension, which is perhaps not the **.mp4** extension we would expect for a video file. The .mp4 file extension is a representation of the container, which has all the files needed to play back a video. These files include the visual images, the audio tracks, and the metadata (i.e., bitrate, resolution, subtitles, timestamp, etc.). The metadata also contains information about the **codec** used for the audio and video streams. The codec, which is a mashup of the words *co*de and *dec*ode, is a method used to compress (encode) a video into a smaller size for faster transmission. The encoded file can be decompressed (decoded) using the same codec for playback and processing. The most common video codecs include **[H.264](https://en.wikipedia.org/wiki/Advanced_Video_Coding)**, **[H.265](https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding)**, and **[MPEG4](https://en.wikipedia.org/wiki/MPEG-4)**. Separate from MPEG4, **[MP4](https://en.wikipedia.org/wiki/MPEG-4_Part_14)** is a container that can be used for playback in the JupyterLab. These properties describe the video format and new ones are continuously being developed to provide improvements in quality, file size, and video playback. We need to build the application based on the video format(s) of the input and desire output. 

<p><img src='images/important.png' width=720></p>
When performing video analytics, it is likely that the application will consume H.264 encoded video streams instead of MP4 container files since only the video component is needed. 

<a name='e1'></a>
#### Exercise #1 - Run Sample Application ####

**Instructions**: <br>
* Execute the below cell to convert the H.264 encoded video file, which can't be played in JupyterLab, into a MP4 file for playback. 
    * The [FFmpeg](https://ffmpeg.org/) tool is a very fast video and audio converter with the general syntax: <br> `ffmpeg [global_options] {[input_file_options] -i input_url} ... {[output_file_options] output_url} ...`. <br> When using the `ffmpeg` command, the `-i` option lets us read an input URL, the `-loglevel quiet` option suppresses the logs, and the `-y` flag overwrites any existing output file with the same name. 
* Execute the cell below to see the converted input video. 
* Execute the cell below to run the DeepStream pipeline. Since we designed the pipeline to write an encoded output file using the MPEG4 codec, we also convert it into a MP4 container file for playback. 
* Execute the cell below after to convert the MPEG4 encoded video output file into a MP4 file and play the output video. 

In [1]:
# DO NOT CHANGE THIS CELL
from IPython.display import Video

!ffmpeg -i /mnt/c/Users/drjai/Nvidia_Deepstream/input_vid.mp4 -c:v libx264 -crf 23 /mnt/c/Users/drjai/Nvidia_Deepstream/data/sample_vid.h264 -y -loglevel quiet
# Convert the H.264 encoded video file to MP4 container file - this will generate the sample_30.mp4 file
#!ffmpeg -i /mnt/c/Users/drjai/AI Video Nvidia/data/sample_30.h264 /mnt/c/Users/drjai/AI Video Nvidia/sample_30.mp4 \
        #-y \
        #-loglevel quiet

# View the input video
Video('input_vid.mp4', width=720)

In [2]:
# DO NOT CHANGE THIS CELL
# Run the DeepStream pipeline - this will generate the output_02_encoded.mpeg4 file
%run sample_apps/app_2.py data/sample_vid.h264

Exception: File `'sample_apps/app_2.py'` not found.

In [5]:
# DO NOT CHANGE THIS CELL
# Convert the encoded video file for playback - this will generate the output_02.mp4 file
!ffmpeg -i /dli/task/output_02_encoded.mpeg4 /dli/task/output_02.mp4 \
        -y \
        -loglevel quiet

# View the output video
Video('output_02.mp4', width=720)

<a name='s2'></a>
## GStreamer Foundations ##
DeepStream utilizes an optimized graph architecture built using the open-source [GStreamer multimedia framework](https://gstreamer.freedesktop.org/). GStreamer is used for creating streaming media applications, ranging from a simple media player to complex video editing applications. GStreamer plugins can be mixed and matched into arbitrary pipelines to create custom applications. 

There are a few key concepts in GStreamer that we need to know before building our application. Understanding the terminologies and their roles in the software will help us rationalize the syntax for working with GStreamer and DeepStream. 
* **Elements** - Elements are at the core of GStreamer. Elements provide some sort of functionality when linked with other elements. For example, a source element provides data to a stream, a filter element processes a stream of data, and a sink element consumes data. Data flow downstream from source elements to sink elements, passing through filter elements. GStreamer offers a large collection of elements by default but also allows for writing new elements. 
    * A [sink](https://en.wikipedia.org/wiki/Sink_(computing)), in computing, is designed to receive data. 
* **Bins** - Bins are container elements that allow you to combine linked elements into a logical group. Bins can be handled in the same way as any other element. They are programmed to manage elements contained within, including state changes as well as bus messages, to ensure that data flow smoothly. This is useful when constructing complex pipelines that require many elements. 
* **Pipeline** - A pipeline is the top-level bin that also manages the synchronization and bus messages of the contained elements. 
* **Plugins** - Elements need to be encapsulated in a plugin to enable GStreamer to use it. A plugin is essentially a loadable block of code, usually recognized as a shared object file or a dynamically linked library. A plugin may contain the implementation of several elements, or just one. GStreamer provides building blocks in the form of plugins that can be used to construct an efficient video analytics pipeline. The DeepStream SDK features hardware-accelerated plugins that bring deep neural networks and other complex processing tasks into the stream processing pipeline. 
* **Bus** - The bus is the object responsible for delivering to the application **messages** generated by the elements. Every pipeline contains a bus by default, so the only thing applications should do is set a message handler on a bus, which is like a signal handler to an object. When the main loop is running, the bus will periodically be checked for new messages, and the message handler will be called when any new message is available. 
    * Messages signal the application of pipeline events. Some of the message types include `GST_MESSAGE_EOS` (end-of-stream), `GST_MESSAGE_ERROR`, and `GST_MESSAGE_WARNING`. 
* **Pads** - Pads are used to negotiate links and dataflow between elements in GStreamer. A pad is the “port” on an element where links can be made with other elements for data to flow through. When data flow from element to element in a pipeline, in reality it flows from the source pad of one element to the sink pad of another. Links are only allowed between two pads when the data types, or **capabilities**, are compatible. 
* **Buffers** and **Events** - All streams of data in GStreamer are chopped up into chunks and passed from a source pad of one element to a sink pad of another element as one of the two types of `GstMiniObject`: **events** (control) and **buffers** (content). A buffer is the basic unit of data transfer in GStreamer. Normally, it contains a chunk of video data that flow from one element to another. The DeepStream SDK attaches the DeepStream metadata object, `NvDsBatchMeta`, to the buffer. An event, on the other hand, contains information on the state of the stream flowing between two linked pads. Events can be used to indicate the end of a media stream. 
* **Queries** - Queries are used to get information about the stream. 

<p><img src='images/important.png' width=720></p>

For the most part, all data in GStreamer flow one way through a link between elements. When data flow from one DeepStream element to another, the buffers are not recreated. Instead, buffer pointers are passed to avoid unnecessary copies and achieve high-speed performance. 

<p><img src='images/gstreamer.png' width='720px'></p>

For more information, please refer to [GStreamer Basics](https://gstreamer.freedesktop.org/documentation/application-development/basics/index.html). 

<a name='s3'></a>
## Anatomy of a DeepStream Pipeline ##
GStreamer and by extension DeepStream applications have a **plugin-based architecture**. Developers can interact with elements through the plugins they are encapsulated in. One single **plugin** may contain the implementation of several elements, or just one. It performs a specific function and has been created for the convenience of developers to leverage. When building a pipeline, we can select from a catalogue of available [GStreamer](https://gstreamer.freedesktop.org/documentation/plugins_doc.html) or [DeepStream](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_Intro.html#) plugins, or create new ones. An application can be thought of as a pipeline consisting of individual components (plugins), each representing a functional block like video decoding/encoding, scaling, inferencing, and more. 

The graph below shows the pipeline of a typical video analytics application, starting from consuming input videos to outputting insights. All the individual blocks are various plugins that are used. At the bottom are the different hardware engines that are utilized throughout the application. Where applicable, plugins are accelerated using the underlying hardware to deliver maximum performance. This could involve optimum memory management with zero-memory copy between plugins as well as the use of various accelerators to ensure the highest performance. 
<p><img src='images/deepstream_overview_graph_architecture.png' width='720px'></p>

* Streaming data can come over the network through RTSP or from a local file system or from a camera directly. The streams are captured using the CPU. Once the frames are in the memory, they are sent for decoding using the NVDEC accelerator. 
* After decoding, there is an _optional_ image pre-processing step where the input image can be pre-processed before inference. The pre-processing can be image dewarping or color space conversion. `Gst-nvdewarper` plugin can dewarp the image from a fisheye or 360-degree camera. `Gst-nvvideoconvert` plugin can perform color format conversion on the frame. These plugins use GPU or VIC (vision image compositor).
* The next step is to batch the frames for optimal inference performance. Batching is done using the `Gst-nvstreammux` plugin.
* Once frames are batched, it is sent for inference. The inference can be done using TensorRT, NVIDIA’s inference accelerator runtime or can be done in the native framework such as TensorFlow or PyTorch using Triton Inference Server. TensorRT inference is performed using `Gst-nvinfer` plugin and inference using Triton is done using `Gst-nvinferserver` plugin. 
* After inference, the next step could involve tracking the object. There are several built-in reference trackers in the SDK, ranging from high performance to high accuracy. Object tracking is performed using the `Gst-nvtracker` plugin.
* For creating visualization artifacts such as bounding boxes, segmentation masks, labels there is a visualization plugin called `Gst-nvdsosd`.
* Finally, to output the results, DeepStream presents various options: render the output with the bounding boxes on the screen, save the output to the local disk, stream out over RTSP, or just send the metadata to the cloud. For sending metadata to the cloud, DeepStream uses `Gst-nvmsgconv` and `Gst-nvmsgbroker` plugin. `Gst-nvmsgconv` converts the metadata into schema payload and `Gst-nvmsgbroker` establishes the connection to the cloud and sends the telemetry data. There are several built-in broker protocols such as Kafka, MQTT, AMQP and Azure IoT. Custom broker adapters can be created.

By connecting different plugins into a pipeline, we can build complex applications for custom use cases.

<a name='s3.1'></a>
### Inspecting Plugins ### 
We can inspect plugins using `gst-inspect-1.0`. It's a tool that prints out information on available plugins, information about a particular plugin, or information about a particular element. When executed with no *plugin* or *element* argument, it will print a list of all plugins and elements together with a summary. When executed with a *plugin* or *element* argument, it will print information about that plugin or element.

In [6]:
# DO NOT CHANGE THIS CELL
!gst-inspect-1.0

[94mcoreelements[0m:  [32mcapsfilter[0m: [0mCapsFilter[0m
[94mcoreelements[0m:  [32mclocksync[0m: [0mClockSync[0m
[94mcoreelements[0m:  [32mconcat[0m: [0mConcat[0m
[94mcoreelements[0m:  [32mdataurisrc[0m: [0mdata: URI source element[0m
[94mcoreelements[0m:  [32mdownloadbuffer[0m: [0mDownloadBuffer[0m
[94mcoreelements[0m:  [32mfakesink[0m: [0mFake Sink[0m
[94mcoreelements[0m:  [32mfakesrc[0m: [0mFake Source[0m
[94mcoreelements[0m:  [32mfdsink[0m: [0mFiledescriptor Sink[0m
[94mcoreelements[0m:  [32mfdsrc[0m: [0mFiledescriptor Source[0m
[94mcoreelements[0m:  [32mfilesink[0m: [0mFile Sink[0m
[94mcoreelements[0m:  [32mfilesrc[0m: [0mFile Source[0m
[94mcoreelements[0m:  [32mfunnel[0m: [0mFunnel pipe fitting[0m
[94mcoreelements[0m:  [32midentity[0m: [0mIdentity[0m
[94mcoreelements[0m:  [32minput-selector[0m: [0mInput selector[0m
[94mcoreelements[0m:  [32mmultiqueue[0m: [0mMultiQueue[0m
[94mcoreelements

There are numerous plugins available for developers to use. You can learn more about them in the documentations for [GStreamer Plugins](https://gstreamer.freedesktop.org/documentation/plugins_doc.html) and [DeepStream Plugins](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_Intro.html#). Let's now inspect a specific plugin to learn more about it. 

In [7]:
# DO NOT CHANGE THIS CELL
!gst-inspect-1.0 h264parse

No such element or plugin 'h264parse'


We get a lot of useful information, but for now we focus on the _description_. By inspecting the `h264parse` plugin, we see that this is intended for parsing H.264 streams. Video data are typically streamed in encoded format to be efficient. We commonly use [H.264](https://en.wikipedia.org/wiki/H.264/MPEG-4_AVC) for compression and encoding, but other options like H.265, VC1, and MPEG-2, to name a few, are available. Compression facilitates accelerated processing by reducing the amount of data transmitted from one place to another. When building a pipeline, we can use this plugin if we need to parse H.264 video streams. 

<a name='e2'></a>
#### Exercise #2 - Explore Plugins ####
Let's inspect a DeepStream-specific plugin: `nvinfer`. 

**Instructions**: <br>
* Modify the below cell by changing the `<FIXME>` only prior to executing. 

In [8]:
!gst-inspect-1.0 nvinfer

Factory Details:
  Rank                     primary (256)
  Long-name                NvInfer plugin
  Klass                    NvInfer Plugin
  Description              Nvidia DeepStreamSDK TensorRT plugin
  Author                   NVIDIA Corporation. Deepstream for Tesla forum: https://devtalk.nvidia.com/default/board/209

Plugin Details:
  Name                     nvdsgst_infer
  Description              NVIDIA DeepStreamSDK TensorRT plugin
  Filename                 /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
  Version                  6.0.0
  License                  Proprietary
  Source module            nvinfer
  Binary package           NVIDIA DeepStreamSDK TensorRT plugin
  Origin URL               http://nvidia.com/

GObject
 +----GInitiallyUnowned
       +----GstObject
             +----GstElement
                   +----GstBaseTransform
                         +----GstNvInfer

Pad Templates:
  SRC template: 'src'
    Availability: Always
    Capa

Click ... to show **solution**.

The `nvinfer` plugin does inferencing on input data using NVIDIA TensorRT. It can perform AI inference on (batched) images for classification, object detection, and segmentation tasks based on the trained model we provide. There are several properties that can be set related to the inference engine, including the `model-engine-file` property. We recommend setting properties via a configuration file through the `config-file-path` property. More information about DeepStream plugins can be found in the [DeepStream Plugin Guide](https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#plugins-development-guide). 

<a name='s4'></a>
## Accessing DeepStream MetaData ##
`GstBuffer` is the basic unit of data transfer in GStreamer. As it's passing through the pipeline, metadata received by each component is attached to the buffer. Similarly, the DeepStream SDK attaches the DeepStream metadata object, `NvDsBatchMeta` to it. DeepStream metadata contains inference results from `Gst-nvinfer` and information from other plugins in the pipeline. DeepStream uses an extensible standard structure for metadata, starting with the batch level metadata (`NvDsBatchMeta`) created inside the `Gst-nvstreammux` plugin. Subsidiary metadata structures hold frame, object, classifier, and display data. The metadata format is described in detail in the [SDK MetaData Documentation and API Guide](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_metadata.html). Having some familiarity with the metadata structure will help us extract the desired information.  

<a name='s4.1'></a>
### Probe ###
<p><img src='images/probe.png' width=720></p>

We use [probes](https://gstreamer.freedesktop.org/documentation/application-development/advanced/pipeline-manipulation.html#using-probes) to access this metadata. Probing is best envisioned as having access to a pad listener. We can use them to access metadata at various points in the pipeline. Technically, a probe is a [callback function](https://en.wikipedia.org/wiki/Callback_(computer_programming)) that can be attached to a pad. While attached, the probe notifies when there is data passing on a pad. It allows us to easily interact with the data flowing through our pipeline. For more information on `GstPad` and probes, please visit GStreamer’s API Reference on [GstPad](https://gstreamer.freedesktop.org/documentation/gstreamer/gstpad.html?gi-language=c). 

<p><img src='images/important.png' width=720></p>

Since the video AI application will rely heavily on the metadata generated from the deep learning models, the probe callback function might be the most important piece when constructing a DeepStream pipeline. 

**Well Done!** When you're ready, let's move to the [next notebook](./03_building_a_DeepStream_application.ipynb).

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>