# 2. Intel® Edge AI Foundation Course

## LESSON 1
### Introduction to AI at the Edge

Get introduced to AI at the Edge, and find out about the topics you’ll learn throughout the rest of the course.

![image.png](attachment:image.png)

Welcome to the course! We'll first take a brief look at AI at the Edge, its importance, different edge applications, and some of the history behind it. From there, I'll take you through the course structure, how the course topics relate to each other, relevant tools and prerequisites, and then give you a quick look at the project at the end of the course.

**Comment:** Nice work! Most voice assistants send your query to the cloud for processing, while most self-driving cars need to be able to perform their computations at the edge. Additionally, while gathering insights from millions of sales transactions probably is fine to use the higher compute available in the cloud, a remote nature camera may not be able to always send its data over a connection.

## 2. What is AI at the Edge?

The edge means local (or near local) processing, as opposed to just anywhere in the cloud. This can be an actual local device like a smart refrigerator, or servers located as close as possible to the source (i.e. servers located in a nearby area instead of on the other side of the world).

The edge can be used where low latency is necessary, or where the network itself may not always be available. The use of it can come from a desire for real-time decision-making in certain applications.

Many applications with the cloud get data locally, send the data to the cloud, process it, and send it back. The edge means there’s no need to send to the cloud; it can often be more secure (depending on edge device security) and have less impact on a network. Edge AI algorithms can still be trained in the cloud, but get run at the edge.

### QUIZ QUESTION
Match the AI applications to whether they are most likely run using the cloud or at the edge. This is to check your early intuition - it's okay if you aren't sure just yet!

![image.png](attachment:image.png)

## 3. Why is AI at the Edge Important?

- Network communication can be expensive (bandwidth, power consumption, etc.) and sometimes impossible (think remote locations or during natural disasters)
- Real-time processing is necessary for applications, like self-driving cars, that can't handle latency in making important decisions
- Edge applications could be using personal data (like health data) that could be sensitive if sent to cloud
- Optimization software, especially made for specific hardware, can help achieve great efficiency with edge AI models

## 4. Applications of AI at the Edge

- There are nearly endless possibilities with the edge.
- IoT devices are a big use of the edge.
- Not every single app needs it - you can likely wait a second while your voice app goes to ask the server a question, or such as when NASA engineers are processing the latest black hole data.
- Applications of AI at the Edge Quiz

## 5. Historical Context



- Cloud computing has gotten a lot of the news in recent years, but the edge is also growing in importance.
- [Per Intel®](https://www.intel.com/content/www/us/en/internet-of-things/infographics/guide-to-iot.html), IoT growth has gone from 2 billion devices in 2006 to a projected 200 billion by 2020.
- From the first network ATMs in the 1970's, to the World Wide Web in the 90's, and on up to smart meters in early 2000's, we've come a long way.
- From the constant use devices like phones to smart speakers, smart refrigerators, locks, warehouse applications and more, the IoT pool keeps expanding.

![image.png](attachment:image.png)



## 6. Course Structure

- In this course, we’ll largely focus on AI at the Edge using the [Intel® Distribution of OpenVINO™ Toolkit](https://software.intel.com/en-us/openvino-toolkit).
- First, we’ll start off with pre-trained models available in the OpenVINO™ Open Model Zoo. Even without needing huge amounts of your own data and costly training, you can deploy powerful models already created for many applications.
- Next, you’ll learn about the Model Optimizer, which can take a model you trained in frameworks such as TensorFlow, PyTorch, Caffe and more, and create an Intermediate Representation (IR) optimized for inference with OpenVINO™ and Intel® hardware.
- Third, you’ll learn about the Inference Engine, where the actual inference is performed on the IR model.
- Lastly, we'll hit some more topics on deploying at the edge, including things like handling input streams, processing model outputs, and the lightweight MQTT architecture used to publish data from your edge models to the web.

## 7. Why Are the Topics Distinct?

- Pre-trained models can be used to explore your options without the need to train a model. This pre-trained model can then be used with the Inference Engine, as it will already be in IR format. This can be integrated into your app and deployed at the edge.
- If you created your own model, or are leveraging a model not already in IR format (TensorFlow, PyTorch, Caffe, MXNet, etc), use the Model Optimizer first. This will then feed to the Inference Engine, which can be integrated into your app and deployed at the edge.
- While you'll be able to perform some amazingly efficient inference after feeding into the Inference Engine, you'll still want to appropriately handle the output for the edge application, and that's what we'll hit in the final lesson.

![image.png](attachment:image.png)

**Comment:** Nice work! There's no training of models here, but the Open Model Zoo is available with plenty of Pre-Trained Models. If you have your own model, you can convert it with the Model Optimizer, and then perform inference on it with the Inference Engine.

## 8.Relevant Tools and Prerequisites


### Summary

- *Prerequisites:*
    - Understand some of the basics of computer vision and how AI models are created.
    - Basic Python or C++ experience. This first course is mainly in Python, although C++ can be used with the Intel® Distribution of OpenVINO™ Toolkit easily as well (and can be faster in a completed app!).
- We will not be training models in this course, as our focus is on optimization & deployment at the edge.
- Classroom workspaces will be available for exercises, so no set-up required if you plan to use them.
### Local Set-up
- Make sure to make note of the [hardware requirements](https://software.intel.com/en-us/openvino-toolkit/hardware) for the Intel® Distribution of OpenVINO™ Toolkit if you want to work locally.
- If you do want to do the exercises on your local machine (or perhaps even on a set-up like a Raspberry Pi with an [Intel® Neural Compute Stick](https://software.intel.com/en-us/articles/intel-neural-compute-stick-2-and-open-source-openvino-toolkit) 2), you can follow the instructions below for your operating system.

**Note:** The classroom workspaces in this course use the 2019 R3 version of the toolkit, so there may be some variance in syntax from newer versions.

[Download the Toolkit](https://software.intel.com/en-us/openvino-toolkit/choose-download?)

### Intel® DevCloud - Edge
There is also the new [Intel® DevCloud](https://software.intel.com/en-us/devcloud/edge) platform for testing out edge environments. This allows you to have access to a range of Intel® hardware such as CPUs, GPUs, FPGAs, Neural Compute Stick, and more. Later courses will get more into the hardware side of things, but this another option for working with an edge environment.

![image.png](attachment:image.png)

## 9. What You Will Build

In the project at the end of the course, you’ll build and deploy a People Counter App at the Edge. In the project, you will:

- Convert a model to an Intermediate Representation (IR)
- Use the IR with the Inference Engine
- Process the output of the model to gather relevant statistics
- Send those statistics to a server, and
- Perform analysis on both the performance and further use cases of your model.

**Project Demo**
Below, you can find a quick video demo of the project running and returning statistics on the number of people in frame, average duration spent in frame, and the total number of people counted so far. This is all sent from an edge application to a web server. Note that there is no audio associated with this application.



## 10. Recap

### Recap Summary

In this introductory lesson, we covered:

- The basics of the edge
- The importance of the edge and its history
- Edge applications
- The structure of the course
    - Pre-Trained Models
    - The Model Optimizer
    - The Inference Engine
    - More edge topics (MQTT, servers, etc.)
- An overview of the project
We'll kick things off next by starting to look at Pre-Trained Models, and how they are useful for Edge Applications.

## LESSON 2


### Leveraging Pre-Trained Models

Utilize Pre-Trained Models from the Intel® Distribution of OpenVINO™ Toolkit to build powerful edge applications, without the need to train your own model.

![image.png](attachment:image.png)

## 1. Introduction

In this lesson we'll cover:

- Basics of the Intel® Distribution OpenVINO™ Toolkit
- Different Computer Vision model types
- Available Pre-Trained Models in the Software
- Choosing the right Pre-Trained Model for your App
- Loading and Deploying a Basic App with a Pre-Traine

## 2. The Intel® Distribution of OpenVINO™ Toolkit

The OpenVINO™ Toolkit’s name comes from “**Open** **V**isual **I**nferencing and **N**eural **N**etwork **O**ptimization”. It is largely focused around optimizing neural network inference, and is open source.

It is developed by Intel®, and helps support fast inference across Intel® CPUs, GPUs, FPGAs and Neural Compute Stick with a common API. OpenVINO™ can take models built with multiple different frameworks, like TensorFlow or Caffe, and use its Model Optimizer to optimize for inference. This optimized model can then be used with the Inference Engine, which helps speed inference on the related hardware. It also has a wide variety of Pre-Trained Models already put through Model Optimizer.

By optimizing for model speed and size, OpenVINO™ enables running at the edge. This does not mean an increase in inference accuracy - this needs to be done in training beforehand. The smaller, quicker models OpenVINO™ generates, along with the hardware optimizations it provides, are great for lower resource applications. For example, an IoT device does not have the benefit of multiple GPUs and unlimited memory space to run its apps.

![image.png](attachment:image.png)


We’ll be using the OpenVINO™ toolkit throughout the course, but if you’re ready to dive in already, you can visit the [main site](https://software.intel.com/en-us/openvino-toolkit) for the Intel® Distribution of OpenVINO™ Toolkit now.

## 3. Pre-Trained Models in OpenVINO™


In general, pre-trained models refer to models where training has already occurred, and often have high, or even cutting-edge accuracy. Using pre-trained models avoids the need for large-scale data collection and long, costly training. Given knowledge of how to preprocess the inputs and handle the outputs of the network, you can plug these directly into your own app.

In OpenVINO™, Pre-Trained Models refer specifically to the Model Zoo, in which the Free Model Set contains pre-trained models already converted using the Model Optimizer. These models can be used directly with the Inference Engine.

### Further Research

We’ll come back to the various pre-trained models available with the OpenVINO™ Toolkit shortly, but you can get a headstart by checking out the documentation [here](https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models).

## 4. Types of Computer Vision Models

We covered three types of computer vision models in the video: Classification, Detection, and Segmentation.

Classification determines a given “class” that an image, or an object in an image, belongs to, from a simple yes/no to thousands of classes. These usually have some sort of “probability” by class, so that the highest probability is the determined class, but you can also see the top 5 predictions as well.

Detection gets into determining that objects appear at different places in an image, and oftentimes draws bounding boxes around the detected objects. It also usually has some form of classification that determines the class of an object in a given bounding box. The bounding boxes have a confidence threshold so you can throw out low-confidence detections.

Segmentation classifies sections of an image by classifying each and every pixel. These networks are often post-processed in some way to avoid phantom classes here and there. Within segmentation are the subsets of semantic segmentation and instance segmentation - the first wherein all instances of a class are considered as one, while the second actually consider separates instances of a class as separate objects.

![image.png](attachment:image.png)

### Further Research
Here is a useful [Medium](https://medium.com/analytics-vidhya/image-classification-vs-object-detection-vs-image-segmentation-f36db85fe81) post if you want to go a little further on types of computer vision models.

## 5. Case Studies in Computer Vision

We focused on SSD, ResNet and MobileNet in the video. SSD is an object detection network that combined classification with object detection through the use of default bounding boxes at different network levels. ResNet utilized residual layers to “skip” over sections of layers, helping to avoid the vanishing gradient problem with very deep neural networks. MobileNet utilized layers like 1x1 convolutions to help cut down on computational complexity and network size, leading to fast inference without substantial decrease in accuracy.

One additional note here on the ResNet architecture - the paper itself actually theorizes that very deep neural networks have convergence issues due to exponentially lower convergence rates, as opposed to just the vanishing gradient problem. The vanishing gradient problem is also thought to be helped by the use of normalization of inputs to each different layer, which is not specific to ResNet. The ResNet architecture itself, at multiple different numbers of layers, was shown to converge faster during training than a “plain” network without the residual layers.

![image.png](attachment:image.png)

#### [ Single Shot Multibox Detector (SSD)](https://arxiv.org/abs/1512.02325)

![image.png](attachment:image.png)

Great! ResNet helped open the door for substantially deeper neural networks than were possible before. “Skip” layers helped neural networks avoid the [vanishing gradient problem](https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484) that otherwise occurs in deep networks.


Further Research
Getting used to reading research papers is a key skill to build when working with AI and Computer Vision. Below, you can find the original research papers on some of the networks we discussed in this section.

- [SSD](https://arxiv.org/abs/1512.02325)
- [YOLO](https://arxiv.org/abs/1506.02640)
- [Faster RCNN](https://arxiv.org/abs/1506.01497)
- [MobileNet](https://arxiv.org/abs/1704.04861)
- [ResNet](https://arxiv.org/abs/1512.03385)
- [Inception](https://arxiv.org/pdf/1409.4842.pdf)

## 6. Available Pre-Trained Models in OpenVINO™

Most of the Pre-Trained Models supplied by OpenVINO™ fall into either face detection, human detection, or vehicle-related detection. There is also a model around detecting text, and more!

Models in the Public Model Set must still be run through the Model Optimizer, but have their original models available for further training and fine-tuning. The Free Model Set are already converted to Intermediate Representation format, and do not have the original model available. These can be easily obtained with the Model Downloader tool provided in the files installed with OpenVINO™.

The SSD and MobileNet architectures we discussed previously are often the main part of the architecture used for many of these models.

![image.png](attachment:image.png)

**Comment:** Nice work! There are a ton of pre-trained models available through the OpenVINO™ Toolkit, although so far there are none using [Generative Adversarial Networks (GANs)](https://en.wikipedia.org/wiki/Generative_adversarial_network), which would be useful in the case of [Art Generation](https://towardsdatascience.com/gangogh-creating-art-with-gans-8d087d8f74a1).

You can check out the full list of pre-trained models available in the Intel® Distribution of OpenVINO™ [here](https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models). As we get into the Model Optimizer in the next lesson, you’ll find it’s quite easy to take pre-trained models available from other sources and use them with OpenVINO™ as well.



## 7. Loading Pre-Trained Models

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-3e515cac" class="ulab-btn--primary"></button>

In this exercise, you'll work to download and load a few of the pre-trained models available 
in the OpenVINO toolkit.

First, you can navigate to the [Pre-Trained Models list](https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models) in a separate window or tab, as well as the page that gives all of the model names [here](https://docs.openvinotoolkit.org/latest/_models_intel_index.html).

Your task here is to download the below three pre-trained models using the Model Downloader tool, as detailed on the same page as the different model names. Note that you *do not need to download all of the available pre-trained models* - doing so would cause your workspace to crash, as the workspace will limit you to 3 GB of downloaded models.

### Task 1 - Find the Right Models
Using the [Pre-Trained Model list](https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models), determine which models could accomplish the following tasks (there may be some room here in determining which model to download):
- Human Pose Estimation
- Text Detection
- Determining Car Type & Color

### Task 2 - Download the Models
Once you have determined which model best relates to the above tasks, use the Model Downloader tool to download them into the workspace for the following precision levels:
- Human Pose Estimation: All precision levels
- Text Detection: FP16 only
- Determining Car Type & Color: INT8 only

**Note**: When downloading the models in the workspace, add the `-o` argument (along with any other necessary arguments) with `/home/workspace` as the output directory. The default download directory will not allow the files to be written there within the workspace, as it is a read-only directory.

### Task 3 - Verify the Downloads
You can verify the download of these models by navigating to: `/home/workspace/intel` (if you followed the above note), and checking whether a directory was created for each of the three models, with included subdirectories for each precision, with respective `.bin` and `.xml` for each model.

**Hint**: Use the `-h` command with the Model Downloader tool if you need to check out the possible arguments to include when downloading specific models and precisions.

## 8. Solution: Loading Pre-Trained Models

### Choosing Models
I chose the following models for the three tasks:

- Human Pose Estimation: [human-pose-estimation-0001](https://docs.openvinotoolkit.org/latest/_models_intel_human_pose_estimation_0001_description_human_pose_estimation_0001.html)
- Text Detection: [text-detection-0004](http://docs.openvinotoolkit.org/latest/_models_intel_text_detection_0004_description_text_detection_0004.html)
- Determining Car Type & Color: [vehicle-attributes-recognition-barrier-0039](https://docs.openvinotoolkit.org/latest/_models_intel_vehicle_attributes_recognition_barrier_0039_description_vehicle_attributes_recognition_barrier_0039.html)

### Downloading Models
To navigate to the directory containing the Model Downloader:
```
cd /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader
```

Within there, you'll notice a `downloader.py` file, and can use the `-h` argument with it to see available arguments. For this exercise, `--name` for model name, and `--precisions`, used when only certain precisions are desired, are the important arguments. Note that running `downloader.py` without these will download all available pre-trained models, which will be multiple gigabytes. You can do this on your local machine, if desired, but the workspace will not allow you to store that much information.

Note: In the classroom workspace, you will not be able to write to the `/opt/intel` directory, so you should also use the `-o` argument to specify your output directory as `/home/workspace` (which will download into a created `intel` folder therein).

#### Downloading Human Pose Model

```
sudo ./downloader.py --name human-pose-estimation-0001 -o /home/workspace
````

#### Downloading Text Detection Model

```
sudo ./downloader.py --name text-detection-0004 --precisions FP16 -o /home/workspace
```

#### Downloading Car Metadata Model

```
sudo ./downloader.py --name vehicle-attributes-recognition-barrier-0039 --precisions INT8 -o /home/workspace
```

### Verifying Downloads

The downloader itself will tell you the directories these get saved into, but to verify yourself, first start in the `/home/workspace` directory (or the same directory as the Model Downloader if on your local machine without the `-o` argument). From there, you can `cd intel`, and then you should see three directories - one for each downloaded model. Within those directories, there should be separate subdirectories for the precisions that were downloaded, and then `.xml` and `.bin` files within those subdirectories, that make up the model.

## 9. Optimizations on the Pre-Trained Models

In the exercise, you dealt with different precisions of the different models. Precisions are related to floating point values - less precision means less memory used by the model, and less compute resources. However, there are some trade-offs with accuracy when using lower precision. There is also fusion, where multiple layers can be fused into a single operation. These are achieved through the Model Optimizer in OpenVINO™, although the Pre-Trained Models have already been run through that process. We’ll return to these optimization techniques in the next lesson.

## 10. Choosing the Right Model for Your App


Make sure to test out different models for your application, comparing and contrasting their use cases and performance for your desired task. Remember that a little bit of extra processing may yield even better results, but needs to be implemented efficiently.

This goes both ways - you should try out different models for a single use case, but you should also consider how a given model can be applied to multiple use cases. For example, being able to track human poses could help in physical therapy applications to assess and track progress of limb movement range over the course of treatment.

![image.png](attachment:image.png)

**Comment:** This is a tough one, and quite open to interpretation. Here’s how I answered it:

- **Detect People, Vehicles and Bikes** - Matches To: Traffic Light Optimization
- **Pedestrian Detection - Matches To** Assess Traffic Levels in Retail Aisles
- **Identify Roadside Objects** - Matches To: Delivery Robot
- **Human Pose Estimation** - Matches To: Monitor Form When Working Out


Great job! There’s quite a bit of leeway here on which of these might work best with what, but hopefully gets you thinking on some ways you can use the pre-trained models in new applications.

## 11. Pre-processing Inputs

The pre-processing needed for a network will vary, but usually this is something you can check out in any related documentation, including in the OpenVINO™ Toolkit documentation. It can even matter what library you use to load an image or frame - OpenCV, which we’ll use to read and handle images in this course, reads them in the BGR format, which may not match the RGB images some networks may have used to train with.

Outside of channel order, you also need to consider image size, and the order of the image data, such as whether the color channels come first or last in the dimensions. Certain models may require a certain normalization of the images for input, such as pixel values between 0 and 1, although some networks also do this as their first layer.

In OpenCV, you can use `cv2.imread` to read in images in BGR format, and `cv2.resize` to resize them. The images will be similar to a numpy array, so you can also use array functions like `.transpose` and `.reshape` on them as well, which are useful for switching the array dimension order.

## 12. Exercise: Pre-processing Inputs
Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-dcdc9e86" class="ulab-btn--primary"></button>

Now that we have a few pre-trained models downloaded, it's time to preprocess the inputs
to match what each of the models expects as their input. We'll use the same models as before
as a basis for determining the preprocessing necessary for each input file.

As a reminder, our three models are:
- Human Pose Estimation: [human-pose-estimation-0001](https://docs.openvinotoolkit.org/latest/_models_intel_human_pose_estimation_0001_description_human_pose_estimation_0001.html)
- Text Detection: [text-detection-0004](http://docs.openvinotoolkit.org/latest/_models_intel_text_detection_0004_description_text_detection_0004.html)
- Determining Car Type & Color: [vehicle-attributes-recognition-barrier-0039](https://docs.openvinotoolkit.org/latest/_models_intel_vehicle_attributes_recognition_barrier_0039_description_vehicle_attributes_recognition_barrier_0039.html)

**Note:** For ease of use, these models have been added into the `/home/workspace/models`
directory. For example, if you need to use the Text Detection model, you could find it at:

```bash
/home/workspace/models/text_detection_0004.xml
```

Each link above contains the documentation for the related model. In our case, we want to 
focus on the **Inputs** section of the page, wherein important information regarding the input
shape, order of the shape (such as color channel first or last), and the order of the color
channels, is included.

Your task is to fill out the code in three functions within `preprocess_inputs.py`, one for 
each of the three models. We have also included a potential sample image for each of the 
three models, that will be used with `test.py` to check whether the
input for each model has been adjusted as expected for proper model input.

Note that each image is **currently loaded as BGR with H, W, C order** in the `test.py` file,
so any necessary preprocessing to change that should occur in your three work files. 
Note that **BGR** order is used, as the OpenCV function we use to read images loads as
BGR, and not RGB.

When finished, you should be able to run the `test.py` file and pass all three tests.

<!--
%%ulab_page_divider
--><hr/>



# 13. Preprocessing Inputs - Solution

### Pose Estimation

Let's start with `pose_estimation`, and it's [related documentation](https://docs.openvinotoolkit.org/latest/_models_intel_human_pose_estimation_0001_description_human_pose_estimation_0001.html).

I see it is in [B, C, H, W] format, with a shape of 1x3x256x456, and an expected color order
of BGR.

Since we're loading the image with OpenCV, I know it's already in BGR format. From there, 
I need to resize the image to the desired shape, but that's going to get me 256x256x3.

```
preprocessed_image = cv2.resize(preprocessed_image, (256, 456))
```

So, I need to transpose the image, where the 3rd dimension, containing the channels,
is placed first, with the other two following.

```
preprocessed_image = preprocessed_image.transpose((2,0,1))
```

Lastly, I still need to add the `1` for the batch size at the start. I can actually just reshape
to "add" the extra dimension.

```
preprocessed_image = preprocessed_image.reshape(1,3,256,456)
```

### Text Detection

Next, let's look at `text_detection`, and it's [related documentation](http://docs.openvinotoolkit.org/latest/_models_intel_text_detection_0004_description_text_detection_0004.html).

This will actually be a very similar process to above! As such, you might actually consider
whether you could add a standard "helper" for each of these, where you could just add the
desired input shape, and perform the same transformations. Note that it does require knowing
for sure that all the steps (being in BGR, resizing, transposing, reshaping) are needed for each.

Here, the only change needed is for resizing (as well as the dimensions fed into reshape):

```
cv2.resize(preprocessed_image, (768, 1280))
```

### Car Metadata

Lastly, let's cover `car_meta`, and it's [related documentation](https://docs.openvinotoolkit.org/latest/_models_intel_vehicle_attributes_recognition_barrier_0039_description_vehicle_attributes_recognition_barrier_0039.html).

Again, all we need to change is how the image is resized, and making sure we `reshape` 
correctly:

```
cv2.resize(preprocessed_image, (72, 72))
```

### Video Explanation

Using the documentation pages for each model, I ended up noticing they needed essentially the same preprocessing, outside of the height and width of the input to the network. The images coming from `cv2.imread` were already going to be BGR, and all the models wanted BGR inputs, so I didn't need to do anything there. However, each image was coming in as height x width x channels, and each of these networks wanted channels first, along with an extra dimension at the start for batch size.

So, for each network, the preprocessing needed to 1) re-size the image, 2) move the channels from last to first, and 3) add an extra dimension of `1` to the start. Here is the function I created for this, which I could call for each separate network:
```
def preprocessing(input_image, height, width):

    '''
    Given an input image, height and width:
    - Resize to height and width
    - Transpose the final "channel" dimension to be first
    - Reshape the image to add a "batch" of 1 at the start 
    '''
    image = cv2.resize(input_image, (width, height))
    image = image.transpose((2,0,1))
    image = image.reshape(1, 3, height, width)

    return image
```
Then, for each model, I can just call this function with the height and width from the documentation:


#### Human Pose
```
preprocessed_image = preprocessing(preprocessed_image, 256, 456)
```
#### Text Detection
```
preprocessed_image = preprocessing(preprocessed_image, 768, 1280)
```
#### Car Meta
```
preprocessed_image = preprocessing(preprocessed_image, 72, 72)

```
Testing
To test your implementation, you can just run `python test.py`.

## 14. Handling Network Outputs

Like the computer vision model types we discussed earlier, we covered the primary outputs those networks create: classes, bounding boxes, and semantic labels.

Classification networks typically output an array with the softmax probabilities by class; the argmax of those probabilities can be matched up to an array by class for the prediction.

Bounding boxes typically come out with multiple bounding box detections per image, which each box first having a class and confidence. Low confidence detections can be ignored. From there, there are also an additional four values, two of which are an X, Y pair, while the other may be the opposite corner pair of the bounding box, or otherwise a height and width.

Semantic labels give the class for each pixel. Sometimes, these are flattened in the output, or a different size than the original image, and need to be reshaped or resized to map directly back to the input.

Quiz Information
In a network like SSD that we discussed earlier, the output is a series of bounding boxes for potential object detections, typically also including a confidence threshold, or how confident the model is about that particular detection.

Therefore, inference performed on a given image will output an array with multiple bounding box predictions including: the class of the object, the confidence, and two corners (made of `xmin`, `ymin`, `xmax`, and `ymax`) that make up the bounding box, in that order.

#### QUIZ QUESTION
![image.png](attachment:image.png)

**comment:** Nice work! Being able to extract the output correctly from a neural network is crucial to implementing inference with an app.


### Further Research
[Here](https://towardsdatascience.com/understanding-ssd-multibox-real-time-object-detection-in-deep-learning-495ef744fab) is a great write-up on working with SSD and its output
This [post](https://thegradient.pub/semantic-segmentation/) gets into more of the differences in moving from models with bounding boxes to those using semantic segmentation

## 15. Running Your First Edge App

You have now learned the key parts of working with a pre-trained model: obtaining the model, preprocessing inputs for it, and handling its output. In the upcoming exercise, you’ll load a pre-trained model into the Inference Engine, as well as call for functions to preprocess and handle the output in the appropriate locations, from within an edge app. We’ll still be abstracting away some of the steps of dealing with the Inference Engine API until a later lesson, but these should work similarly across different models.

## 16. Exercise: Deploy An App at the Edge


### Deploy Your First Edge App

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-60888dc0" class="ulab-btn--primary"></button>

So far, you've downloaded some pre-trained models, handled their inputs, and learned how
to handle outputs. In this exercise, you'll implement the handling of the outputs of our three
models from before, and get to see inference actually performed by adding these models
to some example edge applications. 

There's a lot of code still involved behind the scenes here. With the Pre-Trained Models 
available with the OpenVINO toolkit, you don't need to worry about the Model Optimizer, but
there is still work done to load the model into the Inference Engine. We won't learn about 
this code until later, so in this case, you'll just need to call your functions to handle the input
and output of the model within the app.

If you do want a sneak preview of some of the code that interfaces with the Inference Engine,
you can check it out in `inference.py`. You'll work out of the `handle_models.py` file, as 
well as adding functions calls within the edge app in `app.py`.

## TODOs

In `handle_models.py`, you will need to implement `handle_pose`, `handle_text`, and
`handle_car`.

In `app.py`, first, you'll need to use the input shape of the network to call the `preprocessing`
function. Then, you need to call `handle_output` with the appropriate model argument 
in order to get the right handling function. With that function, you can then feed the output
of the inference request in in order to extract the output. 

Note that there is some additional post-processing done for you in `create_output_image`
within `app.py` to help display the output back onto the input image.

## Testing the apps

To test your implementations, you can use `app.py` to run each edge application, with
the following arguments:
- `-t`: The model type,  which should be one of `"POSE"`, `"TEXT"`, or `"CAR_META"`
- `-m`: The location of the model .xml file
- `-i`: The location of the input image used for testing
- `-c`: A CPU extension file, if applicable. See below for what this is for the workspace.
The results of your output will be saved down for viewing in the `outputs` directory.

As an example, here is an example of running the app with related arguments:

```
python app.py -i "images/blue-car.jpg" -t "CAR_META" -m "/home/workspace/models/vehicle-attributes-recognition-barrier-0039.xml" -c "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"
```

## Model Documentation

Once again, here are the links to the models, so you can use the **Output** section to help
you get started (there are additional comments in the code to assist):

- Human Pose Estimation: [human-pose-estimation-0001](https://docs.openvinotoolkit.org/latest/_models_intel_human_pose_estimation_0001_description_human_pose_estimation_0001.html)
- Text Detection: [text-detection-0004](http://docs.openvinotoolkit.org/latest/_models_intel_text_detection_0004_description_text_detection_0004.html)
- Determining Car Type & Color: [vehicle-attributes-recognition-barrier-0039](https://docs.openvinotoolkit.org/latest/_models_intel_vehicle_attributes_recognition_barrier_0039_description_vehicle_attributes_recognition_barrier_0039.html)

<!--
%%ulab_page_divider
--><hr/>

## 17 Solution: Deploy an App at the Edge
This was a tough one! It takes a little bit to step through this solution, as I want to give you some of my own techniques to approach this rather difficult problem first. The solution video is split into three parts - the first focuses on adding in the preprocessing and output handling calls within the app itself, and then into how I would approach implementing the Car Meta model's output handling.

### Early Steps and Car Meta Model Output Handling

The code for calling preprocessing and utilizing the output handling functions from within app.py is fairly straightforward:
```
preprocessed_image = preprocessing(image, h, w)
```
This is just feeding in the input image, along with height and width of the network, which the given inference_network.load_model function actually returned for you.
```
output_func = handle_output(args.t)
processed_output = output_func(output, image.shape)
```
This is partly based on the helper function I gave you, which can return the correct output handling function by feeding in the model type. The second line actually sends the output of inference and image shape to whichever output handling function is appropriate.

#### Car Meta Output Handling
Given that the two outputs for the Car Meta Model are "type" and "color", and are just the softmax probabilities by class, I wanted you to just return the np.argmax, or the index where the highest probability was determined.

```
def handle_car(output, input_shape):
    '''
    Handles the output of the Car Metadata model.
    Returns two integers: the argmax of each softmax output.
    The first is for color, and the second for type.
    '''
    # Get rid of unnecessary dimensions
    color = output['color'].flatten()
    car_type = output['type'].flatten()
    # TODO 1: Get the argmax of the "color" output
    color_pred = np.argmax(color)
    # TODO 2: Get the argmax of the "type" output
    type_pred = np.argmax(car_type)

    return color_pred, type_pred
```

#### Run the Car Meta Model
I have moved the models used in the exercise into a models subdirectory in the /home/workspace directory, so the path used can be a little bit shorter.
```
python app.py -i "images/blue-car.jpg" -t "CAR_META" -m "/home/workspace/models/vehicle-attributes-recognition-barrier-0039.xml" -c "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"
For the other models, make sure to update the input image -i, model type -t, and model -m accordingly.
```
### Pose Estimation Output Handling
Handling the car output was fairly straightforward by using np.argmax, but the outputs for the pose estimation and text detection models is a bit trickier. However, there's a lot of similar code between the two. In this second part of the solution, I'll go into detail on the pose estimation model, and then we'll finish with a quick video on handling the output of the text detection model.


Pose Estimation is more difficult, and doesn't have as nicely named outputs. I noted you just need the second one in this exercise, called 'Mconv7_stage2_L2', which is just the keypoint heatmaps, and not the associations between these keypoints. From there, I created an empty array to hold the output heatmaps once they are re-sized, as I decided to iterate through each heatmap 1 by 1 and re-size it, which can't be done in place on the original output.

```
def handle_pose(output, input_shape):
    '''
    Handles the output of the Pose Estimation model.
    Returns ONLY the keypoint heatmaps, and not the Part Affinity Fields.
    '''
    # TODO 1: Extract only the second blob output (keypoint heatmaps)
    heatmaps = output['Mconv7_stage2_L2']
    # TODO 2: Resize the heatmap back to the size of the input
    # Create an empty array to handle the output map
    out_heatmap = np.zeros([heatmaps.shape[1], input_shape[0], input_shape[1]])
    # Iterate through and re-size each heatmap
    for h in range(len(heatmaps[0])):
        out_heatmap[h] = cv2.resize(heatmaps[0][h], input_shape[0:2][::-1])

    return out_heatmap
```

Note that the `input_shape[0:2][::-1]` line is taking the original image shape of HxWxC, taking just the first two (HxW), and reversing them to be WxH as `cv2.resize` uses.

Text Detection Model Handling
Thanks for sticking in there! The code for the text detection model is pretty similar to the pose estimation one, so let's finish things off.


Text Detection had a very similar output processing function, just using the 'model/segm_logits/add' output and only needing to resize over two "channels" of output. I likely could have extracted this out into its own output handling function that both Pose Estimation and Text Detection could have used.
```
def handle_text(output, input_shape):
    '''
    Handles the output of the Text Detection model.
    Returns ONLY the text/no text classification of each pixel,
        and not the linkage between pixels and their neighbors.
    '''
    # TODO 1: Extract only the first blob output (text/no text classification)
    text_classes = output['model/segm_logits/add']
    # TODO 2: Resize this output back to the size of the input
    out_text = np.empty([text_classes.shape[1], input_shape[0], input_shape[1]])
    for t in range(len(text_classes[0])):
        out_text[t] = cv2.resize(text_classes[0][t], input_shape[0:2][::-1])

    return out_text
```

## 18 Recap

In this lesson we covered:

- Basics of the Intel® Distribution of OpenVINO™ Toolkit
- Different Computer Vision model types
- Available Pre-Trained Models in the Software
- Choosing the right Pre-Trained Model for your App
- Loading and Deploying a Basic App with a Pre-Trained Mode

## 19  Lesson Glossary

### Edge Application
Applications with inference run on local hardware, sometimes without network connections, such as Internet of Things (IoT) devices, as opposed to the cloud. Less data needs to be streamed over a network connection, and real-time decisions can be made.

### OpenVINO™ Toolkit
The [Intel® Distribution of OpenVINO™ Toolkit](https://software.intel.com/en-us/openvino-toolkit) enables deep learning inference at the edge by including both neural network optimizations for inference as well as hardware-based optimizations for Intel® hardware.

### Pre-Trained Model
Computer Vision and/or AI models that are already trained on large datasets and available for use in your own applications. These models are often trained on datasets like [ImageNet](https://en.wikipedia.org/wiki/ImageNet). Pre-trained models can either be used as is or used in transfer learning to further fine-tune a model. The OpenVINO™ Toolkit provides a number of [pre-trained models](https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models/) that are already optimized for inference.

### Transfer Learning
The use of a pre-trained model as a basis for further training of a neural network. Using a pre-trained model can help speed up training as the early layers of the network have feature extractors that work in a wide variety of applications, and often only late layers will need further fine-tuning for your own dataset. OpenVINO™ does not deal with transfer learning, as all training should occur prior to using the Model Optimizer.

### Image Classification
A form of inference in which an object in an image is determined to be of a particular class, such as a cat vs. a dog.

### Object Detection
A form of inference in which objects within an image are detected, and a bounding box is output based on where in the image the object was detected. Usually, this is combined with some form of classification to also output which class the detected object belongs to.

### Semantic Segmentation
A form of inference in which objects within an image are detected and classified on a pixel-by-pixel basis, with all objects of a given class given the same label.

### Instance Segmentation
Similar to semantic segmentation, this form of inference is done on a pixel-by-pixel basis, but different objects of the same class are separately identified.

### [SSD](https://arxiv.org/abs/1512.02325)
A neural network combining object detection and classification, with different feature extraction layers directly feeding to the detection layer, using default bounding box sizes and shapes/

### [YOLO](https://arxiv.org/abs/1506.02640)
One of the original neural networks to only take a single look at an input image, whereas earlier networks ran a classifier multiple times across a single image at different locations and scales.

### [Faster R-CNN](https://arxiv.org/abs/1506.01497)
A network, expanding on [R-CNN](https://arxiv.org/pdf/1311.2524.pdf) and [Fast R-CNN](https://arxiv.org/pdf/1504.08083.pdf), that integrates advances made in the earlier models by adding a Region Proposal Network on top of the Fast R-CNN model for an integrated object detection model.

### [MobileNet](https://arxiv.org/abs/1704.04861)
A neural network architecture optimized for speed and size with minimal loss of inference accuracy through the use of techniques like [1x1 convolutions](https://stats.stackexchange.com/questions/194142/what-does-1x1-convolution-mean-in-a-neural-network). As such, MobileNet is more useful in mobile applications that substantially larger and slower networks.

### [ResNet](https://arxiv.org/abs/1512.03385)
A very deep neural network that made use of residual, or “skip” layers that pass information forward by a couple of layers. This helped deal with the [vanishing gradient problem](https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484) experienced by deeper neural networks.

### [Inception](https://arxiv.org/pdf/1409.4842.pdf)
A neural network making use of multiple different convolutions at each “layer” of the network, such as 1x1, 3x3 and 5x5 convolutions. The top architecture from the original paper is also known as GoogLeNet, an homage to [LeNet](http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf), an early neural network used for character recognition.

### Inference Precision
Precision refers to the level of detail to weights and biases in a neural network, whether in floating point precision or integer precision. Lower precision leads to lower accuracy, but with a positive trade-off for network speed and size

## LESSON 3
### The Model Optimizer
Explore the Model Optimizer, which allows you to take models trained with many different Deep Learning frameworks and create an Intermediate Representation useful with the Inference Engine.

![image.png](attachment:image.png)

## 1. Introduction

In this lesson we'll cover:

- Basics of the Model Optimizer
- Different Optimization Techniques and their impact on model performance
- Supported Frameworks in the Intel® Distribution of OpenVINO™ Toolkit
- Converting from models in those frameworks to Intermediate Representations
- And a bit on Custom Layers

## 2. The Model Optimizer

The Model Optimizer helps convert models in multiple different frameworks to an Intermediate Representation, which is used with the Inference Engine. If a model is not one of the pre-converted models in the Pre-Trained Models OpenVINO™ provides, it is a required step to move onto the Inference Engine.

As part of the process, it can perform various optimizations that can help shrink the model size and help make it faster, although this will not give the model higher inference accuracy. In fact, there will be some loss of accuracy as a result of potential changes like lower precision. However, these losses in accuracy are minimized.

### Local Configuration
Configuring the Model Optimizer is pretty straight forward for your local machine, given that you already have OpenVINO™ installed. You can navigate to your OpenVINO™ install directory first, which is usually `/opt/intel/openvino`. Then, head to `/deployment_tools/model_optimizer/install_prerequisites`, and run the `install_prerequisites.sh` script therein.

![image.png](attachment:image.png)

### Developer Documentation

You can find the developer documentation [here](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) for working with the Model Optimizer. We’ll delve deeper into it throughout the lesson.

## 3. Optimization Techniques

Here, I mostly focused on three optimization techniques: quantization, freezing and fusion. Note that at the end of the video when I mention hardware optimizations, those are done by the Inference Engine (which we’ll cover in the next lesson), not the Model Optimizer.

### Quantization
Quantization is related to the topic of precision I mentioned before, or how many bits are used to represent the weights and biases of the model. During training, having these very accurate numbers can be helpful, but it’s often the case in inference that the precision can be reduced without substantial loss of accuracy. Quantization is the process of reducing the precision of a model.

With the OpenVINO™ Toolkit, models usually default to FP32, or 32-bit floating point values, while FP16 and INT8, for 16-bit floating point and 8-bit integer values, are also available (INT8 is only currently available in the Pre-Trained Models; the Model Optimizer does not currently support that level of precision). FP16 and INT8 will lose some accuracy, but the model will be smaller in memory and compute times faster. Therefore, quantization is a common method used for running models at the edge.

### Freezing
Freezing in this context is used for TensorFlow models. Freezing TensorFlow models will remove certain operations and metadata only needed for training, such as those related to backpropagation. Freezing a TensorFlow model is usually a good idea whether before performing direct inference or converting with the Model Optimizer.

### Fusion
Fusion relates to combining multiple layer operations into a single operation. For example, a batch normalization layer, activation layer, and convolutional layer could be combined into a single operation. This can be particularly useful for GPU inference, where the separate operations may occur on separate GPU kernels, while a fused operation occurs on one kernel, thereby incurring less overhead in switching from one kernel to the next.

![image.png](attachment:image.png)

**comment:** The Model Optimizer is performing some of these behind the scenes, so we can focus more on building out our application.

### Further Research
- If you’d like to learn more about quantization, check out this [helpful post](https://nervanasystems.github.io/distiller/quantization.html).
- You can find out more about optimizations performed by the Model Optimizer in the OpenVINO™ Toolkit [here](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_Model_Optimization_Techniques.html).

## 4. Supported Frameworks

The supported frameworks with the OpenVINO™ Toolkit are:

- Caffe
- TensorFlow
- MXNet
- ONNX (which can support PyTorch and Apple ML models through another conversion step)
- Kaldi

These are all open source, just like the OpenVINO™ Toolkit. Caffe is originally from UC Berkeley, TensorFlow is from Google Brain, MXNet is from Apache Software, ONNX is combined effort of Facebook and Microsoft, and Kaldi was originally an individual’s effort. Most of these are fairly multi-purpose frameworks, while Kaldi is primarily focused on speech recognition data.

There are some differences in how exactly to handle these, although most differences are handled under the hood of the OpenVINO™ Toolkit. For example, TensorFlow has some different steps for certain models, or frozen vs. unfrozen models. However, most of the functionality is shared across all of the supported frameworks.

![image.png](attachment:image.png)

### Further Research
In case you aren’t familiar with any of these frameworks, feel free to check out the sites for each below:

- [Caffe](https://caffe.berkeleyvision.org/)
- [TensorFlow](https://www.tensorflow.org/)
- [MXNet](https://mxnet.apache.org/)
- [ONNX](https://onnx.ai/)
- [Kaldi](https://kaldi-asr.org/doc/dnn.html)

## 5. Intermediate Representations

Intermediate Representations (IRs) are the OpenVINO™ Toolkit’s standard structure and naming for neural network architectures. A `Conv2D` layer in TensorFlow, `Convolution` layer in Caffe, or `Conv` layer in ONNX are all converted into a `Convolution` layer in an IR.

The IR is able to be loaded directly into the Inference Engine, and is actually made of two output files from the Model Optimizer: an XML file and a binary file. The XML file holds the model architecture and other important metadata, while the binary file holds weights and biases in a binary format. You need both of these files in order to run inference Any desired optimizations will have occurred while this is generated by the Model Optimizer, such as changes to precision. You can generate certain precisions with the `--data_type` argument, which is usually FP32 by default.

![image.png](attachment:image.png)

**Comment:** The Model Optimizer works almost like a translator here, making the Intermediate Representation a shared dialect of all the supported frameworks, which can be understood by the Inference Engine.

### Further Research

- You can find the main developer documentation on converting models in the OpenVINO™ Toolkit [here](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_convert_model_Converting_Model.html). We’ll cover how to do so with TensorFlow, Caffe and ONNX (useful for PyTorch) over the next several pages.
- You can find the documentation on different layer names when converted to an IR [here](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_Supported_Frameworks_Layers.html).
- Finally, you can find more in-depth data on each of the Intermediate Representation layers themselves [here](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_convert_model_IRLayersCatalogSpec.html).

## 6. Using the Model Optimizer with TensorFlow Models

Once the Model Optimizer is configured, the next thing to do with a TensorFlow model is to determine whether to use a frozen or unfrozen model. You can either freeze your model, which I would suggest, or use the separate instructions in the documentation to convert a non-frozen model. Some models in TensorFlow may already be frozen for you, so you can skip this step.

From there, you can feed the model into the Model Optimizer, and get your Intermediate Representation. However, there may be a few items specific to TensorFlow for that stage, which you’ll need to feed into the Model Optimizer before it can create an IR for use with the Inference Engine.

TensorFlow models can vary for what additional steps are needed by model type, being unfrozen or frozen, or being from the TensorFlow Detection Model Zoo. Unfrozen models usually need the `--mean_values` and `--scale` parameters fed to the Model Optimizer, while the frozen models from the Object Detection Model Zoo don’t need those parameters. However, the frozen models will need TensorFlow-specific parameters like `--tensorflow_use_custom_operations_config` and `--tensorflow_object_detection_api_pipeline_config`. Also, `--reverse_input_channels` is usually needed, as TF model zoo models are trained on RGB images, while OpenCV usually loads as BGR. Certain models, like YOLO, DeepSpeech, and more, have their own separate pages.

TensorFlow Object Detection Model Zoo
The models in the TensorFlow Object Detection Model Zoo can be used to even further extend the pre-trained models available to you. These are in TensorFlow format, so they will need to be fed to the Model Optimizer to get an IR. The models are just focused on object detection with bounding boxes, but there are plenty of different model architectures available.

### Further Research
- The developer documentation for Converting TensorFlow Models can be found [here](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_TensorFlow.html). You’ll work through this process in the next exercise.
- TensorFlow also has additional models available in the [TensorFlow Detection Model Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md). By converting these over to Intermediate Representations, you can expand even further on the pre-trained models available to you.

## 7. Exercise: Convert a TF Model


Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-663e2c8b" class="ulab-btn--primary"></button>

In this exercise, you'll convert a TensorFlow Model from the Object Detection Model Zoo
into an Intermediate Representation using the Model Optimizer.

As noted in the related [documentation](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_TensorFlow.html), 
there is a difference in method when using a frozen graph vs. an unfrozen graph. Since
freezing a graph is a TensorFlow-based function and not one specific to OpenVINO itself,
in this exercise, you will only need to work with a frozen graph. However, I encourage you to
try to freeze and load an unfrozen model on your own as well.

For this exercise, first download the SSD MobileNet V2 COCO model from [here](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz). Use the `tar -xvf` 
command with the downloaded file to unpack it.

From there, find the **Convert a TensorFlow\* Model** header in the documentation, and
feed in the downloaded SSD MobileNet V2 COCO model's `.pb` file. 

If the conversion is successful, the terminal should let you know that it generated an IR model.
The locations of the `.xml` and `.bin` files, as well as execution time of the Model Optimizer,
will also be output.

**Note**: Converting the TF model will take a little over one minute in the workspace.

### Hints & Troubleshooting

Make sure to pay attention to the note in this section regarding the 
`--reverse_input_channels` argument. 
If you are unsure about this argument, you can read more [here](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html#when_to_reverse_input_channels).

There is additional documentation specific to converting models from TensorFlow's Object
Detection Zoo [here](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_Object_Detection_API_Models.html).
You will likely need both the `--tensorflow_use_custom_operations_config` and
`--tensorflow_object_detection_api_pipeline_config` arguments fed with their 
related files.

## 8. Solution: Convert a TensorFlow Model

First, you can start by checking out the additional documentation specific to TensorFlow
models from the Model Detection Zoo [here](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_Object_Detection_API_Models.html).

I noticed three additional arguments that were important here:

- `--tensorflow_object_detection_api_pipeline_config`
- `--tensorflow_use_custom_operations_config`
- `--reverse_input_channels`

The first of these just needs the `pipeline.config` file that came with the downloaded model.

The second of these needs a JSON support file for TensorFlow models. I found that the
`ssd_v2_support.json` extension worked with the MobileNet model here.

The final of these is due to the TensorFlow models being trained on RGB images, but the
Inference Engine otherwise defaulting to BGR.

Now, given that I was in the directory with the frozen model file from TensorFlow, here was the 
full path to convert my model:

```
python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model frozen_inference_graph.pb --tensorflow_object_detection_api_pipeline_config pipeline.config --reverse_input_channels --tensorflow_use_custom_operations_config /opt/intel/openvino/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json
```

## from Video

Here's what I entered to convert the SSD MobileNet V2 model from TensorFlow:

`python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model frozen_inference_graph.pb --tensorflow_object_detection_api_pipeline_config pipeline.config --reverse_input_channels --tensorflow_use_custom_operations_config /opt/intel/openvino/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json`

This is pretty long! I would suggest considering setting a path environment variable for the Model Optimizer if you are working locally on a Linux-based machine. You could do something like this:

`export MOD_OPT=/opt/intel/openvino/deployment_tools/model_optimizer`

And then when you need to use it, you can utilize it with $MOD_OPT/mo.py instead of entering the full long path each time. In this case, that would also help shorten the path to the ssd_v2_support.json file used.

## 9. Using the Model Optimizer with Caffe Models

The process for converting a Caffe model is fairly similar to the TensorFlow one, although there’s nothing about freezing the model this time around, since that’s a TensorFlow concept. Caffe does have some differences in the set of supported model architectures. Additionally, Caffe models need to feed both the `.caffemodel` file, as well as a `.prototxt file`, into the Model Optimizer. If they have the same name, only the model needs to be directly input as an argument, while if the `.prototxt` file has a different name than the model, it should be fed in with `--input_proto` as well.

### Further Research

The developer documentation for Converting Caffe Models can be found [here](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Caffe.html). You’ll work through this process in the next exercise.

## 10. Exercise: Convert a Caffe Model

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-d0a57724" class="ulab-btn--primary"></button>

In this exercise, you'll convert a Caffe Model into an Intermediate Representation using the 
Model Optimizer. You can find the related documentation [here](https://docs.openvinotoolkit.org/2018_R5/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Caffe.html).

For this exercise, first download the SqueezeNet V1.1 model by cloning [this repository](https://github.com/DeepScale/SqueezeNet). 

Follow the documentation above and feed in the Caffe model to the Model Optimizer.

If the conversion is successful, the terminal should let you know that it generated an IR model.
The locations of the `.xml` and `.bin` files, as well as execution time of the Model Optimizer,
will also be output.

### Hints & Troubleshooting

You will need to specify `--input_proto` if the `.prototxt` file is not named the same as the model.

There is an important note in the documentation after the section **Supported Topologies** 
regarding Caffe models trained on ImageNet. If you notice poor performance in inference, you
may need to specify mean and scale values in your arguments.

```
python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model squeezenet_v1.1.caffemodel --input_proto deploy.prototxt
```

## 11. Convert a Caffe Model - Solution

First, you can start by checking out the documentation specific to Caffe models [here](https://docs.openvinotoolkit.org/2018_R5/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Caffe.html).

I did notice an additional helpful argument here: `--input_proto`, which is used to specify
a `.prototxt` file to pair with the `.caffemodel` file when the model name and `.prototxt`
filename do not match.

Now, given that I was in the directory with the Caffe model file & `.prototxt` file, here was the 
full path to convert my model:

```
python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model squeezenet_v1.1.caffemodel --input_proto deploy.prototxt
```

### From Video

Here's what I entered to convert the Squeezenet V1.1 model from Caffe:

```python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model squeezenet_v1.1.caffemodel --input_proto deploy.prototxt```

## 12. Using the Model Optimizer with ONNX Models

The process for converting an ONNX model is again quite similar to the previous two, although ONNX does not have any ONNX-specific arguments to the Model Optimizer. So, you’ll only have the general arguments for items like changing the precision.

Additionally, if you are working with PyTorch or Apple ML models, they need to be converted to ONNX format first, which is done outside of the OpenVINO™ Toolkit. See the link further down on this page if you are interested in doing so.

### Further Research
- The developer documentation for Converting ONNX Models can be found [here](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX.html). You’ll work through this process in the next exercise.
- ONNX also has additional models available in the [ONNX Model Zoo](https://github.com/onnx/models). By converting these over to Intermediate Representations, you can expand even further on the pre-trained models available to you.
### PyTorch to ONNX
- If you are interested in converting a PyTorch model using ONNX for use with the OpenVINO™ Toolkit, check out this [link](https://michhar.github.io/convert-pytorch-onnx/) for the steps to do so. From there, you can follow the steps for ONNX models to get an Intermediate Representation.

## 13. Exercise: Convert an ONNX Model


Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-0bd71d51" class="ulab-btn--primary"></button>

### Exercise Instructions

In this exercise, you'll convert an ONNX Model into an Intermediate Representation using the 
Model Optimizer. You can find the related documentation [here](https://docs.openvinotoolkit.org/2018_R5/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX.html).

For this exercise, first download the bvlc_alexnet model from [here](https://s3.amazonaws.com/download.onnx/models/opset_8/bvlc_alexnet.tar.gz). Use the `tar -xvf` command with the downloaded file to unpack it.

Follow the documentation above and feed in the ONNX model to the Model Optimizer.

If the conversion is successful, the terminal should let you know that it generated an IR model.
The locations of the `.xml` and `.bin` files, as well as execution time of the Model Optimizer,
will also be output.

### PyTorch models

Note that we will only cover converting directly from an ONNX model here. If you are interested
in converting a PyTorch model using ONNX for use with OpenVINO, check out this [link](https://michhar.github.io/convert-pytorch-onnx/) for the steps to do so. From there, you can follow the steps in the rest
of this exercise once you have an ONNX model.


## 14. Solution Convert an ONNX Model

First, you can start by checking out the documentation specific to ONNX models [here](https://docs.openvinotoolkit.org/2018_R5/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX.html).

Now, given that I was in the directory with the ONNX model file, here was the 
full path to convert my model:

```
python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model model.onnx
```

### From Video

Here's what I entered to convert the AlexNet model from ONNX:

```
python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model model.onnx
```

## 15. Cutting Parts of a Model


Cutting a model is mostly applicable for TensorFlow models. As we saw earlier in converting these models, they sometimes have some extra complexities. Some common reasons for cutting are:

- The model has pre- or post-processing parts that don’t translate to existing Inference Engine layers.
- The model has a training part that is convenient to be kept in the model, but is not used during inference.
- The model is too complex with many unsupported operations, so the complete model cannot be converted in one shot.
- The model is one of the supported SSD models. In this case, you need to cut a post-processing part off.
- There could be a problem with model conversion in the Model Optimizer or with inference in the Inference Engine. To localize the issue, cutting the model could help to find the problem

There’s two main command line arguments to use for cutting a model with the Model Optimizer, named intuitively as `--input` and `--output`, where they are used to feed in the layer names that should be either the new entry or exit points of the model.

### Developer Documentation
You guessed it - [here’s](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_convert_model_Cutting_Model.html) the developer documentation for cutting a model.

## 16. Supported Layers

Earlier, we saw some of the supported layers when looking at the names when converting from a supported framework to an IR. While that list is useful for one-offs, you probably don’t want to check whether each and every layer in your model is supported. You can also just see when you run the Model Optimizer what will convert.

What happens when a layer isn’t supported by the Model Optimizer? One potential solution is the use of custom layers, which we’ll go into more shortly. Another solution is actually running the given unsupported layer in its original framework. For example, you could potentially use TensorFlow to load and process the inputs and outputs for a specific layer you built in that framework, if it isn’t supported with the Model Optimizer. Lastly, there are also unsupported layers for certain hardware, that you may run into when working with the Inference Engine. In this case, there are sometimes extensions available that can add support. We’ll discuss that approach more in the next lesson.

![image.png](attachment:image.png)

**Comment:** While just about every layer you’d likely be using in your own neural network is supported with the Model Optimizer, sometimes you’ll need to make use of Custom Layers, which we’ll cover next.

### Supported Layers List
Check out the full list of supported layers [here](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_Supported_Frameworks_Layers.html).

## 17. Custom Layers

Custom layers are a necessary and important to have feature of the OpenVINO™ Toolkit, although you shouldn’t have to use it very often, if at all, due to all of the supported layers. However, it’s useful to know a little about its existence and how to use it if the need arises.

The [list of supported layers](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_Supported_Frameworks_Layers.html) from earlier very directly relates to whether a given layer is a custom layer. Any layer not in that list is automatically classified as a custom layer by the Model Optimizer.

To actually add custom layers, there are a few differences depending on the original model framework. In both TensorFlow and Caffe, the first option is to register the custom layers as extensions to the Model Optimizer.

For Caffe, the second option is to register the layers as Custom, then use Caffe to calculate the output shape of the layer. You’ll need Caffe on your system to do this option.

For TensorFlow, its second option is to actually replace the unsupported subgraph with a different subgraph. The final TensorFlow option is to actually offload the computation of the subgraph back to TensorFlow during inference.

You’ll get a chance to practice this in the next exercise. Again, as this is an advanced topic, we won’t delve too much deeper here, but feel free to check out the linked documentation if you want to know more.

Further Research
You’ll get a chance to get hands on with Custom Layers next, but feel free to check out the [developer documentation](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer.html) in the meantime.

If you’re interested in the option to use TensorFlow to operate on a given unsupported layer, you should also make sure to read the [documentation here](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_customize_model_optimizer_Offloading_Sub_Graph_Inference.html).

## 18. Exercise: Custom Layers

# Custom Layers

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-c7cfa177" class="ulab-btn--primary"></button>

This exercise is adapted from [this repository](https://github.com/david-drew/OpenVINO-Custom-Layers).

Note that the classroom workspace is running OpenVINO 2019.r3, while this exercise was
originally created for 2019.r2. This exercise will work appropriately in the workspace, but there
may be some other differences you need to account for if you use a custom layer yourself.

The below steps will walk you through the full walkthrough of creating a custom layer; as such,
there is not a related solution video. Note that custom layers is an advanced topic, and one
that is not expected to be used often (if at all) in most use cases of the OpenVINO toolkit. This
exercise is meant to introduce you to the concept, but you won't need to use it again in the 
rest of this course.

## Example Custom Layer: The Hyperbolic Cosine (cosh) Function

We will follow the steps involved for implementing a custom layer using the simple 
hyperbolic cosine (cosh) function. The cosh function is mathematically calculated as:

```
cosh(x) = (e^x + e^-x) / 2
```

As a function that calculates a value for the given value x, the cosh function is very simple 
when compared to most custom layers. Though the cosh function may not represent a "real" 
custom layer, it serves the purpose of this tutorial as an example for working through the steps 
for implementing a custom layer.

Move to the next page to continue.

## Build the Model

First, export the below paths to shorten some of what you need to enter later:

```
export CLWS=/home/workspace/cl_tutorial
export CLT=$CLWS/OpenVINO-Custom-Layers
```

Then run the following to create the TensorFlow model including the `cosh` layer.

```
mkdir $CLWS/tf_model
python $CLT/create_tf_model/build_cosh_model.py $CLWS/tf_model
```

You should receive a message similar to:

```
Model saved in path: /tf_model/model.ckpt
```

## Creating the *`cosh`* Custom Layer

### Generate the Extension Template Files Using the Model Extension Generator

We will use the Model Extension Generator tool to automatically create templates for all the 
extensions needed by the Model Optimizer to convert and the Inference Engine to execute 
the custom layer.  The extension template files will be partially replaced by Python and C++ 
code to implement the functionality of `cosh` as needed by the different tools.  To create 
the four extensions for the `cosh` custom layer, we run the Model Extension Generator 
with the following options:

- `--mo-tf-ext` = Generate a template for a Model Optimizer TensorFlow extractor
- `--mo-op` = Generate a template for a Model Optimizer custom layer operation
- `--ie-cpu-ext` = Generate a template for an Inference Engine CPU extension
- `--ie-gpu-ext` = Generate a template for an Inference Engine GPU extension 
- `--output_dir` = set the output directory.  Here we are using `$CLWS/cl_cosh` as the target directory to store the output from the Model Extension Generator.

To create the four extension templates for the `cosh` custom layer, given we are in the `$CLWS`
directory, we run the command:

```
mkdir cl_cosh
```

```bash
python /opt/intel/openvino/deployment_tools/tools/extension_generator/extgen.py new --mo-tf-ext --mo-op --ie-cpu-ext --ie-gpu-ext --output_dir=$CLWS/cl_cosh
```

The Model Extension Generator will start in interactive mode and prompt us with questions 
about the custom layer to be generated.  Use the text between the `[]`'s to answer each 
of the Model Extension Generator questions as follows:

```
Enter layer name: 
[cosh]

Do you want to automatically parse all parameters from the model file? (y/n)
...
[n]

Enter all parameters in the following format:
...
Enter 'q' when finished:
[q]

Do you want to change any answer (y/n) ? Default 'no'
[n]

Do you want to use the layer name as the operation name? (y/n)
[y]

Does your operation change shape? (y/n)  
[n]

Do you want to change any answer (y/n) ? Default 'no'
[n]
```

When complete, the output text will appear similar to:
```
Stub file for TensorFlow Model Optimizer extractor is in /home/<user>/cl_tutorial/cl_cosh/user_mo_extensions/front/tf folder
Stub file for the Model Optimizer operation is in /home/<user>/cl_tutorial/cl_cosh/user_mo_extensions/ops folder
Stub files for the Inference Engine CPU extension are in /home/<user>/cl_tutorial/cl_cosh/user_ie_extensions/cpu folder
Stub files for the Inference Engine GPU extension are in /home/<user>/cl_tutorial/cl_cosh/user_ie_extensions/gpu folder
```

Template files (containing source code stubs) that may need to be edited have just been 
created in the following locations:

- TensorFlow Model Optimizer extractor extension: 
  - `$CLWS/cl_cosh/user_mo_extensions/front/tf/`
  - `cosh_ext.py`
- Model Optimizer operation extension:
  - `$CLWS/cl_cosh/user_mo_extensions/ops`
  - `cosh.py`
- Inference Engine CPU extension:
  - `$CLWS/cl_cosh/user_ie_extensions/cpu`
  - `ext_cosh.cpp`
  - `CMakeLists.txt`
- Inference Engine GPU extension:
  - `$CLWS/cl_cosh/user_ie_extensions/gpu`
  - `cosh_kernel.cl`
  - `cosh_kernel.xml`

Instructions on editing the template files are provided in later parts of this tutorial.  
For reference, or to copy to make the changes quicker, pre-edited template files are provided 
by the tutorial in the `$CLT` directory.

Move to the next page to continue.

<!--
%%ulab_page_divider
--><hr/>

## Using Model Optimizer to Generate IR Files Containing the Custom Layer 

We will now use the generated extractor and operation extensions with the Model Optimizer 
to generate the model IR files needed by the Inference Engine.  The steps covered are:

1. Edit the extractor extension template file (already done - we will review it here)
2. Edit the operation extension template file (already done - we will review it here)
3. Generate the Model IR Files

### Edit the Extractor Extension Template File

For the `cosh` custom layer, the generated extractor extension does not need to be modified 
because the layer parameters are used without modification.  Below is a walkthrough of 
the Python code for the extractor extension that appears in the file 
`$CLWS/cl_cosh/user_mo_extensions/front/tf/cosh_ext.py`.
1. Using the text editor, open the extractor extension source file `$CLWS/cl_cosh/user_mo_extensions/front/tf/cosh_ext.py`.
2. The class is defined with the unique name `coshFrontExtractor` that inherits from the base extractor `FrontExtractorOp` class.  The class variable `op` is set to the name of the layer operation and `enabled` is set to tell the Model Optimizer to use (`True`) or exclude (`False`) the layer during processing.

    ```python
    class coshFrontExtractor(FrontExtractorOp):
        op = 'cosh' 
        enabled = True
    ```

3. The `extract` function is overridden to allow modifications while extracting parameters from layers within the input model.

    ```python
    @staticmethod
    def extract(node):
    ```

4. The layer parameters are extracted from the input model and stored in `param`.  This is where the layer parameters in `param` may be retrieved and used as needed.  For the `cosh` custom layer, the `op` attribute is simply set to the name of the operation extension used.

    ```python
    proto_layer = node.pb
    param = proto_layer.attr
    # extracting parameters from TensorFlow layer and prepare them for IR
    attrs = {
        'op': __class__.op
    }
    ```

5. The attributes for the specific node are updated. This is where we can modify or create attributes in `attrs` before updating `node` with the results and the `enabled` class variable is returned.

    ```python
    # update the attributes of the node
    Op.get_op_class_by_name(__class__.op).update_node_stat(node, attrs)
    
    return __class__.enabled
    ```

### Edit the Operation Extension Template File

For the `cosh` custom layer, the generated operation extension does not need to be modified 
because the shape (i.e., dimensions) of the layer output is the same as the input shape.  
Below is a walkthrough of the Python code for the operation extension that appears in 
the file  `$CLWS/cl_cosh/user_mo_extensions/ops/cosh.py`.

1. Using the text editor, open the operation extension source file `$CLWS/cl_cosh/user_mo_extensions/ops/cosh.py` 
2. The class is defined with the unique name `coshOp` that inherits from the base operation `Op` class.  The class variable `op` is set to `'cosh'`, the name of the layer operation.

    ```python
    class coshOp(Op):
    op = 'cosh'
    ```

3. The `coshOp` class initializer `__init__` function will be called for each layer created.  The initializer must initialize the super class `Op` by passing the `graph` and `attrs` arguments along with a dictionary of the mandatory properties for the `cosh` operation layer that define the type (`type`), operation (`op`), and inference function (`infer`).  This is where any other initialization needed by the `coshOP` operation can be specified.

    ```python
    def __init__(self, graph, attrs):
        mandatory_props = dict(
            type=__class__.op,
            op=__class__.op,
            infer=coshOp.infer            
        )
    super().__init__(graph, mandatory_props, attrs)
    ```

4. The `infer` function is defined to provide the Model Optimizer information on a layer, specifically returning the shape of the layer output for each node.  Here, the layer output shape is the same as the input and the value of the helper function `copy_shape_infer(node)` is returned.

    ```python
    @staticmethod
    def infer(node: Node):
        # ==========================================================
        # You should add your shape calculation implementation here
        # If a layer input shape is different to the output one
        # it means that it changes shape and you need to implement
        # it on your own. Otherwise, use copy_shape_infer(node).
        # ==========================================================
        return copy_shape_infer(node)
    ```

### Generate the Model IR Files

With the extensions now complete, we use the Model Optimizer to convert and optimize 
the example TensorFlow model into IR files that will run inference using the Inference Engine.  
To create the IR files, we run the Model Optimizer for TensorFlow `mo_tf.py` with 
the following options:

- `--input_meta_graph model.ckpt.meta`
  - Specifies the model input file.  

- `--batch 1`
  - Explicitly sets the batch size to 1 because the example model has an input dimension of "-1".
  - TensorFlow allows "-1" as a variable indicating "to be filled in later", however the Model Optimizer requires explicit information for the optimization process.  

- `--output "ModCosh/Activation_8/softmax_output"`
  - The full name of the final output layer of the model.

- `--extensions $CLWS/cl_cosh/user_mo_extensions`
  - Location of the extractor and operation extensions for the custom layer to be used by the Model Optimizer during model extraction and optimization. 

- `--output_dir $CLWS/cl_ext_cosh`
  - Location to write the output IR files.

To create the model IR files that will include the `cosh` custom layer, we run the commands:

```bash
cd $CLWS/tf_model
python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_meta_graph model.ckpt.meta --batch 1 --output "ModCosh/Activation_8/softmax_output" --extensions $CLWS/cl_cosh/user_mo_extensions --output_dir $CLWS/cl_ext_cosh
```

The output will appear similar to:

```
[ SUCCESS ] Generated IR model.
[ SUCCESS ] XML file: /home/<user>/cl_tutorial/cl_ext_cosh/model.ckpt.xml
[ SUCCESS ] BIN file: /home/<user>/cl_tutorial/cl_ext_cosh/model.ckpt.bin
[ SUCCESS ] Total execution time: x.xx seconds.
```

Move to the next page to continue.

<!--
%%ulab_page_divider
--><hr/>

## Inference Engine Custom Layer Implementation for the Intel® CPU

We will now use the generated CPU extension with the Inference Engine to execute 
the custom layer on the CPU.  The steps are:

1. Edit the CPU extension template files.
2. Compile the CPU extension library.
3. Execute the Model with the custom layer.

You *will* need to make the changes in this section to the related files.

Note that the classroom workspace only has an Intel CPU available, so we will not perform
the necessary steps for GPU usage with the Inference Engine.

### Edit the CPU Extension Template Files

The generated CPU extension includes the template file `ext_cosh.cpp` that must be edited 
to fill-in the functionality of the `cosh` custom layer for execution by the Inference Engine.  
We also need to edit the `CMakeLists.txt` file to add any header file or library dependencies 
required to compile the CPU extension.  In the next sections, we will walk through and edit 
these files.

#### Edit `ext_cosh.cpp`

We will now edit the `ext_cosh.cpp` by walking through the code and making the necessary 
changes for the `cosh` custom layer along the way.

1. Using the text editor, open the CPU extension source file `$CLWS/cl_cosh/user_ie_extensions/cpu/ext_cosh.cpp`.

2. To implement the `cosh` function to efficiently execute in parallel, the code will use the parallel processing supported by the Inference Engine through the use of the Intel® Threading Building Blocks library.  To use the library, at the top we must include the header [`ie_parallel.hpp`](https://docs.openvinotoolkit.org/2019_R3.1/ie__parallel_8hpp.html) file by adding the `#include` line as shown below.

    Before:

    ```cpp
    #include "ext_base.hpp"
    #include <cmath>
    ```

    After:

    ```cpp
    #include "ext_base.hpp"
    #include "ie_parallel.hpp"
    #include <cmath>
    ```

3. The class `coshImp` implements the `cosh` custom layer and inherits from the extension layer base class `ExtLayerBase`.

    ```cpp
    class coshImpl: public ExtLayerBase {
        public:
    ```

4. The `coshImpl` constructor is passed the `layer` object that it is associated with to provide access to any layer parameters that may be needed when implementing the specific instance of the custom layer.

    ```cpp
    explicit coshImpl(const CNNLayer* layer) {
      try {
        ...
    ```

5. The `coshImpl` constructor configures the input and output data layout for the custom layer by calling `addConfig()`.  In the template file, the line is commented-out and we will replace it to indicate that `layer` uses `DataConfigurator(ConfLayout::PLN)` (plain or linear) data for both input and output.

    Before:

    ```cpp
    ...
    // addConfig({DataConfigurator(ConfLayout::PLN), DataConfigurator(ConfLayout::PLN)}, {DataConfigurator(ConfLayout::PLN)});

    ```

    After:

    ```cpp
    addConfig(layer, { DataConfigurator(ConfLayout::PLN) }, { DataConfigurator(ConfLayout::PLN) });
    ```

6. The construct is now complete, catching and reporting certain exceptions that may have been thrown before exiting.

    ```cpp
      } catch (InferenceEngine::details::InferenceEngineException &ex) {
        errorMsg = ex.what();
      }
    }
    ```

7. The `execute` method is overridden to implement the functionality of the `cosh` custom layer.  The `inputs` and `outputs` are the data buffers passed as [`Blob`](https://docs.openvinotoolkit.org/2019_R3.1/_docs_IE_DG_Memory_primitives.html) objects.  The template file will simply return `NOT_IMPLEMENTED` by default.  To calculate the `cosh` custom layer, we will replace the `execute` method with the code needed to calculate the `cosh` function in parallel using the [`parallel_for3d`](https://docs.openvinotoolkit.org/2019_R3.1/ie__parallel_8hpp.html) function.

    Before:

    ```cpp
      StatusCode execute(std::vector<Blob::Ptr>& inputs, std::vector<Blob::Ptr>& outputs,
        ResponseDesc *resp) noexcept override {
        // Add here implementation for layer inference
        // Examples of implementations you can find in Inference Engine tool samples/extensions folder
        return NOT_IMPLEMENTED;
    ```

    After:
    ```cpp
      StatusCode execute(std::vector<Blob::Ptr>& inputs, std::vector<Blob::Ptr>& outputs,
        ResponseDesc *resp) noexcept override {
        // Add implementation for layer inference here
        // Examples of implementations are in OpenVINO samples/extensions folder

        // Get pointers to source and destination buffers
        float* src_data = inputs[0]->buffer();
        float* dst_data = outputs[0]->buffer();

        // Get the dimensions from the input (output dimensions are the same)
        SizeVector dims = inputs[0]->getTensorDesc().getDims();

        // Get dimensions:N=Batch size, C=Number of Channels, H=Height, W=Width
        int N = static_cast<int>((dims.size() > 0) ? dims[0] : 1);
        int C = static_cast<int>((dims.size() > 1) ? dims[1] : 1);
        int H = static_cast<int>((dims.size() > 2) ? dims[2] : 1);
        int W = static_cast<int>((dims.size() > 3) ? dims[3] : 1);

        // Perform (in parallel) the hyperbolic cosine given by: 
        //    cosh(x) = (e^x + e^-x)/2
        parallel_for3d(N, C, H, [&](int b, int c, int h) {
        // Fill output_sequences with -1
        for (size_t ii = 0; ii < b*c; ii++) {
          dst_data[ii] = (exp(src_data[ii]) + exp(-src_data[ii]))/2;
        }
      });
    return OK;
    }
    ```

#### Edit `CMakeLists.txt`

Because the implementation of the `cosh` custom layer makes use of the parallel processing 
supported by the Inference Engine, we need to add the Intel® Threading Building Blocks 
dependency to `CMakeLists.txt` before compiling.  We will add paths to the header 
and library files and add the Intel® Threading Building Blocks library to the list of link libraries. 
We will also rename the `.so`.

1. Using the text editor, open the CPU extension CMake file `$CLWS/cl_cosh/user_ie_extensions/cpu/CMakeLists.txt`.
2. At the top, rename the `TARGET_NAME` so that the compiled library is named `libcosh_cpu_extension.so`:

    Before:

    ```cmake
    set(TARGET_NAME "user_cpu_extension")
    ```

    After:
    
    ```cmake
    set(TARGET_NAME "cosh_cpu_extension")
    ```

3. We modify the `include_directories` to add the header include path for the Intel® Threading Building Blocks library located in `/opt/intel/openvino/deployment_tools/inference_engine/external/tbb/include`:

    Before:

    ```cmake
    include_directories (PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/common
    ${InferenceEngine_INCLUDE_DIRS}
    )
    ```

    After:
    ```cmake
    include_directories (PRIVATE
    ${CMAKE_CURRENT_SOURCE_DIR}/common
    ${InferenceEngine_INCLUDE_DIRS}
    "/opt/intel/openvino/deployment_tools/inference_engine/external/tbb/include"
    )
    ```

4. We add the `link_directories` with the path to the Intel® Threading Building Blocks library binaries at `/opt/intel/openvino/deployment_tools/inference_engine/external/tbb/lib`:

    Before:

    ```cmake
    ...
    #enable_omp()
    ```

    After:
    ```cmake
    ...
    link_directories(
    "/opt/intel/openvino/deployment_tools/inference_engine/external/tbb/lib"
    )
    #enable_omp()
    ```

5. Finally, we add the Intel® Threading Building Blocks library `tbb` to the list of link libraries in `target_link_libraries`:

    Before:

    ```cmake
    target_link_libraries(${TARGET_NAME} ${InferenceEngine_LIBRARIES} ${intel_omp_lib})
    ```

    After:

    ```cmake
    target_link_libraries(${TARGET_NAME} ${InferenceEngine_LIBRARIES} ${intel_omp_lib} tbb)
    ```

### Compile the Extension Library

To run the custom layer on the CPU during inference, the edited extension C++ source code 
must be compiled to create a `.so` shared library used by the Inference Engine. 
In the following steps, we will now compile the extension C++ library.

1. First, we run the following commands to use CMake to setup for compiling:

    ```bash
    cd $CLWS/cl_cosh/user_ie_extensions/cpu
    mkdir -p build
    cd build
    cmake ..
    ```

    The output will appear similar to:     

    ```
    -- Generating done
    -- Build files have been written to: /home/<user>/cl_tutorial/cl_cosh/user_ie_extensions/cpu/build
    ```

2. The CPU extension library is now ready to be compiled.  Compile the library using the command:

    ```bash
    make -j $(nproc)
    ```

    The output will appear similar to: 

    ```
    [100%] Linking CXX shared library libcosh_cpu_extension.so
    [100%] Built target cosh_cpu_extension
    ```

Move to the next page to continue.

<!--
%%ulab_page_divider
--><hr/>

## Execute the Model with the Custom Layer

### Using a C++ Sample

To start on a C++ sample, we first need to build the C++ samples for use with the Inference
Engine:

```bash
cd /opt/intel/openvino/deployment_tools/inference_engine/samples/
./build_samples.sh
```

This will take a few minutes to compile all of the samples.

Next, we will try running the C++ sample without including the `cosh` extension library to see 
the error describing the unsupported `cosh` operation using the command:  

```bash
~/inference_engine_samples_build/intel64/Release/classification_sample_async -i $CLT/pics/dog.bmp -m $CLWS/cl_ext_cosh/model.ckpt.xml -d CPU
```

The error output will be similar to:

```
[ ERROR ] Unsupported primitive of type: cosh name: ModCosh/cosh/Cosh
```

We will now run the command again, this time with the `cosh` extension library specified 
using the `-l $CLWS/cl_cosh/user_ie_extensions/cpu/build/libcosh_cpu_extension.so` option 
in the command:

```bash
~/inference_engine_samples_build/intel64/Release/classification_sample_async -i $CLT/pics/dog.bmp -m $CLWS/cl_ext_cosh/model.ckpt.xml -d CPU -l $CLWS/cl_cosh/user_ie_extensions/cpu/build/libcosh_cpu_extension.so
```

The output will appear similar to:

```
Image /home/<user>/cl_tutorial/OpenVINO-Custom-Layers/pics/dog.bmp

classid probability
------- -----------
0       0.9308984  
1       0.0691015

total inference time: xx.xxxxxxx
Average running time of one iteration: xx.xxxxxxx ms

Throughput: xx.xxxxxxx FPS

[ INFO ] Execution successful
```

### Using a Python Sample

First, we will try running the Python sample without including the `cosh` extension library 
to see the error describing the unsupported `cosh` operation using the command:  

```bash
python /opt/intel/openvino/deployment_tools/inference_engine/samples/python_samples/classification_sample_async/classification_sample_async.py -i $CLT/pics/dog.bmp -m $CLWS/cl_ext_cosh/model.ckpt.xml -d CPU
```

The error output will be similar to:

```
[ INFO ] Loading network files:
/home/<user>/cl_tutorial/tf_model/model.ckpt.xml
/home/<user>/cl_tutorial/tf_model/model.ckpt.bin
[ ERROR ] Following layers are not supported by the plugin for specified device CPU:
ModCosh/cosh/Cosh, ModCosh/cosh_1/Cosh, ModCosh/cosh_2/Cosh
[ ERROR ] Please try to specify cpu extensions library path in sample's command line parameters using -l or --cpu_extension command line argument
```

We will now run the command again, this time with the `cosh` extension library specified 
using the `-l $CLWS/cl_cosh/user_ie_extensions/cpu/build/libcosh_cpu_extension.so` option 
in the command:

```bash
python /opt/intel/openvino/deployment_tools/inference_engine/samples/python_samples/classification_sample_async/classification_sample_async.py -i $CLT/pics/dog.bmp -m $CLWS/cl_ext_cosh/model.ckpt.xml -l $CLWS/cl_cosh/user_ie_extensions/cpu/build/libcosh_cpu_extension.so -d CPU
```

The output will appear similar to:

```
Image /home/<user>/cl_tutorial/OpenVINO-Custom-Layers/pics/dog.bmp

classid probability
------- -----------
0      0.9308984
1      0.0691015
```

**Congratulations!** You have now implemented a custom layer with the Intel® Distribution of OpenVINO™ Toolkit.

<!--
%%ulab_page_divider
--><hr/>

# Feed an IR to the Inference Engine

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-6f2a60e5" class="ulab-btn--primary"></button>

Earlier in the course, you were focused on working with the Intermediate Representation (IR)
models themselves, while mostly glossing over the use of the actual Inference Engine with
the model.

Here, you'll import the Python wrapper for the Inference Engine (IE), and practice using 
different IRs with it. You will first add each IR as an `IENetwork`, and check whether the layers 
of that network are supported by the classroom CPU.

Since the classroom workspace is using an Intel CPU, you will also need to add a CPU
extension to the `IECore`.

Once you have verified all layers are supported (when the CPU extension is added),
you will load the given model into the Inference Engine.

Note that the `.xml` file of the IR should be given as an argument when running the script.

To test your implementation, you should be able to successfully load each of the three IR
model files we have been working with throughout the course so far, which you can find in the
`/home/workspace/models` directory.

<!--
%%ulab_page_divider
--><hr/>

# Inference Requests

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-ceb2f99a" class="ulab-btn--primary"></button>

In the previous exercise, you loaded Intermediate Representations (IRs) into the Inference
Engine. Now that we've covered some of the topics around requests, including the difference
between synchronous and asynchronous requests, you'll add additional code to make
inference requests to the Inference Engine.

Given an `ExecutableNetwork` that is the IR loaded into the Inference Engine, your task is to:

1. Perform a synchronous request
2. Start an asynchronous request given an input image frame
3. Wait for the asynchronous request to complete

Note that we'll cover handling the results of the request shortly, so you don't need to worry
about that just yet. This will get you practice with both types of requests with the Inference
Engine.

You will perform the above tasks within `inference.py`. This will take three arguments,
one for the model, one for the test image, and the last for what type of inference request
should be made.

You can use `test.py` afterward to verify your code successfully makes inference requests.

<!--
%%ulab_page_divider
--><hr/>

# Integrate the Inference Engine in An Edge App

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-d44d77ce" class="ulab-btn--primary"></button>

You've come a long way from the first lesson where most of the code for working with
the OpenVINO toolkit was happening in the background. You worked with pre-trained models,
moved up to converting any trained model to an Intermediate Representation with the
Model Optimizer, and even got the model loaded into the Inference Engine and began making
inference requests.

In this final exercise of this lesson, you'll close off the OpenVINO workflow by extracting
the results of the inference request, and then integrating the Inference Engine into an existing
application. You'll still be given some of the overall application infrastructure, as more that of
will come in the next lesson, but all of that is outside of OpenVINO itself.

You will also add code allowing you to try out various confidence thresholds with the model,
as well as changing the visual look of the output, like bounding box colors.

Now, it's up to you which exact model you want to use here, although you are able to just
re-use the model you converted with TensorFlow before for an easy bounding box dectector.

Note that this application will run with a video instead of just images like we've done before.

So, your tasks are to:

1. Convert a bounding box model to an IR with the Model Optimizer.
2. Pre-process the model as necessary.
3. Use an async request to perform inference on each video frame.
4. Extract the results from the inference request.
5. Add code to make the requests and feed back the results within the application.
6. Perform any necessary post-processing steps to get the bounding boxes.
7. Add a command line argument to allow for different confidence thresholds for the model.
8. Add a command line argument to allow for different bounding box colors for the output.
9. Correctly utilize the command line arguments in #3 and #4 within the application.

When you are done, feed your model to `app.py`, and it will generate `out.mp4`, which you
can download and view. *Note that this app will take a little bit longer to run.* Also, if you need
to re-run inference, delete the `out.mp4` file first.

You only need to feed the model with `-m` before adding the customization; you should set
defaults for any additional arguments you add for the color and confidence so that the user
does not always need to specify them.

```bash
python app.py -m {your-model-path.xml}
```

<!--
%%ulab_page_divider
--><hr/>

# Handling Input Streams

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-5de618db" class="ulab-btn--primary"></button>

It's time to really get in the think of things for running your app at the edge. Being able to
appropriately handle an input stream is a big part of having a working AI or computer vision
application. 

In your case, you will be implementing a function that can handle camera, video or webcam
data as input. While unfortunately the classroom workspace won't allow for webcam usage,
you can also try that portion of your code out on your local machine if you have a webcam
available.

As such, the tests here will focus on using a camera image or a video file. You will not need to
perform any inference on the input frames, but you will need to do a few other image
processing techniques to show you have some of the basics of OpenCV down.

Your tasks are to:

1. Implement a function that can handle camera image, video file or webcam inputs
2. Use `cv2.VideoCapture()` and open the capture stream
3. Re-size the frame to 100x100
4. Add Canny Edge Detection to the frame with min & max values of 100 and 200, respectively
5. Save down the image or video output
6. Close the stream and any windows at the end of the application

You won't be able to test a webcam input in the workspace unfortunately, but you can use
the included video and test image to test your implementations.

<!--
%%ulab_page_divider
--><hr/>

# Processing Model Outputs

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-4fb9f776" class="ulab-btn--primary"></button>

Let's say you have a cat and two dogs at your house. 

If both dogs are in a room together, they are best buds, and everything is going well.

If the cat and dog #1 are in a room together, they are also good friends, and everything is fine.

However, if the cat and dog #2 are in a room together, they don't get along, and you may need
to either pull them apart, or at least play a pre-recorded message from your smart speaker
to tell them to cut it out.

In this exercise, you'll receive a video where some combination or the cat and dogs may be
in view. You also will have an IR that is able to determine which of these, if any, are on screen.

While the best model for this is likely an object detection model that can identify different
breeds, I have provided you with a very basic (and overfit) model that will return three classes,
one for one or less pets on screen, one for the bad combination of the cat and dog #2, and
one for the fine combination of the cat and dog #1. This is within the exercise directory - `model.xml`.

It is up to you to add code that will print to the terminal anytime the bad combination of the 
cat and dog #2 are detected together. **Note**: It's important to consider whether you really
want to output a warning *every single time* both pets are on-screen - is your warning helpful
if it re-starts every 30th of a second, with a video at 30 fps?

<!--
%%ulab_page_divider
--><hr/>

# Server Communications

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-66f8bc80" class="ulab-btn--primary"></button>

In this exercise, you will practice showing off your new server communication skills
for sending statistics over MQTT and images with FFMPEG.

The application itself is already built and able to perform inference, and a node server is set
up for you to use. The main node server is already fully ready to receive communications from
MQTT and FFMPEG. The MQTT node server is fully configured as well. Lastly, the ffserver is 
already configured for FFMPEG too.

The current application simply performs inference on a frame, gathers some statistics, and then 
continues onward to the next frame. 

## Tasks

Your tasks are to:

- Add any code for MQTT to the project so that the node server receives the calculated stats
  - This includes importing the relevant Python library
  - Setting IP address and port
  - Connecting to the MQTT client
  - Publishing the calculated statistics to the client
- Send the output frame (**not** the input image, but the processed output) to the ffserver

## Additional Information

Note: Since you are given the MQTT Broker Server and Node Server for the UI, you need 
certain information to correctly configure, publish and subscribe with MQTT.
- The MQTT port to use is 3001 - the classroom workspace only allows ports 3000-3009
- The topics that the UI Node Server is listening to are "class" and "speedometer"
- The Node Server will attempt to extract information from any JSON received from the MQTT server with the keys "class_names" and "speed"

## Running the App

First, get the MQTT broker and UI installed.

- `cd webservice/server`
- `npm install`
- When complete, `cd ../ui`
- And again, `npm install`

You will need *four* separate terminal windows open in order to see the results. The steps
below should be done in a different terminal based on number. You can open a new terminal
in the workspace in the upper left (File>>New>>Terminal).

1. Get the MQTT broker installed and running.
  - `cd webservice/server/node-server`
  - `node ./server.js`
  - You should see a message that `Mosca server started.`.
2. Get the UI Node Server running.
  - `cd webservice/ui`
  - `npm run dev`
  - After a few seconds, you should see `webpack: Compiled successfully.`
3. Start the ffserver
  - `sudo ffserver -f ./ffmpeg/server.conf`
4. Start the actual application. 
  - First, you need to source the environment for OpenVINO *in the new terminal*:
    - `source /opt/intel/openvino/bin/setupvars.sh -pyver 3.5`
  - To run the app, I'll give you two items to pipe in with `ffmpeg` here, with the rest up to you:
    - `-video_size 1280x720`
    - `-i - http://0.0.0.0:3004/fac.ffm`

Your app should begin running, and you should also see the MQTT broker server noting
information getting published.

In order to view the output, click on the "Open App" button below in the workspace.

<button id="ulab-button-2c5c842f" class="ulab-btn--primary"></button>

## 19. Recap


In this lesson we covered:

- Basics of the Model Optimizer
- Different Optimization Techniques and their impact on model performance
- Supported Frameworks in the Intel® Distribution of OpenVINO™ Toolkit
- Converting from models in those frameworks to Intermediate Representations
- And a bit on Custom Layers

## 20. Lesson Glossary

### Model Optimizer
A command-line tool used for converting a model from one of the supported frameworks to an Intermediate Representation (IR), including certain performance optimizations, that is compatible with the Inference Engine.

### Optimization Techniques
Optimization techniques adjust the original trained model in order to either reduce the size of or increase the speed of a model in performing inference. Techniques discussed in the lesson include quantization, freezing and fusion.

### Quantization
Reduces precision of weights and biases (to lower precision floating point values or integers), thereby reducing compute time and size with some (often minimal) loss of accuracy.

### Freezing
In TensorFlow this removes metadata only needed for training, as well as converting variables to constants. Also a term in training neural networks, where it often refers to freezing layers themselves in order to fine tune only a subset of layers.

### Fusion
The process of combining certain operations together into one operation and thereby needing less computational overhead. For example, a batch normalization layer, activation layer, and convolutional layer could be combined into a single operation. This can be particularly useful for GPU inference, where the separate operations may occur on separate GPU kernels, while a fused operation occurs on one kernel, thereby incurring less overhead in switching from one kernel to the next.

### Supported Frameworks
The Intel® Distribution of OpenVINO™ Toolkit currently supports models from five frameworks (which themselves may support additional model frameworks): Caffe, TensorFlow, MXNet, ONNX, and Kaldi.

### Caffe
The “Convolutional Architecture for Fast Feature Embedding” (CAFFE) framework is an open-source deep learning library originally built at UC Berkeley.

### TensorFlow
TensorFlow is an open-source deep learning library originally built at Google. As an Easter egg for anyone who has read this far into the glossary, this was also your instructor’s first deep learning framework they learned, back in 2016 (pre-V1!).

### MXNet
Apache MXNet is an open-source deep learning library built by Apache Software Foundation.

### ONNX
The “Open Neural Network Exchange” (ONNX) framework is an open-source deep learning library originally built by Facebook and Microsoft. PyTorch and Apple-ML models are able to be converted to ONNX models.

### Kaldi
While still open-source like the other supported frameworks, Kaldi is mostly focused around speech recognition data, with the others being more generalized frameworks.

### Intermediate Representation
A set of files converted from one of the supported frameworks, or available as one of the Pre-Trained Models. This has been optimized for inference through the Inference Engine, and may be at one of several different precision levels. Made of two files:

- .xml - Describes the network topology
- .bin - Contains the weights and biases in a binary file

### Supported Layers
Layers [supported](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_Supported_Frameworks_Layers.html) for direct conversion from supported framework layers to intermediate representation layers through the Model Optimizer. While nearly every layer you will ever use is in the supported frameworks is supported, there is sometimes a need for handling Custom Layers.

### Custom Layers
Custom layers are those outside of the list of known, supported layers, and are typically a rare exception. Handling custom layers in a neural network for use with the Model Optimizer depends somewhat on the framework used; other than adding the custom layer as an extension, you otherwise have to follow [instructions](https://docs.openvinotoolkit.org/2019_R3/_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer.html) specific to the framework.



# LESSON 4

## The Inference Engine
Dive deeper into the Inference Engine, and perform inference in the OpenVINO™ Toolkit. By the end, you’ll know the full workflow for OpenVINO™ fundamentals and be ready to integrate into an app.

![image.png](attachment:image.png)


## 1. Introduction

In this lesson we'll cover:

- Basics of the Inference Engine
- Supported Devices
- Feeding an Intermediate Representation to the Inference Engine
- Making Inference Requests
- Handling Results from the Inference Engine
- Integrating the Inference Model into an App

## 2. The Inference Engine

The Inference Engine runs the actual inference on a model. It only works with the Intermediate Representations that come from the Model Optimizer, or the Intel® Pre-Trained Models in OpenVINO™ that are already in IR format.

Where the Model Optimizer made some improvements to size and complexity of the models to improve memory and computation times, the Inference Engine provides hardware-based optimizations to get even further improvements from a model. This really empowers your application to run at the edge and use up as little of device resources as possible.

The Inference Engine has a straightforward API to allow easy integration into your edge application. The Inference Engine itself is actually built in C++ (at least for the CPU version), leading to overall faster operations; however, it is very common to utilize the built-in Python wrapper to interact with it in Python code.

![image.png](attachment:image.png)

**Comment:** The Inference Engine, as the name suggests, does the real legwork of inference at the edge

### Developer Documentation
You can find the developer documentation [here](https://docs.openvinotoolkit.org/2019_R3/_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide.html) for working with the Inference Engine. We’ll delve deeper into it throughout the lesson.

## 3. Supported Devices

The supported devices for the Inference Engine are all Intel® hardware, and are a variety of such devices: CPUs, including integrated graphics processors, GPUs, FPGAs, and VPUs. You likely know what CPUs and GPUs are already, but maybe not the others.

FPGAs, or Field Programmable Gate Arrays, are able to be further configured by a customer after manufacturing. Hence the “field programmable” part of the name.

VPUs, or Vision Processing Units, are going to be like the Intel® Neural Compute Stick. They are small, but powerful devices that can be plugged into other hardware, for the specific purpose of accelerating computer vision tasks.

Differences Among Hardware
Mostly, how the Inference Engine operates on one device will be the same as other supported devices; however, you may remember me mentioning a CPU extension in the last lesson. That’s one difference, that a CPU extension can be added to support additional layers when the Inference Engine is used on a CPU.

There are also some differences among supported layers by device, which is linked to at the bottom of this page. Another important one to note is regarding when you use an Intel® Neural Compute Stick (NCS). An easy, fairly low-cost method of testing out an edge app locally, outside of your own computer is to use the NCS2 with a Raspberry Pi. The Model Optimizer is not supported directly with this combination, so you may need to create an Intermediate Representation on another system first, although there are [some instructions](https://software.intel.com/en-us/articles/model-downloader-optimizer-for-openvino-on-raspberry-pi) for one way to do so on-device. The Inference Engine itself is still supported with this combination.

![image.png](attachment:image.png)

### Further Research
Depending on your device, the different plugins do have some differences in functionality and optimal configurations. You can read more on Supported Devices [here](https://docs.openvinotoolkit.org/2019_R3/_docs_IE_DG_supported_plugins_Supported_Devices.html).


## 4. Using the Inference Engine with an IR

`IECore` and `IENetwork`

To load an IR into the Inference Engine, you’ll mostly work with two classes in the openvino.inference_engine library (if using Python):

- `IECore`, which is the Python wrapper to work with the Inference Engine
- `IENetwork`, which is what will initially hold the network and get loaded into `IECore`
The next step after importing is to set a couple variables to actually use the IECore and IENetwork. In the [IECore documentation](https://docs.openvinotoolkit.org/2019_R3/classie__api_1_1IECore.html), no arguments are needed to initialize. To use [IENetwork](https://docs.openvinotoolkit.org/2019_R3/classie__api_1_1IENetwork.html), you need to load arguments named model and weights to initialize - the XML and Binary files that make up the model’s Intermediate Representation.

### Check Supported Layers
In the [IECore documentation](https://docs.openvinotoolkit.org/2019_R3/classie__api_1_1IECore.html), there was another function called `query_network`, which takes in an IENetwork as an argument and a device name, and returns a list of layers the Inference Engine supports. You can then iterate through the layers in the IENetwork you created, and check whether they are in the supported layers list. If a layer was not supported, a CPU extension may be able to help.

The `device_name` argument is just a string for which device is being used - `”CPU”`, `”GPU”`, `”FPGA”`, or `”MYRIAD”` (which applies for the Neural Compute Stick).

### CPU extension
If layers were successfully built into an Intermediate Representation with the Model Optimizer, some may still be unsupported by default with the Inference Engine when run on a CPU. However, there is likely support for them using one of the available CPU extensions.

These do differ by operating system a bit, although they should still be in the same overall location. If you navigate to your OpenVINO™ install directory, then `deployment_tools`, `inference_engine`, `lib`, `intel64`:

- On Linux, you’ll see a few CPU extension files available for AVX and SSE. That’s a bit outside of the scope of the course, but look up Advanced Vector Extensions if you want to know more there. In the classroom workspace, the SSE file will work fine.
    - Intel® Atom processors use SSE4, while Intel® Core processors will utilize AVX.
    - This is especially important to make note of when transferring a program from a Core-based laptop to an Atom-based edge device. If the incorrect extension is specified in the application, the program will crash.
    - AVX systems can run SSE4 libraries, but not vice-versa.
- On Mac, there’s just a single CPU extension file.

You can add these directly to the `IECore` using their full path. After you’ve added the CPU extension, if necessary, you should re-check that all layers are now supported. If they are, it’s finally time to load the model into the IECore.

### Further Research
As you get more into working with the Inference Engine in the next exercise and into the future, here are a few pages of documentation I found useful in working with it.

- [IE Python API](https://docs.openvinotoolkit.org/2019_R3/ie_python_api.html)
- [IE Network](https://docs.openvinotoolkit.org/2019_R3/classie__api_1_1IENetwork.html)
- [IE Core](https://docs.openvinotoolkit.org/2019_R3/classie__api_1_1IECore.html)


## 5. Exercise: Feed an IR to the Inference Engine


Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-6f2a60e5" class="ulab-btn--primary"></button>

Earlier in the course, you were focused on working with the Intermediate Representation (IR)
models themselves, while mostly glossing over the use of the actual Inference Engine with
the model.

Here, you'll import the Python wrapper for the Inference Engine (IE), and practice using 
different IRs with it. You will first add each IR as an `IENetwork`, and check whether the layers 
of that network are supported by the classroom CPU.

Since the classroom workspace is using an Intel CPU, you will also need to add a CPU
extension to the `IECore`.

Once you have verified all layers are supported (when the CPU extension is added),
you will load the given model into the Inference Engine.

Note that the `.xml` file of the IR should be given as an argument when running the script.

To test your implementation, you should be able to successfully load each of the three IR
model files we have been working with throughout the course so far, which you can find in the
`/home/workspace/models` directory.

## 6. Solution: Feed an IR to the Inference Engine

To start, I'll import the `IENetwork` and `IECore` from the Inference Engine. The first of
these is what will hold the Intermediate Representation object, while the second is the 
Python Plugin for working with the Inference Engine.

```
from openvino.inference_engine import IENetwork, IECore
```

I will go ahead and initialize a `plugin` variable with the `IECore` object now as well.

```
plugin = IECore()
```

Now, I can load the separate IR models into an `IENetwork` object, which I'll call `net`.

```
net = IENetwork(model=model_xml, weights=model_bin)
```

As discussed before, the `.xml` file of the IR is the model architecture, while the `.bin` file
contains weights and biases. `model_xml` and `model_bin` should be the paths to these
files.

Before loading this `net` into the `plugin`, we need to check whether all of the layers are
supported by the `plugin`. We can get the plugin's supported layers of the IR by using the
`.query_network()` function of an [`IECore`](https://docs.openvinotoolkit.org/latest/classie__api_1_1IECore.html):

```
supported_layers = plugin.query_network(network=net, device_name="CPU")
```

Note that the `device` argument here can also be `"GPU"`, `"FPGA"` or `"MYRIAD"`, depending
on what hardware is being used. `"MYRIAD"` is used for the Intel Neural Compute Stick.

Then, we can iterate through the layers in the `net` itself, to gather any unsupported layers.

```
unsupported_layers = [l for l in net.layers.keys() if l not in supported_layers]
```

If the length of the `unsupported_layers` list is not zero, the model is not going to be able
to run on this device with the Inference Engine. So, we should add a `print` statement 
or `log` to the console that an extension may be necessary, and then exit the program.

In our case, we will be able to run the included IRs by adding a CPU extension. For Linux
machines, like the classroom workspace, these are usually found in the following directory:
```
<OpenVINO install dir>/deployment_tools/inference_engine/lib/intel64
```

I hard-coded the location of the relevant CPU extension into the starter code. In this case,
it is the `libcpu_extension_sse4.so` CPU extension that you'll want to add.

Back in the `IECore` [documentation](https://docs.openvinotoolkit.org/latest/classie__api_1_1IECore.html),
we see there is an `.add_extension` function, which just takes the path to the CPU
extension and device name:

```
plugin.add_extension(cpu_extension, “CPU”)
```

Now, if we check for unsupported layers again, there should be zero that are unsupported.

Finally, we can  `.load_network` function from `IECore` to load the model into the Inference Engine:

```
plugin.load_network(net, “CPU”)
```

To make sure your implementation works appropriately, you should use each of the three
models in the workspace with `feed_network.py`, such as:

```bash
python feed_network.py -m /home/workspace/models/human-pose-estimation-0001.xml
```

Now, we're ready to start making inference requests, which we'll look at next.

### From Video


First, add the additional libraries (os may not be needed depending on how you get the model file names):
```
### Load the necessary libraries

import os
from openvino.inference_engine import IENetwork, IECore
```
Then, to load the Intermediate Representation and feed it to the Inference Engine:

def load_to_IE(model_xml):
    ### Load the Inference Engine API
    plugin = IECore()
```
    ### Load IR files into their related class
    model_bin = os.path.splitext(model_xml)[0] + ".bin"
    net = IENetwork(model=model_xml, weights=model_bin)

    ### Add a CPU extension, if applicable.
    plugin.add_extension(CPU_EXTENSION, "CPU")

    ### Get the supported layers of the network
    supported_layers = plugin.query_network(network=net, device_name="CPU")

    ### Check for any unsupported layers, and let the user
    ### know if anything is missing. Exit the program, if so.
    unsupported_layers = [l for l in net.layers.keys() if l not in supported_layers]
    if len(unsupported_layers) != 0:
        print("Unsupported layers found: {}".format(unsupported_layers))
        print("Check whether extensions are available to add to IECore.")
        exit(1)

    ### Load the network into the Inference Engine
    plugin.load_network(net, "CPU")

    print("IR successfully loaded into Inference Engine.")

    return
```
Note that a more optimal approach here would actually check whether a CPU extension was added as an argument by the user, but to keep things simple, I hard-coded it for the exercise.

### Running Your Implementation
You should make sure your implementation runs with all three pre-trained models we worked with earlier (and you are welcome to also try the models you converted in the previous lesson from TensorFlow, Caffe and ONNX, although your workspace may not have these stored). I placed these in the /home/workspace/models directory for easier use, and because the workspace will reset the `/opt directory` between sessions.

`python feed_network.py -m /home/workspace/models/human-pose-estimation-0001.xml`

You can run the other two by updating the model name in the above.



## 7. Sending Inference Requests to the IE


After you load the `IENetwork` into the `IECore`, you get back an `ExecutableNetwork`, which is what you will send inference requests to. There are two types of inference requests you can make: Synchronous and Asynchronous. There is an important difference between the two on whether your app sits and waits for the inference or can continue and do other tasks.

With an` ExecutableNetwork`, synchronous requests just use the `infer` function, while asynchronous requests begin with `start_async`, and then you can` wait` until the request is complete. These requests are `InferRequest` objects, which will hold both the input and output of the request.

We'll look a little deeper into the difference between synchronous and asynchronous on the next page.

### Further Research
- [Executable Network documentation](https://docs.openvinotoolkit.org/2019_R3/classie__api_1_1ExecutableNetwork.html)
- [Infer Request documentation](https://docs.openvinotoolkit.org/2019_R3/classie__api_1_1InferRequest.html)

## 8. Asynchronous Requests


### Synchronous
Synchronous requests will wait and do nothing else until the inference response is returned, blocking the main thread. In this case, only one frame is being processed at once, and the next frame cannot be gathered until the current frame’s inference request is complete.

### Asynchronous
You may have heard of asynchronous if you do front-end or networking work. In that case, you want to process things asynchronously, so in case the response for a particular item takes a long time, you don’t hold up the rest of your website or app from loading or operating appropriately.

Asynchronous, in our case, means other tasks may continue while waiting on the IE to respond. This is helpful when you want other things to still occur, so that the app is not completely frozen by the request if the response hangs for a bit.

Where the main thread was blocked in synchronous, asynchronous does not block the main thread. So, you could have a frame sent for inference, while still gathering and pre-processing the next frame. You can make use of the "wait" process to wait for the inference result to be available.

You could also use this with multiple webcams, so that the app could "grab" a new frame from one webcam while performing inference for the other.

![image.png](attachment:image.png)

**Comment:**  This is a simplistic example here, as really even in the synchronous case, you’d probably still want some of the app’s functionality to stick around. But, dealing with network calls to servers and APIs is a big use of asynchronous.

### Further Research
- For more on Synchronous vs. Asynchronous, check out this [helpful post](https://whatis.techtarget.com/definition/synchronous-asynchronous-API).
- You can also check out the [documentation](https://docs.openvinotoolkit.org/2019_R3/_docs_IE_DG_Integrate_with_customer_application_new_API.html) on integrating the inference engine into an application to see the different functions calls from an Inference Request for sync (`Infer`) vs. async (`StartAsync`).
- Lastly, for further practice with Asynchronous Inference Requests, you can check out this [useful demo](https://github.com/opencv/open_model_zoo/blob/master/demos/object_detection_demo_ssd_async/README.md). You’ll get a chance to practice with Synchronous and Asynchronous Requests in the upcoming exercise.



## 9. Exercise: Inference Requests

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-ceb2f99a" class="ulab-btn--primary"></button>

In the previous exercise, you loaded Intermediate Representations (IRs) into the Inference
Engine. Now that we've covered some of the topics around requests, including the difference
between synchronous and asynchronous requests, you'll add additional code to make
inference requests to the Inference Engine.

Given an `ExecutableNetwork` that is the IR loaded into the Inference Engine, your task is to:

1. Perform a synchronous request
2. Start an asynchronous request given an input image frame
3. Wait for the asynchronous request to complete

Note that we'll cover handling the results of the request shortly, so you don't need to worry
about that just yet. This will get you practice with both types of requests with the Inference
Engine.

You will perform the above tasks within `inference.py`. This will take three arguments,
one for the model, one for the test image, and the last for what type of inference request
should be made.

You can use `test.py` afterward to verify your code successfully makes inference requests.

## 10. Solution Inference Requests


To get started, let's check out the [documentation](https://docs.openvinotoolkit.org/latest/classie__api_1_1ExecutableNetwork.html) for `ExecutableNetwork`.

I noticed two functions that seem relevant to the task at hand: `infer` and `start_async`.

While the second function pretty clearly refers to an asynchronous request based on the name, 
we can see from the notes that `infer` is used for synchronous requests. In fact, it looks like 
that function will directly return our results.

### Synchronous Request

So, if we have an `ExecutableNetwork` called `exec_net`, to make a synchronous request, 
we can do this:

```
result = exec_net.infer({'data': frame})
```

So that one is pretty quick and easy; you just need to feed the image frame in and the
synchronous request is made, returning the image.

### Asynchronous Request

The asynchronous request is a two-parter. First, you use the `start_async` function, which is
going to return a `InferRequest` object. 

```
exec_net.start_async(request_id=request_id, inputs={input_blob: frame})
```

The `input_blob` here is just the input layer of the network (`next(iter(net.inputs))`).

Since it's asynchronous, your device could be used to go ahead and capture the next frame 
of input while the current one is having inference performed on it. Therefore, to continue 
processing with the current frame, the application will need to `wait` until the current inference 
request is finished.

You can find the [documentation](https://docs.openvinotoolkit.org/latest/classie__api_1_1InferRequest.html) for `wait`
within `InferRequest`'s documentation. Back in `ExecutableNetwork`, we can see it has
a class attribute of `requests`, where you can work with the current tuple of `InferRequests`.

To use `wait`, we want to feed in a `-1` as an argument, as that will make the process wait
until the inference results are available. Otherwise, we might try to extract and handle the
model's outputs, with nothing to use.

```
status = exec_net.requests[request_id].wait(-1)
```

We can use a `request_id` of `0` here as we don't have any other requests, but if you had
another request to make, you'd use a `1` for `request_id`. 

This will return a status code - if it's `0`, the inference request is complete, and the results
can now be extracted. Let's look at that next.

## From Video

### Synchronous Solution
```
def sync_inference(exec_net, input_blob, image):
    '''
    Performs synchronous inference
    Return the result of inference
    '''
    result = exec_net.infer({input_blob: image})

    return result
```
    
### Asynchronous Solution

```
def async_inference(exec_net, input_blob, image):
    '''
    Performs asynchronous inference
    Returns the `exec_net`
    '''
    exec_net.start_async(request_id=0, inputs={input_blob: image})
    while True:
        status = exec_net.requests[0].wait(-1)
        if status == 0:
            break
        else:
            time.sleep(1)
    return exec_net
```
I don't actually need `time.sleep()` here - using the `-1` with `wait()` is able to perform similar functionality.

### Testing
You can run the test file to check your implementations using inference on multiple models.
```
python test.py
```

## 11. Handling Results

You saw at the end of the previous exercise that the inference requests are stored in a `requests` attribute in the `ExecutableNetwork`. There, we focused on the fact that the `InferRequest` object had a `wait` function for asynchronous requests.

Each `InferRequest` also has a few attributes - namely, `inputs`, `outputs` and `latency`. As the names suggest, inputs in our case would be an image frame, outputs contains the results, and latency notes the inference time of the current request, although we won’t worry about that right now.

It may be useful for you to print out exactly what the `outputs` attribute contains after a request is complete. For now, you can ask it for the `data` under the `“prob”` key, or sometimes output_blob ([see related documentation](https://docs.openvinotoolkit.org/2019_R3/classInferenceEngine_1_1Blob.html)), to get an array of the probabilities returned from the inference request.

![image.png](attachment:image.png)

## 12. Integrating into Your App

In the upcoming exercise, you’ll put all your skills together, as well as adding some further customization to your app.

### Further Research
There’s a ton of great potential edge applications out there for you to build. Here are some examples to hopefully get you thinking:

- [Intel®’s IoT Apps Across Industries](https://www.intel.com/content/www/us/en/internet-of-things/industry-solutions.html)
- [Starting Your First IoT Project](https://hackernoon.com/the-ultimate-guide-to-starting-your-first-iot-project-8b0644fbbe6d)
- [OpenVINO™ on a Raspberry Pi and Intel® Neural Compute Stick](https://www.pyimagesearch.com/2019/04/08/openvino-opencv-and-movidius-ncs-on-the-raspberry-pi/s)

## 13. Exercise: Integrate into an App

## Integrate the Inference Engine in An Edge App

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-d44d77ce" class="ulab-btn--primary"></button>

You've come a long way from the first lesson where most of the code for working with
the OpenVINO toolkit was happening in the background. You worked with pre-trained models,
moved up to converting any trained model to an Intermediate Representation with the
Model Optimizer, and even got the model loaded into the Inference Engine and began making
inference requests.

In this final exercise of this lesson, you'll close off the OpenVINO workflow by extracting
the results of the inference request, and then integrating the Inference Engine into an existing
application. You'll still be given some of the overall application infrastructure, as more that of
will come in the next lesson, but all of that is outside of OpenVINO itself.

You will also add code allowing you to try out various confidence thresholds with the model,
as well as changing the visual look of the output, like bounding box colors.

Now, it's up to you which exact model you want to use here, although you are able to just
re-use the model you converted with TensorFlow before for an easy bounding box dectector.

Note that this application will run with a video instead of just images like we've done before.

So, your tasks are to:

1. Convert a bounding box model to an IR with the Model Optimizer.
2. Pre-process the model as necessary.
3. Use an async request to perform inference on each video frame.
4. Extract the results from the inference request.
5. Add code to make the requests and feed back the results within the application.
6. Perform any necessary post-processing steps to get the bounding boxes.
7. Add a command line argument to allow for different confidence thresholds for the model.
8. Add a command line argument to allow for different bounding box colors for the output.
9. Correctly utilize the command line arguments in #3 and #4 within the application.

When you are done, feed your model to `app.py`, and it will generate `out.mp4`, which you
can download and view. *Note that this app will take a little bit longer to run.* Also, if you need
to re-run inference, delete the `out.mp4` file first.

You only need to feed the model with `-m` before adding the customization; you should set
defaults for any additional arguments you add for the color and confidence so that the user
does not always need to specify them.

```bash
python app.py -m {your-model-path.xml}
```

## 14. Solution: Integrate into an App


## Integrate the Inference Engine

Let's step through the tasks one by one, with a potential approach for each.

> Convert a bounding box model to an IR with the Model Optimizer.

I used the SSD Mobilenet V2 architecture from TensorFlow from the earlier lesson here. Note
that the original was downloaded in a separate workspace, so I needed to download it again
and then convert it.

```
python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model frozen_inference_graph.pb --tensorflow_object_detection_api_pipeline_config pipeline.config --reverse_input_channels --tensorflow_use_custom_operations_config /opt/intel/openvino/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json
```

> Extract the results from the inference request

```
self.exec_network.requests[0].outputs[self.output_blob]
```

> Add code to make the requests and feed back the results within the application

```
self.exec_network.start_async(request_id=0, inputs={self.input_blob: image})
...
status = self.exec_network.requests[0].wait(-1)
```

> Add a command line argument to allow for different confidence thresholds for the model

I chose to use `-ct` as the argument name here, and added it to the existing arguments.

```
optional.add_argument("-ct", help="The confidence threshold to use with the bounding boxes", default=0.5)
```

I set a default of 0.5, so it does not need to be input by the user every time. 

> Add a command line argument to allow for different bounding box colors for the output

Similarly, I added the `-c` argument for inputting a bounding box color.
Note that in my approach, I chose to only allow "RED", "GREEN" and "BLUE", which also
impacts what I'll do in the next step; there are many possible approaches here.

```
optional.add_argument("-c", help="The color of the bounding boxes to draw; RED, GREEN or BLUE", default='BLUE')
```

> Correctly utilize the command line arguments in #3 and #4 within the application

Both of these will come into play within the `draw_boxes` function. For the first, a new line
should be added before extracting the bounding box points that check whether `box[2]`
(e.g. the probability of a given box) is above `args.ct` - assuming you have added 
`args.ct` as an argument passed to the `draw_boxes` function. If not, the box
should not be drawn. Without this, any random box will be drawn, which could be a ton of
very unlikely bounding box detections.

The second is just a small adjustment to the `cv2.rectangle` function that draws the 
bounding boxes we found to be above `args.ct`. I actually added a function to match
the different potential colors up to their RGB values first, due to how I took them in from the
command line:

```
def convert_color(color_string):
    '''
    Get the BGR value of the desired bounding box color.
    Defaults to Blue if an invalid color is given.
    '''
    colors = {"BLUE": (255,0,0), "GREEN": (0,255,0), "RED": (0,0,255)}
    out_color = colors.get(color_string)
    if out_color:
        return out_color
    else:
        return colors['BLUE']
```

I can also add the tuple returned from this function as an additional `color` argument to feed to
`draw_boxes`.

Then, the line where the bounding boxes are drawn becomes:

```
cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color, 1)
```

I was able to run my app, if I was using the converted TF model from earlier (and placed in the 
current directory), using the below:

```bash
python app.py -m frozen_inference_graph.xml
```

Or, if I added additional customization with a confidence threshold of 0.6 and blue boxes:

```bash
python app.py -m frozen_inference_graph.xml -ct 0.6 -c BLUE
```

[Note that I placed my customized app actually in `app-custom.py`]


## From Video

Note: There is one small change from the code on-screen for running on Linux machines versus Mac. On Mac, `cv2.VideoWriter` uses `cv2.VideoWriter_fourcc('M','J','P','G')` to write an `.mp4` file, while Linux uses `0x00000021`.

### Functions in `inference.py`

I covered the `async` and `wait` functions here as it's split out slightly differently than we saw in the last exercise.

First, it's important to note that output and input blobs were grabbed higher above when the network model is loaded:

self.input_blob = next(iter(self.network.inputs))
self.output_blob = next(iter(self.network.outputs))
From there, you can mostly use similar code to before:
```

    def async_inference(self, image):
        '''
        Makes an asynchronous inference request, given an input image.
        '''
        self.exec_network.start_async(request_id=0, 
            inputs={self.input_blob: image})
        return
```

```

    def wait(self):
        '''
        Checks the status of the inference request.
        '''
        status = self.exec_network.requests[0].wait(-1)
        return status
```
You can grab the network output using the appropriate `request` with the `output_blob` key:
```

    def extract_output(self):
        '''
        Returns a list of the results for the output layer of the network.
        '''
        return self.exec_network.requests[0].outputs[self.output_blob]
```
#### `app.py`
The next steps in `app.py`, before customization, are largely based on using the functions in `inference.py`:
```

    ### Initialize the Inference Engine
    plugin = Network()

    ### Load the network model into the IE
    plugin.load_model(args.m, args.d, CPU_EXTENSION)
    net_input_shape = plugin.get_input_shape()

    ...

        ### Pre-process the frame
        p_frame = cv2.resize(frame, (net_input_shape[3], net_input_shape[2]))
        p_frame = p_frame.transpose((2,0,1))
        p_frame = p_frame.reshape(1, *p_frame.shape)

        ### Perform inference on the frame
        plugin.async_inference(p_frame)

        ### Get the output of inference
        if plugin.wait() == 0:
            result = plugin.extract_output()
            ### Update the frame to include detected bounding boxes
            frame = draw_boxes(frame, result, args, width, height)
            # Write out the frame
            out.write(frame)
```
The `draw_boxes` function is used to extract the bounding boxes and draw them back onto the input image.
```

def draw_boxes(frame, result, args, width, height):
    '''
    Draw bounding boxes onto the frame.
    '''
    for box in result[0][0]: # Output shape is 1x1x100x7
        conf = box[2]
        if conf >= 0.5:
            xmin = int(box[3] * width)
            ymin = int(box[4] * height)
            xmax = int(box[5] * width)
            ymax = int(box[6] * height)
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (0, 0, 255), 1)
    return frame
```
### Customizing `app.py`
Adding the customization only took a few extra steps.

Parsing the command line arguments
First, you need to add the additional command line arguments:
```

    c_desc = "The color of the bounding boxes to draw; RED, GREEN or BLUE"
    ct_desc = "The confidence threshold to use with the bounding boxes"

    ...

    optional.add_argument("-c", help=c_desc, default='BLUE')
    optional.add_argument("-ct", help=ct_desc, default=0.5)
```
The names and descriptions here, and even how you use the default values, can be up to you.

### Handle the new arguments
I needed to also process these arguments a little further. This is pretty open based on your own implementation - since I took in a color string, I need to convert it to a BGR tuple for use as a OpenCV colors.
```

def convert_color(color_string):
    '''
    Get the BGR value of the desired bounding box color.
    Defaults to Blue if an invalid color is given.
    '''
    colors = {"BLUE": (255,0,0), "GREEN": (0,255,0), "RED": (0,0,255)}
    out_color = colors.get(color_string)
    if out_color:
        return out_color
    else:
        return colors['BLUE']
```
I then need to call this with the related argument, as well as make sure the confidence threshold argument is a float value.
```

    args.c = convert_color(args.c)
    args.ct = float(args.ct)
```
### Adding customization to `draw_boxes()`
The final step was to integrate these new arguments into my `draw_boxes()` function. I needed to make sure that the arguments are fed to the function:
```
frame = draw_boxes(frame, result, args, width, height)
```
and then I can use them where appropriate in the updated function.
```

def draw_boxes(frame, result, args, width, height):
    '''
    Draw bounding boxes onto the frame.
    '''
    for box in result[0][0]: # Output shape is 1x1x100x7
        conf = box[2]
        if conf >= args.ct:
            xmin = int(box[3] * width)
            ymin = int(box[4] * height)
            xmax = int(box[5] * width)
            ymax = int(box[6] * height)
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), args.c, 1)
    return frame
```
With everything implemented, I could run my app as such (given I re-used the previously converted TF model from the Model Optimizer lesson) if I wanted blue bounding boxes and a confidence threshold of `0.6`:

```
python app.py -m frozen_inference_graph.xml -ct 0.6 -c BLUE
```

## 15. Behind the Scenes of Inference Engine

I noted early on that the Inference Engine is built and optimized in C++, although that’s just the CPU version. There are some differences in what is actually occurring under the hood with the different devices. You are able to work with a shared API to interact with the Inference Engine, while largely being able to ignore these differences.

## Why C++?
Why is the Inference Engine built in C++, at least for CPUs? In fact, many different Computer Vision and AI frameworks are built with C++, and have additional Python interfaces. OpenCV and TensorFlow, for example, are built primarily in C++, but many users interact with the libraries in Python. C++ is faster and more efficient than Python when well implemented, and it also gives the user more direct access to the items in memory and such, and they can be passed between modules more efficiently.

C++ is compiled & optimized ahead of runtime, whereas Python basically gets read line by line when a script is run. On the flip side, Python can make it easier for prototyping and fast fixes. It’s fairly common then to be using a C++ library for the actual Computer Vision techniques and inferencing, but with the application itself in Python, and interacting with the C++ library via a Python API.

### Optimizations by Device
The exact optimizations differ by device with the Inference Engine. While from your end interacting with the Inference Engine is mostly the same, there’s actually separate plugins within for working with each device type.

CPUs, for instance, rely on the Intel® Math Kernel Library for Deep Neural Networks, or MKL-DNN. CPUs also have some extra work to help improve device throughput, especially for CPUs with higher numbers of cores.

GPUs utilize the Compute Library for Deep Neural Networks, or clDNN, which uses OpenCL within. Using OpenCL introduces a small overhead right when the GPU Plugin is loaded, but is only a one-time overhead cost. The GPU Plugin works best with FP16 models over FP32 models

Getting to VPU devices, like the Intel® Neural Compute Stick, there are additional costs associated with it being a USB device. It’s actually recommended to be processing four inference requests at any given time, in order to hide the costs of data transfer from the main device to the VPU.

![image.png](attachment:image.png)

### Further Research
- The best programming language for machine learning and deep learning is still being debated, but here’s a [great blog post](https://towardsdatascience.com/what-is-the-best-programming-language-for-machine-learning-a745c156d6b7) to give you some further background on the topic.

- You can check out the [Optimization Guide](https://docs.openvinotoolkit.org/2019_R3/_docs_optimization_guide_dldt_optimization_guide.html) for more on the differences in optimization between devices.

## 16. Recap

In this lesson we covered:

- Basics of the Inference Engine
- Supported Devices
- Feeding an Intermediate Representation to the Inference Engine
- Making Inference Requests
- Handling Results from the Inference Engine
- Integrating the Inference Model into an App

## 17. Lesson Glossary


### Inference Engine
Provides a library of computer vision functions, supports calls to other computer vision libraries such as OpenCV, and performs optimized inference on Intermediate Representation models. Works with various plugins specific to different hardware to support even further optimizations.

### Synchronous
Such requests wait for a given request to be fulfilled prior to continuing on to the next request.

### Asynchronous
Such requests can happen simultaneously, so that the start of the next request does not need to wait on the completion of the previous.

### IECore
The main Python wrapper for working with the Inference Engine. Also used to load an `IENetwork`, check the supported layers of a given network, as well as add any necessary CPU extensions.

### IENetwork
A class to hold a model loaded from an Intermediate Representation (IR). This can then be loaded into an `IECore` and returned as an `Executable Network`.

### ExecutableNetwork
An instance of a network loaded into an `IECore` and ready for inference. It is capable of both synchronous and asynchronous requests, and holds a tuple of `InferRequest` objects.

### InferRequest
Individual inference requests, such as image by image, to the Inference Engine. Each of these contain their inputs as well as the outputs of the inference request once complete.

# LESSON 5
## Deploying an Edge App
With the OpenVINO™ Toolkit fundamentals down, you’re ready to move onto more topics to get your edge app up and running. Learn about handling input streams, MQTT and more as you finish the course!

![image.png](attachment:image.png)


## 1. Introduction

In this lesson we'll cover:

- Basics of OpenCV
- Handling Input Streams in OpenCV
- Processing Model Outputs for Additional Useful Information
- The Basics of MQTT and their use with IoT devices
- Sending statistics and video streams to a server
- Performance basics
- And finish up by thinking about additional model use cases, as well as end user needs


## 2. OpenCV Basics

OpenCV is an open-source library for various image processing and computer vision techniques that runs on a highly optimized C++ back-end, although it is available for use with Python and Java as well. It’s often helpful as part of your overall edge applications, whether using it’s built-in computer vision techniques or handling image processing.

### Uses of OpenCV
There’s a lot of uses of OpenCV. In your case, you’ll largely focus on its ability to capture and read frames from video streams, as well as different pre-processing techniques, such as resizing an image to the expected input size of your model. It also has other pre-processing techniques like converting from one color space to another, which may help in extracting certain features from a frame. There are also plenty of computer vision techniques included, such as Canny Edge detection, which helps to extract edges from an image, and it extends even to a suite of different machine learning classifiers for tasks like face detection.

### Useful OpenCV function
- `VideoCapture` - can read in a video or image and extract a frame from it for processing
- `resize` is used to resize a given frame
- `cvtColor` can convert between color spaces.
    - You may remember from awhile back that TensorFlow models are usually trained with RGB images, while OpenCV is going to load frames as BGR. There was a technique with the Model Optimizer that would build the TensorFlow model to appropriately handle BGR. If you did not add that additional argument at the time, you could use this function to convert each image to RGB, but that’s going to add a little extra processing time.
- `rectangle` - useful for drawing bounding boxes onto an output image
- `imwrite` - useful for saving down a given image

See the link further down below for more tutorials on OpenCV if you want to dive deeper.

![image.png](attachment:image.png)

**Comment:** The only one of these it isn’t used for typically is the full suite of neural network training you find in the frameworks like TensorFlow and Caffe - it does have some features for [Neural Networks](https://docs.opencv.org/2.4/modules/ml/doc/neural_networks.html) available, but you’d usually use libraries for that.

### Further Research
OpenCV has some [pretty extensive tutorials](https://docs.opencv.org/master/d9/df8/tutorial_root.html) available if you want to dive deeper into this useful computer vision library. We'll look at some of the relevant material on handling camera and video inputs next.



## 3. Handling Input Streams

Being able to efficiently handle video files, image files, or webcam streams is an important part of an edge application. If I were to be running the webcam on my Macbook for instance and performing inference, a surprisingly large amount of resources get used up simply to use the webcam. That’s why it’s useful to utilize the OpenCV functions built for this - they are about as optimized for general use with input streams as you will find.

### Open & Read A Video
We saw the `cv2.VideoCapture` function in the previous video. This function takes either a zero for webcam use, or the path to the input image or video file. That’s just the first step, though. This “capture” object must then be opened with `capture.open`.

Then, you can basically make a loop by checking if `capture.isOpened`, and you can read a frame from it with `capture.read`. This read function can actually return two items, a boolean and the frame. If the boolean is false, there’s no further frames to read, such as if the video is over, so you should `break` out of the loop

### Closing the Capture
- Once there are no more frames left to capture, there’s a couple of extra steps to end the process with OpenCV.

- First, you’ll need to `release` the capture, which will allow OpenCV to release the captured file or stream
- Second, you’ll likely want to use `cv2.destroyAllWindows`. This will make sure any additional windows, such as those used to view output frames, are closed out
- Additionally, you may want to add a call to `cv2.waitKey` within the loop, and break the loop if your desired key is pressed. For example, if the key pressed is 27, that’s the Escape key on your keyboard - that way, you can close the stream midway through with a single button. Otherwise, you may get stuck with an open window that’s a bit difficult to close on its own.


## 4. Exercise: Handling Input Streams


Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-5de618db" class="ulab-btn--primary"></button>

It's time to really get in the think of things for running your app at the edge. Being able to
appropriately handle an input stream is a big part of having a working AI or computer vision
application. 

In your case, you will be implementing a function that can handle camera, video or webcam
data as input. While unfortunately the classroom workspace won't allow for webcam usage,
you can also try that portion of your code out on your local machine if you have a webcam
available.

As such, the tests here will focus on using a camera image or a video file. You will not need to
perform any inference on the input frames, but you will need to do a few other image
processing techniques to show you have some of the basics of OpenCV down.

Your tasks are to:

1. Implement a function that can handle camera image, video file or webcam inputs
2. Use `cv2.VideoCapture()` and open the capture stream
3. Re-size the frame to 100x100
4. Add Canny Edge Detection to the frame with min & max values of 100 and 200, respectively
5. Save down the image or video output
6. Close the stream and any windows at the end of the application

You won't be able to test a webcam input in the workspace unfortunately, but you can use
the included video and test image to test your implementations.


# Handling Input Streams - Solution

Let's walk through each of the tasks.

> Implement a function that can handle camera image, video file or webcam inputs

The main thing here is just to check the `input` argument passed to the command line.

This will differ by application, but in this implementation, the argument parser makes note
that "CAM" is an acceptable input meaning to use the webcam. In that case, the `input_stream`
should be set to `0`, as `cv2.VideoCapture()` can use the system camera when set to zero.

The next is checking whether the input name is a filepath containing an image file type, 
such as `.jpg` or `.png`. If so, you'll just set the `input_stream` to that path. You should also
set the flag here to note it is a single image, so you can save down the image as part of one
of the later steps.

The last one is for a video file. It's mostly the same as the image, as the `input_stream` is the
filepath passed to the `input` argument, but you don't need to use a flag here.

A last thing you should consider in your app here is exception handling - does your app just
crash if the input is invalid or missing, or does it still log useful information to the user?

> Use `cv2.VideoCapture()` and open the capture stream

```
capture = cv2.VideoCapture(input_stream)
capture.open(args.input)

while capture.isOpened():
    flag, frame = cap.read()
    if not flag:
        break
```

It's a bit outside of the instructions, but it's also important to check whether a key gets 
pressed within the while loop, to make it easier to exit. 

You can use:
```
key_pressed = cv2.waitKey(60)
```
to check for a key press, and then
```
if key_pressed == 27:
    break
```
to break the loop, if needed. Key 27 is the Escape button.

> Re-size the frame to 100x100

```
image = cv2.resize(frame, (100, 100))
```

> Add Canny Edge Detection to the frame with min & max values of 100 and 200, respectively

Canny Edge detection is useful for detecting edges in an image, and has been a useful
computer vision technique for extracting features. This was a step just so you could get a little
more practice with OpenCV.

```
edges = cv2.Canny(image,100,200)
```

> Display the resulting frame if it's video, or save it if it is an image

For video:
```
cv2.imshow('display', edges)
```
For a single image:
```
cv2.imwrite('output.jpg', edges)
```

> Close the stream and any windows at the end of the application

Make sure to close your windows here so you don't get stuck with them on-screen.

```
capture.release()
cv2.destroyAllWindows()
```

I can then test both an image and a video with the following:

```bash
python app.py -i blue-car.jpg
```

```bash
python app.py -i test_video.mp4
```
## From Video

**Note**: There are two small changes from the code on-screen for running on Linux machines versus Mac.

On Mac, `cv2.VideoWriter` uses` cv2.VideoWriter_fourcc('M','J','P','G')` to write an `.mp4` file, while Linux uses `0x00000021`.
On Mac, the output with the given code on using Canny Edge Detection will run fine. However, on Linux, you'll need to use np.dstack to make a 3-channel array to write back to the out file, or else the video won't be able to be opened correctly: `    frame = np.dstack((frame, frame, frame))`    
Let's walk through each of the tasks.

> Implement a function that can handle camera image, video file or webcam inputs

The main thing here is just to check the `input` argument passed to the command line.

This will differ by application, but in this implementation, the argument parser makes note that "CAM" is an acceptable input meaning to use the webcam. In that case, the `input_stream` should be set to `0`, as `cv2.VideoCapture()` can use the system camera when set to zero.

The next is checking whether the input name is a filepath containing an image file type, such as .jpg or .png. If so, you'll just set the input_stream to that path. You should also set the flag here to note it is a single image, so you can save down the image as part of one of the later steps.

The last one is for a video file. It's mostly the same as the image, as the input_stream is the filepath passed to the input argument, but you don't need to use a flag here.

A last thing you should consider in your app here is exception handling - does your app just crash if the input is invalid or missing, or does it still log useful information to the user?

> Use cv2.VideoCapture() and open the capture stream

```
capture = cv2.VideoCapture(input_stream)
capture.open(args.input)

while capture.isOpened():
    flag, frame = cap.read()
    if not flag:
        break
```

It's a bit outside of the instructions, but it's also important to check whether a key gets pressed within the while loop, to make it easier to exit.

You can use:

```
key_pressed = cv2.waitKey(60)
```

to check for a key press, and then

```
if key_pressed == 27:
    break
to break the loop, if needed. Key 27 is the Escape button.
```

> Re-size the frame to 100x100

```
image = cv2.resize(frame, (100, 100))
```

Add Canny Edge Detection to the frame with min & max values of 100 and 200, respectively

Canny Edge detection is useful for detecting edges in an image, and has been a useful computer vision technique for extracting features. This was a step just so you could get a little more practice with OpenCV.

```
edges = cv2.Canny(image,100,200)
Display the resulting frame if it's video, or save it if it is an image
```

For video:

```
cv2.imshow('display', edges)
For a single image:

cv2.imwrite('output.jpg', edges)
```

Close the stream and any windows at the end of the application

Make sure to close your windows here so you don't get stuck with them on-screen.

```
capture.release()
cv2.destroyAllWindows()
```

### Testing the Implementation
I can then test both an image and a video with the following:

```
python app.py -i blue-car.jpg
```

```
python app.py -i test_video.mp4
```

## 7. Gathering Useful Information from Model Outputs

Training neural networks focuses a lot on accuracy, such as detecting the right bounding boxes and having them placed in the right spot. But what should you actually do with bounding boxes, semantic masks, classes, etc.? How would a self-driving car make a decision about where to drive based solely off the semantic classes in an image?

It’s important to get useful information from your model - information from one model could even be further used in an additional model, such as traffic data from one set of days being used to predict traffic on another set of days, such as near to a sporting event.

For the traffic example, you’d likely want to count how many bounding boxes there are, but also make sure to only count once for each vehicle until it leaves the screen. You could also consider which part of the screen they come from, and which part they exit from. Does the left turn arrow need to last longer near to a big event, as all the cars seem to be heading in that direction?

In an earlier exercise, you played around a bit with the confidence threshold of bounding box detections. That’s another way to extract useful statistics - are you making sure to throw out low confidence predictions?


## 8. Exercise: Process Model Outputs

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-4fb9f776" class="ulab-btn--primary"></button>

Let's say you have a cat and two dogs at your house. 

If both dogs are in a room together, they are best buds, and everything is going well.

If the cat and dog #1 are in a room together, they are also good friends, and everything is fine.

However, if the cat and dog #2 are in a room together, they don't get along, and you may need
to either pull them apart, or at least play a pre-recorded message from your smart speaker
to tell them to cut it out.

In this exercise, you'll receive a video where some combination or the cat and dogs may be
in view. You also will have an IR that is able to determine which of these, if any, are on screen.

While the best model for this is likely an object detection model that can identify different
breeds, I have provided you with a very basic (and overfit) model that will return three classes,
one for one or less pets on screen, one for the bad combination of the cat and dog #2, and
one for the fine combination of the cat and dog #1. This is within the exercise directory - `model.xml`.

It is up to you to add code that will print to the terminal anytime the bad combination of the 
cat and dog #2 are detected together. **Note**: It's important to consider whether you really
want to output a warning *every single time* both pets are on-screen - is your warning helpful
if it re-starts every 30th of a second, with a video at 30 fps?

## 8. Solution: Process Model Outputs

My approach in this exercise was to check if the bad combination of pets was on screen,
but also to track whether I already warned them in the current incident. Now, I might also
consider re-playing the warning after a certain time period in a single consecutive incident,
but the provided video file does not really have that long of consecutive timespans.

I also output a "timestamp" by checking how many frames had been processed so far 
at 30 fps.

The next step of this, which we'll look at shortly, is how you could actually send this 
information over the Internet, so that you could get an alert or even stream the video,
if necessary.

As we get further into the lesson and consider the costs of streaming images and/or video
to a server, another consideration here could be that you also save down the video *only*
when you run into this problem situation. You could potentially have a running 30 second loop
as well stored on the local device that is constantly refreshed, but the leading 30 seconds is
stored anytime the problematic pet combination is detected.

To run the app, I just used:

```
python app.py -m model.xml
```

Since the model was provided here in the same directory.


### From Video

My approach in this exercise was to check if the bad combination of pets was on screen, but also to track whether I already warned them in the current incident. Now, I might also consider re-playing the warning after a certain time period in a single consecutive incident, but the provided video file does not really have that long of consecutive timespans. I also output a "timestamp" by checking how many frames had been processed so far at 30 fps.

Before the video loop, I added:
```
counter = 0
incident_flag = False
```
Within the loop, after a frame is read, I make sure to increment the counter: `counter+=1`.

I made an `assess_scene` function for most of the processing:
```
def assess_scene(result, counter, incident_flag):
    '''
    Based on the determined situation, potentially send
    a message to the pets to break it up.
    '''
    if result[0][1] == 1 and not incident_flag:
        timestamp = counter / 30
        print("Log: Incident at {:.2f} seconds.".format(timestamp))
        print("Break it up!")
        incident_flag = True
    elif result[0][1] != 1:
        incident_flag = False

    return incident_flag
```
And I call that within the loop right after the result is available:
```
incident_flag = assess_scene(result, counter, incident_flag)
```
### Running the App
To run the app, I just used:
```
python app.py -m model.xml
```
Since the model was provided here in the same directory.

## 9. Intro to MQTT

### MQTT
MQTT stands for MQ Telemetry Transport, where the MQ came from an old IBM product line called IBM MQ for Message Queues (although MQTT itself does not use queues). That doesn’t really give many hints about its use.

MQTT is a lightweight publish/subscribe architecture that is designed for resource-constrained devices and low-bandwidth setups. It is used a lot for Internet of Things devices, or other machine-to-machine communication, and has been around since 1999. Port 1883 is reserved for use with MQTT.

### Publish/Subscribe
In the publish/subscribe architecture, there is a broker, or hub, that receives messages published to it by different clients. The broker then routes the messages to any clients subscribing to those particular messages.

This is managed through the use of what are called “topics”. One client publishes to a topic, while another client subscribes to the topic. The broker handles passing the message from the publishing client on that topic to any subscribers. These clients therefore don’t need to know anything about each other, just the topic they want to publish or subscribe to.

MQTT is one example of this type of architecture, and is very lightweight. While you could publish information such as the count of bounding boxes over MQTT, you cannot publish a video frame using it. Publish/subscribe is also used with self-driving cars, such as with the Robot Operating System, or ROS for short. There, a stop light classifier may publish on one topic, with an intermediate system that determines when to brake subscribing to that topic, and then that system could publish to another topic that the actual brake system itself subscribes to.

### Further Research
- Visit the [main site](http://mqtt.org/) for MQTT
- A [helpful post](https://internetofthingsagenda.techtarget.com/definition/MQTT-MQ-Telemetry-Transport) on more of the basics of MQTT

## 10. Communicating with MQTT

There is a useful Python library for working with MQTT called `paho-mqtt`. Within, there is a sub-library called client, which is how you create an MQTT `client` that can publish or subscribe to the broker.

To do so, you’ll need to know the IP address of the broker, as well as the port for it. With those, you can connect the client, and then begin to either publish or subscribe to topics.

Publishing involves feeding in the topic name, as well as a dictionary containing a message that is dumped to JSON. Subscribing just involves feeding in the topic name to be subscribed to.

You’ll need the [documentation](https://pypi.org/project/paho-mqtt/) for `paho-mqtt` to answer the quiz below.

![image.png](attachment:image.png)

### Further Research
- As usual, documentation is your friend. Make sure to check out the documentation on [PyPi](https://pypi.org/project/paho-mqtt/) related to the `paho-mqtt` Python library if you want to learn more about its functionality.
- Intel® has a [pretty neat IoT tutorial](https://software.intel.com/en-us/SetupGateway-MQTT) on working with MQTT with Python you can check out as well.

## 11. Streaming Images to a Server

Sometimes, you may still want a video feed to be streamed to a server. A security camera that detects a person where they shouldn’t be and sends an alert is useful, but you likely want to then view the footage. Since MQTT can’t handle images, we have to look elsewhere.

At the start of the course, we noted that network communications can be expensive in cost, bandwidth and power consumption. Video streaming consumes a ton of network resources, as it requires a lot of data to be sent over the network, clogging everything up. Even with high-speed internet, multiple users streaming video can cause things to slow down. As such, it’s important to first consider whether you even need to stream video to a server, or at least only stream it in certain situations, such as when your edge AI algorithm has detected a particular event.

### FFmpeg
Of course, there are certainly situations where streaming video is necessary. The FFmpeg library is one way to do this. The name comes from “fast forward” MPEG, meaning it’s supposed to be a fast way of handling the MPEG video standard (among others).

In our case, we’ll use the `ffserver` feature of FFmpeg, which, similar to MQTT, will actually have an intermediate FFmpeg server that video frames are sent to. The final Node server that displays a webpage will actually get the video from that FFmpeg server.

There are other ways to handle streaming video as well. In Python, you can also use a flask server to do some similar things, although we’ll focus on FFmpeg here.

### Setting up FFmpeg
You can download FFmpeg from ffmpeg.org. Using `ffserver` in particular requires a configuration file that we will provide for you. This config file sets the port and IP address of the server, as well as settings like the ports to receive video from, and the framerate of the video. These settings can also allow it to listen to the system stdout buffer, which is how you can send video frames to it in Python.

### Sending frames to FFmpeg
With the `sys` Python library, can use `sys.stdout.buffer.write(frame)` and `sys.stdout.flush()` to send the frame to the `ffserver` when it is running.

If you have a `ffmpeg` folder containing the configuration file for the server, you can launch the `ffserver` with the following from the command line:
```
sudo ffserver -f ./ffmpeg/server.conf
```
From there, you need to actually pipe the information from the Python script to FFmpeg. To do so, you add the | symbol after the python script (as well as being after any related arguments to that script, such as the model file or CPU extension), followed by `ffmpeg` and any of its related arguments.

For example:
```
python app.py -m “model.xml” | ffmpeg -framerate 24
```
And so on with additional arguments before or after the pipe symbol depending on whether they are for the Python application or for FFmpeg.

### Further Research
We covered FFMPEG and ffserver, but as you may guess, there are also other ways to stream video to a browser. Here are a couple other options you can investigate for your own use:

- [Set up Your Own Server on Linux](https://opensource.com/article/19/1/basic-live-video-streaming-server)
- [Use Flask and Python](https://www.pyimagesearch.com/2019/09/02/opencv-stream-video-to-web-browser-html-page/)

## 12. Handling Statistics and Images from a Node Server

[Node.js](https://nodejs.org/en/about/) is an open-source environment for servers, where Javascript can be run outside of a browser. Consider a social media page, for instance - that page is going to contain different content for each different user, based on their social network. Node allows for Javascript to run outside of the browser to gather the various relevant posts for each given user, and then send those posts to the browser.

In our case, a Node server can be used to handle the data coming in from the MQTT and FFmpeg servers, and then actually render that content for a web page user interface.

### More on Front-End
Check out the [Front End Developer Nanodegree program](https://www.udacity.com/course/front-end-web-developer-nanodegree--nd0011) if you want to learn more of these skills!



## 13 Exercise: Server Communications

# Server Communications

Make sure to click the button below before you get started to source the correct environment.

<button id="ulab-button-66f8bc80" class="ulab-btn--primary"></button>

In this exercise, you will practice showing off your new server communication skills
for sending statistics over MQTT and images with FFMPEG.

The application itself is already built and able to perform inference, and a node server is set
up for you to use. The main node server is already fully ready to receive communications from
MQTT and FFMPEG. The MQTT node server is fully configured as well. Lastly, the ffserver is 
already configured for FFMPEG too.

The current application simply performs inference on a frame, gathers some statistics, and then 
continues onward to the next frame. 

## Tasks

Your tasks are to:

- Add any code for MQTT to the project so that the node server receives the calculated stats
  - This includes importing the relevant Python library
  - Setting IP address and port
  - Connecting to the MQTT client
  - Publishing the calculated statistics to the client
- Send the output frame (**not** the input image, but the processed output) to the ffserver

## Additional Information

Note: Since you are given the MQTT Broker Server and Node Server for the UI, you need 
certain information to correctly configure, publish and subscribe with MQTT.
- The MQTT port to use is 3001 - the classroom workspace only allows ports 3000-3009
- The topics that the UI Node Server is listening to are "class" and "speedometer"
- The Node Server will attempt to extract information from any JSON received from the MQTT server with the keys "class_names" and "speed"

## Running the App

First, get the MQTT broker and UI installed.

- `cd webservice/server`
- `npm install`
- When complete, `cd ../ui`
- And again, `npm install`

You will need *four* separate terminal windows open in order to see the results. The steps
below should be done in a different terminal based on number. You can open a new terminal
in the workspace in the upper left (File>>New>>Terminal).

1. Get the MQTT broker installed and running.
  - `cd webservice/server/node-server`
  - `node ./server.js`
  - You should see a message that `Mosca server started.`.
2. Get the UI Node Server running.
  - `cd webservice/ui`
  - `npm run dev`
  - After a few seconds, you should see `webpack: Compiled successfully.`
3. Start the ffserver
  - `sudo ffserver -f ./ffmpeg/server.conf`
4. Start the actual application. 
  - First, you need to source the environment for OpenVINO *in the new terminal*:
    - `source /opt/intel/openvino/bin/setupvars.sh -pyver 3.5`
  - To run the app, I'll give you two items to pipe in with `ffmpeg` here, with the rest up to you:
    - `-video_size 1280x720`
    - `-i - http://0.0.0.0:3004/fac.ffm`

Your app should begin running, and you should also see the MQTT broker server noting
information getting published.

In order to view the output, click on the "Open App" button below in the workspace.

## 14 Solution: Server Communications

Let's focus on MQTT first, and then FFmpeg.

### MQTT

First, I import the MQTT Python library. I use an alias here so the library is easier to work with.

```
import paho.mqtt.client as mqtt
```

I also need to `import socket` so I can connect to the MQTT server. Then, I can get the 
IP address and set the port for communicating with the MQTT server.

```
HOSTNAME = socket.gethostname()
IPADDRESS = socket.gethostbyname(HOSTNAME)
MQTT_HOST = IPADDRESS
MQTT_PORT = 3001
MQTT_KEEPALIVE_INTERVAL = 60
```

This will set the IP address and port, as well as the keep alive interval. The keep alive interval
is used so that the server and client will communicate every 60 seconds to confirm their
connection is still open, if no other communication (such as the inference statistics) is received.

Connecting to the client can be accomplished with:

```
client = mqtt.Client()
client.connect(MQTT_HOST, MQTT_PORT, MQTT_KEEPALIVE_INTERVAL)
```

Note that `mqtt` in the above was my import alias - if you used something different, that line
will also differ slightly, although will still use `Client()`.

The final piece for MQTT is to actually publish the statistics to the connected client.

```
topic = "some_string"
client.publish(topic, json.dumps({"stat_name": statistic}))
```

The topic here should match to the relevant topic that is being subscribed to from the other
end, while the JSON being published should include the relevant name of the statistic for
the node server to parse (with the name like the key of a dictionary), with the statistic passed
in with it (like the items of a dictionary).

```
client.publish("class", json.dumps({"class_names": class_names}))
client.publish("speedometer", json.dumps({"speed": speed}))
```

And, at the end of processing the input stream, make sure to disconnect.

```
client.disconnect()
```

### FFmpeg

FFmpeg does not actually have any real specific imports, although we do want the standard
`sys` library

```
import sys
```

This is used as the `ffserver` can be configured to read from `sys.stdout`. Once the output
frame has been processed (drawing bounding boxes, semantic masks, etc.), you can write
the frame to the `stdout` buffer and `flush` it.

```
sys.stdout.buffer.write(frame)  
sys.stdout.flush()
```

And that's it! As long as the MQTT and FFmpeg servers are running and configured
appropriately, the information should be able to be received by the final node server, 
and viewed in the browser.

To run the app itself, with the UI server, MQTT server, and FFmpeg server also running, do:

```
python app.py | ffmpeg -v warning -f rawvideo -pixel_format bgr24 -video_size 1280x720 -framerate 24 -i - http://0.0.0.0:3004/fac.ffm
```

This will feed the output of the app to FFmpeg.

### From Video

**Note:** You will need to use port 3001 in the workspace for MQTT within `app.py` instead of the standard port 1883.

Let's focus on MQTT first, and then FFmpeg.

### MQTT
First, I import the MQTT Python library. I use an alias here so the library is easier to work with.
```
import paho.mqtt.client as mqtt
```
I also need to `import socket` so I can connect to the MQTT server. Then, I can get the IP address and set the port for communicating with the MQTT server.
```
HOSTNAME = socket.gethostname()
IPADDRESS = socket.gethostbyname(HOSTNAME)
MQTT_HOST = IPADDRESS
MQTT_PORT = 3001
MQTT_KEEPALIVE_INTERVAL = 60
```
This will set the IP address and port, as well as the keep alive interval. The keep alive interval is used so that the server and client will communicate every 60 seconds to confirm their connection is still open, if no other communication (such as the inference statistics) is received.

**Note:** The port here is 3001, instead of the normal MQTT port of 1883, as our classroom workspace environment only allows ports from 3000-3009 to be used. The real importance is here to make sure this matches to what is set for the MQTT broker server to be listening on, which in this case has also been set to 3001 (you can see this in `config.js` within the MQTT server's files in the workspace).

Connecting to the client can be accomplished with:
```
client = mqtt.Client()
client.connect(MQTT_HOST, MQTT_PORT, MQTT_KEEPALIVE_INTERVAL)
```

Note that `mqtt` in the above was my import alias - if you used something different, that line will also differ slightly, although will still use `Client()`.

The final piece for MQTT is to actually publish the statistics to the connected client.
```
topic = "some_string"
client.publish(topic, json.dumps({"stat_name": statistic}))
```
The topic here should match to the relevant topic that is being subscribed to from the other end, while the JSON being published should include the relevant name of the statistic for the node server to parse (with the name like the key of a dictionary), with the statistic passed in with it (like the items of a dictionary).
```
client.publish("class", json.dumps({"class_names": class_names}))
client.publish("speedometer", json.dumps({"speed": speed}))
```
And, at the end of processing the input stream, make sure to disconnect.
```
client.disconnect()
```
FFmpeg
FFmpeg does not actually have any real specific imports, although we do want the standard `sys` library
```
import sys
```
This is used as the `ffserver` can be configured to read from `sys.stdout`. Once the output frame has been processed (drawing bounding boxes, semantic masks, etc.), you can write the frame to the `stdout` buffer and `flush` it.
```
sys.stdout.buffer.write(frame)  
sys.stdout.flush()
```
And that's it! As long as the MQTT and FFmpeg servers are running and configured appropriately, the information should be able to be received by the final node server, and viewed in the browser.

### Running the App
To run the app itself, with the UI server, MQTT server, and FFmpeg server also running, do:
```
python app.py | ffmpeg -v warning -f rawvideo -pixel_format bgr24 -video_size 1280x720 -framerate 24 -i - http://0.0.0.0:3004/fac.ffm
```
This will feed the output of the app to FFmpeg.

## 15 Analyzing Performance Basics

We’ve talked a lot about optimizing for inference and running apps at the edge, but it’s important not to skip past the accuracy of your edge AI model. Lighter, quicker models are helpful for the edge, and certain optimizations like lower precision that help with these will have some impact on accuracy, as we discussed earlier on.

No amount of skillful post-processing and attempting to extract useful data from the output will make up for a poor model choice, or one where too many sacrifices were made for speed.

Of course, it all depends on the exact application as to how much loss of accuracy is acceptable. Detecting a pet getting into the trash likely can handle less accuracy than a self-driving car in determining where objects are on the road.

The considerations of speed, size and network impacts are still very important for AI at the Edge. Faster models can free up computation for other tasks, lead to less power usage, or allow for use of cheaper hardware. Smaller models can also free up memory for other tasks, or allow for devices with less memory to begin with. We’ve also discussed some of the network impacts earlier. Especially for remote edge devices, the power costs of heavy network communication may significantly hamper their use,

Lastly, there can be other differences in cloud vs edge costs other than just network effects. While potentially lower up front, cloud storage and computation costs can add up over time. Data sent to the cloud could be intercepted along the way. Whether this is better or not at the edge does depend on a secure edge device, which isn’t always the case for IoT.

![image.png](attachment:image.png)

### Further Research
- We'll cover more on performance with the Intel® Distribution of OpenVINO™ Toolkit in later courses, but you can check out the [developer docs](https://docs.openvinotoolkit.org/2019_R3/_docs_IE_DG_Intro_to_Performance.html) here for a preview.
- Did you know [Netflix uses 15% of worldwide bandwidth](https://www.sandvine.com/hubfs/downloads/phenomena/phenomena-presentation-final.pdf) with its video streaming? Cutting down on streaming your video to the cloud vs. performing work at the edge can vastly cut down on network costs.

## 16. Model Use Cases

It’s important to think about any additional use cases for a given model or application you build, which can reach far beyond the original training set or intended use. For example, object detection can be used for so many things, and focusing on certain classes along with some post-processing can lead to very different applications.

## 17. Concerning End User Needs

If you are building an app for certain end users, it’s very important to consider their needs. Knowing their needs can inform the various trade-offs you make regarding model decisions (speed, size, accuracy, etc.), what information to send to servers, security of information, etc. If they have more resources available, you might go for a higher accuracy but more resource-intensive app, while an end user with remote, low-power devices will likely have to sacrifice some accuracy for a lighter, faster app, and need some additional considerations about network usage.

This is just to get you thinking - building edge applications is about more than just models and code.

## 18 Recap

In this lesson we covered:

- Basics of OpenCV
- Handling Input Streams in OpenCV
- Processing Model Outputs for Additional Useful Information
- The Basics of MQTT and their use with IoT devices
- Sending statistics and video streams to a server
- Performance basics
- Thinking about additional model use cases, as well as end user needs

## 19. Lesson Glossary

## OpenCV
A computer vision (CV) library filled with many different computer vision functions and other useful image and video processing and handling capabilities.

### MQTT
A publisher-subscriber protocol often used for IoT devices due to its lightweight nature. The paho-mqtt library is a common way of working with MQTT in Python.

### Publish-Subscribe Architecture
A messaging architecture whereby it is made up of publishers, that send messages to some central broker, without knowing of the subscribers themselves. These messages can be posted on some given “topic”, which the subscribers can then listen to without having to know the publisher itself, just the “topic”.

### Publisher
In a publish-subscribe architecture, the entity that is sending data to a broker on a certain “topic”.

### Subscriber
In a publish-subscribe architecture, the entity that is listening to data on a certain “topic” from a broker.

### Topic
In a publish-subscribe architecture, data is published to a given topic, and subscribers to that topic can then receive that data.

### FFmpeg
Software that can help convert or stream audio and video. In the course, the related `ffserver` software is used to stream to a web server, which can then be queried by a Node server for viewing in a web browser.

### Flask
A [Python framework](https://www.fullstackpython.com/flask.html) useful for web development and another potential option for video streaming to a web browser.

### Node Server
A web server built with Node.js that can handle HTTP requests and/or serve up a webpage for viewing in a browser.



## 20 Course Recap


You’ve accomplished something amazing! You went from the basics of AI at the Edge, built your skills with pre-trained models, the Model Optimizer, Inference Engine and more with the Intel® Distribution of OpenVINO™ Toolkit, and even learned more about deploying an app at the edge. Best of luck on the project, and I look forward to seeing what you build next!

Intel® DevMesh
Check out the [Intel® DevMesh](https://devmesh.intel.com/) website for some more awesome projects others have built, join in on existing projects, or even post some of your own!

Continuing with the Toolkit
If you want to learn more about OpenVINO, you can download the toolkit here:

[Download the Toolkit](https://software.intel.com/en-us/openvino-toolkit/choose-download?)
