# Agenda

## 1. [Introduction](#s1)

## 2. [OpenVINO™ Overview](#s6) - Kashchikhin

## 3. [OpenVINO™ Deep Learning Workbench](#s7) - Kashchikhin

## 4. [OpenVINO(TM) API](#s7) – Tugaryov

Object Detection sample: http://127.0.0.1:5665/jupyter/lab/tree/tutorials/object_detection_ssd/tutorial_object_detection_ssd.ipynb
W/o downloader and w/updated cells

## 5. [Practice](#s15) – Tugaryov

Task 1: apply pre-defined blur method to given image at inferred coordinates (photo with several faces)

Task 2: add blurring logic to pre-defined video processor

Task 3: replace each face on the photo with a smile with corresponding emotion

Task 4: emotional smile on the video

** Task 5: Telegram Bot - Smiler

# Intro

## Example #1

![](pictures/deep-learning.png)

Author: Mark Robins, Intel, [Link](https://www.intel.ru/content/www/ru/ru/artificial-intelligence/posts/deep-learning-training-and-inference.html)

## Example #2

![](pictures/use-cases.png)

Author: Mark Robins, Intel, [Link](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/difference-between-ai-machine-learning-deep-learning.html)

## Example #3

![](pictures/system.png)

Author: Mark Robins, Intel, [Link](https://www.intel.ru/content/www/ru/ru/artificial-intelligence/posts/deep-learning-training-and-inference.html)

## Key Concepts
1. Neural Network

Artificial neural networks (ANNs) are comprised of a node layers, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and a non-linear activation function.

2. Inference

Process of neural network execution: feeding data to the network and getting the results. 

3. Dataset

A dataset is a collection of data that can be treated by a neural network as a single unit for analytic and prediction purposes.

4. Optimization

To accelerate the inference of deep learning models by applying special methods without model retraining or fine-tuning, like post-training quantization.
The process of transforming the models that were trained in the floating-point precision into the models with integer representation with floating/fixed-point quantization operations between the layers.

5. Accuracy (?)

Measure for how good or bad a neural network solves its task. Accuracy could be represented by different metrics depending on the task. 

6. Deployment (?)

# OpenVINO™

OpenVINO™ toolkit is a comprehensive toolkit for quickly developing applications and solutions that solve a variety of tasks including emulation of human vision, automatic speech recognition, natural language processing, recommendation systems, and many others. Based on latest generations of artificial neural networks, including Convolutional Neural Networks (CNNs), recurrent and attention-based networks, the toolkit extends computer vision and non-vision workloads across Intel® hardware, maximizing performance. It accelerates applications with high-performance, AI and deep learning inference deployed from edge to cloud.

## General Information

![](pictures/about_vino.png)

## OpenVINO™ capabilities

![](pictures/openvino_toolkit.png)

## OpenVINO™ tools

![](pictures/additional_tools.png)

## Fine-tuned and optimized OpenVINO™ models
### For various tasks
Select a **pretrained** model or models suitable for your needs from the Open Model Zoo.

![](pictures/models.png)

# OpenVINO™ Deep Learning Workbench

Deep Learning Workbench (DL Workbench) is an official OpenVINO™ graphical interface designed to make the production of pretrained deep learning models significantly easier.

DL Workbench combines OpenVINO™ tools to assist you with the most commonly used tasks: import a model, analyze its performance and accuracy, visualize the outputs, optimize and prepare the model for deployment in a matter of minutes. DL Workbench will take you through the full OpenVINO™ workflow, providing the opportunity to learn about various toolkit components.

![](pictures/openvino_toolkit-dl-wb-highlighted.png)

## DL Workbench capabilities

![](pictures/openvino_dl_wb.png)

## DL Workbench Workflow
1. Select a model
    * Import a model
    * Download from Open Model Zoo
2. Create a project
    * Select a target environment
    * Select or create a dataset
    * Run inference
3. Optimize the model
    * Apply INT8 calibration
        * If performance is more important than accuracy: Default method
        * If accuracy is of paramount importance: Accuracy Aware method
4. Assess the quality
    * Analyze throughput and latency
    * Measure accuracy
    * Visualize model output
    * etc.
5. Find an optimal inference configuration (profile the model)
    * Find an optimal batch and stream combination
6. Prepare to deploy
    * Create a deployment package

## DL WB Workflow

### 1. Select/Import a model

#### From Open Model Zoo

![](pictures/import_model_wb_omz.png)

#### Upload a local model

![](pictures/upload-local-model.png)

### 2. Import or create a dataset

We have created a custom dataset which you can use during this workshop.

**TODO**: insert link.

1. Execute the following cell to download an archive with images.

In [None]:
!wget -O dataset.zip LINK_TO_DATASET

2. You should see a `dataset.zip` on the left in the file tree;

3. Download it on your machine (right-click + `Download`);
4. Unarchive it;
5. Go to DL Workbench;

![](pictures/validation_dataset_import.png)

6. Drag & drop the images from the archive to create a not-annotated dataset.

![](pictures/custom_dataset.png)

### 3. Create a project (Run inference)

![](pictures/create_project_selected.png)

![](pictures/dashboard-page.png)

### 4. Analyze the model

#### Performance

![](pictures/analyze.png)

#### Predictions

![](pictures/predictions.png)

### 5. Optimize the model

#### INT8 Quantization

![](pictures/calibration-int8.png)

#### Analyze the Improvements

![](pictures/dashboard-parent-vs-optimized.png)

### 6. Profile the model

To find the best parameters for the inference.

NOTE: Both original and optimized models could be profiled.

![](pictures/group_inference.png)

![](pictures/group_inference_results_01.png)

## DL WB Workflow recap

![](pictures/DL_WB_workflow.gif)

## App - Face Replacer
Face detection, Emotion recognition

#### (INSERT) Picture before - picture after

### Plan
#### Part 0 - Obtain a model
0. Go to DL WB
1. Find a suitable face-detection model 
2. Experiment with it, optimize, assess results
3. Export\download the model

#### Part 1 - OpenVINO Python API + minimal app

Show tutorial_object_detection for OV Python API

4. Prerequisites
    * copy the model path from DL WB
    * sample data (video) is placed in the folder with this notebook
5. OpenVINO Python API for work with neural networks
6. Image/video pre-processing with OpenCV
7. Neural network execution - Inference
8. Results processing
    * Describe the model and its output so that it is understandable how to post-process
9. Have a video with faces replaced

#### Part 2 - Enriching/Building upon the app / Adding new functionality

9. Prepare another neural network
10. Integrate new network in the app

OR - give a choice of either following the presenter with deployment of continuing with the emotion recognition

#### Part 3 - Deploy the app
11. Prepare deployment package\bundle with model and download it
    * Ubuntu - go to DL WB
    * non-Ubuntu - supply with os-specific bundles
12. Prepare platform
    * Copy/download the necessary assets (OpenVINO deployment package, model)
    * Prepare environment using setupvars
13. Prepare sample\application and a Telegram bot
    * Clone the repository with the template
    * Copy your code from the notebook and integrate in the template
13. Deploy
14. Enjoy


#### Demo of the completed bot\application in case of the out of time

# OpenVINO™ API

The purpose of this tutorial is to examine a sample application that was created using the [Intel® Distribution of Open Visual Inference & Neural Network Optimization (OpenVINO™) toolkit](https://software.intel.com/openvino-toolkit). This tutorial will go step-by-step through the necessary steps to demonstrate object detection on images. Object detection is performed using a pre-trained network and running it using the Intel® Distribution of OpenVINO™ toolkit Inference Engine.

Object Detection in Computer Vision is a task of finding objects and locating them in the image.

The tutorial guides you through the following steps:

1. [Import required modules](#1.-Import-Required-Modules) 
3. [Configure inference: path to a model and other data](#3.-Configure-an-Inference)
4. [Initialize the OpenVINO™ runtime](#4.-Initialize-the-OpenVINO™-Runtime)
5. [Read the model](#5.-Read-the-Model)
6. [Make the model executable](#6.-Make-the-Model-Executable)
7. [Prepare an image for model inference](#7.-Prepare-an-Image-for-Model-Inference)
8. [Infer the model](#8.-Infer-the-Model)
9. [Show predictions](#9.-Show-Predictions)

### 1. Import Required Modules

Import the Python* modules that you will use in the sample code:
- [pathlib](https://docs.python.org/3/library/os.html#module-os) is a standard Python module used for filename parsing.
- [cv2](https://docs.opencv.org/trunk/) is an OpenCV module used to work with images.
- [NumPy](http://www.numpy.org/) is an array manipulation module used to process images as arrays.
- [OpenVINO Inference Engine](https://docs.openvinotoolkit.org/latest/openvino_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide.html) is an OpenVINO™ Python API module used for inference.
- [IPython](https://ipython.readthedocs.io/en/stable/index.html) is an IPython API uused for showing images and videos in the notebook

Run the cell below to import the modules. 

In [None]:
import os
import cv2
import numpy as np
from openvino.inference_engine import IECore
from IPython.display import HTML, Image, display

### 3. Configure an Inference

Once you have the OpenVINO™ IR of your model, you can start experimenting with it by inferring it and inspecting its output. 

> **NOTE**: Copy the paths to the `.xml` and `.bin` files from the DL Workbench UI and paste them below.
#### Required parameters

Parameter| Explanation
---|---
**model_xml**| Path to the `.xml` file of OpenVINO™ IR of your model
**model_bin**| Path to the `.bin` file of OpenVINO™ IR of your model

In [None]:
# Model IR files
face_detection_model_xml = 'data/models/face-detection-adas-0001.xml'
face_detection_model_bin = 'data/models/face-detection-adas-0001.bin'

#### Optional Parameters

Experiment with optional parameters after you go the full workflow of the tutorial.

Parameter| Explanation
---|---
**input_image_path**| Path to an input image. Use the `car.bmp` image placed in the directory of the notebook or, if you have imported a dataset in the DL Workbench, copy the path to an image in the dataset.
**device**| Specify the [target device](https://docs.openvinotoolkit.org/latest/workbench_docs_Workbench_DG_Select_Environment.html) to infer on: CPU, GPU, or MYRIAD. Note that the device must be present. For this tutorial, use `CPU` which is known to be present.
**prob_threshold**| Probability threshold to filter detection results

In [None]:
# Input image file. 
input_image_path = 'data/input_image.JPG'

# Input video file
input_video_path = 'data/input.mp4'

# Output video file
output_video_path = 'data/output.mp4'

# Device to use
device = 'CPU'

# Minimum percentage threshold to detect an object
prob_threshold = 50

print(
f'''Configuration parameters settings:
    model_xml={face_detection_model_xml},
    model_bin={face_detection_model_bin},
    input_image_path={input_image_path},
    device={device}, 
    prob_threshold={prob_threshold}''',
)

### 4. Initialize the OpenVINO™ Runtime

Once you define the parameters, let's initiate the `IECore` object that accesses OpenVINO™ runtime capabilities.

In [None]:
# Create an Inference Engine instance
ie_core = IECore()

### 5. Read the Model

Put the IR of your model in the memory.

In [None]:
# Read the network from IR files
face_detection_network = ie_core.read_network(model=face_detection_model_xml, weights=face_detection_model_bin)

### 6. Make the Model Executable

Reading a network is not enough to start a model inference. The model must be loaded to a particular abstraction representing a particular accelerator. In OpenVINO™, this abstraction is called *plugin*. A network loaded to a plugin becomes executable and will be inferred in one of the next steps. 

After loading, we keep necessary model information such as names of input and output blobs: `input_blob` and `output_blob`. Let's remember the input dimensions of your model:
- `n` - input batch size
- `c` - number of input channels. Often, it is `1` or `3`, which means that the model expects either a grayscale or a color image.
- `h` - input image height
- `w` - input image width

In [None]:
face_detection_executable_network = ie_core.load_network(network=face_detection_network, device_name=device)

# Store names of input and output blobs
face_detection_input_blob = next(iter(face_detection_network.input_info))
face_detection_output_blob = next(iter(face_detection_network.outputs))

# Read the input dimensions: n=batch size, c=number of channels, h=height, w=width
face_detection_n, face_detection_c, face_detection_h, face_detection_w = face_detection_network.input_info[face_detection_input_blob].input_data.shape
print(f'Loaded the model into the Inference Engine for the {device} device.'), 
print(f'Face Detection model input dimensions: n={face_detection_n}, c={face_detection_c}, h={face_detection_h}, w={face_detection_w}')

### 7. Prepare an Image for Model Inference

Now let's read and prepare the input image by resizing and re-arranging its dimensions according to the input dimensions of the model.

In [None]:
# Define the function to load the input image
def load_input_image(input_path):   
    # Use OpenCV to load the input image
    image = cv2.imread(input_path)
    return image

# Define the function to pre-process the input image
def pre_process_input_image(image, n, c, h, w):
    # Resize the image dimensions from image to model input w x h
    in_frame = cv2.resize(image, (w, h))
    # Change data layout from HWC to CHW
    in_frame = in_frame.transpose((2, 0, 1))  
    # Reshape to input dimensions
    in_frame = in_frame.reshape((n, c, h, w))
    return in_frame

def show_images(image: np.ndarray):
    _, data = cv2.imencode('.jpg', image) 
    image = Image(data=data)
    display(image)

# Use OpenCV to load the input image
original_image = cv2.imread(input_image_path)
original_image_h, original_image_w, *_ = original_image.shape

# Resize the input image
input_frame = pre_process_input_image(original_image, face_detection_n, face_detection_c, face_detection_h, face_detection_w)

# Display the input image
show_images(original_image)

In [None]:
face_detection_inference_results = face_detection_executable_network.infer(
    inputs={
        face_detection_input_blob: input_frame
    }
)   

### 9. Show Predictions

The next step is to parse the inference results and draw boxes over the objects detected in the image.

A result of model inference (`res`) is an array of predictions. Each prediction `obj` has a following structure:

- `obj[1]`: class ID, or the type of a detected object
- `obj[2]`: Confidence level that currently detected object is an instance of the predicted class
- `obj[3]`: lower x coordinate of the detected object 
- `obj[4]`: lower y coordinate of the detected object
- `obj[5]`: upper x coordinate of the detected object
- `obj[6]`: upper y coordinate of the detected object

For each detected object, the output from the model will include an integer to indicate which type of the object, such as car or human, has been detected. To translate the integer into a more readable text string, use a label mapping file. The label mapping file is a text file of the format `n: string` (for example, `7: car`) that is loaded into a lookup table to be used later when labeling detected objects.

Now we have an image where every detected object is bounded with a box with class id and confidence level. To replace class ids with their names, you need a label mapping file. You can find the sample label mapping file in the current directory with the name `labels.txt`.

In [None]:
# Function to process inference results
def process_face_detection_results(original_image, results):
    processed_image = original_image
    # Get output results
    result = results[face_detection_output_blob]
    color = (12.5, 255, 255)
    original_input_h, original_input_w, *_ = original_image.shape
    
        
    # Loop through all possible results
    for face in result[0][0]:
        probability = round(face[2] * 100, 1)
        
        # If probability is more than the specified threshold, draw and label the box 
        if probability > prob_threshold:
            # Get coordinates of the box containing the detected object
            xmin = int(face[3] * original_input_w)
            ymin = int(face[4] * original_input_h)
            xmax = int(face[5] * original_input_w)
            ymax = int(face[6] * original_input_h)

            # Draw the box and label for the detected object
            cv2.rectangle(processed_image, (xmin, ymin), (xmax, ymax), color, 4)
            cv2.putText(processed_image, f'{probability} %', (xmin, ymin - 7), cv2.FONT_HERSHEY_COMPLEX, 1, color, 2)
    return processed_image

processed_image = process_face_detection_results(original_image, face_detection_inference_results)

show_images(processed_image)

# Practice

## Task 1: apply pre-defined blur method to given image at inferred coordinates

In [None]:
def blur_region(image: np.ndarray) -> np.ndarray:
    height, width = image.shape[:2]
    pixels_count = 16
    temp = cv2.resize(image, (pixels_count, pixels_count), interpolation=cv2.INTER_LINEAR)
    return cv2.resize(temp, (width, height), interpolation=cv2.INTER_NEAREST)

In [None]:
original_input_h, original_input_w, *_ = processed_image.shape

processed_image = original_image

face_detection_inference_result = face_detection_inference_results[face_detection_output_blob]

for detected_face in face_detection_inference_result[0][0]:
    confidence = round(detected_face[2] * 100, 1)
        
    # If confidence is more than the specified threshold, draw and label the box 
    if confidence > prob_threshold:
        
        # Get coordinates of the box containing the detected object
        xmin = int(detected_face[3] * input_w)
        ymin = int(detected_face[4] * input_h)
        xmax = int(detected_face[5] * input_w)
        ymax = int(detected_face[6] * input_h)

        face = original_image[ymin:ymax, xmin:xmax]
        processed_image[ymin:ymax, xmin:xmax] = blur_region(face)

show_images(processed_image)

## Task 2: add blurring logic to pre-defined video processor

In [None]:
input_video_stream = cv2.VideoCapture(input_video_path)

input_frame_width = int(input_video_stream.get(3))   # float `width`
input_frame_height = int(input_video_stream.get(4))  # float `height`

In [None]:
def prapare_out_video_stream(input_video_stream: cv2.VideoCapture, output_video_file_path: str) -> cv2.VideoWriter:
    width  = int(input_video_stream.get(3))
    height = int(input_video_stream.get(4))
    video_writer = cv2.VideoWriter(output_video_file_path, cv2.VideoWriter_fourcc(*'avc1'), 20, (width, height))
    return video_writer

output_video_stream = prapare_out_video_stream(input_video_stream, output_video_path)

In [None]:
while input_video_stream.isOpened():
    # Read the next frame from the intput video 
    finish, original_frame = input_video_stream.read()
    # Check if the video is over
    if not finish:
        # Exit from the loop if the video is over
        break 
    
    # Prepare frame for inference
    in_frame = pre_process_input_image(original_frame, face_detection_n, face_detection_c, face_detection_h, face_detection_w )
    
    
    face_detection_inference_results = face_detection_executable_network.infer(inputs={face_detection_input_blob: in_frame})  
    
    inference_result = face_detection_inference_results[face_detection_output_blob]

    for detected_face in inference_result[0][0]:
        probability = round(detected_face[2] * 100, 1)

        # If probability is more than the specified threshold, draw and label the box 
        if probability > prob_threshold:

            # Get coordinates of the box containing the detected object
            xmin = int(detected_face[3] * input_frame_width)
            ymin = int(detected_face[4] * input_frame_height)
            xmax = int(detected_face[5] * input_frame_width)
            ymax = int(detected_face[6] * input_frame_height)

            face = original_frame[ymin:ymax, xmin:xmax]
            original_frame[ymin:ymax, xmin:xmax] = blur_region(face)
    
    # Write the resulting frame to the output stream
    output_video_stream.write(original_frame)
    
input_video_stream.release()
# Save the resulting video
output_video_stream.release()

In [None]:
# Show a source video
HTML(f"""<video width="600" height="400" controls><source src="{output_video_path}" type="video/mp4"></video>""")

## Task 3: replace each face on the photo with a smile with corresponding emotion

What is the next step? Often from neural networks build pipelines. It is to use the results of the first neural network as an input for the next neural network. 
Let's try to build a pipeline from two networks:  first is finds a person on the video and the next to recognize the emotions of this person

We have already run the first network. And find the person on the video.
The next step is to find a network for emotion recognition.
There is a good neural network in the [OpenModelZOO](https://docs.openvinotoolkit.org/latest/omz_models_group_intel.html) - [emotions-recognition-retail-0003 network](https://docs.openvinotoolkit.org/latest/omz_models_model_emotions_recognition_retail_0003.html)


### Step 1: Download emotions-recognition-retail-0003 network
Run the Model Downloader eith needed arguments to download the emotions-recognition-retail-0003 network:

In [None]:
!python3 ~/intel/openvino_2021/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name emotions-recognition-retail-0003 --precision FP16 --output_dir data/model
!mv data/model/intel/emotions-recognition-retail-0003/FP16/emotions-recognition-retail-0003.* data/models/

In [None]:
# Model IR files
emotion_recognition_model_xml = 'data/models/emotions-recognition-retail-0003.xml'
emotion_recognition_model_bin = 'data/models/emotions-recognition-retail-0003.bin'

In [None]:
emotion_recognition_network = ie_core.read_network(emotion_recognition_model_xml, emotion_recognition_model_bin)

### Step 3: Load the network to a device

Use the instance of `IECore`.
The class `IECore` has a special function called `load_network`, which loads a network to a device.
This function prepares the network for the first inference on the device 
and returns an instance of the network prepared for an inference (execution). 
This function has many parameters, but in this case, you need to know only about two of them:
* `network` - instance of `IENetwork`
* `device_name` - string, contains a device name to infer a model on: CPU, GPU and so on.

In [None]:
emotion_recognition_network_loaded_on_device = ie_core.load_network(emotion_recognition_network, device)

### Step 4: Open the input video

In [None]:
input_video_stream = cv2.VideoCapture(input_video_path)

input_frame_width = int(input_video_stream.get(3))   # float `width`
input_frame_height = int(input_video_stream.get(4))  # float `height`

### Step 5: Create an output video stream

In [None]:
output_video_stream = prapare_out_video_stream(input_video_stream, output_video_path)

In [None]:
emotion_recognition_input_layer = next(iter(emotion_recognition_network.input_info))
emotion_recognition_input_blob = emotion_recognition_network.input_info[emotion_recognition_input_layer].input_data

print(f'Input layer of the emotions-recognition-retail-0003 is {emotion_recognition_input_layer}')

In [None]:
emotion_recognition_n, emotion_recognition_c, emotion_recognition_h, emotion_recognition_w = emotion_recognition_input_blob.shape

print(f'Input shape of the emotion recognition network is n = {emotion_recognition_n}, c={emotion_recognition_c}, h={emotion_recognition_h}, w={emotion_recognition_w}')

In [None]:
emotion_recognition_output_layer = next(iter(emotion_recognition_network.outputs))

### Step 6: Prepare a frame and run inference

In [None]:
def emotion_recognition_inference (face_frame: np.ndarray):
    prepared_frame = pre_process_input_image(face_frame, emotion_recognition_n, emotion_recognition_c, emotion_recognition_h, emotion_recognition_w)
    
    # Run the inference how you did it early
    inference_results = emotion_recognition_network_loaded_on_device.infer({
        emotion_recognition_input_layer: prepared_frame
    })
    
    # For understanding what is the result of inference this model, check documentation 
    # https://docs.openvinotoolkit.org/latest/_models_intel_emotions_recognition_retail_0003_description_emotions_recognition_retail_0003.html
    return inference_results[emotion_recognition_output_layer]

### Step 16: Drow boxes and emotions in a frame

In [None]:
def get_smile_by_index(emotion_inference_result: np.ndarray) -> np.ndarray:
    emotions = ['neutral', 'happy', 'sad', 'surprise', 'anger']
    emotion_index = np.argmax(emotion_inference_result.flatten()) 
    smile_path = f'./data/{emotions[emotion_index]}.png'
    return cv2.imread(smile_path, -1)

In [None]:
def emotion_recognition_inference_postpprocess(image, recognized_emotions, xmin, xmax, ymin, ymax):
    # Put the title to a frame
    w = xmax - xmin
    h = ymax - ymin
    
    smile = get_smile_by_index(recognized_emotions)
    resized_smile = cv2.resize(smile, (w, h))
    
    alpha_s = resized_smile[:, :, 3] / 255.0
    alpha_l = 1.0 - alpha_s
    for c in range(0, 3):
        image[ymin:ymax, xmin:xmax, c] = (alpha_s * resized_smile[:, :, c] + alpha_l * image[ymin:ymax, xmin:xmax, c])


In [None]:
# Load the image
original_image = load_input_image(input_image_path)
original_image_h, original_image_w, *_ = original_image.shape

# Resize the input image
in_frame = pre_process_input_image(original_image, face_detection_n, face_detection_c, face_detection_h, face_detection_w)

# Display the input image
print("Input image:")
show_images(original_image)

In [None]:
face_detection_inference_results = face_detection_executable_network.infer(inputs={face_detection_input_blob: in_frame})

face_detection_inference_result = face_detection_inference_results[face_detection_output_blob]
color = (12.5, 255, 255)
        
processed_image = original_image
# Loop through all possible results
for detected_face in face_detection_inference_result[0][0]:
    probability = round(detected_face[2] * 100, 1)

    # If probability is more than the specified threshold, draw and label the box 
    if probability > prob_threshold:
        # Get coordinates of the box containing the detected object
        xmin = int(detected_face[3] * original_image_w)
        ymin = int(detected_face[4] * original_image_h)
        xmax = int(detected_face[5] * original_image_w)
        ymax = int(detected_face[6] * original_image_h)

        face = original_image[ymin:ymax, xmin:xmax]
                
        recognized_emotions = emotion_recognition_inference(face)
        emotion_recognition_inference_postpprocess(processed_image, recognized_emotions, xmin, xmax, ymin, ymax)


In [None]:
show_images(processed_image)

### Step 17: Loop over frames in the input video

In [None]:
input_video_stream = cv2.VideoCapture(input_video_path)

input_frame_width = int(input_video_stream.get(3))   # float `width`
input_frame_height = int(input_video_stream.get(4))  # float `height`

In [None]:
output_video_stream = prapare_out_video_stream(input_video_stream, output_video_path)

In [None]:
while input_video_stream.isOpened():
    
    # Read the next frame from the intput video 
    finish, original_frame = input_video_stream.read()
    # Check if the video is over
    if not finish:
        # Exit from the loop if the video is over
        break 
    face_detection_frame = pre_process_input_image(original_frame, face_detection_n, face_detection_c, face_detection_h, face_detection_w)
    face_detection_inference_results = face_detection_executable_network.infer(inputs={face_detection_input_blob: face_detection_frame})

    face_detection_inference_result = face_detection_inference_results[face_detection_output_blob]

    # Loop through all possible results
    for detected_face in face_detection_inference_result[0][0]:
        probability = round(detected_face[2] * 100, 1)

        # If probability is more than the specified threshold, draw and label the box 
        if probability > prob_threshold:
            # Get coordinates of the box containing the detected object
            xmin = int(detected_face[3] * input_frame_width)
            ymin = int(detected_face[4] * input_frame_height)
            xmax = int(detected_face[5] * input_frame_width)
            ymax = int(detected_face[6] * input_frame_height)

            face = original_frame[ymin:ymax, xmin:xmax]

            recognized_emotions = emotion_recognition_inference(face)
            emotion_recognition_inference_postpprocess(original_frame, recognized_emotions, xmin, xmax, ymin, ymax)
    
    output_video_stream.write(original_frame)
    
input_video_stream.release()
# Save the resulting video
output_video_stream.release()

Now the person (Artyom) on the resulting video will be detected with emotion:

In [None]:
# Show a source video
HTML(f"""<video width="600" height="400" controls><source src="{output_video_path}" type="video/mp4"></video>""")