# Agenda

## 1. [Introduction](#s1)

## 2. [OpenVINO™ Overview](#s6) - Kashchikhin

## 3. [OpenVINO™ Deep Learning Workbench](#s7) - Kashchikhin

## 4. [OpenVINO(TM) API](#s7) – Tugaryov

Object Detection sample: http://127.0.0.1:5665/jupyter/lab/tree/tutorials/object_detection_ssd/tutorial_object_detection_ssd.ipynb
W/o downloader and w/updated cells

## 5. [Practice](#s15) – Tugaryov

Task 1: apply pre-defined blur method to given image at inferred coordinates (photo with several faces)

Task 2: add blurring logic to pre-defined video processor

Task 3: replace each face on the photo with a smile with corresponding emotion

Task 4: emotional smile on the video

** Task 5: Telegram Bot - Smiler

# 1. Intro

### What You Will Learn

Welcome to the workshop! ... We will use OpenVINO™ framework and its graphical interface Deep Learning Workbench to make your Deep Learning journey easy and exciting. 

During this workshop you will:

1. Learn the basics of neural model analysis and optimization:
    - what a model is and how it works
    - how to measure its performance and analyze the quality
    - how to tune the model for enhanced performance
2. Write your own AI application __________

And good luck from our team: 

### Why Deep Learning

Deep Learning is highly popular right now due to significant breakthroughs in the artificial neural networks area, which have encouraged businesses to use deep learning solutions as part of their strategy. From digital assistants and chatbots in customer service to object recognition in retail, and much more, Deep learning has enabled the development of various revolutionary AI applications. With Deep Learning, the algorithm does not need to be taught about the essential features. It can discover features from data on its own using a neural network. The exceptional performance of Deep Learning algorithms with difficult tasks involving massive quantities of data, coupled with the growing availability of pre-trained models, has made Deep Learning very appealing to many companies.
Example

### What Is Inference 

While training is the process of teaching a model to perform a particular AI task, Deep learning inference is the process of using a trained model to make predictions against previously unseen data. A trained model is often modified and simplified before being deployed. Some neural models can be large and complex, with hundreds of layers of artificial neurons and weights connecting them. The larger the model, the more memory and energy is consumed to run it, and the longer will be the response time (or latency) from when you input data to the model until you receive a result.

![](pictures/deep-learning.png)

Author: Mark Robins, Intel, [Link](https://www.intel.ru/content/www/ru/ru/artificial-intelligence/posts/deep-learning-training-and-inference.html)


But sometimes the use case requires that inference run very fast or at very low power. For example, a self-driving car must be able to detect and respond within milliseconds in order to avoid an accident.In such cases, there is a desire to simplify the model after training to reduce power and latency, even if this simplification results in a slight reduction in accuracy. There are several ways to optimize a model. Further, we will use optimization method called INT8 calibration, which involves reducing the numerical precision of the weights from, for example, 32-bit floating point numbers down to 8-bit, resulting in a reduced model size and faster computation.

Let's take a look at the inference on the real-life example of vehical detection model. The model running alongside the traffic camera constantly processes a video to detect vehicles entering the intersection. If a vehicle enters when the light is red, multiple images of the vehicle are captured and fed into the model, which finds an image that includes a license plate and transmits it for further processing. At the server, a first inference is run to localize the license plate in the image, and a second inference is run to read the characters on the license plate. Finally, the license plate information is sent to the data center where an application looks up the car’s owner based on the license plate and detects up potential traffic violations to be reviewed.

![](pictures/system.png)

Author: Mark Robins, Intel, [Link](https://www.intel.ru/content/www/ru/ru/artificial-intelligence/posts/deep-learning-training-and-inference.html)

# 2. OpenVINO™ Toolkit

The OpenVINO™ toolkit is a comprehensive toolkit for optimizing pretrained deep learning models of various use cases to achieve high performance and prepare them for deployment on Intel® platforms. Based on latest generations of artificial neural networks, including Convolutional Neural Networks (CNNs), recurrent and attention-based networks, the toolkit extends computer vision and non-vision workloads across Intel® hardware, maximizing performance. It accelerates applications with high-performance, AI and deep learning inference deployed from edge to cloud.

## Introducing OpenVINO

![](pictures/about_vino.png)

## OpenVINO™ Capabilities

![](pictures/openvino_toolkit.png)

## OpenVINO™ Toolkit

![](pictures/additional_tools.png)

## OpenVINO™ Deep Learning Models

A model is a network that has been trained over a set of data using a certain framework. Since Deep learning technologies are used in various industrial
applications, it is crucial to have an effective solution for each specific use case. OpenVINO Open Model Zoo provides a range of public and Intel pre-trained models to resolve a variety of different tasks, such as classification, object detection, segmentation and many others.

![](pictures/models.png)

## OpenVINO (Propaganda)

![](pictures/ov-repository.png)

[Link](https://github.com/openvinotoolkit/openvino)

-----------------

![](pictures/ov-coursera.png)

[Link](https://www.coursera.org/learn/intel-openvino)

-------------------

![](pictures/ov-summit.png)

[Source](https://blogs.intel.com/psg/openvino-toolkit-wins-vision-product-of-the-year-award-in-best-developer-tools-category-at-embedded-vision-summit/)

--------------------



# 3. Deep Learning Workbench: OpenVINO™ Quickstart

Deep Learning Workbench (DL Workbench) is the official OpenVINO™ graphical interface designed to make the production of pretrained deep learning models significantly easier.
With DL Workbench you can start working with your deep learning model right from your browser: import a model, analyze its performance and accuracy, visualize the outputs, optimize and prepare the model for deployment in a matter of minutes. 


![](pictures/openvino_toolkit-dl-wb-highlighted.png)

## DL Workbench capabilities

![](pictures/openvino_dl_wb.png)

## DL Workbench Workflow


### 1. Import a model

The first step is to import a model. You can either select a model from the Open Model Zoo or import your own model. Open Model Zoo contains a list of publicly available models and Intel pre-trained models that solve many use cases, such as classification, object detection, segmentation and many others. Once you decide which model solves your particular use case, select it and start importing. In case you have trained your model yourself or did not find the required model in the Open Model Zoo, you can upload the files and start the import. 

#### Import Open Model Zoo Model

![](pictures/import_model_wb_omz.png)

#### Upload Your Custom Model

![](pictures/upload-local-model.png)

### 2. Import Dataset

Then, we proceed by importing the dataset. Importing the validation dataset means that you need to upload the dataset of supported format suitable for your use case. The dataset can be either annotated or not annotated.
 1. Use the following link to download the dataset: [Link](https://github.com/dl-wb-experiments/face-hiding-workshop/files/7043878/dataset.zip).
2. Unarchive it;
3. Go to DL Workbench;ges.

![](pictures/validation_dataset_import.png)

4. Drag & drop the images from the archive to create a not annotated dataset.

![](pictures/custom_dataset.png)

### 3. Run Inference

Let’s create our first project that will be based on our model with the custom datasets. For that, our final step is to select the environment. Target is a machine that hosts one or several accelerators. Device is a hardware accelerator on which a model is executed. By default, your Local Target is selected. Once we create the project, the first baseline inference happens. During the baseline inference our model is evaluated from the performance perspective. 

![](pictures/create_project_selected.png)

![](pictures/dashboard-page.png)

### 4. Analyze the Model

When the inference stage is finished, we can see that our model can process the following number of frames per second with the corresponding latency. Latency is the time required to process one image. The lower the value, the better. Throughput is the number of images (feames) processed per second. Higher throughput value means better performance. Since our model is fully floating-point, we might benefit if we try to execute it in the integer precision. Let's check how our model works and proceed to optimize it. 

#### Performance

![](pictures/analyze.png)

#### Vizualize Predictions

DL Workbench enables you to visually estimate how well a model recognizes images by testing the model on particular sample images. This considerably enhances the analysis of inference results, giving you an opportunity not only to estimate the performance, but also to visually understand whether the model works correctly and the accuracy is tolerable for client applications.



![](pictures/predictions.png)

### 5. Optimize the Model

#### INT8 Calibration
One of the recommended ways to accelerate your model performance is to perform 8-bit integer (INT8) calibration.
A model in INT8 precision takes up less memory and has higher throughput capacity.
Often this performance boost is achieved at the cost of a small accuracy reduction. 

![](pictures/calibration-int8.png)

-------------------------------

![](pictures/int8-page.png)

#### Analyze the Improvements


![](pictures/dashboard-parent-vs-optimized.png)

### 6. Profile the Model

The inference of a network is the execution of a computational graph consisting of different operations. Internally, the execution resources are split/pinned into execution streams. 

Streams – number of requests running in parallel. Available cores are evenly distributed between the streams. 

The neural model can be organized in such a way that the number of input data can vary, which allows to simultaneously process a batch of input data in one pass through the neural network. This can significantly improve the performance.

Batch – number of images propagated to the network at a time.
During execution of a model, streams, as well as inference requests in a stream, can be distributed inefficiently among cores of hardware, which can reduce model speed. Some target and topology combinations work best with increased batch size and some with multiple parallel inference streams, while some suffer from those options. Predicting performance is extremely complicated; we recommend to try different batch and stream combinations to boost the performance.

![](pictures/explore-inference.png)

--------------------

![](pictures/inference-table.png)

Let’s recap briefly what you have learned at this stage:

1. What a model is and how it works
2. How to measure its performance
3. How to accelerate the model using INT8 calibration
4. How different options affect model performance 

Our next step is to apply this knowledge to build the ____ application. Before we proceed, we should determine our model location. For that, go to the Learn OpenVINO tab, select Model Inference with OpenVINO API and copy the model path.

## Copy Paths to the Model
TODO: Move to the next section

![](pictures/model-paths.png)

# OpenVINO™ API

The purpose of this tutorial is to examine a sample application that was created using the [Intel® Distribution of Open Visual Inference & Neural Network Optimization (OpenVINO™) toolkit](https://software.intel.com/openvino-toolkit). This tutorial will go step-by-step through the necessary steps to demonstrate object detection on images. Object detection is performed using a pre-trained network and running it using the Intel® Distribution of OpenVINO™ toolkit Inference Engine.

Object Detection in Computer Vision is a task of finding objects and locating them in the image.

The tutorial guides you through the following steps:

1. [Import required modules](#1.-Import-Required-Modules) 
3. [Configure inference: path to a model and other data](#3.-Configure-an-Inference)
4. [Initialize the OpenVINO™ runtime](#4.-Initialize-the-OpenVINO™-Runtime)
5. [Read the model](#5.-Read-the-Model)
6. [Make the model executable](#6.-Make-the-Model-Executable)
7. [Prepare an image for model inference](#7.-Prepare-an-Image-for-Model-Inference)
8. [Infer the model](#8.-Infer-the-Model)
9. [Show predictions](#9.-Show-Predictions)

### 1. Import Required Modules

Import the Python* modules that you will use in the sample code:
- [pathlib](https://docs.python.org/3/library/os.html#module-os) is a standard Python module used for filename parsing.
- [cv2](https://docs.opencv.org/trunk/) is an OpenCV module used to work with images.
- [NumPy](http://www.numpy.org/) is an array manipulation module used to process images as arrays.
- [OpenVINO Inference Engine](https://docs.openvinotoolkit.org/latest/openvino_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide.html) is an OpenVINO™ Python API module used for inference.
- [IPython](https://ipython.readthedocs.io/en/stable/index.html) is an IPython API uused for showing images and videos in the notebook

Run the cell below to import the modules. 

In [None]:
import os
import cv2
import numpy as np
from openvino.inference_engine import IECore
from IPython.display import HTML, Image, display

### 2. Configure an Inference

Once you have the OpenVINO™ IR of your model, you can start experimenting with it by inferring it and inspecting its output. 

> **NOTE**: Copy the paths to the `.xml` and `.bin` files from the DL Workbench UI and paste them below.
#### Required parameters

Parameter| Explanation
---|---
**model_xml**| Path to the `.xml` file of OpenVINO™ IR of your model
**model_bin**| Path to the `.bin` file of OpenVINO™ IR of your model

In [None]:
# Model IR files
face_detection_model_xml = 'data/models/face-detection-adas-0001.xml'
face_detection_model_bin = 'data/models/face-detection-adas-0001.bin'

#### Optional Parameters

Experiment with optional parameters after you go the full workflow of the tutorial.

Parameter| Explanation
---|---
**input_image_path**| Path to an input image. Use the `car.bmp` image placed in the directory of the notebook or, if you have imported a dataset in the DL Workbench, copy the path to an image in the dataset.
**device**| Specify the [target device](https://docs.openvinotoolkit.org/latest/workbench_docs_Workbench_DG_Select_Environment.html) to infer on: CPU, GPU, or MYRIAD. Note that the device must be present. For this tutorial, use `CPU` which is known to be present.
**prob_threshold**| Probability threshold to filter detection results - PERCENT!!!

In [None]:
# Input image file. 
input_image_path = 'data/input_image.JPG'

# Input video file
input_video_path = 'data/input.mp4'

# Output video file
output_video_path = 'data/output.mp4'

# Device to use
device = 'CPU'

# Minimum percentage threshold to detect an object
prob_threshold = 50

print(
f'''Configuration parameters settings:
    model_xml={face_detection_model_xml},
    model_bin={face_detection_model_bin},
    input_image_path={input_image_path},
    device={device}, 
    prob_threshold={prob_threshold}''',
)

### 3. Initialize the OpenVINO™ Runtime

Once you define the parameters, let's initiate the `IECore` object that accesses OpenVINO™ runtime capabilities.

In [None]:
# Create an Inference Engine instance
ie_core = IECore()

### 4. Read the Model

Put the IR of your model in the memory.

In [None]:
# Read the network from IR files
face_detection_network = ie_core.read_network(model=face_detection_model_xml, weights=face_detection_model_bin)

### 5. Make the Model Executable

Reading a network is not enough to start a model inference. The model must be loaded to a particular abstraction representing a particular accelerator. In OpenVINO™, this abstraction is called *plugin*. A network loaded to a plugin becomes executable and will be inferred in one of the next steps. 

After loading, we keep necessary model information such as names of input and output blobs: `input_blob` and `output_blob`. Let's remember the input dimensions of your model:
- `n` - input batch size
- `c` - number of input channels. Often, it is `1` or `3`, which means that the model expects either a grayscale or a color image.
- `h` - input image height
- `w` - input image width

In [None]:
# Store names of input and output blobs
face_detector_input_name = next(iter(face_detector.input_info))
face_detector_output = next(iter(face_detection_network.outputs))

# Read the input dimensions: n=batch size, c=number of channels, h=height, w=width
face_detection_network_input_shape = face_detection_network.input_info[face_detection_input_blob].input_data.shape
n, c, face_detector_input_height, face_detection_input_width = face_detection_network_input_shape
print(f'Face Detection model input dimensions: n={face_detection_input_batch}, c={face_detection_input_channels}, h={face_detection_input_height}, w={face_detection_input_width}')

## 6. Load the model to the device
>NOTE: need documentation https://docs.openvinotoolkit.org/latest/ie_python_api/classie__api_1_1IECore.html#ac9a2e043d14ccfa9c6bbf626cfd69fcc

In [None]:
print(f'Loaded the model into the Inference Engine for the {device} device.'), 
face_detection_executable_network = ie_core.load_network(network=face_detection_network, device_name=device)

### 6. Prepare an Image for Model Inference

Now let's read and prepare the input image by resizing and re-arranging its dimensions according to the input dimensions of the model.

In [None]:
# Define the function to load the input image
def load_input_image(input_path):   
    # Use OpenCV to load the input image
    image = cv2.imread(input_path)
    return image

# Define the function to pre-process (resize, transpose, ) the input image
def pre_process_input_image(image: np.ndarray, target_height: int, target_width: int) -> np.ndarray:
    # Resize the image dimensions from image to model input w x h
    resized_image = cv2.resize(image, (target_width, target_height))
    
    # Change data layout from HWC to CHW
    transposed_image = resized_image.transpose((2, 0, 1))
    
    n = 1 # Batch is always 1 in our case
    c = 3 # Channels is always 3 in our case
    
    # Reshape to input dimensions
    reshaped_image = transposed_image.reshape((n, c, target_height, target_width))
    return reshaped_image

def show_image(image: np.ndarray):
    _, data = cv2.imencode('.jpg', image) 
    image = Image(data=data)
    display(image)

In [None]:
# Use OpenCV to load the input image
original_image = cv2.imread(input_image_path)

# Prepare the image
input_frame = pre_process_input_image(original_image, target_height=face_detection_input_height, target_width=face_detection_input_width)

# Display the input image
show_images(original_image)

### 6. INFERENCE
NOW INFERENCE! Feeding the prepared image to the model

In [None]:
face_detection_inference_results = face_detection_executable_network.infer(
    inputs={
        face_detection_input_blob: input_frame
    }
)

A model can have many outputs, because of that the `infer` method returns the dictionary where keyes are names of output layers, the values - results of inference for each output layers

In [None]:
face_detection_inference_results = face_detection_inference_results[face_detection_output_blob]

### 7. Show Predictions

The next step is to parse the inference results and draw boxes over the objects detected in the image.

A result of model inference (`face_detection_inference_results`) is an array of predictions. Each prediction `object` has a following structure:

- `object[2]`: Confidence level that currently detected object is an instance of the predicted class
- `object[3]`: lower x coordinate of the detected object 
- `object[4]`: lower y coordinate of the detected object
- `object[5]`: upper x coordinate of the detected object
- `object[6]`: upper y coordinate of the detected object

In [None]:
def parse_face_detection_results(inference_results: np.ndarray, original_image_width: int, original_image_height: int) -> list:
    detected_faces = []
    
    for inference_result in inference_results[0][0]:
        confidence = round(inference_result[2] * 100, 1)

        # If confidence is more than the specified threshold, draw and label the box 
        if confidence > prob_threshold:

            # Get coordinates of the box containing the detected object
            xmin = int(inference_result[3] * original_image_width)
            ymin = int(inference_result[4] * original_image_height)
            xmax = int(inference_result[5] * original_image_width)
            ymax = int(inference_result[6] * original_image_height)

            detected_face = (xmin, ymin, xmax, ymax, confidence)
            detected_faces.append(detected_face)
            
    return detected_faces

In [None]:
# Function to process inference results
def post_process_face_detection(inference_results: np.ndarray, original_image: np.ndarray) -> np.ndarray:
    original_image_height, original_image_width, *_ = original_image.shape
    
    processed_image = original_image.copy()
    
    # Get output results
    color = (12.5, 255, 255)
    
    detected_faces = parse_face_detection_results(inference_results, original_image_width, original_image_height)
    # Loop through all possible results
    for detected_face in detected_faces:
        xmin, ymin, xmax, ymax, confidence = detected_face

        # Draw the box and label for the detected object
        cv2.rectangle(processed_image, (xmin, ymin), (xmax, ymax), color, 4)
        cv2.putText(processed_image, 
                    f'{confidence} %', (xmin, ymin - 7), 
                    cv2.FONT_HERSHEY_COMPLEX, 1, color, 2)
            
    return processed_image

In [None]:
processed_image = post_process_face_detection(face_detection_inference_results, original_image)

show_images(processed_image)

# Practice

## Task 1: Apply pre-defined blur method to given image at inferred coordinates

Define a function to blur an image

In [None]:
def blur(image: np.ndarray) -> np.ndarray:
    height, width = image.shape[:2]
    pixels_count = 16
    resized_image = cv2.resize(image, (pixels_count, pixels_count), interpolation=cv2.INTER_LINEAR)
    return cv2.resize(resized_image, (width, height), interpolation=cv2.INTER_NEAREST)

And one more function to process inference results of the face detection network. Use the `blur` function to blur a part of image with a face

In [None]:
def blur_postprocessing(face_detection_inference_result: np.ndarray, original_image: np.ndarray) -> np.ndarray:
    original_image_height, original_image_width,  _ = original_image.shape
    processed_image = original_image.copy()
    
    detected_faces = parse_face_detection_results(face_detection_inference_result, original_image_width, original_image_height)

    for detected_face in detected_faces:
        xmin, ymin, xmax, ymax, _ = detected_face
        face = original_image[ymin:ymax, xmin:xmax]
        processed_image[ymin:ymax, xmin:xmax] = blur(face)
            
    return processed_image

And now prepare a function to do inference. This function must prepare image for inference (use `pre_process_input_image` for this) and run inference of the image using  `face_detection_executable_network`

In [None]:
def face_detection_indeference(image: np.ndarray)-> np.ndarray:
    # 1. Prepare the image
    input_frame = pre_process_input_image(image, target_width=face_detection_input_width, target_height=face_detection_input_height)

    # 2. Infer the model
    face_detection_inference_results = face_detection_executable_network.infer(inputs={face_detection_input_blob: input_frame})  
    return face_detection_inference_results[face_detection_output_blob]

In [None]:
inference_result = face_detection_indeference(original_image)

# 3. Blur faces on image
processed_image = blur_postprocessing(inference_result, original_image)

show_images(processed_image)

## Task 2: Add blurring logic to pre-defined video processor

Inference of single image is done and the next task is process all frames of a video.

In [None]:
input_video_stream = cv2.VideoCapture(input_video_path)

In [None]:
def prepare_output_video_stream(input_video_stream: cv2.VideoCapture, output_video_file_path: str) -> cv2.VideoWriter:
    width  = int(input_video_stream.get(3))
    height = int(input_video_stream.get(4))
    video_writer = cv2.VideoWriter(output_video_file_path, cv2.VideoWriter_fourcc(*'avc1'), 20, (width, height))
    return video_writer

output_video_stream = prepare_output_video_stream(input_video_stream, output_video_path)

In [None]:
def face_detection_indeference(image: np.ndarray)-> np.ndarray:
    input_frame = pre_process_input_image(image, target_width=face_detection_input_width, target_height=face_detection_input_height)

    # 2. Infer the model
    face_detection_inference_results = face_detection_executable_network.infer(inputs={face_detection_input_blob: input_frame})  
    return face_detection_inference_results[face_detection_output_blob]

In [None]:
while input_video_stream.isOpened():
    # 1. Read the next frame from the input video 
    return_code, original_frame = input_video_stream.read()
    if not return_code:
        break
        
    inference_result = face_detection_indeference(original_frame)

    # 3. Blur faces on image
    processed_image = blur_postprocessing(inference_result, original_frame)
    
    # 3. Write the resulting frame to the output stream
    output_video_stream.write(processed_image)
    

input_video_stream.release()
# Save the resulting video
output_video_stream.release()

In [None]:
# Show a source video
HTML(f"""<video width="600" height="400" controls><source src="{output_video_path}" type="video/mp4"></video>""")

## Task 3: replace each face on the photo with a smile with corresponding emotion

### Step 1: Download a Pretrained Model from the Open Model Zoo

OpenVINO™ toolkit includes the [Model Optimizer](http://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) used to convert and optimize trained models into Intermediate Representation (IR) model files, and the [Inference Engine](http://docs.openvinotoolkit.org/latest/_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide.html), which uses the IR model files to run an inference on hardware devices. The IR model files are created from models trained in popular frameworks, like Caffe\*, TensorFlow\*, and others. 

OpenVINO™ [Model Downloader](http://docs.openvinotoolkit.org/latest/_tools_downloader_README.html) downloads common inference models from the [Intel® Open Model Zoo](https://github.com/opencv/open_model_zoo). 

Let's download the `emotions-recognition-retail-0003` model first.

In [None]:
!python3 ~/intel/openvino_2021/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name emotions-recognition-retail-0003 --precision FP16 --output_dir data/model

In [None]:
# Model IR files
emotion_recognition_model_xml = 'data/model/intel/emotions-recognition-retail-0003/FP16/emotions-recognition-retail-0003.xml'
emotion_recognition_model_bin = 'data/model/intel/emotions-recognition-retail-0003/FP16/emotions-recognition-retail-0003.bin'

In [None]:
# call ie_core.read_network to read the OpenVINO IR model
emotion_recognition_network = ie_core.read_network(emotion_recognition_model_xml, emotion_recognition_model_bin)

### Step 3: Load the network to a device

Use the instance of `IECore`.
The class `IECore` has a special function called `load_network`, which loads a network to a device.
This function prepares the network for the first inference on the device 
and returns an instance of the network prepared for an inference (execution). 
This function has many parameters, but in this case, you need to know only about two of them:
* `network` - instance of `IENetwork`
* `device_name` - string, contains a device name to infer a model on: CPU, GPU and so on.

In [None]:
emotion_recognition_network_loaded_on_device = ie_core.load_network(emotion_recognition_network, device)

In [None]:
emotion_recognition_input_layer = next(iter(emotion_recognition_network.input_info))
emotion_recognition_input_blob = emotion_recognition_network.input_info[emotion_recognition_input_layer].input_data

print(f'Input layer of the emotions-recognition-retail-0003 is {emotion_recognition_input_layer}')

In [None]:
emotion_recognition_input_batch, emotion_recognition_input_channels, emotion_recognition_input_height, emotion_recognition_input_width = emotion_recognition_input_blob.shape

print(f'Input shape of the emotion recognition network is n={emotion_recognition_input_batch}, c={emotion_recognition_input_channels}, h={emotion_recognition_input_height}, w={emotion_recognition_input_width}')

In [None]:
emotion_recognition_output_layer = next(iter(emotion_recognition_network.outputs))

### Step 6: Prepare a frame and run inference

In [None]:
def emotion_recognition_inference(face_frame: np.ndarray):
    prepared_frame = pre_process_input_image(face_frame, target_width=emotion_recognition_input_width, target_height=emotion_recognition_input_height)
    
    # Run the inference how you did it early
    inference_results = emotion_recognition_network_loaded_on_device.infer({
        emotion_recognition_input_layer: prepared_frame
    })
    
    # For understanding what is the result of inference this model, check documentation 
    # https://docs.openvinotoolkit.org/latest/_models_intel_emotions_recognition_retail_0003_description_emotions_recognition_retail_0003.html
    return inference_results[emotion_recognition_output_layer]

In [None]:
original_frame = load_input_image('./data/emotion.jpg')

# 2. Infer the model
face_detection_inference_results = emotion_recognition_inference(original_frame)

face_detection_inference_result = face_detection_inference_results.flatten()
emotions = ['neutral', 'happy', 'sad', 'surprise', 'anger']

show_images(original_frame)

print('Inference results:')
for index, prediction in enumerate(face_detection_inference_result):
    emotion = emotions[index]
    print(f'{emotion}:\t{prediction}')

### Step 16: Drow boxes and emotions in a frame

In [None]:
# Load the image
original_image = load_input_image(input_image_path)
original_image_height, original_image_width, *_ = original_image.shape

# Display the input image
print("Input image:")
show_images(original_image)

In [None]:
def get_smile_by_index(emotion_inference_result: np.ndarray) -> np.ndarray:
    emotions = ['neutral', 'happy', 'sad', 'surprise', 'anger']
    emotion_index = np.argmax(emotion_inference_result.flatten()) 
    smile_path = f'./data/{emotions[emotion_index]}.png'
    return cv2.imread(smile_path, -1)

In [None]:
def emotion_recognition_inference_postprocess(image: np.ndarray, recognized_emotions: np.ndarray, xmin:int, ymin:int, xmax:int, ymax:int):
    # Put the title to a frame
    width = xmax - xmin
    height = ymax - ymin
    
    smile = get_smile_by_index(recognized_emotions)
    resized_smile = cv2.resize(smile, (width, height))

    alpha_s = resized_smile[:, :, 3] / 255.0
    alpha_l = 1.0 - alpha_s
    for c in range(0, 3):
        image[ymin:ymax, xmin:xmax, c] = (alpha_s * resized_smile[:, :, c] + alpha_l * image[ymin:ymax, xmin:xmax, c])


In [None]:
processed_image = original_image.copy()
face_detection_inference_results = face_detection_indeference(original_image)
original_image_height, original_image_width, _ = original_image.shape

faces_coordinates = parse_face_detection_results(face_detection_inference_results, original_image_width, original_image_height)

for face_coordinates in faces_coordinates:
    xmin, ymin, xmax, ymax, confidence = face_coordinates
    face = original_image[ymin:ymax, xmin:xmax]

    emotion_predictions = emotion_recognition_inference(face)
    
    emotion_recognition_inference_postprocess(processed_image, emotion_predictions, xmin, ymin, xmax, ymax)

In [None]:
show_images(processed_image)

### Step 17: Loop over frames in the input video

In [None]:
input_video_stream = cv2.VideoCapture(input_video_path)

original_video_width = int(input_video_stream.get(3))
original_video_height = int(input_video_stream.get(4))

In [None]:
output_video_stream = prepare_output_video_stream(input_video_stream, output_video_path)

In [None]:
while input_video_stream.isOpened():
    # 1. Read the next frame from the input video 
    finish, original_frame = input_video_stream.read()
    if not finish:
        break
        
    # 2. apply face replacement from previous step
    face_detection_inference_results = face_detection_indeference(original_frame)
    
    faces_coordinates = parse_face_detection_results(face_detection_inference_results, original_video_width, original_video_height)

    for face_coordinates in faces_coordinates:

        xmin, ymin, xmax, ymax, _ = face_coordinates
        face = original_frame[ymin:ymax, xmin:xmax]

        emotion_predictions = emotion_recognition_inference(face)

        emotion_recognition_inference_postprocess(original_frame, emotion_predictions, xmin, ymin, xmax, ymax)
    # 3. Write the resulting frame to the output stream
    output_video_stream.write(original_frame)
    
input_video_stream.release()
# Save the resulting video
output_video_stream.release()

In [None]:
# Show a source video
HTML(f"""<video width="600" height="400" controls><source src="{output_video_path}" type="video/mp4"></video>""")