# Agenda

## 1. [Introduction](#s1)

## 2. [OpenVINO™ Overview](#s6)

## 3. [OpenVINO™ Deep Learning Workbench](#s7)

### 3.1 [DL Workbench Workflow](#s7)

## 5. [Practice](#s15)

# Intro

## Key Concepts
1. Neural Network

Artificial neural networks (ANNs) are comprised of a node layers, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and an activation function.

![](./pictures/neural_network.svg)

By Glosser.ca - Own work, [Link](https://commons.wikimedia.org/w/index.php?curid=24913461) CC BY-SA 3.0

2. Inference

Process of neural network execution: feeding data to the network and getting the results. 

3. Dataset

A dataset is a collection of data that can be treated by a neural network as a single unit for analytic and prediction purposes.

4. Optimization

To accelerate the inference of deep learning models by applying special methods without model retraining or fine-tuning, like post-training quantization.
The process of transforming the models that were trained in the floating-point precision into the models with integer representation with floating/fixed-point quantization operations between the layers.

5. Accuracy

Measure for how good or bad a neural network solves its task. Accuracy could represented by different metrics depending on the task. 

6. Deployment

## Simple Workflow

![](./pictures/infer.PNG)

# OpenVINO

OpenVINO™ toolkit is a comprehensive toolkit for quickly developing applications and solutions that solve a variety of tasks including emulation of human vision, automatic speech recognition, natural language processing, recommendation systems, and many others. Based on latest generations of artificial neural networks, including Convolutional Neural Networks (CNNs), recurrent and attention-based networks, the toolkit extends computer vision and non-vision workloads across Intel® hardware, maximizing performance. It accelerates applications with high-performance, AI and deep learning inference deployed from edge to cloud.

## General Information

![](pictures/about_vino.png)

## OpenVINO capabilities

![](pictures/openvino_toolkit.png)

## OpenVINO tools

![](pictures/additional_tools.png)

## Fine-tuned and optimized OpenVINO models
### For various tasks

![](pictures/models.png)

# [Deep Learning Workbench In Depth](./dl_workbench.ipynb)

There you can find the DL WB interface examples and a sample workflow

## App - Face Replacer
Face detection, Emotion recognition

#### (INSERT) Picture before - picture after

### Plan
#### Part 0 - Obtain a model
0. Go to DL WB
1. Find a suitable face-detection model 
2. Experiment with it, optimize, assess results
3. Export\download the model

#### Part 1 - OpenVINO Python API + minimal app

Show tutorial_object_detection for OV Python API

4. Prerequisites
    * copy the model path from DL WB
    * sample data (video) is placed in the folder with this notebook
5. OpenVINO Python API for work with neural networks
6. Image/video pre-processing with OpenCV
7. Neural network execution - Inference
8. Results processing
    * Describe the model and its output so that it is understandable how to post-process
9. Have a video with faces replaced

#### Part 2 - Enriching/Building upon the app / Adding new functionality

9. Prepare another neural network
10. Integrate new network in the app

OR - give a choice of either following the presenter with deployment of continuing with the emotion recognition

#### Part 3 - Deploy the app
11. Prepare deployment package\bundle with model and download it
    * Ubuntu - go to DL WB
    * non-Ubuntu - supply with os-specific bundles
12. Prepare platform
    * Copy/download the necessary assets (OpenVINO deployment package, model)
    * Prepare environment using setupvars
13. Prepare sample\application and a Telegram bot
    * Clone the repository with the template
    * Copy your code from the notebook and integrate in the template
13. Deploy
14. Enjoy


#### Demo of the completed bot\application in case of the out of time

# Face Hiding Workshop Practice

## Step 0. Preparation.

First of all we need to install requirements fo this workshop.
We prepared a specific package to process inference results of RetinaFace. In addition we need packages like numpy to work with tensors and IPython to show a video in the notebook

In [None]:
!pip install -r requirements.txt

Next step of preparation is set some constants. This is paths to input and result videos and the model.

In [None]:
from pathlib import Path

# Contains all data for the workshop
WORKSHOP_MODEL_PATH = Path('./data') / 'model'

# Path to the Inference Engine model
# But you can use the INT8 model instead
RETINA_FACE_MODEL_PATH_XML = WORKSHOP_MODEL_PATH / 'retinaface-resnet50-pytorch.xml'
RETINA_FACE_MODEL_PATH_BIN = WORKSHOP_MODEL_PATH / 'retinaface-resnet50-pytorch.bin'

DEVICE = 'CPU'

DATA_PATH = Path('./data')
INPUT_VIDEO = str(DATA_PATH / 'input.mp4')
OUTPUT_VIDEO = str(DATA_PATH / 'output.MP4')

Now let's show the input video

In [None]:
from IPython.display import HTML

# Show a source video
HTML(f"""<video width="600" height="400" controls><source src="{INPUT_VIDEO}" type="video/mp4"></video>""")

In [None]:
# Import OpenCV for work with a video and images
import cv2

# Import the Inference Engine
from openvino.inference_engine import IECore, IENetwork

# Import module for process inference results
from RetinaFacePostProcessing.retinaface_post_processing import RetinaFacePostPostprocessor

import numpy as np

The first our function is to create output video writer.

In [None]:
def prapare_out_video_stream(input_video_stream: cv2.VideoCapture, output_video_file_path: str) -> cv2.VideoWriter:
    width  = int(input_video_stream.get(3))
    height = int(input_video_stream.get(4))
    video_writer = cv2.VideoWriter(output_video_file_path, cv2.VideoWriter_fourcc(*'avc1'), 20, (width, height))
    return video_writer

### Step 1: Create an instance of the OpenVINO Inference Engine `IECore` class
This class represents an Inference Engine entity 
and allows you to manipulate plugins using unified interfaces. 

In [None]:
ie_core = IECore()

### Step 2: Read the prepared model

You need to create an instance of the IENetwork class.
A constructor of this class has two parameters: 
 1. path to the .xml file of the model 
 2. path to the .bin file of the model

In [None]:
retinaface_network = ie_core.read_network(RETINA_FACE_MODEL_PATH_XML, RETINA_FACE_MODEL_PATH_BIN)

### Step 3: Get the name of the input layer of the model

To infer a model, you need to know input layers of the model
The object `retinaface_network` contains information about inputs of the network in a property `input_info`,
which is a dictionary: key - name of the input layer, volume - representation of the input network.
In this case, you need to get the name and the blob of the input .`retinaface_input_name` should be a string, `retinaface_input_blob`  should be a `DataPtr`.

In [None]:
retinaface_input_name = next(iter(retinaface_network.input_info))
retinaface_input_blob = retinaface_network.input_info[retinaface_input_name].input_data

print(f'Input layer of the RetinaFace is {retinaface_input_name}')

### Step 3: Get shape (dimensions) of the input layer of the network

* n - number of batches
* c - number of input image channels (usualy 3 - R, G and B) 
* h - height
* w - width

In [None]:
retinaface_batch, retinaface_channels, retinaface_input_layer_h, retinaface_input_layer_w = retinaface_input_blob.shape

print(f'Input shape of the RetinaFace is [{retinaface_batch}, {retinaface_channels}, {retinaface_input_layer_h}, {retinaface_input_layer_w}]')

In [None]:
retinaface_output_blob = next(iter(retinaface_network.outputs))

### Step 4: Load the network to a device

Use the instance of `IECore`.
The class `IECore` has a special function called `load_network`, which loads a network to a device.
This function prepares the network for the first inference on the device 
and returns an instance of the network prepared for an inference (execution). 
This function has many parameters, but in this case, you need to know only about two of them:
* `network` - instance of `IENetwork`
* `device_name` - string, contains a device name to infer a model on: CPU, GPU and so on.

In [None]:
retinaface_loaded_to_device = ie_core.load_network(retinaface_network, DEVICE)

### Step 5: Open the input video

In [None]:
input_video_stream = cv2.VideoCapture(INPUT_VIDEO)

### Step 6: PreProcessing 

In [None]:
def face_detection_pre_processing(input_frame: np.ndarray, batch: int, channels: int, input_layer_height: int, input_layer_width: int) -> np.ndarray:
    # Resize the frame to the network input 
    resized_frame = cv2.resize(input_frame, (input_layer_width, input_layer_height))
    
    # Change the data layout from HWC to CHW
    transposed_frame = resized_frame.transpose((2, 0, 1))  
    
    # Reshape the frame to the network input 
    reshaped_frame = transposed_frame.reshape((batch, channels, input_layer_height, input_layer_width))
    
    return reshaped_frame

## Step 7: Inference

In [None]:
def face_detection_inference(input_frame: np.ndarray) -> np.ndarray:
    feed_dict = {
        retinaface_input_name: input_frame
    }
    
    # All is ready for the main thing - inference!
    # You have read and loaded the network to the device, prepared input data and now you are ready to infer.
    
    # Step 11:
    # To start an inference, call the `infer` function of the `network_loaded_to_device` variable. 
    # We must set input data (a dictionary).
    inference_result = retinaface_loaded_to_device.infer(feed_dict)
    
    # Great! The `inference_result` variable contains output data after inference of the network.
    # `inference_result` is a dictionary, 
    #  where key is the name of the output name, 
    #        value is data from the blob.
    
    return inference_result

### Step 9: Prepare for post-processing

In [None]:
# Create Output video stream
output_video_stream = prapare_out_video_stream(input_video_stream, OUTPUT_VIDEO)

# Get input height and width
input_frame_width = int(input_video_stream.get(3))   # float `width`
input_frame_height = int(input_video_stream.get(4))  # float `height`

# create postprocessor
postprocessor = RetinaFacePostPostprocessor(origin_image_size=[input_frame_width, input_frame_height], 
                                            input_image_size=[retinaface_input_layer_w, retinaface_input_layer_h])

### Step 8: Function for processing inference results

In [None]:
face_to_swap = cv2.imread('./data/neutral.png')

In [None]:
def draw_boxes_around_face_in_frame(original_frame: np.ndarray, face_box: np.ndarray):       


    # Step 14: Draw bounding boxes
    # Draw a bounding box only for objects the confidence of which is greater than a specified threshold
    # Get coordinates of a discovered object
    xmin = int(face_box[0])
    ymin = int(face_box[1])

    xmax = int(face_box[2])
    ymax = int(face_box[3])
    
    # Step 13: Get the confidence for a discovered object
    confidence =  face_box[4]
    
    w = xmax - xmin
    h = ymax - ymin
    
    resized_face_to_swap = cv2.resize(face_to_swap, (w, h))
    
    original_frame[ymin:ymax, xmin:xmax] = resized_face_to_swap
    
    
    # Get confidence for a discovered object
    confidence = round(confidence * 100, 1)
    
    # Draw a box and a label
    color = (0, 255, 0)
    
    # Create the title of an object
    text = f'{confidence}%'

    # Put the title to a frame
    cv2.putText(original_frame, text, (xmin, ymin - 7), cv2.FONT_HERSHEY_COMPLEX, 2, color, 2)

In [None]:
def add_face_detection_inference_result_in_frame(original_frame: np.ndarray, inference_result: np.ndarray):       
    detected_faces = postprocessor.process_output(inference_result)
    
    for detected_face in detected_faces:
        # Step 13: Get the confidence for a discovered object
        draw_boxes_around_face_in_frame(original_frame, detected_face)

### Step 10: Loop over frames in the input video

In [None]:
while input_video_stream.isOpened():
    # Read the next frame from the intput video 
    ret, frame = input_video_stream.read()
    # Check if the video is over
    if not ret:
        # Exit from the loop if the video is over
        break 
    
    # Prepare frame for inference
    in_frame = face_detection_pre_processing(frame, retinaface_batch, retinaface_channels, retinaface_input_layer_h, retinaface_input_layer_w)
    
    
    inferece_result = face_detection_inference(in_frame)
    
    add_face_detection_inference_result_in_frame(frame, inferece_result)
    
    # Write the resulting frame to the output stream
    output_video_stream.write(frame)
    
input_video_stream.release()
# Save the resulting video
output_video_stream.release()

In [None]:
from IPython.display import HTML

# Show a source video
HTML(f"""<video width="600" height="400" controls><source src="{OUTPUT_VIDEO}" type="video/mp4"></video>""")

Do you see boxes in the video? 
If yes, you did all right!
**Good Work!** 

## Section 16: Practice (Part 2)

What is the next step? Often from neural networks build pipelines. It is to use the results of the first neural network as an input for the next neural network. 
Let's try to build a pipeline from two networks:  first is finds a person on the video and the next to recognize the emotions of this person

We have already run the first network. And find the person on the video.
The next step is to find a network for emotion recognition.
There is a good neural network in the [OpenModelZOO](https://docs.openvinotoolkit.org/2019_R1/_docs_Pre_Trained_Models.html) - [emotions-recognition-retail-0003 network](https://docs.openvinotoolkit.org/2019_R1/_emotions_recognition_retail_0003_description_emotions_recognition_retail_0003.html)

### Step 1: Download emotions-recognition-retail-0003 network
Run the Model Downloader eith needed arguments to download the emotions-recognition-retail-0003 network:

In [None]:
!python3 ~/intel/openvino_2021/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name emotions-recognition-retail-0003 --precision FP16 --output_dir data/model

This mode already is in OpenVINO format and you do not need to convert it.

After downloading the model you can use it:

### Step 2: Read the prepared model
The IENetwork class is designed to work with a model in the Inference Engine. This class contains information about the network model read from the Intermediate Representation and allows you to manipulate some model parameters such as layers affinity and output layers.

You need to create an instance of the IENetwork class. A constructor of this class has two parameters:

path to the .xml file of the model
path to the .bin file of the model

In [None]:
emotion_recognition_network = ie_core.read_network('data/model/intel/emotions-recognition-retail-0003/FP16/emotions-recognition-retail-0003.xml', 'data/model/intel/emotions-recognition-retail-0003/FP16/emotions-recognition-retail-0003.bin')

### Step 3: Load the network to a device

Use the instance of `IECore`.
The class `IECore` has a special function called `load_network`, which loads a network to a device.
This function prepares the network for the first inference on the device 
and returns an instance of the network prepared for an inference (execution). 
This function has many parameters, but in this case, you need to know only about two of them:
* `network` - instance of `IENetwork`
* `device_name` - string, contains a device name to infer a model on: CPU, GPU and so on.

In [None]:
emotion_recognition_network_loaded_on_device = ie_core.load_network(emotion_recognition_network, 'CPU')

### Step 4: Open the input video

In [None]:
input_video_stream = cv2.VideoCapture(INPUT_VIDEO)

### Step 5: Create an output video stream

In [None]:
output_video_stream = prapare_out_video_stream(input_video_stream, OUTPUT_VIDEO)

In [None]:
emotion_recognition_input_layer = next(iter(emotion_recognition_network.input_info))
emotion_recognition_input_blob = emotion_recognition_network.input_info[emotion_recognition_input_layer].input_data

print(f'Input layer of the emotions-recognition-retail-0003 is {emotion_recognition_input_layer}')

In [None]:
emotion_recognition_batch, emotion_recognition_channels, emotion_recognition_input_layer_h, emotion_recognition_input_layer_w = emotion_recognition_input_blob.shape

print(f'Input shape of the RetinaFace is [{emotion_recognition_batch}, {emotion_recognition_channels}, {emotion_recognition_input_layer_h}, {emotion_recognition_input_layer_w}]')

In [None]:
emotion_recognition_output_layer = next(iter(emotion_recognition_network.outputs))

### Step 6: Prepare a frame and run inference

In [None]:
def emotion_infer(face):
    # Resize the frame to the network input 
    resized_frame = cv2.resize(face, (emotion_recognition_input_layer_w, emotion_recognition_input_layer_h))
    
    # Change the data layout from HWC to CHW
    transposed_frame = resized_frame.transpose((2, 0, 1))  
    
    # Reshape the frame to the network input 
    reshaped_frame = transposed_frame.reshape((emotion_recognition_batch, emotion_recognition_channels, emotion_recognition_input_layer_h, emotion_recognition_input_layer_w))

    # Run the inference how you did it early
    inference_results = emotion_recognition_network_loaded_on_device.infer({
        emotion_recognition_input_layer: reshaped_frame
    })
    # For understanding what is the result of inference this model, check documentation 
    # https://docs.openvinotoolkit.org/latest/_models_intel_emotions_recognition_retail_0003_description_emotions_recognition_retail_0003.html
    return inference_results[emotion_recognition_output_layer]

### Step 16: Drow boxes and emotions in a frame

In [None]:
def get_smile_by_index(emotion_inference_result: np.ndarray) -> np.ndarray:
    emotions = ['neutral', 'happy', 'sad', 'surprise', 'anger']
    emotion_index = np.argmax(emotion_inference_result.flatten()) 
    smile_path = f'./data/{emotions[emotion_index]}.png'
    return cv2.imread(smile_path)

In [None]:
def emotion_recognition_inference_postpprocess(original_frame, detected_face, emotion_result, x_limits, y_limits):
    smile = get_smile_by_index(emotion_result)
    # Put the title to a frame
    w = x_limits[1] - x_limits[0]
    h = y_limits[1] - y_limits[0]

    resized_smile = cv2.resize(smile, (w, h))
    
    original_frame[y_limits[0]:y_limits[1], x_limits[0]:x_limits[1]] = resized_smile

### Step 17: Loop over frames in the input video

In [None]:
while input_video_stream.isOpened():
    
    # Read the next frame from the intput video 
    ret, original_frame = input_video_stream.read()
    # Check if the video is over
    if not ret:
        # Exit from the loop if the video is over
        break 
    face_detection_frame = face_detection_pre_processing(original_frame, retinaface_batch, retinaface_channels, retinaface_input_layer_h, retinaface_input_layer_w)
    face_detection_inferece_result = face_detection_inference(face_detection_frame)
    
    detected_faces = postprocessor.process_output(face_detection_inferece_result)
    
    for detected_face in detected_faces:
        # Step 13: Get the confidence for a discovered object
        xmin = int(detected_face[0])
        ymin = int(detected_face[1])

        xmax = int(detected_face[2])
        ymax = int(detected_face[3])
        
        emotion_recognition_frame = original_frame[ymin:ymax, xmin:xmax]
    
        # Get height and width of the frame
        emotion_recognition_result = emotion_infer(emotion_recognition_frame)
        emotion_recognition_inference_postpprocess(original_frame, detected_face, emotion_recognition_result, (xmin, xmax), (ymin, ymax))
        # Write the resulting frame to the output stream
    
    output_video_stream.write(original_frame)
    
input_video_stream.release()
# Save the resulting video
output_video_stream.release()

Now the person (Artyom) on the resulting video will be detected with emotion:

In [None]:
# Show a source video
HTML(f"""<video width="600" height="400" controls><source src="{OUTPUT_VIDEO}" type="video/mp4"></video>""")

![](pictures/thankyou.PNG)