## Table of Contents
    
1. [Exercise in Pre-trained models](#Exercise 1 in Pre-trained models (Lesson 2)
    1. [Exercise 1.1](#1.1 Download 3 pre-trained models here)
    2. [Exercise 1.2](#1.2 Preprocess the inputs to match what each of the models expects as their input.)
    3. [Exercise 1.3](#1.3)
2. [Exercise in Model Optimizer](#2)
    1. [Exercise 2.1](#2.1)
    2. [Exercise 2.2](#2.2)
    3. [Exercise 2.3](#2.3)
3. [Exercise in Inference Engine](#3)
    1. [Exercise 3.1](#3.1)
    2. [Exercise 3.2](#3.2)
    3. [Exercise 3.3](#3.3)
4. [Exercise in Deploy an Edge App](#4)
    1. [Exercise 4.1](#4.1)
    2. [Exercise 4.2](#4.2)
    3. [Exercise 4.3](#4.3)


## Exercise 1 in Pre-trained models (Lesson 2)

###  1.1 Download 3 pre-trained models [here](https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models) to complete tasks including: *human pose estimation (all precision levels), text detection (FP16), determining car type and color (INT8)*. And verify the downloads afterwards 

```shell
# to access model downloader in Linux
cd /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader
# to download the three models, default is to download all precision levels
sudo ./downloader.py --name human-pose-estimation-0001 -o {directory output where you wanna save the downloaded model}
sudo ./downloader.py --name text-detection-0004 --precisions FP16 -o {dir output}
sudo ./downloader.py --name vehicle-attributes-recognition-barrier-0039 --precisions INT8 -o {dir output}
# when working on the local computer, save to the same dir as model downloader, it will create a new dir named /intel, two files .bin and .xml should be in each model folder
```

### 1.2 Preprocess the inputs to match what each of the models expects as their input. 
- **<font size='3'>[human pose estimation](https://docs.openvinotoolkit.org/latest/_models_intel_human_pose_estimation_0001_description_human_pose_estimation_0001.html) </font>**
- **<font size='3'>[text detection](http://docs.openvinotoolkit.org/latest/_models_intel_text_detection_0004_description_text_detection_0004.html) </font>**
- **<font size='3'>[vehicle attributes recognition](https://docs.openvinotoolkit.org/latest/_models_intel_vehicle_attributes_recognition_barrier_0039_description_vehicle_attributes_recognition_barrier_0039.html) </font>**

```python
import cv2
import numpy as np

def pose_estimation(input_image):
    '''
    Given some input image, preprocess the image so that
    it can be used with the related pose estimation model
    you downloaded previously. You can use cv2.resize()
    to resize the image.
    '''
    preprocessed_image = np.copy(input_image)

    # TODO: Preprocess the image for the pose estimation model
    # default opencv color channel is BGR with H*W*C order
    # cv2.resize input format is W*H and output format is H*W*C
    preprocessed_image = cv2.resize(preprocessed_image,(456,256))
    # move color channels from last dimension to first dimension, so the order is C*H*W
    preprocessed_image = preprocessed_image.transpose(2,0,1)
    # add an extra dimension 1 as batch size using reshape, and input format is 1*C*H*W
    preprocessed_image = preprocessed_image.reshape(1,3,256,456)
    
    return preprocessed_image


def text_detection(input_image):
    '''
    Given some input image, preprocess the image so that
    it can be used with the related text detection model
    you downloaded previously. You can use cv2.resize()
    to resize the image.
    '''
    preprocessed_image = np.copy(input_image)

    # TODO: Preprocess the image for the text detection model
    preprocessed_image = cv2.resize(preprocessed_image,(1280,768))
    preprocessed_image = preprocessed_image.transpose(2,0,1)
    preprocessed_image = preprocessed_image.reshape(1,3,768,1280)
    
    return preprocessed_image


def car_meta(input_image):
    '''
    Given some input image, preprocess the image so that
    it can be used with the related car metadata model
    you downloaded previously. You can use cv2.resize()
    to resize the image.
    '''
    preprocessed_image = np.copy(input_image)

    # TODO: Preprocess the image for the car metadata model
    preprocessed_image = cv2.resize(preprocessed_image,(72,72))
    preprocessed_image = preprocessed_image.transpose(2,0,1)
    preprocessed_image = preprocessed_image.reshape(1,3,72,72)
    
    return preprocessed_image
```

### <a id='1.3'></a>1.3 Call functions to handle input and output of models within the app for pose detection, car-attributes detection, and text detection 

* <font size='3'>First, revise some codes in ```app.py``` and ```handle_model.py``` to print out the keys (```print(output.keys())```) of the model output (take pose detection for example). By running the python app.py and specifying the location of the image (```-i```), the model file (```-m```), the type of model (```-t```), and CPU extension file (```-c```), terminal will print out two keys, ```'Mconv7_stage2_L2'``` and ```'Mconv7_stage2_L1'```. Then use either key and go back to ```handle_model.py``` to print out the shape of the corresponding value, and compare to the output in the document, so I can know which key I should use.</font>

* **<font size='3'>handle_pose function</font>**

```python
import cv2
import numpy as np


def handle_pose(output, input_shape):
    '''
    Handles the output of the Pose Estimation model.
    Returns ONLY the keypoint heatmaps, and not the Part Affinity Fields.
    '''
    # TODO 1: Extract only the second blob output (keypoint heatmaps)
    # the second output blob contains heatmap keypoints with shape of 1*19*32*57
    # heatmaps[0]: 19*32*57
    heatmaps = output['Mconv7_stage2_L2']
    
    # TODO 2: Resize the heatmap back to the size of the input
    # input_shape: H*W*C
    # input_shape[0:2][::-1]: W*H, this order is required by cv2.resize function, output is in H*W order
    pose_output = np.zeros([heatmaps.shape[1],input_shape[0],input_shape[1]])
    # len(heatmaps[0]): 19
    for i in range(len(heatmaps[0])):
        pose_output[i] = cv2.resize(heatmaps[0][i],input_shape[0:2][::-1])

    return pose_output


def handle_text(output, input_shape):
    '''
    Handles the output of the Text Detection model.
    Returns ONLY the text/no text classification of each pixel,
        and not the linkage between pixels and their neighbors.
    '''
    # TODO 1: Extract only the first blob output (text/no text classification)
    txt_bin = output['model/segm_logits/add']
    # TODO 2: Resize this output back to the size of the input
    txt_output = np.empty([txt_bin.shape[1],input_shape[0],input_shape[1]])
    for i in range(len(txt_bin[0])):
        txt_output[i] = cv2.resize(txt_bin[0][i],input_shape[0:2][::-1])

    return txt_output


def handle_car(output, input_shape):
    '''
    Handles the output of the Car Metadata model.
    Returns two integers: the argmax of each softmax output.
    The first is for color, and the second for type.
    '''
    # TODO 1: Get the argmax of the "color" output
    color_arg = output['color'].flatten().argmax()
    # TODO 2: Get the argmax of the "type" output
    type_arg = output['type'].flatten().argmax()
    return color_arg,type_arg


def handle_output(model_type):
    '''
    Returns the related function to handle an output,
        based on the model_type being used.
    '''
    if model_type == "POSE":
        return handle_pose
    elif model_type == "TEXT":
        return handle_text
    elif model_type == "CAR_META":
        return handle_car
    else:
        return None


'''
The below function is carried over from the previous exercise.
You just need to call it appropriately in `app.py` to preprocess
the input image.
'''
def preprocessing(input_image, height, width):
    '''
    Given an input image, height and width:
    - Resize to width and height
    - Transpose the final "channel" dimension to be first
    - Reshape the image to add a "batch" of 1 at the start 
    '''
    image = np.copy(input_image)
    image = cv2.resize(image, (width, height))
    image = image.transpose((2,0,1))
    image = image.reshape(1, 3, height, width)

    return image
```

* **<font size='3'>app.py</font>**

```python
import argparse
import cv2
import numpy as np

from handle_models import handle_output, preprocessing
from inference import Network


CAR_COLORS = ["white", "gray", "yellow", "red", "green", "blue", "black"]
CAR_TYPES = ["car", "bus", "truck", "van"]


def get_args():
    '''
    Gets the arguments from the command line.
    '''

    parser = argparse.ArgumentParser("Basic Edge App with Inference Engine")
    # -- Create the descriptions for the commands

    c_desc = "CPU extension file location, if applicable"
    d_desc = "Device, if not CPU (GPU, FPGA, MYRIAD)"
    i_desc = "The location of the input image"
    m_desc = "The location of the model XML file"
    t_desc = "The type of model: POSE, TEXT or CAR_META"

    # -- Add required and optional groups
    parser._action_groups.pop()
    required = parser.add_argument_group('required arguments')
    optional = parser.add_argument_group('optional arguments')

    # -- Create the arguments
    required.add_argument("-i", help=i_desc, required=True)
    required.add_argument("-m", help=m_desc, required=True)
    required.add_argument("-t", help=t_desc, required=True)
    optional.add_argument("-c", help=c_desc, default=None)
    optional.add_argument("-d", help=d_desc, default="CPU")
    args = parser.parse_args()

    return args


def get_mask(processed_output):
    '''
    Given an input image size and processed output for a semantic mask,
    returns a masks able to be combined with the original image.
    '''
    # Create an empty array for other color channels of mask
    empty = np.zeros(processed_output.shape)
    # Stack to make a Green mask where text detected
    mask = np.dstack((empty, processed_output, empty))

    return mask


def create_output_image(model_type, image, output):
    '''
    Using the model type, input image, and processed output,
    creates an output image showing the result of inference.
    '''
    if model_type == "POSE":
        # Remove final part of output not used for heatmaps
        output = output[:-1]
        # Get only pose detections above 0.5 confidence, set to 255
        for c in range(len(output)):
            output[c] = np.where(output[c]>0.5, 255, 0)
        # Sum along the "class" axis
        output = np.sum(output, axis=0)
        # Get semantic mask
        pose_mask = get_mask(output)
        # Combine with original image
        image = image + pose_mask
        return image
    elif model_type == "TEXT":
        # Get only text detections above 0.5 confidence, set to 255
        output = np.where(output[1]>0.5, 255, 0)
        # Get semantic mask
        text_mask = get_mask(output)
        # Add the mask to the image
        image = image + text_mask
        return image
#         print(output['model/link_logits_/add'].shape)
    elif model_type == "CAR_META":
        # Get the color and car type from their lists
        color = CAR_COLORS[output[0]]
        car_type = CAR_TYPES[output[1]]
        # Scale the output text by the image shape
        scaler = max(int(image.shape[0] / 1000), 1)
        # Write the text of color and type onto the image
        image = cv2.putText(image, 
            "Color: {}, Type: {}".format(color, car_type), 
            (50 * scaler, 100 * scaler), cv2.FONT_HERSHEY_SIMPLEX, 
            2 * scaler, (255, 255, 255), 3 * scaler)
        return image
    else:
        print("Unknown model type, unable to create output image.")
        return image


def perform_inference(args):
    '''
    Performs inference on an input image, given a model.
    '''
    # Create a Network for using the Inference Engine
    inference_network = Network()
    # Load the model in the network, and obtain its input shape
    n, c, h, w = inference_network.load_model(args.m, args.d, args.c)

    # Read the input image
    image = cv2.imread(args.i)

    ### TODO: Preprocess the input image
    preprocessed_image = preprocessing(image,h,w)

    # Perform synchronous inference on the image
    inference_network.sync_inference(preprocessed_image)

    # Obtain the output of the inference request
    output = inference_network.extract_output()

    ### TODO: Handle the output of the network, based on args.t
    ### Note: This will require using `handle_output` to get the correct
    ###       function, and then feeding the output to that function.
    processed_output = handle_output(args.t)(output,image.shape)

    # Create an output image based on network
    output_image = create_output_image(args.t, image, processed_output)

    # Save down the resulting image
    cv2.imwrite("outputs/{}-output.png".format(args.t), output_image)


def main():
    args = get_args()
    perform_inference(args)


if __name__ == "__main__":
    main()
```

* **<font size='3'>call for the app</font>**
```shell
python {path of app.py file} -m {path of model .xml file} -i {path of input image file} -t {type of model: 'POSE'/'TEXT'/'CAR_META'} -c {CPU extension file, workspace: '/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so'}
# output would be saved in the outputt directory
```

* **<font size='3'>After running the ```app.py```, output image shown as below, from left to right are: pose detection, text detection, car attribute detection</font>**

<img src='https://r953259c958904xjupyterlfkz3ibsg.udacity-student-workspaces.com/files/outputs/POSE-output.png?_xsrf=2%7C75ac3ac5%7C03fdd4b1ed1bc0978e7d959920fcd572%7C1583120289' width='300' style='float: left;' title='pose'>
<img src='https://r953259c958904xjupyterlfkz3ibsg.udacity-student-workspaces.com/files/outputs/TEXT-output.png?_xsrf=2%7C75ac3ac5%7C03fdd4b1ed1bc0978e7d959920fcd572%7C1583120289' width='300' style='float: left;'>
<img src='https://r953259c958904xjupyterlfkz3ibsg.udacity-student-workspaces.com/files/outputs/CAR_META-output.png?_xsrf=2%7C75ac3ac5%7C03fdd4b1ed1bc0978e7d959920fcd572%7C1583120289' width='300' align='left'>

## Exercise 2 in Model Optimizer (Lesson 3) <a id='2'></a>
    
**Back to [TOC](#toc)**    

### <a id='2.1'></a>2.1 Convert a TensorFlow Model 
* **feed in the downloaded SSD MobileNet V2 COCO model's .pb file, documentation can be found [here](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_TensorFlow.html)**
* **--reverse_input_channels documentation can be found [here](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html#when_to_reverse_input_channels)**
* **--reverse_input_channels --tensorflow_object_detection_api_pipeline_config documentations can be found [here](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_Object_Detection_API_Models.html)**

```shell
# to download file from link to current directory in Linux: wget <url>, if the path is too long can also do var=path then $var/file
# to unzip a file: tar -xvf path_to_file
# cd to the directory where model was downloaded
# TensorFlow models are trained with images in RGB order, but Inference Engine load images input in BGR order, so need --reverse_input_channels    
python <INSTALL_DIR>/deployment_tools/model_optimizer directory/mo_tf.py --input_model <INPUT_MODEL>.pb --reverse_input_channels --tensorflow_object_detection_api_pipeline_config <path_to_pipeline.config> --tensorflow_use_custom_operations_config <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json
```
* **will generate two files: model_name.xml and model_name.bin in the same directory and show you the ellaps time.**

### <a id='2.2'></a>2.2 Convert a Caffe Model into IR using Model Optimizer
* **clone a repository from github ```git clone <repository_url>```**
* **need to specify ```--input_proto``` if the ```.prototxt``` file is not named the same as the model ```.caffemodel``` file. [convert a caffe model](https://docs.openvinotoolkit.org/2018_R5/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Caffe.html#caffe_specific_conversion_params)**
* **If you notice poor performance in inference, you may need to specify mean ```--mean_values [R,G,B]``` and scale values ```-- scale num``` in your arguments. [specify the parameters](https://docs.openvinotoolkit.org/2018_R5/_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html)**

```shell
# cd to SqueezeNet/SqueezeNet_v1.1 first
python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model squeezenet_v1.1.caffemodel --input_proto deploy.prototxt
```
* **will generate two files: model_name.xml and model_name.bin in the same directory and show you the execution time.**

### <a id='2.3'></a>2.3 Convert a ONNX Model into IR using Model Optimizer
* **download file from link to current directory in Linux: ```wget <url>```, then unzip the file: ```tar -xvf path_to_file```**
* **convert a onnx model [documentation](https://docs.openvinotoolkit.org/2018_R5/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX.html)**

```shell
# cd to bvlc_alexnet first
python /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model <model_name>.onnx
```
* **will generate two files: model_name.xml and model_name.bin in the same directory and show you the execution time.**
* **PyTorch/Apple core models must be converted to ONNX format outside of OpenVINO before using Model Optimizer for conversion.**

## Exercise 3 in Inference Engine (IE) (Lesson 4) <a id='3'></a>
    
**Back to [TOC](#toc)**

### <a id='3.1'></a>3.1 Feed an IR to IE
* **import Python wrapper for IE ```from openvino.inference_engine import IECore, IENetwork``` ([IECore](https://docs.openvinotoolkit.org/2019_R3/classie__api_1_1IECore.html), [IENetwork](https://docs.openvinotoolkit.org/2019_R3/classie__api_1_1IENetwork.html))**
* **add each IR as an ```IENetwork``` and check whether the layers of that network are supported by CPU ```ie.query_network(network,device_name)``` ([IENet Layer](https://docs.openvinotoolkit.org/2019_R3/classie__api_1_1IENetLayer.html))**
* **since workspace is using intel CPU, add a CPU extension to ```IECore```: ```ie=IECore(), ie.add_extension(extension_path=CPU_EXTENSION, device_name="CPU")```**
* **after verify all layers are supported, load IR (model is ```.xml``` file, weights is ```.bin``` file) into IE to create an ```ExecutableNetwork```: ```ie.load_network(network, device_name)```**
* **[Data Structures of IE API](https://docs.openvinotoolkit.org/2019_R3/ie_python_api.html)**

```python
import argparse
### TODO: Load the necessary libraries
from openvino.inference_engine import IECore, IENetwork
import os

CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

def get_args():
    '''
    Gets the arguments from the command line.
    '''
    parser = argparse.ArgumentParser("Load an IR into the Inference Engine")
    # -- Create the descriptions for the commands
    m_desc = "The location of the model XML file"

    # -- Create the arguments
    parser.add_argument("-m", help=m_desc)
    args = parser.parse_args()

    return args


def load_to_IE(model_xml):
    ### TODO: Load the Inference Engine API
    plugin = IECore()
    
    ### TODO: Load IR files into their related class
    model_bin = os.path.splitext(model_xml)[0] + '.bin'
    net = IENetwork(model=model_xml, weights=model_bin)
    
    ### TODO: Add a CPU extension, if applicable. It's suggested to check
    ###       your code for unsupported layers for practice before 
    ###       implementing this. Not all of the models may need it.
    plugin.add_extension(extension_path=CPU_EXTENSION, device_name="CPU")    
    
    ### TODO: Get the supported layers of the network
    supported_layers = plugin.query_network(net,'CPU')
    
    ### TODO: Check for any unsupported layers, and let the user
    ###       know if anything is missing. Exit the program, if so.
    unsupported_layers = [l for l in net.layers.keys() if l not in supported_layers]
    
    if len(unsupported_layers) != 0:
        print('Unsupported layers found: %s' %unsupported_layers)
        print('Check if extension available to be added')
        exit(1)
        
    ### TODO: Load the network into the Inference Engine
    exec_net = plugin.load_network(network=net, device_name="CPU", num_requests=2)
    print("IR successfully loaded into Inference Engine.")

    return


def main():
    args = get_args()
    load_to_IE(args.m)


if __name__ == "__main__":
    main()
```

### <a id='3.2'></a>3.2 Send Inference Requests to IE
* **```inference.py```: perform a synchronous and an asynchronous [inference requests](https://docs.openvinotoolkit.org/2019_R3/classie__api_1_1InferRequest.html) given an input image frame**
* **```test.py```: to test ```inference.py```**
* **synchronous request: wait and do nothing until inference completes, so only one frame processed once; asynchronous request: not hold everthing up if response is slow, can send frame for inference and simultaneously start preprocessing next frame while waiting**
* **[```ExecutableNetwork```](https://docs.openvinotoolkit.org/2019_R3/classie__api_1_1ExecutableNetwork.html)**
* **```exect_net.requests[request_id]```: a collection of inference requests, e.g., ```exect_net.requests[0]```: means the first request**

**inference.py**
```python
import argparse
import cv2
from helpers import load_to_IE, preprocessing

CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

def get_args():
    '''
    Gets the arguments from the command line.
    '''
    parser = argparse.ArgumentParser("Load an IR into the Inference Engine")
    # -- Create the descriptions for the commands
    m_desc = "The location of the model XML file"
    i_desc = "The location of the image input"
    r_desc = "The type of inference request: Async ('A') or Sync ('S')"

    # -- Create the arguments
    parser.add_argument("-m", help=m_desc)
    parser.add_argument("-i", help=i_desc)
    parser.add_argument("-r", help=i_desc)
    args = parser.parse_args()

    return args


def async_inference(exec_net, input_blob, image):
    ### TODO: Add code to perform asynchronous inference
    ### Note: Return the exec_net
    infer_request = exec_net.start_async(request_id=0,inputs={input_blob:image})     
    while True:
        # -1: Waits until inference result becomes available (default value)
        status = infer_request.wait(-1)
        # status 0 means the execution has been completed for the request
        if status == 0:
            break
        else:
            time.sleep(1)
    return exec_net


def sync_inference(exec_net, input_blob, image):
    ### TODO: Add code to perform synchronous inference
    ### Note: Return the result of inference
    res = exec_net.infer(inputs={input_blob:image})
    return res


def perform_inference(exec_net, request_type, input_image, input_shape):
    '''
    Performs inference on an input image, given an ExecutableNetwork
    '''
    # Get input image
    image = cv2.imread(input_image)
    # Extract the input shape
    n, c, h, w = input_shape
    # Preprocess it (applies for the IRs from the Pre-Trained Models lesson)
    preprocessed_image = preprocessing(image, h, w)

    # Get the input blob for the inference request
    input_blob = next(iter(exec_net.inputs))

    # Perform either synchronous or asynchronous inference
    request_type = request_type.lower()
    if request_type == 'a':
        output = async_inference(exec_net, input_blob, preprocessed_image)
    elif request_type == 's':
        output = sync_inference(exec_net, input_blob, preprocessed_image)
    else:
        print("Unknown inference request type, should be 'A' or 'S'.")
        exit(1)

    # Return the exec_net for testing purposes
    return output


def main():
    args = get_args()
    exec_net, input_shape = load_to_IE(args.m, CPU_EXTENSION)
    perform_inference(exec_net, args.r, args.i, input_shape)


if __name__ == "__main__":
    main()
```

**test.py**
```python
from helpers import load_to_IE, preprocessing
from inference import perform_inference

CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

MODEL_PATH = "/home/workspace/models/"

OUTPUT_SHAPES = {
    "POSE": {"Mconv7_stage2_L1": (1, 38, 32, 57),
             "Mconv7_stage2_L2": (1, 19, 32, 57)},
    "TEXT": {"model/link_logits_/add": (1, 16, 192, 320),
             "model/segm_logits/add": (1, 2, 192, 320)},
    "CAR META": {"color": (1, 7, 1, 1),
                 "type": (1, 4, 1, 1)}
}

def pose_test():
    counter = 0
    model = MODEL_PATH + "human-pose-estimation-0001.xml"
    image = "images/sitting-on-car.jpg"
    counter += test(model, "POSE", image)

    return counter


def text_test():
    counter = 0
    model = MODEL_PATH + "text-detection-0004.xml"
    image = "images/sign.jpg"
    counter += test(model, "TEXT", image)

    return counter


def car_test():
    counter = 0
    model = MODEL_PATH + "vehicle-attributes-recognition-barrier-0039.xml"
    image = "images/blue-car.jpg"
    counter += test(model, "CAR META", image)

    return counter


def test(model, model_type, image):
    # Synchronous Test
    counter = 0
    try:
        # Load IE separately to check InferRequest latency
        exec_net, input_shape = load_to_IE(model, CPU_EXTENSION)
        result = perform_inference(exec_net, "S", image, input_shape)
        output_blob = next(iter(exec_net.outputs))
        # Check for matching output shape to expected
        assert result[output_blob].shape == OUTPUT_SHAPES[model_type][output_blob]
        # Check latency is > 0; i.e. a request occurred
        assert exec_net.requests[0].latency > 0.0
        counter += 1
    except:
        print("Synchronous Inference failed for {} Model.".format(model_type))
    # Asynchronous Test
    try:
        # Load IE separately to check InferRequest latency
        exec_net, input_shape = load_to_IE(model, CPU_EXTENSION)
        exec_net = perform_inference(exec_net, "A", image, input_shape)
        output_blob = next(iter(exec_net.outputs))
        # Check for matching output shape to expected
        assert exec_net.requests[0].outputs[output_blob].shape == OUTPUT_SHAPES[model_type][output_blob]
        # Check latency is > 0; i.e. a request occurred
        assert exec_net.requests[0].latency > 0.0
        counter += 1
    except:
        print("Asynchronous Inference failed for {} Model.".format(model_type))

    return counter


def feedback(tests_passed):
    print("You passed {} of 6 tests.".format(int(tests_passed)))
    if tests_passed == 3:
        print("Congratulations!")
    else:
        print("See above for additional feedback.")


def main():
    counter = pose_test() + text_test() + car_test()
    feedback(counter)


if __name__ == "__main__":
    main()
```

**helpers.py**
```python
import os
import cv2
import numpy as np
from openvino.inference_engine import IENetwork, IECore

'''
The below functions are carried over from previous exercises.
They are already called appropriately in inference.py.
'''
def load_to_IE(model_xml, cpu_extension):
    # Load the Inference Engine API
    plugin = IECore()

    # Load IR files into their related class
    model_bin = os.path.splitext(model_xml)[0] + ".bin"
    net = IENetwork(model=model_xml, weights=model_bin)

    # Add a CPU extension, if applicable.
    if cpu_extension:
        plugin.add_extension(cpu_extension, "CPU")

    # Get the supported layers of the network
    supported_layers = plugin.query_network(network=net, device_name="CPU")

    # Check for any unsupported layers, and let the user
    # know if anything is missing. Exit the program, if so.
    unsupported_layers = [l for l in net.layers.keys() if l not in supported_layers]
    if len(unsupported_layers) != 0:
        print("Unsupported layers found: {}".format(unsupported_layers))
        print("Check whether extensions are available to add to IECore.")
        exit(1)

    # Load the network into the Inference Engine
    exec_net = plugin.load_network(net, "CPU")

    # Get the input layer
    input_blob = next(iter(net.inputs))

    # Get the input shape
    input_shape = net.inputs[input_blob].shape

    return exec_net, input_shape


def preprocessing(input_image, height, width):
    '''
    Given an input image, height and width:
    - Resize to width and height
    - Transpose the final "channel" dimension to be first
    - Reshape the image to add a "batch" of 1 at the start 
    '''
    image = np.copy(input_image)
    image = cv2.resize(image, (width, height))
    image = image.transpose((2,0,1))
    image = image.reshape(1, 3, height, width)

    return image
```

### <a id='3.3'></a>3.3 Integrate IE into Edge App
* **procedures to integrate IE into the edge app:**
     * step 1: get an IR with model optimizer (either directly obtained or convert a pre-trained via MO)
     * step 2: load it into app with IE
     * step 3: add any necessary pre-processing code for input frame
     * step 4: make inference request and preform inference
     * step 5: handle and process output
$$step1 \to step2 \to step3 \to step4$$
* **what the app do: convert a bounding box model to IR with MO (previously done in [exe 2.1](#2.1)); use an async request to perform inference on each video frame and extract the results from the inference request; add code to make the requests and feed back the results within the application; perform any necessary post-processing steps to get the bounding boxes**

**app.py**
```python
import argparse
import cv2
from inference import Network

INPUT_STREAM = "test_video.mp4"
CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

def get_args():
    '''
    Gets the arguments from the command line.
    '''
    parser = argparse.ArgumentParser("Run inference on an input video")
    # -- Create the descriptions for the commands
    m_desc = "The location of the model XML file"
    i_desc = "The location of the input file"
    d_desc = "The device name, if not 'CPU'"    
    ### TODO: Add additional arguments and descriptions for:
    ###       1) Different confidence thresholds used to draw bounding boxes
    t_desc = "The confidence threshold used to draw bounding boxes"    
    ###       2) The user choosing the color of the bounding boxes
    c_desc = "The color chosen for the bounding boxes, RED, GREEN, or BLUE"

    # -- Add required and optional groups
    parser._action_groups.pop()
    required = parser.add_argument_group('required arguments')
    optional = parser.add_argument_group('optional arguments')

    # -- Create the arguments
    required.add_argument("-m", help=m_desc, required=True)
    optional.add_argument("-i", help=i_desc, default=INPUT_STREAM)
    optional.add_argument("-d", help=d_desc, default='CPU')
    optional.add_argument("-t", help=t_desc, default=0.5)
    optional.add_argument("-c", help=c_desc, default='RED')
    args = parser.parse_args()

    return args

def convert_color(color_str):
    colors = {'BLUE':(255,0,0),'GREEN':(0,255,0),'RED':(0,0,255)}
    color = colors.get(color_str)
    if color:
        return color
    else:
        return colors['RED']     # default color is red

# The IR model outputs a blob with the shape: [1, 1, N, 7], where N is the number of detected bounding boxes.
# For each detection, the description has the format: [image_id, label, conf, x_min, y_min, x_max, y_max]
def draw_boxes(frame, result, args, width, height):
    for box in result[0][0]:        
        if box[2] > args.t:
            xmin = int(box[3]*width)
            ymin = int(box[4]*height)
            xmax = int(box[5]*width)
            ymax = int(box[6]*height)
            cv2.rectangle(frame, (xmin,ymin), (xmax,ymax), args.c, 1)
    return frame

def infer_on_video(args):
    args.c = convert_color(args.c)
    args.t = float(args.t)
    ### TODO: Initialize the Inference Engine
    plugin = Network()
    ### TODO: Load the network model into the IE
    plugin.load_model(args.m,args.d,CPU_EXTENSION)
    input_shape = plugin.get_input_shape()
    # Get and open video capture
    cap = cv2.VideoCapture(args.i)
    cap.open(args.i)

    # Grab the shape of the input 
    width = int(cap.get(3))
    height = int(cap.get(4))

    # Create a video writer for the output video
    # The second argument should be `cv2.VideoWriter_fourcc('M','J','P','G')`
    # on Mac, and `0x00000021` on Linux
    out = cv2.VideoWriter('out.mp4', 0x00000021, 30, (width,height))
    
    # Process frames until the video ends, or process is exited
    while cap.isOpened():
        # Read the next frame
        flag, frame = cap.read()
        if not flag:
            break
        key_pressed = cv2.waitKey(60)

        ### TODO: Pre-process the frame
        prep_frame = cv2.resize(frame, (input_shape[3],input_shape[2])).transpose(2,0,1)        
        prep_frame = prep_frame.reshape(1,*prep_frame.shape)
        ### TODO: Perform inference on the frame
        plugin.async_inference(prep_frame)
        ### TODO: Get the output of inference
        if plugin.wait() == 0:
            res = plugin.extract_output()
        ### TODO: Update the frame to include detected bounding boxes
        frame = draw_boxes(frame, res, args, width, height)
        # Write out the frame
        out.write(frame)
        # Break if escape key pressed
        if key_pressed == 27:
            break

    # Release the out writer, capture, and destroy any OpenCV windows
    out.release()
    cap.release()
    cv2.destroyAllWindows()


def main():
    args = get_args()
    infer_on_video(args)


if __name__ == "__main__":
    main()
```
**inference.py**
```python
'''
Contains code for working with the Inference Engine.
You'll learn how to implement this code and more in
the related lesson on the topic.
'''

import os
import sys
import logging as log
from openvino.inference_engine import IENetwork, IECore

class Network:
    '''
    Load and store information for working with the Inference Engine,
    and any loaded models.
    '''

    def __init__(self):
        self.plugin = None
        self.network = None
        self.input_blob = None
        self.output_blob = None
        self.exec_network = None
        self.infer_request = None


    def load_model(self, model, device="CPU", cpu_extension=None):
        '''
        Load the model given IR files.
        Defaults to CPU as device for use in the workspace.
        Synchronous requests made within.
        '''
        model_xml = model
        model_bin = os.path.splitext(model_xml)[0] + ".bin"

        # Initialize the plugin
        self.plugin = IECore()

        # Add a CPU extension, if applicable
        if cpu_extension and "CPU" in device:
            self.plugin.add_extension(cpu_extension, device)

        # Read the IR as a IENetwork
        self.network = IENetwork(model=model_xml, weights=model_bin)

        # Load the IENetwork into the plugin
        self.exec_network = self.plugin.load_network(self.network, device)

        # Get the input layer
        self.input_blob = next(iter(self.network.inputs))
        self.output_blob = next(iter(self.network.outputs))

        return


    def get_input_shape(self):
        '''
        Gets the input shape of the network
        '''
        return self.network.inputs[self.input_blob].shape


    def async_inference(self, image):
        '''
        Makes an asynchronous inference request, given an input image.
        '''
        ### TODO: Start asynchronous inference
        self.exec_network.start_async(request_id=0,inputs={self.input_blob:image})        
        return


    def wait(self):
        '''
        Checks the status of the inference request.
        '''
        ### TODO: Wait for the async request to be complete
        status = self.exec_network.requests[0].wait()
        return status


    def extract_output(self):
        '''
        Returns a list of the results for the output layer of the network.
        '''
        ### TODO: Return the outputs of the network from the output_blob
        res = self.exec_network.requests[0].outputs[self.output_blob]
        return res
```

* **run the app will generate an output.mp4 file with bounding boxes detecting moving vehicles**
```shell
python app.py -m frozen_inference_graph.xml
```

In [12]:
from IPython.display import HTML
HTML("""
<video width="400" height="300" controls>
  <source src="./out.mp4" type="video/mp4">
</video>
""")

## Exercise 4 in Deploy an Edge App (Lesson 5) <a id='4'></a>
    
**Back to [TOC](#toc)**

### <a id='4.1'></a>4.1 Handle Input Streams
* **Task 1: Implement a function that can handle camera image, video file or webcam inputs**
* **Task 2: Use cv2.VideoCapture() and open the capture stream, loop while the capture is open, use capture.read to return two values: boolean (false when no frame to read) and frame, read frames, and break the loop if no more frames (checked by capture.isOpened)**
* **Task 3: Re-size the frame to 100x100**
* **Task 4: Add Canny Edge Detection to the frame with min & max values of 100 and 200, respectively**
* **Task 5: Save down the image or video output and close the stream and any windows at the end of the application**
* **[OpenCV documentation and tutorial](https://docs.opencv.org/master/d9/df8/tutorial_root.html)**

**app.py**
```python
import argparse
import cv2
import numpy as np

def get_args():
    '''
    Gets the arguments from the command line.
    '''
    parser = argparse.ArgumentParser("Handle an input stream")
    # -- Create the descriptions for the commands
    i_desc = "The location of the input file"

    # -- Create the arguments
    parser.add_argument("-i", help=i_desc)
    args = parser.parse_args()

    return args


def capture_stream(args):
    ### TODO: Handle image, video or webcam
    flag = False
    if args.i == "CAM":
        args.i = 0
    elif args.i.endswith('.jpg') or args.i.endswith('.bmp'):
        flag = True
    ### TODO: Get and open video capture
    capture = cv2.VideoCapture(args.i)
    capture.open(args.i)
    if not flag:
        # The second argument should be `cv2.VideoWriter_fourcc('M','J','P','G')`
        # on Mac, and `0x00000021` on Linux
        # 100x100 to match desired resizing
        out_video = cv2.VideoWriter('out_video.mp4', 0x00000021, 20, (100,100))        
        while capture.isOpened():            
            pressed_key = cv2.waitKey(120)
            if pressed_key == 27:
                break
            boo, frame = capture.read()
            if not boo:
                break
        ### TODO: Re-size the frame to 100x100
            frame = cv2.resize(frame, (100,100))
        ### TODO: Add Canny Edge Detection to the frame, 
            ###       with min & max values of 100 and 200
            ###       Make sure to use np.dstack after to make a 3-channel image
            frame = cv2.Canny(frame,100,200)
            frame = np.dstack((frame,frame,frame))
            ### TODO: Write out the frame, depending on image or video
            out_video.write(frame)            
    else:
        frame = cv2.imread(args.i)
        frame = cv2.resize(frame,(100,100))
        frame = cv2.Canny(frame,(100,200))
        frame = np.dstack((frame,frame,frame))
        cv2.imwrite('out_image.jpg',frame)
    ### TODO: Close the stream and any windows at the end of the application
    capture.release()
    cv2.destroyAllWindows()
    

def main():
    args = get_args()
    capture_stream(args)


if __name__ == "__main__":
    main()
```
* **run the app will generate an output_video.mp4 file upon canny edge detection**
```shell
python app.py -i test_video.mp4
```

In [13]:
HTML("""
<video width="400" height="300" controls>
  <source src="./out_video.mp4" type="video/mp4">
</video>
""")

### <a id='4.2'></a>4.2 Process Model Output
* **Suppose you have a video showing two combinations, one for cat and dog#1 who are friends, the other for cat and dog#2 who don't get along, your goal is to print a warning message to the terminal when cat and dog#2 seen since they don't get along**
* **```model.xml```: object detection model that can identify different breeds, will return three classes, one for one or less pets on screen, one for the bad combination of the cat and dog#2, and one for the fine combination of the cat and dog#1.**
* **```python app.py -m model.xml -i pets.mp4```: run app will print out a warning message "Warning: Break up!" at 3.8 s, 10.8 s, 17.8 s.**

```python
import argparse
import cv2
from inference import Network

INPUT_STREAM = "pets.mp4"
CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"

def get_args():
    '''
    Gets the arguments from the command line.
    '''
    parser = argparse.ArgumentParser("Run inference on an input video")
    # -- Create the descriptions for the commands
    m_desc = "The location of the model XML file"
    i_desc = "The location of the input file"
    d_desc = "The device name, if not 'CPU'"

    # -- Add required and optional groups
    parser._action_groups.pop()
    required = parser.add_argument_group('required arguments')
    optional = parser.add_argument_group('optional arguments')

    # -- Create the arguments
    required.add_argument("-m", help=m_desc, required=True)
    optional.add_argument("-i", help=i_desc, default=INPUT_STREAM)
    optional.add_argument("-d", help=d_desc, default='CPU')
    args = parser.parse_args()

    return args


def infer_on_video(args):
    # Initialize the Inference Engine
    plugin = Network()

    # Load the network model into the IE
    plugin.load_model(args.m, args.d, CPU_EXTENSION)
    net_input_shape = plugin.get_input_shape()

    # Get and open video capture
    cap = cv2.VideoCapture(args.i)
    cap.open(args.i)

    # Process frames until the video ends, or process is exited
    count = 0 
    pet_flag = False
    while cap.isOpened():
        # Read the next frame
        flag, frame = cap.read()
        if not flag:
            break
        key_pressed = cv2.waitKey(60)
        count += 1
        # Pre-process the frame
        p_frame = cv2.resize(frame, (net_input_shape[3], net_input_shape[2]))
        p_frame = p_frame.transpose((2,0,1))
        p_frame = p_frame.reshape(1, *p_frame.shape)

        # Perform inference on the frame
        plugin.async_inference(p_frame)

        # Get the output of inference
        if plugin.wait() == 0:
            result = plugin.extract_output()
            ### TODO: Process the output, second class is the combination of cat and dog#2
            if result[0][1] == 1 and not pet_flag:
                print('Warning: Break up!')
                # 30 frames per second
                print('Incident at {} seconds'.format(count/30))
                pet_flag = True
            elif result[0][1] != 1:
                pet_flag = False

        # Break if escape key pressed
        if key_pressed == 27:
            break

    # Release the capture and destroy any OpenCV windows
    cap.release()
    cv2.destroyAllWindows()


def main():
    args = get_args()
    infer_on_video(args)


if __name__ == "__main__":
    main()
```

In [14]:
HTML("""
<video width="400" height="300" controls>
  <source src="./pets.mp4" type="video/mp4">
</video>
""")

### <a id='4.3'></a>4.3 Server Communications and Run App on Web
* **Task 1: Add any code for MQTT to the project so that the node server receives the calculated stats**
* **Taks 2: Send the output frame (not the input image, but the processed output) to the ffserver**
* **Task 3: Get the MQTT broker and UI installed and running. ```cd webservice/server``` --> ```npm install``` --> When complete, ```cd ../ui``` --> And again, ```npm install```. Then start 4 terminals for each.**
    * Terminal 1: Get the MQTT broker installed and running. ```cd webservice/server/node-server``` --> ```node ./server.js``` --> You should see a message that ```Mosca server started```.
    * Terminal 2: Get the UI Node Server running. ```cd webservice/ui``` --> ```npm run dev``` --> After a few seconds, you should see ```webpack: Compiled successfully```.
    * Terminal 3: Start the ffserver. ```sudo ffserver -f ./ffmpeg/server.conf``` ffmpege is a folder containing configuration file to set the port & IP address of server, ports to receive video from, and framerate.
    * Terminal 4: Start the actual application. First, you need to source the environment for OpenVINO in the new terminal: ```source /opt/intel/openvino/bin/setupvars.sh -pyver 3.5```
* **[```paho-mqtt``` library](https://pypi.org/project/paho-mqtt/): a MQTT library in Python, used to connect to client with IP address and port of broker (server).**
* **```ffserver``` feature of [FFmpeg](https://www.ffmpeg.org/): a software/library for handling video and audio streams, have an intermediate FFmpeg server that video frames are sent to then send onto Node server for a webpage. [Flask](https://www.pyimagesearch.com/2019/09/02/opencv-stream-video-to-web-browser-html-page/) library in Python can also be used for this purpose.**
* **[```Node.js``` server](https://nodejs.org/en/about/): an open-source server environment, allows JavaScript to be run outside of a browser and get dynamic contents for a user and display in browser.**

**app.py**
```python
import argparse
import cv2
import numpy as np
import socket
import json
from random import randint
from inference import Network
### TODO: Import any libraries for MQTT and FFmpeg
import sys
import paho.mqtt.client as mqtt

INPUT_STREAM = "test_video.mp4"
CPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"
ADAS_MODEL = "/home/workspace/models/semantic-segmentation-adas-0001.xml"


CLASSES = ['road', 'sidewalk', 'building', 'wall', 'fence', 'pole', 
'traffic_light', 'traffic_sign', 'vegetation', 'terrain', 'sky', 'person',
'rider', 'car', 'truck', 'bus', 'train', 'motorcycle', 'bicycle', 'ego-vehicle']

# MQTT server environment variables
HOSTNAME = socket.gethostname()
IPADDRESS = socket.gethostbyname(HOSTNAME)
MQTT_HOST = IPADDRESS
MQTT_PORT = 3001 ### TODO: Set the Port for MQTT, the MQTT port to use is 3001
MQTT_KEEPALIVE_INTERVAL = 60

def get_args():
    '''
    Gets the arguments from the command line.
    '''
    parser = argparse.ArgumentParser("Run inference on an input video")
    # -- Create the descriptions for the commands
    i_desc = "The location of the input file"
    d_desc = "The device name, if not 'CPU'"

    # -- Create the arguments
    parser.add_argument("-i", help=i_desc, default=INPUT_STREAM)
    parser.add_argument("-d", help=d_desc, default='CPU')
    args = parser.parse_args()

    return args


def draw_masks(result, width, height):
    '''
    Draw semantic mask classes onto the frame.
    '''
    # Create a mask with color by class
    classes = cv2.resize(result[0].transpose((1,2,0)), (width,height), 
        interpolation=cv2.INTER_NEAREST)
    unique_classes = np.unique(classes)
    out_mask = classes * (255/20)
    
    # Stack the mask so FFmpeg understands it
    out_mask = np.dstack((out_mask, out_mask, out_mask))
    out_mask = np.uint8(out_mask)

    return out_mask, unique_classes


def get_class_names(class_nums):
    class_names= []
    for i in class_nums:
        class_names.append(CLASSES[int(i)])
    return class_names


def infer_on_video(args, model):
    ### TODO: Connect to the MQTT server
    mqttc = mqtt.Client()
    mqttc.connect(MQTT_HOST, MQTT_PORT, MQTT_KEEPALIVE_INTERVAL)
    
    # Initialize the Inference Engine
    plugin = Network()

    # Load the network model into the IE
    plugin.load_model(model, args.d, CPU_EXTENSION)
    net_input_shape = plugin.get_input_shape()

    # Get and open video capture
    cap = cv2.VideoCapture(args.i)
    cap.open(args.i)

    # Grab the shape of the input 
    width = int(cap.get(3))
    height = int(cap.get(4))

    # Process frames until the video ends, or process is exited
    while cap.isOpened():
        # Read the next frame
        flag, frame = cap.read()
        if not flag:
            break
        key_pressed = cv2.waitKey(60)

        # Pre-process the frame
        p_frame = cv2.resize(frame, (net_input_shape[3], net_input_shape[2]))
        p_frame = p_frame.transpose((2,0,1))
        p_frame = p_frame.reshape(1, *p_frame.shape)

        # Perform inference on the frame
        plugin.async_inference(p_frame)

        # Get the output of inference
        if plugin.wait() == 0:
            result = plugin.extract_output()
            # Draw the output mask onto the input
            out_frame, classes = draw_masks(result, width, height)
            class_names = get_class_names(classes)
            speed = randint(50,70)
            
            ### TODO: Send the class names and speed to the MQTT server
            ### The topics that the UI Node Server is listening to are "class" and "speedometer"
            ### Hint: The UI web server will check for a "class" and
            ### "speedometer" topic. Additionally, it expects "class_names"
            ### and "speed" as the json keys of the data, respectively.
            mqttc.publish('class', json.dumps({'class_names': class_names}))
            mqttc.publish('speedometer', json.dumps({'speed': speed}))

        ### TODO: Send frame to the ffmpeg server
        sys.stdout.buffer.write(out_frame)
        sys.stdout.flush

        # Break if escape key pressed
        if key_pressed == 27:
            break

    # Release the capture and destroy any OpenCV windows
    cap.release()
    cv2.destroyAllWindows()
    ### TODO: Disconnect from MQTT
    mqttc.disconnect()

def main():
    args = get_args()
    model = ADAS_MODEL
    infer_on_video(args, model)


if __name__ == "__main__":
    main()
```

* **Run ```app.py``` in Terminal 4, and can see MQTT broker server noting information getting published**
    * Pipe the output frames into FFmpeg: ```python app.py | ffmpeg {args}```, arguments like ```-f``` (format), ```-pixel_format```, ```-video_size```, ```-framerate```.
    * The video is running slowly because semantic segmentation is pretty resoursive and intensive

```bash
python app.py | ffmpeg -video_size 1280x720  -f rawvideo -pixel_format bgr24 -framerate 24 -i - http://0.0.0.0:3004/fac.ffm
```

In [15]:
HTML("""
<video width="400" height="300" controls>
  <source src="./Vehicle_Edge_Application.mp4" type="video/mp4">
</video>
""")