# DPU example: Semantic-FPN
---

## Aim/s
* This notebooks shows an example of DPU applications. The application, as well as the DPU IP, is pulled from the official 
[Vitis AI Github Repository](https://github.com/Xilinx/Vitis-AI).
* Description: Semantic-FPN for segmentation on Cityscapes.
* Input size: 1024x512
* Task: Segmentation

## References
* [Vitis AI Github Repository](https://www.xilinx.com/products/design-tools/vitis/vitis-ai.html).

## Last revised
* December 14, 2023
    * Initial revision. MM, Alpha Data Parallel Systems Ltd.

----

In [None]:
import cv2
import glob
import time
import numpy as np
import IPython
from PIL import Image

from pynq_dpu import DpuOverlay

## 1. Prepare the overlay
We will download the overlay onto the board and prepare the model.

In [None]:
ol = DpuOverlay('dpu.bit')
ol.load_model('pt_SemanticFPN-mobilenetv2.xmodel')

## 2. Utility functions

In this section, we will prepare a few functions and the input data for later use.

First we need to define helper functions to prepare the input images for the model. At this
stage we will also define a post-processing function to decode the output of the model and
apply a color palette to annotated the semantic information of the image.

In [None]:
MEAN = (123.675, 116.28, 103.53)
STD = (58.395, 57.12, 57.375)

def preprocess(image, size=(1024, 512)):
    '''
    Function to prepare the input for the Semantic-FPN model.
    The input images are first converted to the RGB color space,
    then resized and finally shifted and scaled to standardise the
    color channel values.
    
    Args:
        image: 3 dimensional array corresponding to the input BGR image.
        size (optional): (new_width, new_height) tuple corresponding to 
                         the size of the input the Semantic-FPN model is
                         expecting. Set to 1024x512 by default.
        
    Returns:
        3 dimensional array corresponding to the input for Semantic-FPN.
        The array has a shape of (new_height, new_width, 3).
    '''
    rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    resized_image = cv2.resize(rgb_image, size, interpolation=cv2.INTER_LINEAR)
    processed_image = (resized_image - MEAN) / STD
    return processed_image.astype(np.float32)

def get_palette():
    '''
    Function to generate the color palette to annotate the semantic information.
    
    Returns:
        2 dimensional array of the shape (256, 3) corresponding to the color
        values for each channel of the colorised image. While only 19 classes
        are predicted by the model, the cv2.LUT function requires a palette for
        all 256 possible values of the 8-bit unsigned integers in the mask.
    '''
    palette = [128, 64, 128,
               244, 35, 232,
               70, 70, 70,
               102, 102, 156,
               190, 153, 153,
               153, 153, 153,
               250, 170, 30,
               220, 220, 0,
               107, 142, 35,
               152, 251, 152,
               70, 130, 180,
               220, 20, 60,
               255, 0, 0,
               0, 0, 142,
               0, 0, 70,
               0, 60, 100,
               0, 80, 100,
               0, 0, 230,
               119, 11, 32]

    palette = palette + [0] * (256 * 3 - len(palette))
    
    return np.array(palette, dtype=np.uint8).reshape((-1, 3))

PALETTE = get_palette()

def colorise_mask(mask):
    '''
    Function to colorise the output mask of the model.
    
    Input:
        mask: 2 dimensional array corresponding to the color index of each
              pixel classified during segmentation.
    
    Returns:
        3 dimensional array corresponding to the image colorised according to
        the provided mask.
    '''
    uint_mask = mask.astype(np.uint8)
    channels = [cv2.LUT(uint_mask, PALETTE[:, i]) for i in range(3)]
    return np.dstack(channels)

def postprocess(output_tensors):
    '''
    Function to decode the output of the model and colorise the segmenation
    results.
    
    Input:
        output_tensors: buffers allocated for the DPU to store the output of
                        the model.
    
    Returns:
        3 dimensional array corresponding to the image colorised according to
        the output of the model.
    '''
    output = output_tensors[0][0]
    seg_pred = np.argmax(output, axis=2)
    return colorise_mask(seg_pred)

Finally, use a search pattern to collect all the `png` files for segmentation.

In [None]:
search_pattern = r'img/segm/*.png'
original_images = glob.glob(search_pattern)
total_images = len(original_images)

## 3. Use VART

Now we should be able to use VART to do semantic segmentation.

In [None]:
dpu = ol.runner

input_tensors = dpu.get_input_tensors()
output_tensors = dpu.get_output_tensors()

shape_in = input_tensors[0].dims
shape_out = output_tensors[0].dims

We can define a few buffers to store input and output data. They will be reused during 
multiple runs.

In [None]:
input_data = [np.empty(shape_in, dtype=np.float32, order="C")]
output_data = [np.empty(shape_out, dtype=np.float32, order="C")]

image = input_data[0]

Remember that we have a list of `original_images`. We can no define a new function `run()` which takes the image index as the input, prepares the image for the model, executes the DPU for the given input and finally post-processes the output of the DPU. With the argument 
`display` set to `True`, the colorised output of the model is displayed.

In [None]:
def run(image_index, display=False):
    '''
    Function to handle a single input for the DPU and process the output of the run.
    
    Args:
        image_index: int corresonding to the index of the collected images to be processed.
        display (optional): boolean flag set to annotate and display the input image.
        
    Returns:
        None, the annotated image is displayed when the display flag is set.
        
    Raises:
        AssertionError: when the provided image_index is larger than the number of potential
                        input images found.
    
    '''
    assert image_index < total_images, \
    f'Please specify an image index less than {total_images}. Index provided: {image_index}.'
    
    # Read input image
    input_image = cv2.imread(original_images[image_index])
    
    # Pre-processing
    image_data = preprocess(input_image)
    
    # Fetch data to DPU and trigger it
    image[...] = image_data
    job_id = dpu.execute_async(input_data, output_data)
    dpu.wait(job_id)
    
    # Post-processing
    output_image = postprocess(output_data)
    
    if display:
        IPython.display.display(Image.fromarray(output_image))

In [None]:
run(0, display=True)

We can also run it for multiple images as shown below. In this example we have only used 1 thread; in principle, users should be able to boost the performance by employing more threads.

In [None]:
time1 = time.time()
[run(i) for i in range(total_images)]
time2 = time.time()
fps = total_images/(time2-time1)
print("Performance: {} FPS".format(fps))

We will need to remove references to `vart.Runner` and let Python garbage-collect the unused graph objects. This will make sure we can run other notebooks without any issue.

In [None]:
del dpu
del ol

----

Copyright (C) 2021 Xilinx, Inc

SPDX-License-Identifier: Apache-2.0 License

----

----