# Import pose estimation model

## Define output format

Let's load the JSON file which describes the human pose task.  This is in COCO format, it is the category descriptor pulled from the annotations file.  We modify the COCO category slightly, to add a neck keypoint.  We will use this task description JSON to create a topology tensor, which is an intermediate data structure that describes the part linkages, as well as which channels in the part affinity field each linkage corresponds to.

In [1]:
import os
os.environ['MPLCONFIGDIR'] = os.getcwd() + "/configs/" # Specify MatplotLib config folder

import json
# Requiere https://github.com/NVIDIA-AI-IOT/trt_pose
import trt_pose.coco
from trt_pose.draw_objects import DrawObjects
from trt_pose.parse_objects import ParseObjects

with open('human_pose.json', 'r') as f:
    human_pose = json.load(f)

topology = trt_pose.coco.coco_category_to_topology(human_pose)

parse_objects = ParseObjects(topology)
draw_objects = DrawObjects(topology)

## Import TensorRT optimized model

Next, we'll load our model. It has been optimized using another Notebook and saved so that we do not need to perform optimization again, we can just load the model. Please note that TensorRT has device specific optimizations, so you can only use an optimized model on similar platforms.

In [2]:
import torch
# Requiere https://github.com/NVIDIA-AI-IOT/torch2trt
from torch2trt import TRTModule

OPTIMIZED_MODEL = 'resnet18_baseline_att_224x224_A_epoch_249_trt.pth'

model_trt = TRTModule()
model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))

<All keys matched successfully>

# Define video-processing pipeline

## Pre-process image for TRT_Pose

Next, let's define a function that will preprocess the image, which is originally in BGR8 / HWC format. It is formated to the default Torch format.

In [3]:
import cv2
import torchvision.transforms as transforms
import PIL.Image

mean = torch.Tensor([0.485, 0.456, 0.406]).cuda()
std = torch.Tensor([0.229, 0.224, 0.225]).cuda()
device = torch.device('cuda')

def preprocess(image):
    global device
    device = torch.device('cuda')
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = PIL.Image.fromarray(image)
    image = transforms.functional.to_tensor(image).to(device)
    image.sub_(mean[:, None, None]).div_(std[:, None, None])
    return image[None, ...]

## Access video feed

A CSI camera is used for experimentation. A custom interface will be developped later on to interact with all sort of video feed, including WiFi cameras. The output window is also defined in this cell. See console for details about the video acquisition pipeline.

In [4]:
# Requiere https://github.com/NVIDIA-AI-IOT/jetcam
from jetcam.csi_camera import CSICamera
from jetcam.utils import bgr8_to_jpeg

WIDTH, HEIGHT = 224, 224 # Defined by the model

camera = CSICamera(width=WIDTH, height=HEIGHT, capture_fps=30)

import ipywidgets
from IPython.display import display

image_w = ipywidgets.Image(format='jpeg')

display(image_w)

Image(value=b'', format='jpeg')

## Processing loop

The *execute()* function contains the whole analysis process: 
- Read image
- Pre-process to Torch format
- Infere key-points
- Draw skeleton on the input image
- Update in output window.

In [5]:
camera.running = True
def execute(change):
    image = cv2.rotate(change['new'], cv2.ROTATE_180)
    data = preprocess(image)
    cmap, paf = model_trt(data)
    cmap, paf = cmap.detach().cpu(), paf.detach().cpu()
    counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15)
    draw_objects(image, counts, objects, peaks)
    image_w.value = bgr8_to_jpeg(image[:, ::-1, :])

In [6]:
camera.observe(execute, names='value')

In [7]:
camera.unobserve(execute, names='value')