## This notebook is an example of how to pipeline two models. 
A video stream from a local camera is processed by the person detection model. The person detection results are then processed by the pose detection model, one person bounding box at a time.
Combined result is then displayed.
OpenCV is required to run this sample.

This script works with the following inference options:

1. [DeGirum Cloud Platform](https://cs.degirum.com),
1. DeGirum-hosted AI server node shared via Peer-to-Peer VPN,
1. AI server node hosted by you in your local network,
1. AI server running on your local machine,
1. DeGirum ORCA accelerator directly installed on your local machine.

To try different options, you just need to change the `inference_option` in the code below.

The script needs a web camera connected to the machine running this code. The `camera_index` also needs to be specified in the code below.

### Specify where do you want to run your inferences and camera index here

In [1]:
inference_option = 1  # <<< change it according to your needs selecting from the list in the header comment
camera_index = 0      # camera index; 0 is default camera

### The rest of the cells below should run without any modifications

In [2]:
import degirum as dg # import DeGirum PySDK
import mytools, cv2

In [3]:
# connect to model zoo according to selected inference option
zoo = mytools.connect_model_zoo(inference_option)

Inference option = 'DeGirum Cloud Platform'


In [4]:
# load models for DeGirum Orca AI accelerator
# (change model name to "...n2x_cpu_1" to run it on CPU)
people_det_model = zoo.load_model("yolo_v5s_person_det--512x512_quant_n2x_orca_1")
pose_model = zoo.load_model("mobilenet_v1_posenet_coco_keypoints--353x481_quant_n2x_orca_1")

# adjust pose model properties
pose_model.output_pose_threshold = 0.2 # lower threshold
pose_model.overlay_line_width = 1
pose_model.overlay_alpha = 1
pose_model.overlay_show_probabilities = False
pose_model.overlay_show_labels = False
pose_model.image_backend = 'opencv' 
pose_model.input_numpy_colorspace = 'BGR'
pose_model._model_parameters.InputImgFmt = ['JPEG']

# adjust people model properties
people_det_model.image_backend = 'opencv'
people_det_model._model_parameters.InputImgFmt = ['JPEG']

In [5]:
# open video stream from local camera 
stream = mytools.open_video_stream(camera_index)

Successfully opened video stream


In [7]:
# AI prediction loop
# Press 'x' or 'q' to stop
with mytools.Display("Poses") as display:
    
    with pose_model: # performance optimization to keep connection to nested model open
        
        # run person detection model on a camera stream
        for people in people_det_model.predict_batch(mytools.video_source(stream)):
            # prepare list of bboxes of detected person
            # if people is not None:
            person_boxes = [person['bbox'] for person in people.results]
            if not person_boxes:
                continue

            # prepare list of images cropped around each detected person
            person_crops = [ mytools.Display.crop(people.image, box) for box in person_boxes ]

            # for each detected person detect the pose
            all_poses = None # accumulated result
            for poses, box in zip(pose_model.predict_batch(person_crops), person_boxes):

                for r in poses.results: # convert pose coordinates to back to original image
                    for p in r['landmarks']:
                        p['landmark'][0] += box[0]
                        p['landmark'][1] += box[1]

                if all_poses is None: # accumulate all detected poses
                    all_poses = poses
                    all_poses._input_image = people.image_overlay
                else:
                    all_poses._inference_results += poses.results

            display.show(all_poses.image_overlay)

In [8]:
stream.release() # release camera stream