# Action Recognition with Degirum

Human action recognition finds actions over time in a video. The list of actions in this notebook is extensive (400 in total) and covers Person Actions, (for example, drawing, drinking, laughing), Person-Person Actions (for example, hugging, shaking hands), and Person-Object Actions (for example, opening present, mowing the lawn, playing "instrument"). You could find several parent-child groupings on the list of labels, such as braiding hair and brushing hair, salsa dancing, robot dancing, or playing violin and playing guitar. For more information about the labels and the dataset, see the "The Kinetics Human Action Video Dataset" research paper.

This notebook demonstrates live human action recognition with OpenVINO, using the [Action Recognition Models](https://docs.openvino.ai/2024/omz_models_group_intel.html#action-recognition-models) from [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo), specifically the Encoder and Decoder from [action-recognition-0001](https://docs.openvino.ai/2024/omz_models_model_action_recognition_0001.html). Both models create a sequence to sequence ("seq2seq")1 system to identify the human activities for Kinetics-400 dataset. The models use the Video Transformer approach with ResNet34 encoder2.

#### Specify where you want to run your inferences, model_zoo_url, model name for action recognition and video source

In [None]:
# hw_location: where you want to run inference
#     "@cloud" to use DeGirum cloud
#     "@local" to run on local machine
#     IP address for AI server inference
# model_zoo_url: url/path for model zoo
#     cloud_zoo_url: valid for @cloud, @local, and ai server inference options
#     '': ai server serving models from local folder
#     path to json file: single model zoo in case of @local inference
# model_name: name of the model for running AI inference
# image_source: image source for inference
#     path to image file
#     URL of image
#     PIL image object
#     numpy array

hw_location = "@cloud"
model_zoo_url = "https://cs.degirum.com/degirum/openvino_demos"
encoder_model_name = "encoder_action_recognition--224x224_float_openvino_cpu_1"
decoder_model_name = "decoder_action_recognition--224x224_float_openvino_cpu_1"
video_source = "https://archive.org/serve/ISSVideoResourceLifeOnStation720p/ISS%20Video%20Resource_LifeOnStation_720p.mp4"  


#### Imports

In [None]:
import degirum as dg
import degirum_tools
import numpy as np
import cv2
from IPython import display
from preprocessor_action_rec_encoder import ActionRecEncoderPreprocessor
from postprocessor_action_rec_decoder import ActionRecDecoderPostprocessor
from image_overlay import display_text_fnc

#### Load the Action Recognition model

In [None]:
action_rec_zoo = dg.connect(hw_location, model_zoo_url, degirum_tools.get_token())
encoder_model = action_rec_zoo.load_model(encoder_model_name)
decoder_model = action_rec_zoo.load_model(decoder_model_name, custom_postprocessor = ActionRecDecoderPostprocessor)

In [None]:
dec_w, dec_h = decoder_model.model_info.InputW[0],decoder_model.model_info.InputC[0]

#### Run Action Recognition on Video

In [None]:
num_frames = 100  # Maximum number of frames to read from video, set to 0 for all frames.
sample_duration = 16
skip_first_frames = 600
size = encoder_model.model_info.InputC[0]
use_popup=False
with degirum_tools.open_video_stream(video_source) as video_stream:
    w, h, fps = degirum_tools.get_video_stream_properties(video_stream)
    fps = 30
    if num_frames == 0:
        total_frames = video_stream.get(cv2.CAP_PROP_FRAME_COUNT)
    else:
        total_frames = num_frames
    encoder_output = []
    decoder_output = []
    counter = 0
    frames = []
    text_template = "{label},{conf:.2f}%"
    progress = degirum_tools.Progress(total_frames)
    if use_popup:
        title = "Press ESC to Exit"
        cv2.namedWindow(title, cv2.WINDOW_GUI_NORMAL | cv2.WINDOW_AUTOSIZE)  
        
    for i, frame in enumerate(degirum_tools.video_source(video_stream)):
        if i < skip_first_frames:
            continue
        if i == skip_first_frames+total_frames:
            break
        counter = counter + 1
        
        scale = 1280 / max(frame.shape)
        # Adaptative resize for visualization.
        if scale < 1:
            frame = cv2.resize(frame, None, fx=scale, fy=scale, interpolation=cv2.INTER_AREA)

        if counter % 2 == 0:
            preprocessed = ActionRecEncoderPreprocessor(size).preprocess_frame_for_encoder(frame)
            encoder_output.append(encoder_model(preprocessed).results[0]["data"])
            if len(encoder_output) == sample_duration:
                decoder_input = np.concatenate(encoder_output, axis=0)
                # Organize input shape vector to the Decoder (shape: [1x16x512]]
                decoder_input = decoder_input.transpose((2, 3, 0, 1))
                decoder_input = decoder_input.reshape((1, dec_w, dec_h))
                decoder_input = decoder_input.astype(np.float32)
                result_de = decoder_model(decoder_input) 
                decoder_output = result_de.results
                encoder_output = []     

        # Visualize the results.
        if decoder_output:
            for idx in range(len(result_de.results)):
                display_text = text_template.format(
                    label=result_de.results[idx]['label'],
                    conf=result_de.results[idx]['score']*100,
                )
                display_text_fnc(size, frame, display_text, idx)

        # Use this workaround if you experience flickering.
        if use_popup:
            cv2.imshow(title, frame)
            key = cv2.waitKey(1)
            # escape = 27
            if key == 27:
                break
        else:
            # Encode numpy array to jpg.
            _, encoded_img = cv2.imencode(".jpg", frame, params=[cv2.IMWRITE_JPEG_QUALITY, 90])
            # Create an IPython image.
            i = display.Image(data=encoded_img)
            # Display the image in this notebook.
            display.clear_output(wait=True)
            display.display(i)           
        progress.step()
