![Degirum banner](https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/degirum_banner.png)
## Video Classification with R3D_18

This notebook is an example of how to run video classification with the **R3D_18 model**.  

This notebook demonstrates how to use **DeGirum PySDK** to perform AI inference on video data using the **R3D_18 video classification model**. The model processes sequences of video frames to recognize and classify human activities.  

In this notebook, we provide two workflows:  

1. Running inference on a **video stream**.  
2. Running inference with a **GStreamer pipeline** setup.  

The results of the classification are displayed with timestamps and activity labels, making it easy to interpret detected actions within the video.  

This script works with the following inference options:  

- Run inference on **DeGirum Cloud Platform**  
- Run inference on **DeGirum AI Server** deployed on a localhost or on some computer in your LAN or VPN  
- Run inference on a **DeGirum Hailo accelerator** directly installed on your computer  

To try different options, you need to specify the appropriate `hw_location` option.  

In [None]:
!degirum token create

#### Specify where do you want to run your inferences

In [None]:
# hw_location: where you want to run inference
#     "@cloud" to use DeGirum cloud
#     "@local" to run on local machine
#     IP address for AI server inference
# model_zoo_url: url/path for model zoo
#     cloud_zoo_url: valid for @cloud, @local, and ai server inference options
#     '': ai server serving models from local folder
#     path to json file: single model zoo in case of @local inference
# lp_det_model_name: name of the model for license plate detection
# lp_ocr_model_name: name of the model for license plate OCR
# video_source: video source for inference
#     camera index for local camera
#     URL of RTSP stream
#     URL of YouTube Video
#     path to video file (mp4 etc)
model_name = "r3d_18--112x112_quant_hailort_multidevice_1"
inference_host_address = "@local"
model_zoo_url = "degirum/hailo"

# Specify the video you will run inference on
video_source = "../assets/archery.mp4"

### Simple predictions on a video file

To begin with, let’s take a look at how to run predictions on a video file using the **R3D_18 video classification model**.  
For this, we define a simple function that reads frames from the video, prepares them into a fixed-length clip, and then runs inference with the model.

In [4]:
import cv2
import numpy as np
import degirum as dg

# Load model
model = dg.load_model(
    model_name=model_name,
    inference_host_address=inference_host_address,
    zoo_url=model_zoo_url,
)

# Inference
def infer_clip(path, T=16, size=112):
    """Run inference on a video clip using the R3D_18 model."""
    cap = cv2.VideoCapture(path)
    frames = []

    # Read video frames
    while True:
        ok, frame = cap.read()
        if not ok:
            break
        frame = cv2.cvtColor(cv2.resize(frame, (size, size)), cv2.COLOR_BGR2RGB)
        frames.append(frame.astype(np.uint8))
    cap.release()

    if not frames:
        return

    # Pad or sample frames to fixed length T
    if len(frames) < T:
        frames += [frames[-1]] * (T - len(frames))
    idxs = np.linspace(0, len(frames) - 1, T, dtype=int)
    selected_frames = [frames[i] for i in idxs]

    # Prepare input tensor
    x = np.concatenate(selected_frames, axis=-1).reshape(1, size * size, 48).astype(np.uint8)

    # Run inference
    out = model(x)

    # Handle results
    if (
        isinstance(out.results, list)
        and out.results
        and isinstance(out.results[0], dict)
        and "label" in out.results[0]
    ):
        print(out.results[:5])
    else:
        scores = out.results[0]["data"].reshape(-1)
        top5 = np.argsort(-scores)[:5]
        labels = getattr(model, "label_dictionary", None)
        if labels:
            print([labels.get(int(i), str(int(i))) for i in top5])
        else:
            print(top5.tolist())


# Run inference
infer_clip(video_source)

[{'category_id': 5, 'label': 'archery', 'score': 15.736677169799805}, {'category_id': 161, 'label': 'hurling (sport)', 'score': 12.822478294372559}, {'category_id': 141, 'label': 'golf chipping', 'score': 12.822478294372559}, {'category_id': 142, 'label': 'golf driving', 'score': 11.365378379821777}, {'category_id': 153, 'label': 'hitting baseball', 'score': 9.471148490905762}]


#### Using Gstreamer using Pygobject module

we will use pygobject to fetch frames using gstreamer inside our custom video generator. It will require you to install pygobject which is a python binding which provides bindings for gstreamer etc. To install pygobject, follow below steps:

`sudo apt install python3-gi python3-gi-cairo`

`pip install PyGObject`

In [None]:
import gi
gi.require_version("Gst", "1.0")
from gi.repository import Gst
import numpy as np
import cv2
import degirum as dg

# Initialize GStreamer
Gst.init(None)

# Custom generator using PyGObject
gstreamer_pipeline = (
    "v4l2src device=/dev/video0 ! "
    "videoconvert ! video/x-raw,format=BGR ! "
    "appsink name=sink emit-signals=true max-buffers=1 drop=true"
)

# Clip / preprocess settings for R3D_18
T_CLIP = 16       # number of frames per clip
RESIZE_TO = 112   # model expects 112x112
# The model expects packed FCR input as uint8 with shape (1, 112*112, 3*T)

# Load Hailo model 
model = dg.load_model(
    model_name=model_name,
    inference_host_address=inference_host_address,
    zoo_url=model_zoo_url
)

# Helper: FCR pack (preprocess)
def pack_frames_to_fcr(frames, size=RESIZE_TO, T=T_CLIP):
    """
    frames: list of T frames, each (size, size, 3), RGB, uint8
    returns: (1, size*size, 3*T), uint8
    """
    x = np.concatenate(frames, axis=-1)           # (H, W, 3*T)
    x = x.reshape(1, size * size, 3 * T)          # (1, H*W, 3*T)
    return x.astype(np.uint8)

# Generator: yields model-ready clips
#   - Pull frames from GStreamer
#   - Preprocess: resize to 112x112, BGR->RGB, pack T frames
def custom_video_generator(pipeline_str=gstreamer_pipeline, T=T_CLIP, size=RESIZE_TO):
    pipeline = Gst.parse_launch(pipeline_str)
    sink = pipeline.get_by_name("sink")
    if sink is None:
        raise RuntimeError("appsink named 'sink' not found in pipeline.")
    pipeline.set_state(Gst.State.PLAYING)

    try:
        frames = []
        while True:
            sample = sink.emit("pull-sample")
            if sample is None:
                break

            # Extract BGR frame
            buf = sample.get_buffer()
            caps = sample.get_caps()
            s = caps.get_structure(0)
            width = s.get_value("width")
            height = s.get_value("height")
            ok, map_info = buf.map(Gst.MapFlags.READ)
            if not ok:
                continue
            try:
                frame_bgr = np.frombuffer(map_info.data, np.uint8).reshape((height, width, 3))
            finally:
                buf.unmap(map_info)

            # --- Preprocess for R3D_18 ---
            # 1) Resize to 112x112
            # 2) Convert BGR -> RGB
            frame_rgb112 = cv2.cvtColor(
                cv2.resize(frame_bgr, (size, size)),
                cv2.COLOR_BGR2RGB
            ).astype(np.uint8)
            frames.append(frame_rgb112)

            # When we have T frames, pack and yield
            if len(frames) == T:
                x = pack_frames_to_fcr(frames, size=size, T=T)
                yield x
                frames.clear()
    finally:
        pipeline.set_state(Gst.State.NULL)

# Run prediction and display
try:
    for result in model.predict_batch(custom_video_generator()):
        # DeGirum provides an overlay image with labels/probabilities
        cv2.imshow("Webcam Inference", result.image_overlay)
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
except KeyboardInterrupt:
    pass
finally:
    cv2.destroyAllWindows()

#### Conclusion

In this guide, we demonstrated how to use the **R3D_18 video classification model** with DeGirum PySDK for activity recognition in videos.  
We covered two workflows:  

1. Running predictions on a **video file** by building a simple frame generator that prepares clips of fixed length for the model.  
2. Running predictions on a **live video stream** using a **GStreamer-based generator with PyGObject**, which allows direct access to camera or RTSP feeds.  

Both approaches highlight the flexibility of DeGirum PySDK when working with video data. You can adapt the frame generator to apply custom preprocessing—such as resizing, cropping, or filtering—to better suit your application needs.  

These methods enable you to seamlessly integrate video classification into real-world scenarios, whether you are working with offline recordings or real-time video streams. By leveraging DeGirum’s built-in overlays and multiple inference backends (Cloud, AI Server, or Hailo accelerator), you can quickly deploy and visualize accurate activity recognition results across a variety of environments.