# A Practical Guide to Running Object Detection Models: YOLOX Use Case

This notebook provides a practical guide for running object detection models on Tenstorrent hardware devices using the TT-BUDA compiler stack. *For detailed information on model compatibility, please refer to the [models support table](#) to check which model works with which Tenstorrent device(s).*

In this example, we demonstrate how to use the [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) model on Tenstorrent AI accelerator hardware to label objects in a video file. The input file `video.mp4` is processed, and the output with labeled objects is saved as `output_video.mp4`.

**Note on terminology:**

While TT-BUDA is the official Tenstorrent AI/ML compiler stack, PyBUDA is the Python interface for TT-BUDA. TT-BUDA is the core technology; however, PyBUDA allows users to access and utilize TT-BUDA's features directly from Python. This includes directly importing model architectures and weights from PyTorch, TensorFlow, ONNX, and TFLite.

## Guide Overview

In this guide, we cover the steps for running the **YOLOX** model to detect objects in a video. The results are saved in a new video file with bounding boxes and labels overlaid on detected objects.

You will learn how to:
- Set up the appropriate libraries and environment
- Run YOLOX for video labeling
- Save the output with detected labels

## Step 1: Import libraries

Make sure that you have an activate Python environment with the latest version of PyBUDA installed.

In [None]:
import subprocess
import os
import cv2
import numpy as np
import pybuda
import requests
import torch
import logging
from queue import Queue
from pybuda._C.backend_api import BackendDevice
from yolox.data.data_augment import preproc as preprocess
from yolox.data.datasets import COCO_CLASSES
from yolox.exp import get_exp
from yolox.utils import demo_postprocess, multiclass_nms

## Step 2: Configure PyBUDA Parameters and Device-Specific Settings

There are optional configurations that can be adjusted before compiling and running a model on Tenstorrent hardware. Sometimes, the configurations are necessary to compile the model and other times they are tuneable parameters that can be adjusted for performance enhancement.

In [None]:
def run_yolox_video(variant, video_path, output_path, batch_size=1):
    """
    Process all frames of a video file using YOLOX and generate a new video file containing object detection results.
    param variant: YOLOX model variant (e.g., yolox_nano, yolox_tiny, yolox_s, yolox_m, yolox_l, yolox_x)
    param video_path: input video file path
    param output_path: output video file path

    """
    logging.basicConfig(level=logging.DEBUG, format='%(asctime)s | %(levelname)s | %(message)s')
    logging.info("Starting YOLOX video processing.")

     # Set PyBuda configuration parameters
    compiler_cfg = pybuda.config._get_global_compiler_config()
    compiler_cfg.balancer_policy = "Ribbon"
    compiler_cfg.default_df_override = pybuda.DataFormat.Float16_b
    os.environ["PYBUDA_DECOMPOSE_SIGMOID"] = "1"
    logging.info("PyBuda configuration set.")

    # Device specific configurations
    available_devices = pybuda.detect_available_devices()
    logging.info(f"Available devices: {available_devices}")
    if available_devices:
        if available_devices[0] == BackendDevice.Wormhole_B0:
            if variant not in ["yolox_nano", "yolox_s"]:
                os.environ["PYBUDA_FORK_JOIN_BUF_QUEUES"] = "1"
                os.environ["PYBUDA_FORK_JOIN_EXPAND_OUTPUT_BUFFERS"] = "1"
                os.environ["PYBUDA_FORK_JOIN_SKIP_EXPANDING_BUFFERS"] = "1"
                os.environ["TT_BACKEND_TIMEOUT"] = "7200"  

## Step 3: Load Model Weights and Prepare the Model

loads the YOLOX model weights if not already available, loads the weights into the model, and compiles the model with PyBUDA for the specified variant.

In [None]:
weight_name = f"weight/{variant}.pth"
model_name = "yolov3" if variant == "yolox_darknet" else variant.replace("_", "-")
exp = get_exp(exp_name=model_name)
model = exp.get_model()
ckpt = torch.load(weight_name, map_location="cpu")
model.load_state_dict(ckpt["model"])
model.eval()
tt_model = pybuda.PyTorchModule(f"pt_{variant}", model)

## Step 4: Prepare Dummy Input and Compile Model

A dummy input is created to initialize and test the model on the hardware.

In [None]:
input_shape = (416, 416) if variant in ["yolox_nano", "yolox_tiny"] else (640, 640)
dummy_input = torch.randn(1, 3, input_shape[0], input_shape[1])
try:
    dummy_output_queue = pybuda.run_inference(tt_model, inputs=[(dummy_input, )], input_count=batch_size)
    dummy_output = dummy_output_queue.get()
    logging.info("Model compiled successfully with dummy inference.")
except Exception as e:
    logging.error(f"Model compilation failed: {e}")
    return

## Step 5: Open Video and Initialize Video Writer

The video is opened for reading, and an output video writer is initialized to save the processed frames.

In [None]:
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
    logging.error(f"Failed to open video file {video_path}")
    return
width, height = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
original_fps = cap.get(cv2.CAP_PROP_FPS)
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
out = cv2.VideoWriter(output_path, fourcc, original_fps, (width, height))
logging.info(f"Output video writer initialized: {output_path}")

## Step 6: Process Video Frames

Each frame is read, preprocessed, and passed to the model for inference. Detected objects are drawn on each frame, and the frame is saved to the output video.

 ### **Run inference on the targeted device**


Running a model on a Tenstorrent device invovles two parts: compilation and runtime.

Compilation -- TT-BUDA is a compiler. Meaning that it will take a model architecture graph and compile it for the target hardware. Compilation can take anywhere from a few seconds to a few minutes, depending on the model. This only needs to happen once. When you execute the following block of code the compilation logs will be displayed.

Runtime -- once the model has been compiled and loaded onto the device, the user can push new inputs which will execute immediately.

The `run_inference` API can achieve both steps in a single call. If it's the first call, the model will compile. Any subsequent calls will execute runtime only.

Please refer to the documentation for alternative APIs such as `initialize_pipeline` and `run_forward`.
 

In [None]:
while True:
        ret, frame = cap.read()
        if not ret:
            logging.info("End of video file reached.")
            break
        frame_count += 1
        logging.info(f"Processing frame {frame_count}")

        # Image preprocessing
        img, ratio = preprocess(frame, input_shape)
        img_tensor = torch.from_numpy(img).unsqueeze(0).float()  # (1, 3, H, W)
        batch_input = torch.cat([img_tensor] * batch_size, dim=0)  # (batch_size, 3, H, W)

        try:
            
             # Run inference on Tenstorrent device
            logging.debug(f"Before run_inference call for frame {frame_count}")
            output_queue = pybuda.run_inference(None, inputs=[(batch_input, )], input_count=batch_size)
            output = output_queue.get()
            logging.debug(f"After run_inference call for frame {frame_count}")
            logging.debug(f"Inference completed for frame {frame_count}.")
        except Exception as e:
            logging.error(f"Inference failed for frame {frame_count}: {e}")
            continue

        try:
            # Output processing
            if isinstance(output, list):
                output = output[0]
            if isinstance(output, pybuda.Tensor):
                output_array = output.value().detach().float().numpy()
            elif isinstance(output, torch.Tensor):
                output_array = output.detach().float().numpy()
            else:
                logging.error(f"Unexpected output type: {type(output)}")
                continue

            # Check output array shape
            logging.debug(f"Output array shape: {output_array.shape}")
            logging.debug(f"Output array dtype: {output_array.dtype}")
            logging.debug(f"Output array stats: min={np.min(output_array)}, max={np.max(output_array)}, mean={np.mean(output_array)}")

            # If the output array is 1-dimensional, reshape it to 2-dimensional
            if output_array.ndim == 1:
                logging.warning(f"Output array is 1-dimensional for frame {frame_count}. Reshaping to 2D.")
                output_array = output_array.reshape(1, -1)
                logging.debug(f"Reshaped output array shape: {output_array.shape}")

            # Check for NaN or Inf values in the output array
            if np.isnan(output_array).any() or np.isinf(output_array).any():
                logging.error(f"Invalid values (NaN or Inf) in output_array for frame {frame_count}")
                continue

            # Post-processing 
            predictions = demo_postprocess(output_array, input_shape)[0]
            if predictions is None or predictions.shape[0] == 0:
                logging.info(f"No objects detected in frame {frame_count}.")
                out.write(frame)  
                continue

            boxes = predictions[:, :4]
            scores = predictions[:, 4:5] * predictions[:, 5:]
            boxes_xyxy = np.ones_like(boxes)
            boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2] / 2.0
            boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3] / 2.0
            boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2] / 2.0
            boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3] / 2.0
            boxes_xyxy /= ratio

            dets = multiclass_nms(
                boxes_xyxy, scores, nms_thr=0.45, score_thr=0.1
            )
            logging.debug(f"Post-processing completed for frame {frame_count}.")
        except Exception as e:
            logging.error(f"Post-processing failed for frame {frame_count}: {e}")
            continue


        #score_threshold = 0.30

        if dets is not None:
            final_boxes, final_scores, final_cls_inds = dets[:, :4], dets[:, 4], dets[:, 5]
            for box, score, cls_ind in zip(final_boxes, final_scores, final_cls_inds):

                #if score < score_threshold:
                #    continue

                class_name = COCO_CLASSES[int(cls_ind)]
                x_min, y_min, x_max, y_max = map(int, box)
               
                cv2.rectangle(
                    frame,
                    (x_min, y_min),
                    (x_max, y_max),
                    (255, 0, 255),
                    1,
                )
                
                cv2.putText(
                    frame,
                    f"{class_name} {score:.2f}",
                    (x_min, y_min - 5),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.5,
                    (255, 0, 255),
                    2,
                )
                

        
        out.write(frame)
        logging.debug(f"Frame {frame_count} written to output video.")

## Step 7: Release Resources



In [None]:
cap.release()
out.release()
logging.info("Video processing completed.")

## **Result**

<div align="center">
    <img src="https://github.com/eSlimKorea/Model-TEST-TT-BUDA/blob/main/Model%20Test/CV_Models/YOLOX/output_1.jpg?raw=true" width="600" style="margin: 10px;">
    <img src="https://github.com/eSlimKorea/Model-TEST-TT-BUDA/blob/main/Model%20Test/CV_Models/YOLOX/output_2.jpg?raw=true" width="600" style="margin: 10px;">
</div>
<div align="center">
    <img src="https://github.com/eSlimKorea/Model-TEST-TT-BUDA/blob/main/Model%20Test/CV_Models/YOLOX/output_3.jpg?raw=true" width="600" style="margin: 10px;">
    <img src="https://github.com/eSlimKorea/Model-TEST-TT-BUDA/blob/main/Model%20Test/CV_Models/YOLOX/output_4.jpg?raw=true" width="600" style="margin: 10px;">
</div>