# A Practical Guide to Running Object Detection Models: YOLOX Use Case

This notebook provides a practical guide for running object detection models on Tenstorrent hardware devices using the TT-BUDA compiler stack. *For detailed information on model compatibility, please refer to the [models support table](#) to check which model works with which Tenstorrent device(s).*

In this example, we demonstrate how to use the [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) model on Tenstorrent AI accelerator hardware to label objects in a video file. The input file `video.mp4` is processed, and the output with labeled objects is saved as `output_video.mp4`.

**Note on terminology:**

While TT-BUDA is the official Tenstorrent AI/ML compiler stack, PyBUDA is the Python interface for TT-BUDA. TT-BUDA is the core technology; however, PyBUDA allows users to access and utilize TT-BUDA's features directly from Python. This includes directly importing model architectures and weights from PyTorch, TensorFlow, ONNX, and TFLite.

## Guide Overview

In this guide, we cover the steps for running the **YOLOX** model to detect objects in a video. The results are saved in a new video file with bounding boxes and labels overlaid on detected objects.

You will learn how to:
- Set up the appropriate libraries and environment
- Run YOLOX for video labeling
- Save the output with detected labels

## Step 1: Import libraries

Make sure that you have an activate Python environment with the latest version of PyBUDA installed.

In [None]:
import subprocess
import os
import cv2
import numpy as np
import pybuda
import requests
import torch
import logging
from queue import Queue
from pybuda._C.backend_api import BackendDevice
from yolox.data.data_augment import preproc as preprocess
from yolox.data.datasets import COCO_CLASSES
from yolox.exp import get_exp
from yolox.utils import demo_postprocess, multiclass_nms

: 

## Step 2: Set PyBUDA Configuration and Device-Specific Settings

In [None]:
def run_yolox_video(variant, video_path, output_path, batch_size=1):
    """
    Process all frames of a video file using YOLOX and generate a new video file containing object detection results.
    param variant: YOLOX model variant (e.g., yolox_nano, yolox_tiny, yolox_s, yolox_m, yolox_l, yolox_x)
    param video_path: input video file path
    param output_path: output video file path

    """
    logging.basicConfig(level=logging.DEBUG, format='%(asctime)s | %(levelname)s | %(message)s')
    logging.info("Starting YOLOX video processing.")

     # Set PyBuda configuration parameters
    compiler_cfg = pybuda.config._get_global_compiler_config()
    compiler_cfg.balancer_policy = "Ribbon"
    compiler_cfg.default_df_override = pybuda.DataFormat.Float16_b
    os.environ["PYBUDA_DECOMPOSE_SIGMOID"] = "1"
    logging.info("PyBuda configuration set.")

    # Device specific configurations
    available_devices = pybuda.detect_available_devices()
    logging.info(f"Available devices: {available_devices}")
    if available_devices:
        if available_devices[0] == BackendDevice.Wormhole_B0:
            if variant not in ["yolox_nano", "yolox_s"]:
                os.environ["PYBUDA_FORK_JOIN_BUF_QUEUES"] = "1"
                os.environ["PYBUDA_FORK_JOIN_EXPAND_OUTPUT_BUFFERS"] = "1"
                os.environ["PYBUDA_FORK_JOIN_SKIP_EXPANDING_BUFFERS"] = "1"
                os.environ["TT_BACKEND_TIMEOUT"] = "7200"  

## Step 3: Load Model Weights and Prepare the Model

loads the YOLOX model weights if not already available, loads the weights into the model, and compiles the model with PyBUDA for the specified variant.

In [None]:
weight_name = f"weight/{variant}.pth"
model_name = "yolov3" if variant == "yolox_darknet" else variant.replace("_", "-")
exp = get_exp(exp_name=model_name)
model = exp.get_model()
ckpt = torch.load(weight_name, map_location="cpu")
model.load_state_dict(ckpt["model"])
model.eval()
tt_model = pybuda.PyTorchModule(f"pt_{variant}", model)

## Step 4: Prepare Dummy Input and Compile Model

A dummy input is created to initialize and test the model on the hardware.

In [None]:
input_shape = (416, 416) if variant in ["yolox_nano", "yolox_tiny"] else (640, 640)
dummy_input = torch.randn(1, 3, input_shape[0], input_shape[1])
try:
    dummy_output_queue = pybuda.run_inference(tt_model, inputs=[(dummy_input, )], input_count=batch_size)
    dummy_output = dummy_output_queue.get()
    logging.info("Model compiled successfully with dummy inference.")
except Exception as e:
    logging.error(f"Model compilation failed: {e}")
    return

## Step 5: Configure PyBUDA Parameters

There are optional configurations that can be adjusted before compiling and running a model on Tenstorrent hardware. Sometimes, the configurations are necessary to compile the model and other times they are tuneable parameters that can be adjusted for performance enhancement.

For the BERT model, two key parameters are required for compilation:

* `default_df_override`
* `default_dram_parameters`

In [None]:
# Set PyBuda configurations
compiler_cfg = pybuda.config._get_global_compiler_config()
compiler_cfg.default_df_override = pybuda._C.DataFormat.Float16_b
compiler_cfg.default_dram_parameters = False
compiler_cfg.balancer_policy = "Ribbon"
os.environ["PYBUDA_RIBBON2"] = "1"
os.environ["TT_BACKEND_OVERLAY_MAX_EXTRA_BLOB_SIZE"] = f"{81*1024}"

## Step 6: Instantiate Tenstorrent device

The first time we use PyBUDA, we must initialize a `TTDevice` object which serves as the abstraction over the target hardware.

In [None]:
tt0 = pybuda.TTDevice(
    name="tt_device_0",  # here we can give our device any name we wish, for tracking purposes
)

In this example, `tt0` object is created without specifying the device architecture. Pybuda will automatically detectt and define the architecture based on which Tenstorrent device its run on.

### Specifying the Architecture
If you want to specify the target device architecture, you can pass the `arch` parameter. Here’s how it can be done:

```python
# Create a TTDevice instance with a specified architecture
tt0 = pybuda.TTDevice(
    name="tt_device_0",  # You can give your device any name for tracking purposes
    arch=pybuda.BackendDevice.Grayskull  # Optionally set the target device architecture
)

## Step 7: Create a PyBUDA module from PyTorch model

Next, we must abstract the PyTorch model loaded from HuggingFace into a `pybuda.PyTorchModule` object. This will let the BUDA compiler know which model architecture and AI framework it has to compile.

We then "place" this module onto the previously initialized `TTDevice`.

In [None]:
# Create module
pybuda_module = pybuda.PyTorchModule(
    name = "pt_bert_question_answering",  # give the module a name, this will be used for tracking purposes
    module=model  # specify the model that is being targeted for compilation
)

# Place module on device
tt0.place_module(module=pybuda_module)

## Step 8: Push the (tokenized) inputs into the model input queue

In [None]:
# Push inputs
tt0.push_to_inputs(input_tokens)

## Step 9: Run inference on the targeted device

Running a model on a Tenstorrent device invovles two parts: compilation and runtime.

Compilation -- TT-BUDA is a compiler. Meaning that it will take a model architecture graph and compile it for the target hardware. Compilation can take anywhere from a few seconds to a few minutes, depending on the model. This only needs to happen once. When you execute the following block of code the compilation logs will be displayed.

Runtime -- once the model has been compiled and loaded onto the device, the user can push new inputs which will execute immediately.

The `run_inference` API can achieve both steps in a single call. If it's the first call, the model will compile. Any subsequent calls will execute runtime only.

Please refer to the documentation for alternative APIs such as `initialize_pipeline` and `run_forward`.

In [None]:
# Run inference on Tenstorrent device
output_q = pybuda.run_inference()  # executes compilation (if first time) + runtime
output = output_q.get()  # get last value from output queue

## Step 10: Data Postprocessing

Data postprocessing is done to convert the model outputs into a readable / useful format. For NLP models, this usually means receiving the logit outputs from the model, extracting the top matching tokens, and then decoding the tokens into text.

In [None]:
# Data postprocessing
answer_start = output[0].value().argmax().item()
answer_end = output[1].value().argmax().item()
answer = tokenizer.decode(input_tokens["input_ids"][0, answer_start : answer_end + 1])

## Step 11: Print and evaluate outputs

In [None]:
# Print outputs
print(f"Input context:\n{context}")
print(f"\nInput question:\n{question}")
print(f"\nOutput from model running on TTDevice:\n{answer}")

## Step 12: Shutdown PyBuda

In [None]:
pybuda.shutdown()