<a href="https://colab.research.google.com/github/Samahu/ouster-yolov5-demo/blob/main/Ouster_Yolo5_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Performing Object Detection on Lidar Data using YOLOv5
![](https://raw.githubusercontent.com/Samahu/ouster-yolov5-demo/main/media/intro.png)

In this tutorial, we are going to use a pretrained YOLOv5 model and perform inference on Lidar Data generated by Ouster sensors.

For simpliciy we are going to work with recorded data but most of the ideas discussed here works when streaming directly from a sensor.

# What is YOLOv5
YOLOv5, or You Only Look Once version 5, is a state-of-the-art object detection algorithm that can accurately detect multiple objects in an image or video frame in real-time. It uses a single neural network architecture to predict object bounding boxes and class labels, and is known for its high accuracy, ability to detect objects of various sizes and aspect ratios, and fast inference speeds.

* Evaluate our YOLOv5 model's performance against example LIDAR data



# Step 1: Install Requirements
By default Google Colab will import the requried dependencies for running python torch, thus we only need to install `ouster-sdk` pypi package which provides the necessary objects and method to connect directly to Ouster sensors and replay or record pcap files that captures the sensor packets.

> Note: When prompted to restart the runtime, click "RESTART RUNTIME" and re-run the `pip install ouster-sdk` command again.

In [None]:
%pip install ouster-sdk

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Next we are going to import some python modules that we are going to need through out the tutorial, which includes _NumPy_, _OpenCV_ and _Torch_. After imports we display Torch version information and whether GPU is enabled or not.

In [None]:
# Import Numpy, OpenCV and Torch
import time
import cv2
import numpy as np
from PIL import Image
import itertools
import torch

print(f"Setup complete. Using torch {torch.__version__} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")

Setup complete. Using torch 2.0.0+cu118 (Tesla T4)


# Step 2: Getting Lidar Data
Next we grab some sample lidar data recorded from an Ouster sensor in pcap format. When getting lidar data for Ouster sensors in pcap format you need the accompanied metadata json which generated during recording. Without this file it is not possible to properly replay sensor data from a pcap file.   

In [None]:
import progressbar
import urllib.request

pbar = None
def show_progress(block_num, block_size, total_size):
    global pbar
    if pbar is None:
        pbar = progressbar.ProgressBar(maxval=total_size)
        pbar.start()

    downloaded = block_num * block_size
    if downloaded < total_size:
        pbar.update(downloaded)
    else:
        pbar.finish()
        pbar = None

# Download the file from the URL and save it locally
metadata_url = "https://raw.githubusercontent.com/ouster-lidar/yolov5-ouster-lidar-data/main/Ouster-YOLOv5-sample.json"
pcap_url = "https://media.githubusercontent.com/media/ouster-lidar/yolov5-ouster-lidar-data/main/Ouster-YOLOv5-sample.pcap"

urllib.request.urlretrieve(metadata_url, "Ouster-YOLOv5-sample.json", show_progress)
urllib.request.urlretrieve(pcap_url, "Ouster-YOLOv5-sample.pcap", show_progress)

100% (7157 of 7157) |####################| Elapsed Time: 0:00:00 Time:  0:00:00
100% (156016072 of 156016072) |##########| Elapsed Time: 0:00:01 Time:  0:00:01


('Ouster-YOLOv5-sample.pcap', <http.client.HTTPMessage at 0x7f81891f3820>)

# Step 3: Stream Lidar Data

Now we are going to import some ouster class objects and utilites that enable us to connect to an ouster sensor or replay a captured stream from a pcap file 

In [None]:
from ouster import client
from ouster.client import ChanField, LidarScan
from ouster.pcap import Pcap
from ouster.sdk.util import resolve_metadata
from ouster.client._utils import AutoExposure

The follwing class **ScanCapture** helps us stream sensor data from a pcap file, it follows the same interface as the **VideoCapture** in OpenCV. So it would be very simple to use with existing OpenCV examples.   

We only need to provide the path to pcap file and ScanCapture will automatically resolve the path for the metadata json using the `resolve_metadata` function.

In [None]:
class ScanCapture:
    def __init__(self, source, loop=True) -> None:
        self._open_pcap(source)
        self._loop = loop
    @property
    def sensor_info(self):
        return self._info
    @property
    def scans_loop(self) -> LidarScan:
        return self._scans_loop
    def _open_pcap(self, file_path):
        meta_file = resolve_metadata(file_path)
        with open(meta_file, "r") as f:
            self._info = client.SensorInfo(f.read())
        if self._info:
            self._scans = client.Scans(Pcap(file_path, self._info), complete=True)
        else:
            print("error opening the stream")
        # Use this only on short clips
        self._scans_loop = itertools.cycle(self._scans)
    def _read(self):
        if self._loop == True:
            for s in self._scans_loop:
                yield s
        else:
            for s in self._scans:
                yield s
    def read(self) -> LidarScan:
        try:
            return True, next(self._read())
        except Exception as e:
            print(e)
            return False, None
    def release(self):
        self._scans.close()

# Step 4: Run Inference With Pre-Trained Model
To keep this tutorial as short as possible we are going to download and use a pre-trained YOLOv5 model from torch hub.

In [None]:
# Load YOLOv5 model
%pip install ultralytics
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)


Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2023-5-3 Python-3.10.11 torch-2.0.0+cu118 CUDA:0 (Tesla T4, 15102MiB)



[31m[1mrequirements:[0m /root/.cache/torch/hub/requirements.txt not found, check failed.


Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape... 


The following method `model_infer` takes in the YOLOv5 `model` that we have just downloaded as the 1st argument and also takes in a `frame` object for the 2nd argument. The `frame` object here refers to one of the data layers or channels that Ouster sensors produce, these include: _RANGE, SIGNAL, REFLECTIVITY_ and _NEAR_IR_. For more information about these data layers please refer to the sensor manual.
  
The `model_infer` does the following:
* converts the frame object to a format that YOLOv5 model can work with.
* performs object detection and classification using the model.
* highlights each detected objects by drawing a rectangle around it.

In [None]:
# Model inference method
def model_infer(model, frame):
    image = Image.fromarray(frame * 255)
    results = model(image)
    class_indices = results.pred[0][:, -1].int()
    return results.pred[0][class_indices == 0, :4]


def draw_results(image, objects):
    for box in objects:
        x1, y1, x2, y2 = map(int, box)
        cv2.rectangle(image, (x1, y1), (x2, y2), (0, 1, 0), 1)
        cv2.putText(image,'person', (x1, y1),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.4, (0, 1, 0), 1, 2)


## Step 4.1: Destaggering Frames

In Ouster sensors, the point cloud is organized into staggered frames. This means that instead of capturing all the data points in one horizontal scan, the sensor captures every other data point in one scan and then captures the remaining data points in the next scan. This creates a staggered pattern in the point cloud, where every other row is shifted by half a scan angle relative to the previous row.

The following image shows the raw representation of point clouds generated as captured by the sensor.

![](https://static.ouster.dev/sdk-docs/_images/lidar_scan_staggered.png)

Evidently, the image does not seem very comprehensible. To actually view the content of the scan and run inference on the frame against the YOLOv5 model we need to perform an operation on it known as destaggering. The following image shows the outcome of applying destaggering on the above frame:

![](https://static.ouster.dev/sdk-docs/_images/lidar_scan_destaggered.png)

For more information refer to [Staggering and Destaggering in Ouster Doc](https://static.ouster.dev/sdk-docs/reference/lidar-scan.html#staggering-and-destaggering)

## Step 4.2: Auto Leveling
Before we proceed with reading sensor data and performing inference, we need to briefly explain the concept of auto leveling. Some of the values included in the point cloud represet very fine measurements like the distance which is measured in milimeters. The values could range between severl millimeters to thousands. These values can not fit within the typical range for image in computers which range between 0 and 255. So we need to scale the range of values to fit this range. It may seem very straightforward to scale the image values by the factor of `image/np.max(image)*255` but simply scaling the value by a factor wouldn't work very well. Consider the case where the majority of values emitted by the sensor lie in the range of 50mm to 500mm but then you had few sparse values in the range of 10000mm or beyond. Performing the simple scalar conversion discussed ealier would result in a poor image (TODO give example image) due to dispropertionate scale of the image value. What would be a better approach to avoid the extreme values (outliers) that don't contribute much to the image. This can be achieved by taking the 3rd and 97th percentile of image values, drop any value that exceed them and then scale the remaining values with the reduce range. This process is known as **Auto Leveling**, the ouster-sdk implements this functionality through the **AutoExposure** interface. For more infromation refer to `https://static.ouster.dev/sdk-docs/cpp/ouster_client/image_processing.html`

TODO: include photos that better help explain the concept

In [None]:
# Define post processor for each channel
post_processors = {
    ChanField.RANGE: AutoExposure(),
    ChanField.SIGNAL: AutoExposure(),
    ChanField.REFLECTIVITY: AutoExposure(),
    ChanField.NEAR_IR: AutoExposure()
}

In [None]:
def get_frame_from_scan(scan, channel):
    image = scan.field(channel).astype(np.float32)
    image = client.destagger(scan_capture.sensor_info, image)
    post_processors[channel](image)
    return image

Now that we have all the methods needed, let's go ahead and process the pcap file that we have downloaded early on. Since we want to save the inference results to a video file, then we need to set the `loop` parameter of **ScanCapture** to **False**.

In [None]:
pcap_file_path = "./Ouster-YOLOv5-sample.pcap"
scan_capture = ScanCapture(pcap_file_path, loop=False)

channels = [ChanField.RANGE, ChanField.SIGNAL, ChanField.REFLECTIVITY, ChanField.NEAR_IR]

fps = 10   # TODO: read from metadata
video_size = (1024, 128 * 4)    # TODO: read frame size from metadata * 4
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('output.avi', fourcc, fps, video_size)

while True:
    ret, scan = scan_capture.read()

    if not ret:
        break

    images = [None] * len(channels)

    for i in range(len(channels)):
        images[i] = get_frame_from_scan(scan, channels[i])
        results = model_infer(model, images[i])
        # convert to a colored image
        images[i] = cv2.cvtColor(images[i], cv2.COLOR_GRAY2BGR)
        draw_results(images[i], results)

    stacked_images = cv2.vconcat(images)
    converted_image = cv2.convertScaleAbs(stacked_images * 255)
    out.write(converted_image)

scan_capture.release()
out.release()




# Display the results
  To display the results within Google Colab we need to convert it into mp4 file format.

In [None]:
!ffmpeg -y -i output.avi output.mp4

ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --e

In [None]:
from IPython.display import HTML
from base64 import b64encode
mp4 = open('output.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()

HTML("""
<video width="1024" height="512" controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

# Conclusion and Next Steps
This was a quick introduction that shows the viability of utilizing pre-trained YOLOv5 models against LIDAR data generated by Ouster sensors. From here, you can clone the repo assoicated with this Jupyter Notebook and run the provided python example `yolo5_opencv.py` locally.  Additionally, you can try also the `yolo5_simpleviz.py` which allows to you view the point cloud in 3D.

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>