Real-time Object Detection Using YOLOv10 in a Jupyter Notebook
This tutorial simplifies the process of converting DJI drones into surveillance or object detection systems using a Jupyter Notebook. You can use VSCode with the Jupyter extension on Windows for an efficient workflow.

## Install Jupyter and Prerequisites:

!pip install ultralytics
!pip install opencv-python opencv-contrib-python


# !!!! Install the GPU version below if your system has a powerfull-GPU (i'm running on 3070ti 8gbvram 32gbram) otherwise skip this step and install the cpu version

In [None]:
pip install torch==2.0.1+cu118 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# CPU-version below

In [None]:
!pip install torch torchvision torchaudio

## Load the YOLOv10n model and perform detection

In [1]:

from ultralytics import YOLOv10

# Load the YOLO model
model = YOLOv10("models/yolov10n.pt")

# Test the model with a sample image 
results = model("assets/test.jpg", show=True)  

  from .autonotebook import tqdm as notebook_tqdm



image 1/1 c:\Users\91902\Downloads\DJI-G2-YOLOVX\assets\test.jpg: 640x640 5 persons, 2 ties, 4.0ms
Speed: 6.1ms preprocess, 4.0ms inference, 103.2ms postprocess per image at shape (1, 3, 640, 640)


## Load the YOLOv10n model and perform detection on a video with custom pothole detection yolov10 model

In [None]:


from ultralytics import YOLOv10

# Load the YOLO model
model = YOLOv10("models/pthlebest.pt")

# Test the model with a sample video
results = model("assets/pothole.mp4", show=True, save=True)  

  from .autonotebook import tqdm as notebook_tqdm




errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

video 1/1 (frame 1/505) c:\Users\91902\Downloads\DJI-G2-YOLOVX\testvideo.mp4: 384x640 1 person, 6 cars, 149.9ms
video 1/1 (frame 2/505) c:\Users\91902\Downloads\DJI-G2-YOLOVX\testvideo.mp4: 384x640 1 person, 7 cars, 25.7ms
video 1/1 (frame 3/505) c:\Users\91902\Downloads\DJI-G2-YOLOVX\testvideo.mp4: 384x640 1 person, 6 cars, 19.8ms
video 1/1 (frame 4/505) c:\Users\91902\Downloads\DJI-G2-YOLOVX\testvideo.mp4: 384x640 1 person, 7 cars, 19.4ms
video 1/1 (frame 5/505) c:\Users\91902\Downloads\DJI-G2-YOLOVX\testvideo.mp4: 384x640 1 person, 6 cars, 18.9ms
video 1/1 (frame

# DJI-G2-O3 Detection with Ndi/hx-Cosmostreamer box via ethernet port

### NDI python lib

In [None]:
!pip install ndi-python

In [None]:
import sys
import numpy as np
import cv2 as cv
import NDIlib as ndi
from ultralytics import YOLO

def main():
    # Initialize NDI
    if not ndi.initialize():
        return 0

    # Find NDI sources
    ndi_find = ndi.find_create_v2()
    if ndi_find is None:
        return 0

    sources = []
    while not len(sources) > 0:
        print('Looking for sources ...')
        ndi.find_wait_for_sources(ndi_find, 1000)
        sources = ndi.find_get_current_sources(ndi_find)

    # Create NDI receiver
    ndi_recv_create = ndi.RecvCreateV3()
    ndi_recv_create.color_format = ndi.RECV_COLOR_FORMAT_BGRX_BGRA
    ndi_recv = ndi.recv_create_v3(ndi_recv_create)
    if ndi_recv is None:
        return 0

    ndi.recv_connect(ndi_recv, sources[0])
    ndi.find_destroy(ndi_find)

    # Load YOLOv10 model
    yolo_model = YOLO('models/yolov10n.pt')

    cv.startWindowThread()

    while True:
        # Receive NDI video frame
        t, v, _, _ = ndi.recv_capture_v2(ndi_recv, 5000)

        if t == ndi.FRAME_TYPE_VIDEO:
            print('Video data received (%dx%d).' % (v.xres, v.yres))
            frame = np.copy(v.data)

            # Ensure the frame has exactly 3 channels (RGB)
            if frame.shape[2] == 4:
                frame = cv.cvtColor(frame, cv.COLOR_BGRA2BGR)

            # Perform YOLOv8 inference on the frame
            results = yolo_model(frame)

            # Ensure results is a single result object
            if isinstance(results, list) and len(results) > 0:
                result = results[0]  # Get the first result
                annotated_frame = result.plot()  # Using plot() method for visualization

                # Convert frame to suitable format for OpenCV
                annotated_frame = cv.cvtColor(annotated_frame, cv.COLOR_RGB2BGR)

                # Display the annotated frame
                cv.imshow('YOLOv9 Tracking with NDI Stream', annotated_frame)

            ndi.recv_free_video_v2(ndi_recv, v)

        if cv.waitKey(1) & 0xff == 27:  # Exit on 'Esc' pressq
            break

    # Cleanup
    ndi.recv_destroy(ndi_recv)
    ndi.destroy()
    cv.destroyAllWindows()

    return 0


if __name__ == "__main__":
    sys.exit(main())


# Open-vino for intel based CPU/gpu faster inference (tested on intel i7-12650)

In [1]:
!pip install openvino-dev


Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting openvino-dev
  Downloading openvino_dev-2024.5.0-17288-py3-none-any.whl.metadata (15 kB)
Collecting networkx<=3.1.0 (from openvino-dev)
  Downloading networkx-3.1-py3-none-any.whl.metadata (5.3 kB)
Collecting openvino==2024.5.0 (from openvino-dev)
  Downloading openvino-2024.5.0-17288-cp39-cp39-win_amd64.whl.metadata (8.5 kB)
Downloading openvino_dev-2024.5.0-17288-py3-none-any.whl (4.6 MB)
   ---------------------------------------- 0.0/4.6 MB ? eta -:--:--
   ---------------------------------------- 0.0/4.6 MB 1.4 MB/s eta 0:00:04
   - -------------------------------------- 0.2/4.6 MB 3.3 MB/s eta 0:00:02
   -------- ------------------------------- 1.0/4.6 MB 10.9 MB/s eta 0:00:01
   ---------------------------------- ----- 4.0/4.6 MB 25.4 MB/s eta 0:00:01
   ---------------------------------------- 4.6/4.6 MB 29.3 MB/s eta 0:00:00
Downloading openvino-2024.5.0-17288-cp39-cp39-win_amd64.whl (37.4 MB)


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.3.1+cu118 requires torch==2.3.1+cu118, but you have torch 2.0.1 which is incompatible.


In [None]:
import sys
import numpy as np
import cv2 as cv
from openvino.runtime import Core  # Import OpenVINO runtime
import time

def main():
    # Load YOLOv10 model with OpenVINO
    ie = Core()
    model_path = "models/openvino/yolov10n.xml"  # OpenVINO IR format
    compiled_model = ie.compile_model(model=model_path, device_name="AUTO")
    input_layer = compiled_model.input(0)

    # Load the video file
    video_path = "assets/djig2.mp4" # replace your video path here
    cap = cv.VideoCapture(video_path)

    if not cap.isOpened():
        print(f"Error: Could not open video file {video_path}")
        return 0

    cv.startWindowThread()

    while True:
        # Read a frame from the video
        ret, frame = cap.read()
        if not ret:
            print("End of video file reached or error occurred.")
            break

        # Preprocess the frame
        input_blob = cv.resize(frame, (640, 640))  # Resize to model input size
        input_blob = input_blob.transpose((2, 0, 1))  # Change data layout from HWC to CHW
        input_blob = np.expand_dims(input_blob, axis=0)  # Add batch dimension
        input_blob = input_blob.astype(np.float32)  # Convert to float32

        # Perform YOLOv10 inference on the frame
        start_time = time.time()
        results = compiled_model([input_blob])[compiled_model.outputs[0]]
        inference_time = time.time() - start_time
        print(f"Inference Time: {inference_time:.2f} seconds")

        # Post-processing the results and annotating the frame
        # For simplicity, let's use the function `process_results` to handle it
        annotated_frame = process_results(frame, results)

        # Display the annotated frame
        cv.imshow('YOLOv10 Inference on Video', annotated_frame)

        if cv.waitKey(1) & 0xff == 27:  # Exit on 'Esc' press
            break

    # Cleanup
    cap.release()
    cv.destroyAllWindows()

    return 0

def process_results(frame, results):
    # Dummy function to handle post-processing of results
    # This function decodes the model output and annotates the frame accordingly
    return frame

if __name__ == "__main__":
    sys.exit(main())


Inference Time: 0.05 seconds
Inference Time: 0.03 seconds
Inference Time: 0.03 seconds
Inference Time: 0.03 seconds
Inference Time: 0.04 seconds
Inference Time: 0.03 seconds
Inference Time: 0.04 seconds
Inference Time: 0.04 seconds
Inference Time: 0.03 seconds
Inference Time: 0.03 seconds
Inference Time: 0.03 seconds
Inference Time: 0.04 seconds
Inference Time: 0.03 seconds
Inference Time: 0.04 seconds
Inference Time: 0.04 seconds
Inference Time: 0.04 seconds
Inference Time: 0.04 seconds
Inference Time: 0.04 seconds
Inference Time: 0.03 seconds
Inference Time: 0.03 seconds
Inference Time: 0.02 seconds
Inference Time: 0.03 seconds
Inference Time: 0.03 seconds
Inference Time: 0.03 seconds
Inference Time: 0.03 seconds
Inference Time: 0.04 seconds
Inference Time: 0.03 seconds
Inference Time: 0.04 seconds
Inference Time: 0.04 seconds
Inference Time: 0.04 seconds
Inference Time: 0.02 seconds
Inference Time: 0.03 seconds
Inference Time: 0.04 seconds
Inference Time: 0.03 seconds
Inference Time

SystemExit: 0

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
