I used google colab so that I dont have to download all the long dependencies and dataset, plus it provides free GPU which helps in training the model efficiently


To use colab GPU,you must:
go to Runtime -> change Runtime type ->(select any GPU you want) in the hardware accelarartor menu

This script performs real-time object detection using YOLOv8 in a web browser within Google Colab. The workflow consists of:


1.   *Loading YOLOv8 model* – Uses ultralytics to load YOLOv8 for object detection.
2.   *Initializing a webcam stream* – Uses JavaScript (Javascript from IPython.display) to capture frames from the webcam.
3. *Processing frames in real-time* – The webcam feed is converted to images and passed through YOLOv8 for detection.
4. *Overlaying detections* – Bounding boxes are drawn on detected objects, and the processed frame is displayed.
5. *Looping until stopped* – The process continues capturing and processing frames until manually stopped.










In [None]:
!pip install ultralytics

Collecting ultralytics
  Downloading ultralytics-8.3.91-py3-none-any.whl.metadata (35 kB)
Collecting ultralytics-thop>=2.0.0 (from ultralytics)
  Downloading ultralytics_thop-2.0.14-py3-none-any.whl.metadata (9.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.8.0->ultralytics)
  Downloading nv

In [None]:
#import the necessary libraries
from base64 import b64decode, b64encode # Handles image encoding/decoding
from google.colab.output import eval_js # Executes JavaScript code in Colab
from IPython.display import display, Javascript # Displays JavaScript output in notebooks
from PIL import Image # Image processing
from ultralytics import YOLO
from ultralytics.engine.results import Results
import io
import numpy as np

# Load the YOLO model with a specified variant
model_type = 'yolov8n.pt' #using the lightweight YOLOv8 nano model
yolo_model = YOLO(model_type)
frame_resolution = [640, 480] #webcam frame resolution
image_compression = 0.8 #compression factor for image capture

this piece of code helps us in running web-cam object detection on google colab for ease-of-use, It doesn't have to do much with the the detection part itself so we dont need to go in the details of this code that much, as it is avaialable on the internet

###Why Use JavaScript for Webcam in Colab?

Google Colab runs in the cloud and does not have direct access to local hardware like webcams.

JavaScript is required because: ✅ Colab runs in a browser – JavaScript can access hardware like the webcam through the browser's navigator.mediaDevices API.

✅ Real-time processing – JavaScript streams the video in the browser while Python processes frames asynchronously.

✅ Efficient Image Capture – JavaScript encodes images as base64, making them lightweight for transfer between Colab and Python.

In [None]:
def initiate_camera_stream():
    """Initializes the webcam stream in a browser using JavaScript."""
    js_code = Javascript(f'''
    const FRAME_RES = {frame_resolution};
    const IMAGE_QUALITY = {image_compression};
    ''' + '''
    var videoElement;
    var container = null;
    var mediaStream;
    var snapshotCanvas;
    var snapshotImage;
    var statusLabel;

    var pendingResolve = null;
    var stopStream = false;

    function cleanUp() {
        mediaStream.getVideoTracks()[0].stop();
        videoElement.remove();
        container.remove();
        videoElement = null;
        container = null;
        mediaStream = null;
        snapshotImage = null;
        snapshotCanvas = null;
        statusLabel = null;
    }

    function captureFrame() {
        if (!stopStream) {
            window.requestAnimationFrame(captureFrame);
        }
        if (pendingResolve) {
            var imageData = "";
            if (!stopStream) {
                snapshotCanvas.getContext('2d').drawImage(videoElement, 0, 0, FRAME_RES[0], FRAME_RES[1]);
                imageData = snapshotCanvas.toDataURL('image/jpeg', IMAGE_QUALITY)
            }
            var resolver = pendingResolve;
            pendingResolve = null;
            resolver(imageData);
        }
    }

    async function setupWebcam() {
        if (container !== null) {
            return mediaStream;
        }

        container = document.createElement('div');
        container.style.border = '2px solid black';
        container.style.padding = '3px';
        container.style.width = '100%';
        container.style.maxWidth = '600px';
        document.body.appendChild(container);

        const statusContainer = document.createElement('div');
        statusContainer.innerHTML = "<span>Status: </span>";
        statusLabel = document.createElement('span');
        statusLabel.innerText = 'Idle';
        statusLabel.style.fontWeight = 'bold';
        statusContainer.appendChild(statusLabel);
        container.appendChild(statusContainer);

        videoElement = document.createElement('video');
        videoElement.style.display = 'block';
        videoElement.width = container.clientWidth - 6;
        videoElement.setAttribute('playsinline', '');
        videoElement.onclick = () => { stopStream = true; };
        mediaStream = await navigator.mediaDevices.getUserMedia(
            {video: { facingMode: "environment"}});
        container.appendChild(videoElement);

        snapshotImage = document.createElement('img');
        snapshotImage.style.position = 'absolute';
        snapshotImage.style.zIndex = 1;
        snapshotImage.onclick = () => { stopStream = true; };
        container.appendChild(snapshotImage);

        const exitMessage = document.createElement('div');
        exitMessage.innerHTML =
            '<span style="color: red; font-weight: bold;">' +
            'Click here or on the video to stop the demo</span>';
        container.appendChild(exitMessage);
        exitMessage.onclick = () => { stopStream = true; };

        videoElement.srcObject = mediaStream;
        await videoElement.play();

        snapshotCanvas = document.createElement('canvas');
        snapshotCanvas.width = FRAME_RES[0];
        snapshotCanvas.height = FRAME_RES[1];
        window.requestAnimationFrame(captureFrame);

        return mediaStream;
    }
    async function captureImage(status, imageData) {
        if (stopStream) {
            cleanUp();
            stopStream = false;
            return '';
        }

        var setupTime = Date.now();
        mediaStream = await setupWebcam();

        var displayTime = Date.now();
        if (status != "") {
            statusLabel.innerHTML = status;
        }

        if (imageData != "") {
            var videoRect = videoElement.getClientRects()[0];
            snapshotImage.style.top = videoRect.top + "px";
            snapshotImage.style.left = videoRect.left + "px";
            snapshotImage.style.width = videoRect.width + "px";
            snapshotImage.style.height = videoRect.height + "px";
            snapshotImage.src = imageData;
        }

        var captureTime = Date.now();
        var result = await new Promise((resolve, reject) => pendingResolve = resolve);
        stopStream = false;

        return {
            'setup': displayTime - setupTime,
            'display': captureTime - displayTime,
            'capture': Date.now() - captureTime,
            'img': result,
        };
    }
    ''')
    display(js_code)





In [None]:
def capture_snapshot(status_text, encoded_image):
    """Captures an image from the webcam stream."""
    return eval_js(f'captureImage("{status_text}", "{encoded_image}")')

def decode_js_image(js_result) -> Image.Image:
    """Decodes a base64 image string from JavaScript response."""
    _, b64_string = js_result['img'].split(',')
    image_bytes = b64decode(b64_string)
    return Image.open(io.BytesIO(image_bytes))

def overlay_transparent_detections(detection_output: Results) -> Image.Image:
    """Creates a transparent background with bounding boxes over detected objects."""
    transparent_layer = np.zeros([detection_output.orig_shape[0], detection_output.orig_shape[1], 4], dtype=np.uint8)
    detected_overlay = detection_output.plot(font='verdana', masks=False, img=transparent_layer)
    detected_overlay[:, :, 3] = (detected_overlay.max(axis=2) > 0).astype(int) * 255
    return Image.fromarray(detected_overlay, 'RGBA')

Loops through webcam frames, runs object detection, and overlays the results.

Stops when the user clicks on the video.

In [None]:
# Start the video stream and process frames
initiate_camera_stream()
captured_image = ''
while True:
    response = capture_snapshot('Capturing...', captured_image)
    if not response:
        break
    frame = decode_js_image(response)
    for result in yolo_model(source=np.array(frame), verbose=False):
        overlay_image = overlay_transparent_detections(result)
        with io.BytesIO() as buffer:
            overlay_image.save(buffer, format='png')
            base64_str = str(b64encode(buffer.getvalue()), 'utf-8')
            captured_image = f'data:image/png;base64,{base64_str}'

<IPython.core.display.Javascript object>