[![Labellerr](https://storage.googleapis.com/labellerr-cdn/%200%20Labellerr%20template/notebook.webp)](https://www.labellerr.com)

# **Fine-Tune YOLO for Threat Detection**

[![labellerr](https://img.shields.io/badge/Labellerr-BLOG-black.svg)](https://www.labellerr.com/blog/)
[![Youtube](https://img.shields.io/badge/Labellerr-YouTube-b31b1b.svg)](https://www.youtube.com/@Labellerr)
[![Github](https://img.shields.io/badge/Labellerr-GitHub-green.svg)](https://github.com/Labellerr/Hands-On-Learning-in-Computer-Vision)

## 🎯 Objective

This notebook provides a comprehensive workflow for building a threat detection system using a fine-tuned YOLO segmentation model. The project guides you through creating a dataset by extracting frames from videos, training a model to detect weapons like guns and knives, and deploying it to perform real-time inference and highlight potential threats in a video stream.


## 🚀 Key Features

* **Frame Extraction**: Automatically extract image frames from source videos to build a training dataset.
* **Format Conversion**: Convert annotations from COCO JSON format to the YOLO segmentation format.
* **Model Training**: Fine-tune a YOLOv8 nano segmentation model on a custom threat dataset.
* **Video Inference Pipeline**: Develop a complete pipeline to process videos, overlaying bounding boxes, masks, and visual alerts for detected threats.


## 📚 Libraries & Prerequisites

* **Core Libraries**: `ultralytics`, `opencv-python`.
* **Environment**: A Python environment with GPU support is highly recommended for faster model training.
* **Dataset**: A video dataset containing various threat objects (e.g., guns, knives) and a corresponding `annotation.json` file.


### **Create Dataset and Annotation**

The process begins with creating a dataset. We first clone a utility repository containing a script to extract frames from our source videos at a specified interval. These extracted images are then annotated with segmentation masks for the threat objects, and the annotations are exported in the COCO JSON format.

In [None]:
# Collect various images of threats like guns from the internet.
# Annotate the images using a tool like Labellerr

### **Convert COCO JSON Annotation to YOLO format**

The annotated data in COCO JSON must be converted into the YOLO segmentation format that the model requires for training. We use a helper script from the cloned repository to perform this conversion, which generates the necessary `.txt` label files and the `data.yaml` configuration file.

In [None]:
!git clone https://github.com/Labellerr/yolo_finetune_utils.git

In [None]:
from yolo_finetune_utils.coco_yolo_converter.seg_converter import coco_to_yolo_converter

result = coco_to_yolo_converter(
            json_path='./annotation.json',
            images_dir='./dataset',
            output_dir='yolo_format'
            )

### **Train YOLO11 Model on a Custom Dataset**

With the dataset correctly formatted, we can now train our YOLO11 nano segmentation model. By starting with the pre-trained `yolo11x-seg.pt` checkpoint, we leverage transfer learning to fine-tune the model on our specific threat detection task for 150 epochs.

In [None]:
!pip install ultralytics
from ultralytics import YOLO

In [None]:
!yolo task=detect mode=train data="path/to/dataset.yaml" model="yolo11x.pt" epochs=200 imgsz=640 batch=20

### **Inferencing Fine-Tune YOLO model**

After training, we create an inference pipeline to process new videos. We start by loading our best-trained model weights. We then define a function that takes a video frame, runs a prediction, and overlays the results—including bounding boxes, class labels, segmentation masks, and a "Threat Detected" warning—onto the frame.

Finally, we apply this pipeline to an entire video. The code opens the input video, processes each frame using our function, and saves the annotated frames to a new output video file.

In [None]:
import cv2

# Load the YOLO11 model
model = YOLO('./runs/detect/train3/weights/best.pt', task="detect")

# Open video capture
cap = cv2.VideoCapture("./assests/wep2.mp4")

# Define video writer to save output
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
output = cv2.VideoWriter('output.mp4', fourcc, fps, (width, height))

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Run inference on the frame
    results = model(frame)
    
    # Process each detection
    for result in results[0].boxes:
        # Get box coordinates
        x1, y1, x2, y2 = map(int, result.xyxy[0])
        
        # Get confidence and class
        confidence = float(result.conf[0])
        class_id = int(result.cls[0])
        class_name = " GUN "
        
        if confidence > 0.3:  # Confidence threshold
            # Create translucent overlay
            overlay = frame.copy()
            cv2.rectangle(overlay, (x1, y1), (x2, y2), (0, 0, 255), -1)  # Solid red rectangle
            alpha = 0.4  # Transparency factor
            cv2.addWeighted(overlay, alpha, frame, 1 - alpha, 0, frame)

            # Draw bounding box
            cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 0, 255), 1)

            # Add label
            label = f'{class_name} {confidence *100:.2f}%'
            cv2.putText(frame, label, (x1, y1 - 10), 
                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)

    # Write the frame with annotation to output video
    output.write(frame)

# Release resources
cap.release()
output.release()
