# **Technical Report documented by Yufu Niu**

## **Problem Overview**
### **The object of this assignment is to locate the vehicles in the bev image. The provided image data is displayed as follow:**
**1. The Fixed camera view image with marked lane lines and stop lines (red lines) plus measured physical distance**  
**2. The cooresponding Google earth bev image with marked lane lines and stop lines (red lines) plus measured physical distance**  
**3. kml file**  
**4. The Fixed camera video**

## **Workflow**
**1: Load object detection model**  
**2: Create a Video Writer for saving frames to a new video file**  
**3: Load background subtractor**  
**4: Process video frame by frame**  
**5: Load the BEV image**  
**6: Manually record the paired pixel locations for marked red line from camera image and BEV image using ImageJ**  
**7: Calculate Homography**  
**8: Map the bounding box locations from camera coordinate to bev coordinate**
**9: draw vehicle locations on bev image**

## **Python implementation**

## **Step1: Load object detection model**

In [None]:
from ultralytics import YOLO
import cv2
import numpy as np
import json

In [None]:
#load yolo - try yolo models at different scales
#model = YOLO('yolov8s.pt')
#model = YOLO('yolov8m.pt')
model = YOLO('yolov8x.pt')

In [None]:
cap = cv2.VideoCapture('./video_01.mp4')
#check if video has been successfully loaded or not
ret, frame = cap.read()
if not ret:
    raise ValueError("Failed to read video")


## **Step2: Create a video writer for saving frames to a new video file**

In [None]:
#check video information
height, width, channels = frame.shape
fps = cap.get(cv2.CAP_PROP_FPS)
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
output_video = cv2.VideoWriter('./results/yolo_annotated_video_01.mp4', fourcc, fps, (width, height))

## **Step3: Load background subtractor**

In [None]:
bg = cv2.createBackgroundSubtractorMOG2(
    history = 50, varThreshold = 50, detectShadows = False)

In [None]:
# define vehicle class and color
vehicle_classes = {
    'car': {'color': (0, 255, 0), 'radius': 4},    # green
    'truck': {'color': (0, 0, 255), 'radius': 6},  # red
    'bus': {'color': (255, 0, 0), 'radius': 8}     # blue
}

## **Step4: Process video frame by frame**

In [None]:
all_frames_boxes = []  # list to store info for each frame
while True:
    ret, frame = cap.read()
    if not ret:
        break
        
    bgmask = bg.apply(frame) # apply background substractor to frame
    bgmask = cv2.medianBlur(bgmask,5) #apply median method to remove small noisy dots
    
    # Run YOLOv8 prediction
    results = model.predict(
        source=frame,      # input frame
        imgsz=640,         # resize for model
        conf=0.25,         # confidence threshold
        iou=0.45,          # IoU threshold
        device='cpu',      # Using CPU
        stream=False
    )

    # Get the first result (single frame)
    res = results[0]
    frame_boxes = []
    
    # Draw boxes for detected vehicles
    if hasattr(res, 'boxes') and len(res.boxes): #check if 'boxes' is in res and 'boxes' is not empty
        for box in res.boxes:
            # Bounding box coordinates
            x1, y1, x2, y2 = map(int, box.xyxy[0].tolist()) #convert bounding box coordiantes to list and remove decimals
            confidence = float(box.conf[0]) #calculate confidence
            class_id = int(box.cls[0]) #get the class id
            class_name = model.names.get(class_id, str(class_id)) #get the related vehicle type name

            # Only draw vehicle classes
            if class_name in vehicle_classes:
                motion_region = bgmask[y1:y2,x1:x2] #get the bounding box region
                motion_ratio = np.mean(motion_region > 0) # convert to boolean for static background search
                if motion_ratio < 0.03: #if motion_ratio <0.03, it is a static background, and do not process it further
                    continue

                color = vehicle_classes[class_name]['color'] #get color for each vehicle class
                
                cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2) #draw a rectangle to locate each behicle
                cv2.putText(frame, f"{class_name} {confidence:.2f}", (x1, y1 - 6), #put text for each rectangle
                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1)

                # Add to 2D plane
                radius = vehicle_classes[class_name]['radius'] #calculate bottom-center coordinate
                x_center = int((x1 + x2) / 2)
                y_bottom = int(y2)

                frame_boxes.append({
                    'class': class_name,
                    'conf': confidence,
                    'bbox': [x1, y1, x2, y2],
                    'center': [x_center, y_bottom]
                })

    all_frames_boxes.append(frame_boxes)
    output_video.write(frame)

# Release resources
cap.release()
output_video.release()
# Save to JSON
with open('./results/vehicle_bboxes.json', 'w') as f:
    json.dump(all_frames_boxes, f)

## **Step5: Load the BEV image**

In [17]:
# Load the BEV image
bev_image = cv2.imread('2020-25-06.png')  # or .jpg
plane_image = bev_image.copy()  # reset per frame

## **Step6: Manually record the paired pixel locations for marked red line from camera image and BEV image using ImageJ**

In [13]:
camera_coordiate = np.array([[551,1032],[684,926],[872,766],[1133,503],[1105,905],[1155,835],
                            [1264,682],[1295,636],[1148,504],[1625,598],[1161,914],[1848,1020],[703,354],[727,397]])
bev_coordiante = np.array([[506,262],[527,328],[566,394],[691,605],[456,416],[473,447],
                          [513,535],[532,572],[683,610],[449,715],[320,541],[244,544],[1168,544],[1044,520]])


## **Step7: Calculate Homography**

In [14]:
#Find the transformation matrix from camera image to bev image
H, status = cv2.findHomography(camera_coordiate, bev_coordiante)

## **Step8: Map the bounding box locations from camera coordinate to bev coordinate**

In [15]:
#change bounding box locations to bottom-center and map coordinates to BEV image
bev_coords_all_frames = []

for frame_boxes in all_frames_boxes:
    frame_bev = []
    for box in frame_boxes:
        x1, y1, x2, y2 = box['bbox']
        x_center = (x1 + x2) / 2
        y_bottom = y2
        vehicle_pixel = np.array([x_center, y_bottom, 1])
        
        # Map to BEV
        vehicle_bev = np.dot(H, vehicle_pixel) #apply a homography H to a point:
        vehicle_bev /= vehicle_bev[2]  # normalize

        frame_bev.append({
            'class': box['class'],
            'conf': box['conf'],
            'bev_x': vehicle_bev[0],
            'bev_y': vehicle_bev[1]
        })
    bev_coords_all_frames.append(frame_bev)
    
# Save to JSON
with open('./results/vehicle_bev_coords.json', 'w') as f:
    json.dump(bev_coords_all_frames, f)

## **Step9: draw vehicle locations on bev image**

In [19]:
# Load the single frame and draw vehicle locations on bev image
# Show the 201th frame as example
frame_idx = 201
frame_bev = bev_coords_all_frames[frame_idx]  # list of dicts with 'bev_x', 'bev_y'
for v in frame_bev:
    x_px = int(v['bev_x'])
    y_px = int(v['bev_y'])
    cls = v['class']
    color = vehicle_classes.get(cls, {'color': (0,0,255)})['color']
    radius = vehicle_classes.get(cls, {'radius': 5})['radius']
    cv2.circle(plane_image, (x_px, y_px), radius, color, -1)
    cv2.putText(plane_image, cls, (x_px-20, y_px-20),
                cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 1)
# Save the 2D plane visualization
cv2.imwrite(f'./results/bev_with_vehicles_{frame_idx}.png', plane_image)

# save the cooresponding frame for comparsion
cap = cv2.VideoCapture('./video_01.mp4')
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
ret, camera_frame = cap.read()

cv2.imwrite(f'./results/camera_frame_{frame_idx}.png', camera_frame)
cap.release()


## **Conclusion**
**In this report, a yolov8 model was used to detect vehicle objects on a fixed carema video. Background subtractor was applied to detect static background for removing misleading objects such as trees, adveristing board. After that, ImageJ was used to manually record the paired pixel locations for marked red line from camera image and BEV image for calculating Homography. In this way, a coorindate mapping can be built from fixed camera image to google earth bev image. Finally, the transformed location of the detected vehicles were drawn on the bev image.**

## **Further Work for Improvement Suggestions**

## **The proposed workflow can be further improved by the following suggestions:** 
**1. The current homography was calculated by pixel locations which may be not accurate enough. Further work can be related to utilise kml files to extract geospatial information. In this way, the pixel location can be mapped to GPS location which will provide higher accuracy.**  
**2. The physical distance was not utilised, the physical distance can verfy the accuracy of the manual label from ImageJ by comparing the physical distance between two points on camera image and bev image. In addition, the location of the vehicles on bev image can be scaled using the physical distance. This is very important when estimating real-time speed on the bev image.**  
**3. More powerful deep learning model can be tried to further improve the accuracy of the object detection**  
**4. Currently only dots are used to locate the vehicle in the bev image, bounding box or polygen can further constrain the shape of the vehicle on the bev image.**
