Detect car in image using YOLOv5 model

Using dataset at https://www.kaggle.com/datasets/sshikamaru/car-object-detection. But, the data, specifically the .csv file includes only x,y boundary coordinates of car in each image. 
But, the YOLO format requires {class_id, center_x, center_y, width, height}. So, the input data has to be converted into YOLO format.

In [8]:
import os
import pandas as pd
import cv2
import shutil
from sklearn.model_selection import train_test_split

# Paths

csv_path = r'C:\Users\Siddhant\Documents\Python\Project5\train_solution_bounding_boxes.csv'
train_images_dir = r'C:\Users\Siddhant\Documents\Python\Project5\training_images'
test_images_dir = r'C:\Users\Siddhant\Documents\Python\Project5\testing_images'
output_dir = r'C:\Users\Siddhant\Documents\Python\Project5\dataset'

# Output directories
train_images_out = os.path.join(output_dir, 'train/images/')
train_labels_out = os.path.join(output_dir, 'train/labels/')
val_images_out = os.path.join(output_dir, 'val/images/')
val_labels_out = os.path.join(output_dir, 'val/labels/')

# Create output directories
os.makedirs(train_images_out, exist_ok=True)
os.makedirs(train_labels_out, exist_ok=True)
os.makedirs(val_images_out, exist_ok=True)
os.makedirs(val_labels_out, exist_ok=True)

# Load CSV
df = pd.read_csv(csv_path)

# Create YOLO annotations
def convert_to_yolo(image_name, xmin, ymin, xmax, ymax, img_width, img_height):
    center_x = ((xmin + xmax) / 2) / img_width
    center_y = ((ymin + ymax) / 2) / img_height
    box_width = (xmax - xmin) / img_width
    box_height = (ymax - ymin) / img_height
    return f"0 {center_x} {center_y} {box_width} {box_height}\n"

# Process each row in the CSV
annotations = {}
for _, row in df.iterrows():
    image_name = row['image']
    xmin, ymin, xmax, ymax = row['xmin'], row['ymin'], row['xmax'], row['ymax']
    
    # Load image to get dimensions
    image_path = os.path.join(train_images_dir, image_name)
    img = cv2.imread(image_path)
    if img is None:
        print(f"Image {image_name} not found. Skipping.")
        continue
    img_height, img_width, _ = img.shape

    # Convert to YOLO format
    yolo_annotation = convert_to_yolo(image_name, xmin, ymin, xmax, ymax, img_width, img_height)

    # Store annotation
    if image_name not in annotations:
        annotations[image_name] = []
    annotations[image_name].append(yolo_annotation)

# Save annotations to .txt files
for image_name, yolo_annotations in annotations.items():
    txt_name = os.path.splitext(image_name)[0] + '.txt'
    txt_path = os.path.join(train_labels_out, txt_name)
    with open(txt_path, 'w') as f:
        f.writelines(yolo_annotations)

# Split into train and val sets
train_files, val_files = train_test_split(list(annotations.keys()), test_size=0.2, random_state=42)

# Move images and annotations to train/val folders
for image_name in train_files:
    shutil.copy(os.path.join(train_images_dir, image_name), os.path.join(train_images_out, image_name))
    txt_name = os.path.splitext(image_name)[0] + '.txt'
    shutil.move(os.path.join(train_labels_out, txt_name), os.path.join(train_labels_out, txt_name))

for image_name in val_files:
    shutil.copy(os.path.join(train_images_dir, image_name), os.path.join(val_images_out, image_name))
    txt_name = os.path.splitext(image_name)[0] + '.txt'
    shutil.move(os.path.join(train_labels_out, txt_name), os.path.join(val_labels_out, txt_name))

print("Dataset preparation complete!")

Dataset preparation complete!


Note that the labels diectory will be converted into YOLO formatted .txt files.

Now, let us pull YOLOV5, our pretained model for car detection

In [11]:
!git clone https://github.com/ultralytics/yolov5.git

 


Now, let us create the .yaml file for YOLO. Under 'yolo5' directory, create config.yaml with below content

nc: 1

names: ['car']

train: path_where_kaggledataset_was_downloaded\dataset\train\images

val: path_where_kaggledataset_was_downloaded\dataset\val\images


Now, let us train the model.

--C:\Users\Siddhant\yolov5\train.py: This runs the train.py script from the YOLOv5 repository that was downloaded

--img 640: Sets the input image size to 640x640 pixels.

--batch 16: Specifies the batch size for training (number of images processed simultaneously).

--epochs 2: Sets the number of training epochs (complete passes through the dataset.

Note: epochs 2 is very low and this will affect the number of successful detections. Typical value is 100 for large data sets, while this will improve the successful predictions if
car exists in a given image, training time will drastically increase as well.

--data C:\Users\Siddhant\yolov5\config.yaml: Points to the YAML file containing dataset information (like class names and image paths).

--weights yolov5s.pt: Uses pre-trained YOLOv5s weights as a starting point for training.

--name car_detection: Assigns a name to the training run, which will be used for the output directory

In [17]:
!python C:\Users\Siddhant\yolov5\train.py --img 640 --batch 16 --epochs 2 --data C:\Users\Siddhant\yolov5\config.yaml --weights yolov5s.pt --name car_detection

Command 'git fetch origin' timed out after 5 seconds

[34m[1mtrain: [0mweights=yolov5s.pt, cfg=, data=C:\Users\Siddhant\yolov5\config.yaml, hyp=yolov5\data\hyps\hyp.scratch-low.yaml, epochs=2, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, evolve_population=yolov5\data\hyps, resume_evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=yolov5\runs\train, name=car_detection, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest, ndjson_console=False, ndjson_file=False
YOLOv5  v7.0-397-gde62f93c Python-3.9.10 torch-2.5.1+cpu CPU

[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw




Now, let us validate our training. best.pt is where the best results of weights from training is kept.

In [19]:
!python C:\Users\Siddhant\yolov5\val.py --weights C:\Users\Siddhant\yolov5\runs\train\car_detection3/weights/best.pt --data C:\Users\Siddhant\yolov5\config.yaml --img 640

[34m[1mval: [0mdata=C:\Users\Siddhant\yolov5\config.yaml, weights=['C:\\Users\\Siddhant\\yolov5\\runs\\train\\car_detection3/weights/best.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=yolov5\runs\val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5  v7.0-397-gde62f93c Python-3.9.10 torch-2.5.1+cpu CPU

Fusing layers... 
Model summary: 157 layers, 7012822 parameters, 0 gradients, 15.8 GFLOPs

[34m[1mval: [0mScanning C:\Users\Siddhant\Documents\Python\Project5\dataset\val\labels.cache... 71 images, 0 backgrounds, 0 corrupt: 100%|##########| 71/71 [00:00<?, ?it/s]
[34m[1mval: [0mScanning C:\Users\Siddhant\Documents\Python\Project5\dataset\val\labels.cache... 71 images, 0 backgrounds, 0 corrupt: 100%|##########| 71/71 [00:00<?, ?it/s]




                 Class     Images  Instances        

Not good results. Let us retrain our model with more epochs.

In [21]:
!python C:\Users\Siddhant\yolov5\train.py --img 640 --batch 16 --epochs 25 --data C:\Users\Siddhant\yolov5\config.yaml --weights yolov5s.pt --name car_detection

Command 'git fetch origin' timed out after 5 seconds

[34m[1mtrain: [0mweights=yolov5s.pt, cfg=, data=C:\Users\Siddhant\yolov5\config.yaml, hyp=yolov5\data\hyps\hyp.scratch-low.yaml, epochs=25, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, evolve_population=yolov5\data\hyps, resume_evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=yolov5\runs\train, name=car_detection, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest, ndjson_console=False, ndjson_file=False
YOLOv5  v7.0-397-gde62f93c Python-3.9.10 torch-2.5.1+cpu CPU

[34m[1mhyperparameters: [0mlr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_p




In [22]:
!python C:\Users\Siddhant\yolov5\val.py --weights C:\Users\Siddhant\yolov5\runs\train\car_detection5/weights/best.pt --data C:\Users\Siddhant\yolov5\config.yaml --img 640

[34m[1mval: [0mdata=C:\Users\Siddhant\yolov5\config.yaml, weights=['C:\\Users\\Siddhant\\yolov5\\runs\\train\\car_detection5/weights/best.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=yolov5\runs\val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5  v7.0-397-gde62f93c Python-3.9.10 torch-2.5.1+cpu CPU

Fusing layers... 
Model summary: 157 layers, 7012822 parameters, 0 gradients, 15.8 GFLOPs

[34m[1mval: [0mScanning C:\Users\Siddhant\Documents\Python\Project5\dataset\val\labels.cache... 71 images, 0 backgrounds, 0 corrupt: 100%|##########| 71/71 [00:00<?, ?it/s]
[34m[1mval: [0mScanning C:\Users\Siddhant\Documents\Python\Project5\dataset\val\labels.cache... 71 images, 0 backgrounds, 0 corrupt: 100%|##########| 71/71 [00:00<?, ?it/s]

                 Class     Images  Instances          P

Much, much better results as observed from val_batchx_pred file. How many labels model is able to detect from val dataset - let us find out

In [24]:
!python C:\Users\Siddhant\yolov5\detect.py --weights C:\Users\Siddhant\yolov5\runs\train\car_detection5\weights\best.pt --source C:\Users\Siddhant\Documents\Python\Project5\testing_images --img 640 --conf 0.25 --save-txt --save-conf

[34m[1mdetect: [0mweights=['C:\\Users\\Siddhant\\yolov5\\runs\\train\\car_detection5\\weights\\best.pt'], source=C:\Users\Siddhant\Documents\Python\Project5\testing_images, data=yolov5\data\coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=True, save_format=0, save_csv=False, save_conf=True, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=yolov5\runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5  v7.0-397-gde62f93c Python-3.9.10 torch-2.5.1+cpu CPU

Fusing layers... 
Model summary: 157 layers, 7012822 parameters, 0 gradients, 15.8 GFLOPs
image 1/175 C:\Users\Siddhant\Documents\Python\Project5\testing_images\vid_5_25100.jpg: 384x640 (no detections), 94.1ms
image 2/175 C:\Users\Siddhant\Documents\Python\Project5\testing_images\vid_5_25120.jpg: 384x640 (no detections), 9