# **Data Preprocessing**

This is focused on converting bounding box annotations from a traditional format into the YOLOv10 format, which is suitable for training object detection models. The original dataset contains bounding box coordinates in the format:
ymin, ymax, xmin, xmax
The conversion process transforms these coordinates into the YOLOv10 format:
class_id, center_x, center_y, width, height

In [1]:
import pandas as pd

# Load CSV file
csv_file = '/content/Licplatesdetection_train.csv'  # Replace with your CSV file path
df = pd.read_csv(csv_file)

# Function to convert (ymin, xmin, ymax, xmax) to (center_x, center_y, width, height)
def convert_bbox(ymin, xmin, ymax, xmax):
    center_x = (xmin + xmax) / 2
    center_y = (ymin + ymax) / 2
    width = xmax - xmin
    height = ymax - ymin
    return center_x, center_y, width, height

# Define class_id (Assuming a single class, use 0 as default class_id, change if needed)
class_id = 0

# Create a new DataFrame to store the converted values
converted_data = []

for index, row in df.iterrows():
    img_id = row['img_id']
    ymin = row['ymin']
    xmin = row['xmin']
    ymax = row['ymax']
    xmax = row['xmax']

    # Convert the bounding box coordinates
    center_x, center_y, width, height = convert_bbox(ymin, xmin, ymax, xmax)

    # Append the data as a new row: [img_id, class_id, center_x, center_y, width, height]
    converted_data.append([img_id, class_id, center_x, center_y, width, height])

# Create a new DataFrame with the converted data
columns = ['img_id', 'class_id', 'center_x', 'center_y', 'width', 'height']
converted_df = pd.DataFrame(converted_data, columns=columns)

# Save the new CSV file
output_csv = 'converted_annotations.csv'  # Path to save the new CSV
converted_df.to_csv(output_csv, index=False)

print(f"Converted CSV file saved as {output_csv}")

Converted CSV file saved as converted_annotations.csv


In [2]:
!unzip /content/Licplatesdetection_train.zip

Archive:  /content/Licplatesdetection_train.zip
   creating: license_plates_detection_train/
  inflating: license_plates_detection_train/1.jpg  
  inflating: license_plates_detection_train/2.jpg  
  inflating: license_plates_detection_train/3.jpg  
  inflating: license_plates_detection_train/4.jpg  
  inflating: license_plates_detection_train/5.jpg  
  inflating: license_plates_detection_train/6.jpg  
  inflating: license_plates_detection_train/7.jpg  
  inflating: license_plates_detection_train/8.jpg  
  inflating: license_plates_detection_train/9.jpg  
  inflating: license_plates_detection_train/10.jpg  
  inflating: license_plates_detection_train/11.jpg  
  inflating: license_plates_detection_train/12.jpg  
  inflating: license_plates_detection_train/13.jpg  
  inflating: license_plates_detection_train/14.jpg  
  inflating: license_plates_detection_train/15.jpg  
  inflating: license_plates_detection_train/16.jpg  
  inflating: license_plates_detection_train/17.jpg  
  inflating: li

This focuses on normalizing bounding box annotations for license plate detection from images. The original dataset provides bounding box coordinates, which are transformed into normalized values suitable for training object detection models, particularly in the YOLO format.

The script reads bounding box annotations from a CSV file, normalizes the coordinates based on the dimensions of the respective images, and saves the normalized data into a new CSV file.

In [3]:
import pandas as pd
import cv2
import os

# Load the converted CSV file
csv_file = 'converted_annotations.csv'  # Replace with your CSV file path
df = pd.read_csv(csv_file)

# Define the images folder
images_folder = '/content/license_plates_detection_train'  # Replace with your images folder path

# Function to normalize the bounding box values
def normalize_bbox(center_x, center_y, width, height, img_width, img_height):
    norm_center_x = center_x / img_width
    norm_center_y = center_y / img_height
    norm_width = width / img_width
    norm_height = height / img_height
    return norm_center_x, norm_center_y, norm_width, norm_height

# List to store normalized data
normalized_data = []

# Group the CSV by img_id so we can process all bounding boxes for a single image
grouped = df.groupby('img_id')

# Loop through each image in the CSV
for img_id, group in grouped:
    # Get the path of the image
    image_path = os.path.join(images_folder, img_id)

    # Load the image to get its dimensions
    img = cv2.imread(image_path)
    if img is None:
        print(f"Image not found: {image_path}")
        continue

    img_height, img_width, _ = img.shape

    # Get the bounding box values for this image and normalize them
    for _, row in group.iterrows():
        class_id = row['class_id']
        center_x = row['center_x']
        center_y = row['center_y']
        width = row['width']
        height = row['height']

        # Normalize the bounding box values
        norm_center_x, norm_center_y, norm_width, norm_height = normalize_bbox(center_x, center_y, width, height, img_width, img_height)

        # Append normalized values
        normalized_data.append([img_id, class_id, norm_center_x, norm_center_y, norm_width, norm_height])

# Create a DataFrame for normalized data and save to a new CSV file
normalized_columns = ['img_id', 'class_id', 'norm_center_x', 'norm_center_y', 'norm_width', 'norm_height']
normalized_df = pd.DataFrame(normalized_data, columns=normalized_columns)
normalized_df.to_csv('normalized_annotations.csv', index=False)

print("Normalized CSV file saved as 'normalized_annotations.csv'")


Normalized CSV file saved as 'normalized_annotations.csv'


This  aims to prepare a custom dataset for license plate recognition by normalizing bounding box annotations and organizing images and labels into training and validation sets. The dataset is structured in a way compatible with YOLO (You Only Look Once) object detection models.

In [4]:
import os
import shutil
import pandas as pd
import cv2
from sklearn.model_selection import train_test_split

# Load the original CSV file
csv_file = 'converted_annotations.csv'  # Replace with your original CSV file path
df = pd.read_csv(csv_file)

# Define the images folder
images_folder = '/content/license_plates_detection_train'

# Define the target directory for the dataset
output_dir = 'custom_dataset'  # The base directory for train and validation folders
train_folder = os.path.join(output_dir, 'train')
val_folder = os.path.join(output_dir, 'validation')

# Create the directory structure if it doesn't exist
for folder in [train_folder, val_folder]:
    os.makedirs(os.path.join(folder, 'images'), exist_ok=True)
    os.makedirs(os.path.join(folder, 'labels'), exist_ok=True)

# Function to normalize the bounding box values
def normalize_bbox(center_x, center_y, width, height, img_width, img_height):
    norm_center_x = center_x / img_width
    norm_center_y = center_y / img_height
    norm_width = width / img_width
    norm_height = height / img_height
    return norm_center_x, norm_center_y, norm_width, norm_height

# Function to save labels to a text file in the required format
def save_label_file(image_name, bboxes, folder):
    label_path = os.path.join(folder, 'labels', f'{image_name}.txt')
    with open(label_path, 'w') as f:
        for bbox in bboxes:
            class_id, norm_center_x, norm_center_y, norm_width, norm_height = bbox
            # Write the normalized bbox values to the file
            f.write(f'{class_id} {norm_center_x} {norm_center_y} {norm_width} {norm_height}\n')

# Function to copy images to the correct folder
def copy_image(image_name, src_folder, dst_folder):
    src_image_path = os.path.join(src_folder, image_name)
    dst_image_path = os.path.join(dst_folder, 'images', image_name)
    shutil.copyfile(src_image_path, dst_image_path)

# Helper function to process data
def process_data(dataframe, src_images_folder, target_folder):
    grouped = dataframe.groupby('img_id')

    for img_id, group in grouped:
        # Load the image to get its dimensions
        image_path = os.path.join(src_images_folder, img_id)
        img = cv2.imread(image_path)
        if img is None:
            print(f"Image not found: {image_path}")
            continue

        img_height, img_width, _ = img.shape

        # Get the bounding box values for the image and normalize them
        bboxes = []
        for _, row in group.iterrows():
            class_id = row['class_id']
            center_x = row['center_x']
            center_y = row['center_y']
            width = row['width']
            height = row['height']

            # Normalize the bounding box values
            norm_center_x, norm_center_y, norm_width, norm_height = normalize_bbox(center_x, center_y, width, height, img_width, img_height)

            # Append normalized values
            bboxes.append([class_id, norm_center_x, norm_center_y, norm_width, norm_height])

        # Copy the image to the appropriate folder (train or validation)
        copy_image(img_id, src_images_folder, target_folder)
        # Save the labels in the .txt file with the image name
        save_label_file(img_id.split('.')[0], bboxes, target_folder)

# Split data into train and validation (80% train, 20% validation)
train_df, val_df = train_test_split(df, test_size=0.2, random_state=42)

# Process the train data
process_data(train_df, images_folder, train_folder)

# Process the validation data
process_data(val_df, images_folder, val_folder)

print("Dataset directory structure created successfully.")

Dataset directory structure created successfully.


In [7]:
import os
import zipfile

def zip_folder(folder_path, output_zip_path):
    # Create a ZipFile object
    with zipfile.ZipFile(output_zip_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
        # Walk through the folder
        for root, _, files in os.walk(folder_path):
            for file in files:
                # Create a relative path for the file in the zip
                file_path = os.path.join(root, file)
                zipf.write(file_path, os.path.relpath(file_path, os.path.join(folder_path, '..')))

# Specify the folder you want to zip and the output zip file name
folder_to_zip = '/content/custom_dataset'  # Replace with your folder path
output_zip_file = 'dataset.zip'  # Desired output zip file name

# Call the function to zip the folder
zip_folder(folder_to_zip, output_zip_file)

print(f"Folder '{folder_to_zip}' zipped as '{output_zip_file}'")



Folder '/content/custom_dataset' zipped as 'dataset.zip'
