## ✅ **Plan for Conversion**
1. **Create new folders** inside `train`, `test`, and `val`:  
   - `train/annotations/`  
   - `val/annotations/`  
   - `test/annotations/`  
   - These will store `.xml` (Pascal VOC) or `.json` (COCO) files.

2. **Write a script to convert YOLO to Pascal VOC XML**  
   - Read YOLO `.txt` files and extract bounding boxes.  
   - Convert to XML format with image size information.  
   - Save in the `annotations/` folder.

3. **(Alternative) Write a script to convert YOLO to COCO JSON**  
   - Read YOLO `.txt` files and extract bounding boxes.  
   - Store data in COCO JSON format.  
   - Save in the `annotations/` folder.

---

### 📌 **Which format do you prefer?**  
- **Pascal VOC (`.xml`)** (Recommended for Faster R-CNN in TensorFlow)  
- **COCO (`.json`)** (If you want to try EfficientDet later)  

To convert YOLO `.txt` annotations to Pascal VOC `.xml` format, follow these steps:

### **1️⃣ Create a Folder Structure**
- Inside each dataset split (`train`, `val`, `test`), create a new folder for XML annotations.
  ```
  /train
    /images
    /labels
    /xml_annotations
  /val
    /images
    /labels
    /xml_annotations
  /test
    /images
    /labels
    /xml_annotations
  ```

### **2️⃣ Convert YOLO `.txt` to Pascal VOC `.xml`**
You'll need:
- Image dimensions (width, height)
- Class mapping (for your case, class `0` is "Building")

### **3️⃣ Python Script for Conversion**
The script will:
- Read `.txt` files
- Convert YOLO format (relative `x_center, y_center, width, height`) to Pascal VOC format (absolute `xmin, ymin, xmax, ymax`)
- Save `.xml` files in the respective `xml_annotations` folder.

### **Step 1: Import Libraries**

**Purpose:** Import necessary libraries for:
- `os`: For handling file system paths and directory operations.
- `cv2`: For reading images and manipulating image data.
- `matplotlib.pyplot as plt`: For visualizing images and annotations.

In [6]:
import os
import cv2
import shutil

In [7]:
# Mapping from YOLO class IDs to human-readable class names
class_mapping = {0: "Building"}

In [8]:
# Directory paths for labels and images
label_dir = "../Data/tiny_object_detection_yolo/filtered_labels"
image_dir = "../Data/tiny_object_detection_yolo/filtered_images"

# Subfolders representing different splits of the dataset
sub_folders = ["train", "test", "val"]

### **Step 2:** Define the yolo_to_voc function 
- to convert bounding box coordinates from YOLO format to VOC format.

In [9]:
def yolo_to_voc(yolo_bbox, img_width, img_height):
    """
    Convert YOLO format bounding box to VOC format (xmin, ymin, xmax, ymax).
    """
    try:
        # Parse YOLO bounding box values (class_id, x_center, y_center, width, height)
        class_id, x_center, y_center, width, height = map(float, yolo_bbox)
    except ValueError:
        # Skip invalid annotation lines
        print(f"Skipping invalid annotation: {yolo_bbox}")
        return None

    # Convert normalized YOLO coordinates to absolute pixel values
    x_center, y_center, width, height = (
        x_center * img_width,
        y_center * img_height,
        width * img_width,
        height * img_height,
    )

    # Calculate bounding box corners
    xmin = int(x_center - width / 2)
    ymin = int(y_center - height / 2)
    xmax = int(x_center + width / 2)
    ymax = int(y_center + height / 2)

    return class_id, xmin, ymin, xmax, ymax

### **Step 3:** Loop through each dataset 
- Split (train, test, val), read YOLO annotations, convert them to VOC format, and save them as text files.

In [11]:
for split in sub_folders:
    # Define paths for labels, images, and output folder
    labels_path = os.path.join(label_dir, split)
    images_path = os.path.join(image_dir, split)
    txt_output_path = os.path.join(labels_path, "txt")

    # Create output directory if it doesn't exist
    os.makedirs(txt_output_path, exist_ok=True)

    for label_file in os.listdir(labels_path):
        if label_file.endswith(".txt"):  # Process only .txt annotation files
            image_file = label_file.replace(".txt", ".png")
            image_path = os.path.join(images_path, image_file)

            # Check for different image file extensions (PNG, JPG, JPEG)
            if not os.path.exists(image_path):
                image_file = label_file.replace(".txt", ".jpg")
                image_path = os.path.join(images_path, image_file)

            if not os.path.exists(image_path):
                image_file = label_file.replace(".txt", ".jpeg")
                image_path = os.path.join(images_path, image_file)

            label_path = os.path.join(labels_path, label_file)

            if os.path.exists(image_path):  # Ensure image file exists
                img = cv2.imread(image_path)  # Load image to get dimensions
                img_height, img_width, _ = img.shape

                with open(label_path, "r") as f:
                    lines = f.readlines()  # Read all lines from annotation file

                bboxes = []  # List to store converted bounding boxes
                for line in lines:
                    yolo_bbox = line.strip().split()  # Split annotation line into values
                    converted_bbox = yolo_to_voc(yolo_bbox, img_width, img_height)  # Convert YOLO to VOC
                    if converted_bbox:
                        bboxes.append(converted_bbox)

                # Move processed label file to output directory
                shutil.move(label_path, os.path.join(txt_output_path, label_file))

    print(f"✅ Conversion completed for {split} set. YOLO annotations moved to {txt_output_path}")

✅ Conversion completed for train set. YOLO annotations moved to ../Data/tiny_object_detection_yolo/filtered_labels/train/txt
✅ Conversion completed for test set. YOLO annotations moved to ../Data/tiny_object_detection_yolo/filtered_labels/test/txt
✅ Conversion completed for val set. YOLO annotations moved to ../Data/tiny_object_detection_yolo/filtered_labels/val/txt


### **Step 4:** Combine Files
- For each dataset split (`train`, `test`, `val`):
- Ensures the presence of image and label directories.
- Moves matching images and labels to the `combined_dir` while maintaining structure.
- Issues warnings for missing label-image pairs.

In [12]:
def combine_files(image_dir, label_dir, combined_dir):
    """
    Combine images and corresponding YOLO txt labels into a single directory structure.
    """
    for folder in ['train', 'test', 'val']:
        # Define paths for images and labels
        folder_image_dir = os.path.join(image_dir, folder)
        folder_label_dir_txt = os.path.join(label_dir, folder, 'txt')
        combined_folder = os.path.join(combined_dir, folder)

        # Create combined folder if it doesn't exist
        os.makedirs(combined_folder, exist_ok=True)

        # Ensure required folders exist
        if not os.path.exists(folder_image_dir) or not os.path.exists(folder_label_dir_txt):
            print(f"Error: Missing folder structure for {folder}")
            continue

        # Get list of image and label files
        image_files = [f for f in os.listdir(folder_image_dir) if f.endswith(('.png', '.jpg', '.jpeg'))]
        label_files_txt = [f for f in os.listdir(folder_label_dir_txt) if f.endswith('.txt')]

        for image_file in image_files:
            # Find corresponding annotation file
            label_file_txt = image_file.replace(image_file.split('.')[-1], 'txt')
            if label_file_txt in label_files_txt:
                # Copy image and label to the combined directory
                shutil.copy(os.path.join(folder_image_dir, image_file), os.path.join(combined_folder, image_file))
                shutil.copy(os.path.join(folder_label_dir_txt, label_file_txt), os.path.join(combined_folder, label_file_txt))
            else:
                print(f"Warning: No label found for {image_file} in {folder}")
        
        print(f"All files copied successfully to {folder}.")

In [14]:
def main():
    # Define directory paths
    image_dir = '../Data/tiny_object_detection_yolo/filtered_images'
    label_dir = '../Data/tiny_object_detection_yolo/filtered_labels'
    combined_dir = '../Data/tiny_object_detection_yolo/Yolo__Data'
    
    # Combine images and labels into a single dataset
    combine_files(image_dir, label_dir, combined_dir)


### **Step 5:** 3. Execution Flow:
- The script first converts YOLO annotations to VOC format and moves them into organized directories.
- Then, the `combine_files` function merges images and labels into a structured dataset for training.
- The script is executed via the `main()` function, which defines necessary directory paths and triggers dataset organization.

In [15]:
if __name__ == "__main__":
    main()

All files copied successfully to train.
All files copied successfully to test.
All files copied successfully to val.


### **Summary of the Process:**

- 1️⃣. **YOLO to VOC Conversion**: It converts object detection annotations from YOLO format (which uses normalized bounding boxes) to VOC format (which uses pixel-based bounding boxes). The conversion includes creating Pascal VOC XML files for each image that contain class labels and bounding box coordinates.

- 2️⃣. **Processing Dataset**: The script processes annotation files in different dataset splits (`train`, `test`, `val`). For each image, it reads the YOLO annotations, converts them to VOC format, and saves them as text files in an output directory.

- 3️⃣.  **File Organization**: The `combine_files` function moves and organizes image and annotation files into a single folder structure (`train`, `test`, `val`) for each split, simplifying dataset usage.
