# Convert yolo annotations to voc


Converting YOLO (You Only Look Once) annotations to VOC (PASCAL VOC) format serves as a strategic step in enhancing compatibility and interoperability within the computer vision ecosystem. The process enables seamless integration with frameworks, datasets, and tools that rely on the VOC annotation standard, promoting flexibility and ease of use. By adopting VOC format, one gains access to a broader range of pre-trained models, community-supported evaluation tools, and benchmark datasets like the PASCAL VOC challenges. This conversion not only facilitates collaboration within the computer vision community but also ensures the utilization of established standards for annotation and evaluation, ultimately contributing to more robust and widely applicable object detection workflows.

Converting YOLO (You Only Look Once) annotations to VOC (PASCAL VOC) format involves reformatting the annotation information from YOLO's specific format to the XML format used by VOC. Here are the steps to convert YOLO annotations to VOC:

## 1. YOLO Annotation Format:

In YOLO, annotations are typically stored in a text file where each line corresponds to an annotation for an image. The format for each line is:

In [1]:
<class_id> <center_x> <center_y> <width> <height>

SyntaxError: invalid syntax (4106331670.py, line 1)

## 2. VOC Annotation Format:

In VOC format, annotations are stored in XML files. Here's a basic template for a VOC XML file:

In [None]:
<annotation>
	<folder></folder>
	<filename>image.jpg</filename>
	<size>
		<width>width</width>
		<height>height</height>
		<depth>3</depth>
	</size>
	<object>
		<name>class_name</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>xmin</xmin>
			<ymin>ymin</ymin>
			<xmax>xmax</xmax>
			<ymax>ymax</ymax>
		</bndbox>
	</object>
</annotation>


## 3. Conversion Script (Python) for single file:

You can use a Python script to read YOLO annotations and generate VOC XML files. Here's a basic script:

### Imports

In [22]:
# For files operations
import os 

# For working with XML files
import xml.etree.ElementTree as ET

# For displaying progress bar
from tqdm import tqdm

### Main Code

In [14]:
def yolo_to_voc(yolo_path, voc_path, image_width, image_height):
    # Read YOLO annotations from file
    with open(yolo_path, 'r') as file:
        lines = file.readlines()

    # Create the root element for the XML tree
    root = ET.Element("annotation")

    # Add the 'folder' element to the root
    folder = ET.SubElement(root, "folder")
    folder.text = ""  # Replace with the appropriate folder name

    # Add the 'filename' element to the root
    filename = ET.SubElement(root, "filename")
    filename.text = os.path.basename(yolo_path).replace('.txt', '.jpg')

    # Add the 'size' element to the root
    size = ET.SubElement(root, "size")
    width = ET.SubElement(size, "width")
    height = ET.SubElement(size, "height")
    depth = ET.SubElement(size, "depth")
    width.text = str(image_width)
    height.text = str(image_height)
    depth.text = "3"

    # Process each line in the YOLO file (each line corresponds to a bounding box)
    for line in lines:
        parts = line.split()
        class_id, center_x, center_y, bbox_width, bbox_height = map(float, parts)

        # Add the 'object' element to the root for each bounding box
        obj = ET.SubElement(root, "object")

        # Add the 'name' element to the 'object' element
        name = ET.SubElement(obj, "name")
        name.text = str(int(class_id) + 1)  # Class ID (YOLO uses 0-based indexing)

        # Add the 'pose' element to the 'object' element
        pose = ET.SubElement(obj, "pose")
        pose.text = "Unspecified"

        # Add the 'truncated' element to the 'object' element
        truncated = ET.SubElement(obj, "truncated")
        truncated.text = "0"

        # Add the 'difficult' element to the 'object' element
        difficult = ET.SubElement(obj, "difficult")
        difficult.text = "0"

        # Add the 'bndbox' element to the 'object' element
        bndbox = ET.SubElement(obj, "bndbox")
        xmin = ET.SubElement(bndbox, "xmin")
        ymin = ET.SubElement(bndbox, "ymin")
        xmax = ET.SubElement(bndbox, "xmax")
        ymax = ET.SubElement(bndbox, "ymax")

        # Calculate and set bounding box coordinates based on YOLO format
        xmin.text = str(int((center_x - bbox_width / 2) * image_width))
        ymin.text = str(int((center_y - bbox_height / 2) * image_height))
        xmax.text = str(int((center_x + bbox_width / 2) * image_width))
        ymax.text = str(int((center_y + bbox_height / 2) * image_height))

    # Create an XML tree and write it to the specified VOC XML file
    tree = ET.ElementTree(root)
    tree.write(voc_path)

### trail

#### Input

In [12]:
!cat  '/run/media/spritan/New Volume/VinData/data_processed/step3_split_data/out_split_MONOCHROME1/train/labels/0a4fbc9ade84a7abd1680eb8ba031a9d_R13.txt'

7 0.7026666666666667 0.35133333333333333 0.24533333333333332 0.24533333333333332
4 0.3888333333333333 0.36733333333333335 0.36866666666666664 0.36866666666666664
6 0.7026666666666667 0.35133333333333333 0.24533333333333332 0.24533333333333332
8 0.7323333333333333 0.312 0.03866666666666667 0.03866666666666667
6 0.3888333333333333 0.36733333333333335 0.36866666666666664 0.36866666666666664
7 0.3888333333333333 0.36733333333333335 0.36866666666666664 0.36866666666666664


#### Run

In [15]:
# Example usage:
yolo_path = "/run/media/spritan/New Volume/VinData/data_processed/step3_split_data/out_split_MONOCHROME1/train/labels/0a4fbc9ade84a7abd1680eb8ba031a9d_R13.txt"
voc_path = "./voc_annotation.xml"
image_width = 512  # Replace with the actual width of the image
image_height = 512  # Replace with the actual height of the image

yolo_to_voc(yolo_path, voc_path, image_width, image_height)

#### Output

In [8]:
!cat './voc_annotation.xml'

<annotation>
    <folder>VOC2012</folder>
    <filename>0a4fbc9ade84a7abd1680eb8ba031a9d_R13.jpg</filename>
    <size>
        <width>512</width>
        <height>512</height>
        <depth>3</depth>
    </size>
    <object>
        <name>7</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>296</xmin>
            <ymin>117</ymin>
            <xmax>422</xmax>
            <ymax>242</ymax>
        </bndbox>
    </object>
    <object>
        <name>4</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>104</xmin>
            <ymin>93</ymin>
            <xmax>293</xmax>
            <ymax>282</ymax>
        </bndbox>
    </object>
    <object>
        <name>6</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>296</xmin>
        

## 4. Conversion Script (Python) for entire folder:

In [19]:
yolo_dir_path = "/run/media/spritan/New Volume/VinData/data_processed/step3_split_data/out_split_MONOCHROME1/val/labels"
voc_dir_path  = "/run/media/spritan/New Volume/VinData/data_processed/step3_split_data/out_split_MONOCHROME1/val/labelsVOC"
image_width = 512  # Replace with the actual width of the image
image_height = 512  # Replace with the actual height of the image

yolo_files = os.listdir(yolo_dir_path)

for file in tqdm(yolo_files, desc="processing:"):
    yolo_path = os.path.join(yolo_dir_path,file)
    voc_file = os.path.basename(file).replace('.txt', '.xml')
    voc_path = os.path.join(voc_dir_path,voc_file)
    yolo_to_voc(yolo_path, voc_path, image_width, image_height)

processing:: 100%|██████████| 867/867 [00:00<00:00, 1372.63it/s]


## 5. Complete Code

In [21]:
import os
import xml.etree.ElementTree as ET
from tqdm import tqdm

def yolo_to_voc(yolo_path, voc_path, image_width, image_height):
    with open(yolo_path, 'r') as file:
        lines = file.readlines()

    root = ET.Element("annotation")

    folder = ET.SubElement(root, "folder")
    folder.text = ""

    filename = ET.SubElement(root, "filename")
    filename.text = os.path.basename(yolo_path).replace('.txt', '.jpg')

    size = ET.SubElement(root, "size")
    width = ET.SubElement(size, "width")
    height = ET.SubElement(size, "height")
    depth = ET.SubElement(size, "depth")
    width.text = str(image_width)
    height.text = str(image_height)
    depth.text = "3"

    for line in lines:
        parts = line.split()
        class_id, center_x, center_y, bbox_width, bbox_height = map(float, parts)

        obj = ET.SubElement(root, "object")
        name = ET.SubElement(obj, "name")
        name.text = str(int(class_id)+1)

        pose = ET.SubElement(obj, "pose")
        pose.text = "Unspecified"

        truncated = ET.SubElement(obj, "truncated")
        truncated.text = "0"

        difficult = ET.SubElement(obj, "difficult")
        difficult.text = "0"

        bndbox = ET.SubElement(obj, "bndbox")
        xmin = ET.SubElement(bndbox, "xmin")
        ymin = ET.SubElement(bndbox, "ymin")
        xmax = ET.SubElement(bndbox, "xmax")
        ymax = ET.SubElement(bndbox, "ymax")

        xmin.text = str(int((center_x - bbox_width / 2) * image_width))
        ymin.text = str(int((center_y - bbox_height / 2) * image_height))
        xmax.text = str(int((center_x + bbox_width / 2) * image_width))
        ymax.text = str(int((center_y + bbox_height / 2) * image_height))

    tree = ET.ElementTree(root)
    tree.write(voc_path)

yolo_dir_path = "/run/media/spritan/New Volume/VinData/data_processed/step3_split_data/out_split_MONOCHROME1/val/labels"
voc_dir_path  = "/run/media/spritan/New Volume/VinData/data_processed/step3_split_data/out_split_MONOCHROME1/val/labelsVOC"
image_width = 512  # Replace with the actual width of the image
image_height = 512  # Replace with the actual height of the image

yolo_files = os.listdir(yolo_dir_path)

for file in tqdm(yolo_files, desc="processing:"):
    yolo_path = os.path.join(yolo_dir_path,file)
    voc_file = os.path.basename(file).replace('.txt', '.xml')
    voc_path = os.path.join(voc_dir_path,voc_file)
    yolo_to_voc(yolo_path, voc_path, image_width, image_height)