**Extracting Class and bounding box info from xml files**

In this section we will extract all the relevant information from the bounding boxes created

In [None]:
import os
import glob
import pandas as pd
import io
import xml.etree.ElementTree as ET

In [None]:
# Displaying all the xml files
path = '/content/drive/MyDrive/Bayesian Quest/JMJTL-BQ-ObjectDetection/data'
allFiles = glob.glob(path + '/*.xml')
allFiles

['/content/drive/MyDrive/Bayesian Quest/JMJTL-BQ-ObjectDetection/data/pothole1.xml',
 '/content/drive/MyDrive/Bayesian Quest/JMJTL-BQ-ObjectDetection/data/pothole2.xml',
 '/content/drive/MyDrive/Bayesian Quest/JMJTL-BQ-ObjectDetection/data/pothole3.xml',
 '/content/drive/MyDrive/Bayesian Quest/JMJTL-BQ-ObjectDetection/data/pothole4.xml',
 '/content/drive/MyDrive/Bayesian Quest/JMJTL-BQ-ObjectDetection/data/pothole5.xml',
 '/content/drive/MyDrive/Bayesian Quest/JMJTL-BQ-ObjectDetection/data/pothole6.xml',
 '/content/drive/MyDrive/Bayesian Quest/JMJTL-BQ-ObjectDetection/data/pothole7.xml',
 '/content/drive/MyDrive/Bayesian Quest/JMJTL-BQ-ObjectDetection/data/pothole8.xml',
 '/content/drive/MyDrive/Bayesian Quest/JMJTL-BQ-ObjectDetection/data/pothole10.xml',
 '/content/drive/MyDrive/Bayesian Quest/JMJTL-BQ-ObjectDetection/data/pothole9.xml',
 '/content/drive/MyDrive/Bayesian Quest/JMJTL-BQ-ObjectDetection/data/pothole11.xml',
 '/content/drive/MyDrive/Bayesian Quest/JMJTL-BQ-ObjectDetectio

Next we need to parse through the 'xml'files and then extract the information from the file. We will use the 'ElementTree' method in the xml package to parse through the folder and then get the relevant information.

In the first line we get the 'tree' object and the get the 'root' of the xml file. The root object contains all the objects as children. Let us just extract all the elements contained in the xml file.

IN the below method we go through each of the elements of the xml file and then extract the tags and the attribute of the element. We can see the major elements in the output. If we compare the xml file we can see all these elements listed in the xml file.

In the below the elements named as 'object' are the bounding boxes we annotated in the earlier step. These objects contains the bounding box information we need. We will see how to extract them in a minute

In [None]:
xml_file = allFiles[0]
# Get the tree object
tree = ET.parse(xml_file)
# Get the root of the xml file
root = tree.getroot()
# Extracting the tag from each child
for child in root:
    print(child.tag)

folder
filename
path
source
size
segmented
object
object
object
object
object
object
object
object


Let us first extract the filename of this xml file using the root.find() method. We need to specify which element we want to look into, which in our case is the 'filename' and to get the filename as a string we give the .text extension.

In [None]:
filename = root.find('filename').text
filename

'pothole1.jpeg'

Let us now get the width and height of the image. We can see from the xml file that this is contained in the element 'size'. We will use the find() method to extract these elements and the convert the text into integer to get the width and height information we want. Let us get those elements

In [None]:
width = int(root.find('size').find('width').text)
height = int(root.find('size').find('height').text)
print(width,height)

275 183


Our next task is to extract the class names and the bounding box elements. These are contained in each of the 'object' elements under the name 'bndbox'. The class is contained inside this element under the element name 'name' and the bounding boxes are with the element names 'xmin','ymin','xmax','ymax'. Let us look at one of the sample object elements

In [None]:
members = root.findall('object')
member = members[0]
print(member.find('name').text)
print(member.find('bndbox').find('xmin').text)

pothole
64


From the above we can see the class name and one of the bounding box values. Now that we have seen all the moving parts of what we want to do, let us encapsulate all these into a function and extract all the information into a pandas dataframe

In [None]:
def xml_to_pd(path):
    """Iterates through all .xml files (generated by labelImg) in a given directory and combines
    them in a single Pandas dataframe.

    Parameters:
    ----------
    path : str
        The path containing the .xml files
    Returns
    -------
    Pandas DataFrame
        The produced dataframe
    """

    xml_list = []
    # List down all the files within the path
    for xml_file in glob.glob(path + '/*.xml'):
        # Get the tree and the root of the xml files
        tree = ET.parse(xml_file)
        root = tree.getroot()
        # Get the filename, width and height from the respective elements
        filename = root.find('filename').text
        width = int(root.find('size').find('width').text)
        height = int(root.find('size').find('height').text)
        # Extract the class names and the bounding boxes of the classes
        for member in root.findall('object'):
            bndbox = member.find('bndbox')
            value = (filename,
                     width,
                     height,
                     member.find('name').text,
                     int(bndbox.find('xmin').text),
                     int(bndbox.find('ymin').text),
                     int(bndbox.find('xmax').text),
                     int(bndbox.find('ymax').text),
                     )
            xml_list.append(value)
    # Consolidate all the information into a data frame
    column_name = ['filename', 'width', 'height',
                   'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df

In [None]:
pothole_df = xml_to_pd(path)
pothole_df

Unnamed: 0,filename,width,height,class,xmin,ymin,xmax,ymax
0,pothole1.jpeg,275,183,pothole,64,78,130,107
1,pothole1.jpeg,275,183,pothole,44,105,131,154
2,pothole1.jpeg,275,183,pothole,12,151,59,177
3,pothole1.jpeg,275,183,vegetation,163,33,254,58
4,pothole1.jpeg,275,183,pothole,115,54,142,74
...,...,...,...,...,...,...,...,...
60,pothole18.jpeg,201,251,vehicle,9,99,52,128
61,pothole18.jpeg,201,251,vehicle,85,61,120,86
62,pothole18.jpeg,201,251,vehicle,106,5,147,45
63,pothole18.jpeg,201,251,vehicle,91,44,117,61
