# <span style="color:purple">GIS and Machine Learning for Object Detection in Satellite Imagery</span>

<img src="img/robot.jpg"></img>

## <span style="color:blue">Step 4: Generate TF Records from the train/test splits</span>

We now need to create TFRecord files that we need to train an object detection model in TensorFlow.

A few format changes must occur: First, we convert the XML files from all the images in the train and test folders into singular CSV files. Second, we convert the singular CSV files into TFRecord files. We'll use this jupyter notebook to perform these conversions:

## 1. Conversion from XML files to singular CSVs

In [2]:
# Import needed modules
import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET

In [3]:
# Helper function
def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df

In [None]:
os.chdir("")  # Set workspace path

for directory in ['train', 'test']:
    image_path = os.path.join(os.getcwd(), 'images/{0}'.format(directory))
    print("Processing images at {0}...".format(directory))
    xml_df = xml_to_csv(image_path)
    print(xml_df)
    xml_df.to_csv('data/{0}_labels.csv'.format(directory), index=None)
    print('Successfully converted xml to csv.\n')

The output of this script are two files: "test_labels.csv" and "train_labels.csv"

## 2. Conversion of singular CSVs to TFRecords

The [3_generate_tfrecord.py](https://github.com/Qberto/ML_ObjectDetection_CAFO/blob/master/3_generate_tfrecord.py) script then reads these and generates the TFRecord files. 

#### Please note:

##### At this stage you should have TensorFlow installed on your system and the following repository available in your workspace (https://github.com/tensorflow/models/tree/master/research). 

To execute this script for the test and train subsets, create a data folder in your workspace and run the following commands from your prompt:

```
python 3_generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=data/train.record python 3_generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record 
```

You should now see the train.record and test.record in your data folder.

For reference, you may take a look at the data folder in [this repository](https://github.com/Qberto/ML_ObjectDetection_CAFO/tree/master/data) to see what your singular CSVs and TFRecord files should look like.