<div class = "alert alert-block alert-success">
    
# <span style='color:Blue'> NOTEBOOK 1: Formatting and Pre-Processing Object-Aligned Annotations for Nightingale 

## Make sure you are running the *nightingale_env* kernel in this notebook

## This notebook is provided as an example workflow to properly format the groundtruth data for training the SCRDet++ model with Nightingale.  A properly formatted groundtruth file for Nightingale will be a csv with the following column headers:

>> ## 'IMID','xLF','yLF','xRF','yRF','xRB', 'yRB','xLB', 'yLB','class'

## Where 
> ### 'IMID' is the image name without file extension
> ### 'xLF' & 'yLF' are the x,y coordinates of the front-left corner of the object
> ### 'xRF' & 'yRF' are the x,y coordinates of the front-right corner of the object
> ### 'xRB' & 'yRB' are the x,y coordinates of the back-right corner of the object
> ### 'xLB' & 'yLB' are the x,y coordinates of the back-left corner of the object
> ### 'class' is the class or cateogry of the object label


## We'll use a sample of the OMITTED dataset to work through this notebook. Sample NITFs and groundtruth csv are provided in the data/IMAGERY/ and data/CSVs/ folders. The primary steps in this notebook are:
> ## 1. Grouping the OMITTED sub-categories into higher-level classes and removing unwanted classes from the groundtruth
> ## 2. Converting OMITTED's *Front,Back,Left,Right* data object point annotations to a 4-corner-point format using the fblr2corners function
> ## 3. Formatting the groundtruth files for Nightingale
> ## 4. Partitioning the groundtruth data into Training and Test sets

<div class = "alert alert-block alert-success">

## <span style='color:Blue'> Imports </span>

In [1]:
import os
import numpy as np
import random
import pandas as pd
import glob
from libs.box_utils.coordinate_convert import fblr2corners

<div class = "alert alert-block alert-success">

## <span style='color:Blue'> First, we'll define the data and convert the OMITTED object labels to 3 classes, class1, class2, and class3 </span>

<div class = "alert alert-block alert-warning">

## <span style='color:black'> (Optional) Before you start, add your imagery to the Nightingale/Training/data/IMAGERY folder and your groundtruth csv to the Nightingale/Training/data/CSVs folder

In [2]:
# 1) set the path to the image directory and groundtruth file
img_path = 'data/IMAGERY/'
gt_path = 'data/CSVs/sample_OMITTED.csv'

In [1]:
# 2) Read in the groundtruth csv as a pandas dataframe

df = pd.read_csv(gt_path)
print(df.columns)
df.head()

<div class = "alert alert-block alert-success">

## Notice that the raw OMITTED data has the 'objType' column and 'objStatus' column. This is because OMITTED was interested in not only the type of object, but what the object was doing. For simplicity, we'll use the 'objType' and 'objStatus' to filter out unwanted categories and rename kept categories to either 'class1', 'class2', or 'class3'

In [4]:
# 3) Make a list of the NITF files in the image directory and print the number of total NITFs
image_files = os.listdir(img_path)
print(len(image_files))

3


In [5]:
# 4) Make a numpy array of the image id's column in the groundtruth file 
# and add a column called IMID to match Nightingale header format for image id's. 
# Print the number of unique image id's.
IMID = df['ImageID'].to_numpy(np.str)
df['IMID'] = IMID
print(len(np.unique(IMID)), ' unique image ids in the groundtruth file')

3  unique image ids in the groundtruth file


In [2]:
## 5) To a new list, convert the class1 and class2 sub-categories so that only 3 classes remain, leaving other classes labled as "Junk"

count = 0
class3 = []
for obj_class_obj_status in df[['objType','objStatus']].to_numpy(dtype=np.str):
    obj_class_obj_status = obj_class_obj_status[0]+obj_class_obj_status[1]
    
    if 'class1' in obj_class_obj_status:
        class3.append('class1')
        
    elif 'class2_super_cat' in obj_class_obj_status:
        
        if 'class2_subcat' in obj_class_obj_status:
            class3.append('class2')
        else:
            class3.append('class3')
    else:
        class3.append('Junk')
    
    count+=1
print(count, 'total objects converted')
print('The new classes are ',np.unique(np.asarray(class3)))

In [7]:
# 6) add the new class lables list to the dataframe under the 'class' header
df['class'] = class3

In [3]:
# 7) create a new dataframe that only contains the 3 classes we are interested in
df_3class = df[df['class'].isin(['class1','class2','class3'])]
df_3class = df_3class.reset_index().drop('index',axis=1)
df_3class.head()

<div class = "alert alert-block alert-success">
    
## <span style='color:Blue'> Next we'll convert the *front, back, left, right* object pixel point format to 4-corner-point format and then create a new dataframe that contains only the columns required for Nightingale </span>

In [9]:
# 1) Run fblr2corners on each annotation in the groundtruth dataframe and then create a new column for each corner point
corners = []
for index,row in df_3class.iterrows():
    fblr = row[['firstX','firstY','secondX','secondY','thirdX', 'thirdY','fourthX', 'fourthY']].to_numpy(np.int64)
    #print(fblr)
    BBOX = fblr2corners(fblr)
    corners.append(BBOX)
corners = np.asarray(corners)
df_3class['xLF'] = corners[:,0]
df_3class['yLF'] = corners[:,1]
df_3class['xRF'] = corners[:,2]
df_3class['yRF'] = corners[:,3]
df_3class['xRB'] = corners[:,4]
df_3class['yRB'] = corners[:,5]
df_3class['xLB'] = corners[:,6]
df_3class['yLB'] = corners[:,7]

In [4]:
# 2) check new dataframe, now with xLF,yLF,xRF,yRF,xRB,yRB,xLB,yLB corner points for each annotation
df_3class.head()

In [11]:
# 3) To a new dataframe, send only the required annotation 
# information for Nightingale, which includes the image id, 
# four corner points, and category name
df_3class_nightingale = df_3class[['IMID','xLF','yLF','xRF','yRF','xRB', 'yRB','xLB', 'yLB','class']]
print(df_3class_nightingale.columns)
print(len(df_3class_nightingale), 'total annoations')

Index(['IMID', 'xLF', 'yLF', 'xRF', 'yRF', 'xRB', 'yRB', 'xLB', 'yLB',
       'class'],
      dtype='object')
36 total annoations


<div class = "alert alert-block alert-success">

## <span style='color:Blue'> Finally, we'll partition the data by randomly shuffling the image id's and writing separate training and test CSVs to disk </span>

In [12]:
# 1) randomly shuffle a list of unique image id's and break them up into a random 70/30 split using a fixed random seed
IMIDs = list(np.unique(df_3class_nightingale['IMID'].to_numpy()))
random.Random(4).shuffle(IMIDs)
train_im_list = IMIDs[0:int(len(IMIDs)*0.7)]
test_im_list = IMIDs[int(len(IMIDs)*0.7):int(len(IMIDs))]
print('There are ', len(train_im_list), 'training images')
print('There are ', len(test_im_list), 'test images')

There are  2 training images
There are  1 test images


In [5]:
# 2) Create the training-data dataframe
df_3class_nightingale_train = df_3class_nightingale[df_3class_nightingale.IMID.isin(train_im_list)]
df_3class_nightingale_train = df_3class_nightingale_train.reset_index().drop('index',axis=1)
df_3class_nightingale_train.head()

In [6]:
# 3) create the test-data dataframe
df_3class_nightingale_test = df_3class_nightingale[df_3class_nightingale.IMID.isin(test_im_list)]
df_3class_nightingale_test = df_3class_nightingale_test.reset_index().drop('index',axis=1)
df_3class_nightingale_test.head()

<div class = "alert alert-block alert-warning">

## <span style='color:red'> Before we write our training and testing annotations to their own separate csv files, let's check the numbers to make sure everything looks right </span>

In [15]:
# There were originally 36 total annotations in our simplified 3-category dataset. 
# Do our training and test annotation counts add up?
len(df_3class_nightingale_train), len(df_3class_nightingale_test)

(26, 10)

In [7]:
# Let's check the numbers of examples of each class in our training and test set
train_class_count = np.unique(df_3class_nightingale_train['class'].to_numpy(), return_counts=True)
test_class_count = np.unique(df_3class_nightingale_test['class'].to_numpy(), return_counts=True)
print(train_class_count)
print(test_class_count)

In [17]:
# And let's double check the number of images in the training and test set
train_im_count = np.unique(df_3class_nightingale_train['IMID'].to_numpy())
test_im_count = np.unique(df_3class_nightingale_test['IMID'].to_numpy())
print(len(train_im_count))
print(len(test_im_count))

2
1


<div class = "alert alert-block alert-success">

## <span style='color:Blue'> Everything looks good! Let's write our training and test dataframes to new csv files </span>

In [18]:
df_3class_nightingale_train.to_csv('data/CSVs/OMITTED_TRAIN_Data_Nightingale_Format.csv',index=False)
df_3class_nightingale_test.to_csv('data/CSVs/OMITTED_TEST_Data_Nightingale_Format.csv',index=False)

<div class = "alert alert-block alert-success">

# <span style='color:Green'> Great! Now we can move on to the next notebook for adding our training data to a Tensorflow Record. Open Notebook-2, "2-Make_TF_Record.ipynb" to get started. </span>