# Converting Label CSV to Multi-Class & Multi-Bbox CSVs

WNixalo - 6/5/2018

---

Converting labels CSV from x1,y1,x2,y2 coordinate columns format to 'coordinates' column format -- merging in with fastai method.

Also splitting up the single CSV containing coordinates and classes into one for multiple coordinates and one for multiple classes.

I'll convert the class to a number index. I don't know exactly how this interferes or not with having a standard 'background' class -- the fastai pascal multi code has class '0' as 'aeroplane' (if you take a look at `cat2id`). It looks like fastai assigns the 'background' class at an 'end' class, ie: from the line in [pascal-multi](https://github.com/WNoxchi/Aersu/blob/master/GLOC/model_dev/codealong-fastai-dl2-pascal-multi.ipynb): 

```
pos = gt_overlap > 0.4
...
gt_clas[1 - pos] = len(id2cat)
```
Where **if** the overlap threshold is breached, the 1st element of var groundtruth class is set to the length of the class ids -- and *this* is how 'background' is assigned.

That's to say: if there are 12 class ids, then a 13th class is assigned if that detection is 'background'.

## Imports

In [1]:
%matplotlib inline
%reload_ext autoreload
%autoreload 2

In [2]:
from fastai.conv_learner import *

sys.path.insert(1, os.path.join('../'))
from utils import common
from utils import temp_utils
from utils.subfolder_val_idxs import set_val_idxs

from matplotlib import patches, patheffects

In [3]:
PATH = Path('../data')
PATH_TRAIN     = PATH/'train'
PATH_TRAIN_BBX = PATH/'interstage_train'
PATH_CSV     = PATH/'labels.csv'
PATH_CSV_BBX = PATH/'interstage_labels.csv'
CPU_PATH_CSV     = PATH/'cpu_labels.csv'
CPU_PATH_CSV_BBX = PATH/'cpu_interstage_labels.csv'

## Testing: convert coords to string in CSV

In [89]:
df_bbx = pd.read_csv(PATH_CSV_BBX)
df_bbx.columns = ['id','x1','y1','x2','y2','class']
df_bbx.to_csv(PATH_CSV_BBX, index=False)
df_bbx.head()

Unnamed: 0,id,x1,y1,x2,y2,class
0,interstage_train/000000-000412/000000.jpg,83,72,191,380,pilot
1,interstage_train/000000-000412/000001.jpg,52,89,204,381,pilot
2,interstage_train/000000-000412/000002.jpg,58,89,208,390,pilot
3,interstage_train/000000-000412/000003.jpg,66,98,214,388,pilot
4,interstage_train/000000-000412/000004.jpg,65,90,209,389,pilot


In [90]:
cols = ['x1','y1','x2','y2']
bbxs = df_bbx[cols].values
bbxs = [np.array(row) for row in bbxs]
bbxs = [' '.join(str(o) for o in row) for row in bbxs]
bbxs[:10]

['83 72 191 380',
 '52 89 204 381',
 '58 89 208 390',
 '66 98 214 388',
 '65 90 209 389',
 '50 73 208 386',
 '51 74 197 382',
 '49 75 200 381',
 '48 71 203 376',
 '91 97 201 376']

In [91]:
df_bbx.columns

Index(['id', 'x1', 'y1', 'x2', 'y2', 'class'], dtype='object')

In [92]:
new_df_bbx = df_bbx[['id','class']]
new_df_bbx.insert(1, 'bbox', bbxs)
new_df_bbx.head()

Unnamed: 0,id,bbox,class
0,interstage_train/000000-000412/000000.jpg,83 72 191 380,pilot
1,interstage_train/000000-000412/000001.jpg,52 89 204 381,pilot
2,interstage_train/000000-000412/000002.jpg,58 89 208 390,pilot
3,interstage_train/000000-000412/000003.jpg,66 98 214 388,pilot
4,interstage_train/000000-000412/000004.jpg,65 90 209 389,pilot


In [93]:
new_ids = new_df_bbx['id']
new_ids = [o.split('interstage_train/')[-1] for o in new_ids]
new_df_bbx = new_df_bbx.drop(columns='id')
new_df_bbx.insert(0, 'id', new_ids)
new_df_bbx.head()

Unnamed: 0,id,bbox,class
0,000000-000412/000000.jpg,83 72 191 380,pilot
1,000000-000412/000001.jpg,52 89 204 381,pilot
2,000000-000412/000002.jpg,58 89 208 390,pilot
3,000000-000412/000003.jpg,66 98 214 388,pilot
4,000000-000412/000004.jpg,65 90 209 389,pilot


In [14]:
new_df_bbx.to_csv(PATH/'class_bbox_labels.csv', index=False)

## Multi-Class CSV

In [4]:
multi_class_df = pd.read_csv(PATH/'class_bbox_labels.csv')

In [5]:
multi_class_df = multi_class_df.drop(columns='bbox')

In [6]:
multi_class_df.head()

Unnamed: 0,id,class
0,000000-000412/000000.jpg,pilot
1,000000-000412/000001.jpg,pilot
2,000000-000412/000002.jpg,pilot
3,000000-000412/000003.jpg,pilot
4,000000-000412/000004.jpg,pilot


I know the 'pilot' class will be the 1st id. I'll worry about others later. At this point, I really care about a 'pilot' 'no pilot' detector. I'm transitioning from a 2-stage to a 1-stage detector/classifier in this rewrite of GLoC.

In [11]:
clas_ids = multi_class_df['class']
clas_ids = [0 for row in clas_ids]

multi_class_df = multi_class_df.drop(columns='class')
multi_class_df.insert(1, 'class', clas_ids)
multi_class_df.head()

Unnamed: 0,id,class
0,000000-000412/000000.jpg,0
1,000000-000412/000001.jpg,0
2,000000-000412/000002.jpg,0
3,000000-000412/000003.jpg,0
4,000000-000412/000004.jpg,0


In [12]:
multi_class_df.to_csv(PATH/'class_labels.csv', index=False)

## Multi-Coordinate CSV

In [4]:
multi_coord_df = pd.read_csv(PATH/'class_bbox_labels.csv')

In [6]:
multi_coord_df = multi_coord_df.drop(columns='class')

In [7]:
multi_coord_df.head()

Unnamed: 0,id,bbox
0,000000-000412/000000.jpg,83 72 191 380
1,000000-000412/000001.jpg,52 89 204 381
2,000000-000412/000002.jpg,58 89 208 390
3,000000-000412/000003.jpg,66 98 214 388
4,000000-000412/000004.jpg,65 90 209 389


In [9]:
multi_coord_df.to_csv(PATH/'bbox_labels.csv', index=False)

---

If the '.jpg' suffix becomes an issue I'll just remove it.