# Data Model for Object Detection
The task here is to prepare a TFRecord dataset that can be fed into the [object detection API of tensorflow](https://github.com/tensorflow/models/tree/master/research/object_detection). This notebook uses a subset of the [GSSS](https://datadryad.org/resource/doi:10.5061/dryad.5pt92) dataset that were used in this [paper](https://datadryad.org/resource/doi:10.5061/dryad.5pt92) by Schneider! <br> 
I broke down the data model into the following steps:<br>
1. Database creation - As part of this step I work on consolidating the input data in various format into a one json file.
2. Using this json to create a tensorflow record
3. Validating the pipeline


The detailed steps that I follow are:
1. Data Export : CSV (from the panoptes API) -> JSON file
2. Data Import : JSON file -> Dictionary object 
3. Write TFRecord : Dictionary Object -> TFRecord file
4. Validate data in the TFRecord

## Importing necessary packages

In [13]:
#import pandas as pd
import csv, os, sys
import operator
import tensorflow as tf
import json
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
sys.path.append('/home/ubuntu/data/tensorflow/my_workspace/camera-trap-detection/data/')
from utils import dataset_util
#Added this to handle the truncation error while decoding the jpeg
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

In [14]:
Project_filepath = "/home/ubuntu/data/tensorflow/my_workspace/training_demo/"#"/home/rai00007/Desktop/zooniverse/" # Original data - "/data/lucifer1.2/users/rai00007/"

In [17]:
df_schneider_box = pd.read_csv(Project_filepath + 'Data/GoldStandardBoundBoxCoord.csv')
schneider_events = list(set(df_schneider_box['filename']))
schneider_events = [word.split('.')[0] for word in schneider_events] # [word[:-4] for word in schneider_events]
len(schneider_events)
df_schneider_box.head()

Unnamed: 0,filename,width,height,class,xmin,ymin,xmax,ymax
0,ASG000dz24.jpg,2048,1536,Impala,1141,883,1227,977
1,ASG000dz24.jpg,2048,1536,Impala,1340,876,1381,925
2,ASG000dz24.jpg,2048,1536,Impala,1448,803,1538,1042
3,ASG000dz24.jpg,2048,1536,Impala,1382,763,1485,1080
4,ASG000c7hr.jpg,2048,1536,Wildebeest,1987,680,2048,751


In [23]:
df_all_images = pd.read_csv(Project_filepath + 'Data/all_images.csv')
df_all_images = df_all_images[df_all_images['CaptureEventID'].isin(schneider_events)]
print(df_all_images.shape)
df_all_images.head()

(10597, 2)


Unnamed: 0,CaptureEventID,URL_Info
1377257,ASG000c6uw,S4/B03/B03_R1/S4_B03_R1_IMAG0137.JPG
1377258,ASG000c6uw,S4/B03/B03_R1/S4_B03_R1_IMAG0138.JPG
1377259,ASG000c6uw,S4/B03/B03_R1/S4_B03_R1_IMAG0139.JPG
1381262,ASG000c6x1,S4/B03/B03_R1/S4_B03_R1_IMAG4142.JPG
1381263,ASG000c6x1,S4/B03/B03_R1/S4_B03_R1_IMAG4143.JPG


In [41]:
df = df_all_images.drop_duplicates(subset='CaptureEventID', keep='first')
df['URL_info_full'] = 'https://snapshotserengeti.s3.msi.umn.edu/' + df['URL_Info'].astype(str)
df.iloc[1]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


CaptureEventID                                           ASG000c6x1
URL_Info                       S4/B03/B03_R1/S4_B03_R1_IMAG4142.JPG
URL_info_full     https://snapshotserengeti.s3.msi.umn.edu/S4/B0...
Name: 1381262, dtype: object

**Download the images**

In [42]:
import os, sys, random, ssl
import urllib, urllib.request

In [43]:
def get_images_from_url(dataset, image_name_index, url_col_index, outpath):
    if (not os.environ.get('PYTHONHTTPSVERIFY', '') and
        getattr(ssl, '_create_unverified_context', None)): 
        ssl._create_default_https_context = ssl._create_unverified_context
        
        check = []
        
        for i in range(dataset.shape[0]):
            if dataset.iloc[i][image_name_index] not in check:
                j = 0
            if dataset.iloc[i][image_name_index] in check:
                j += 1 
            
            print('Processing image: %d' % i)
            
            urllib.request.urlretrieve(dataset.iloc[i][url_col_index], outpath+'{0}.jpg'\
                                       .format(dataset.iloc[i][image_name_index] ))

In [45]:
outpath = '../data/LILA/schneider_images/'

In [None]:
get_images_from_url(df, 0, 2, outpath)

Processing image: 0
Processing image: 1
Processing image: 2
Processing image: 3
Processing image: 4
Processing image: 5
Processing image: 6
Processing image: 7
Processing image: 8
Processing image: 9
Processing image: 10
Processing image: 11
Processing image: 12
Processing image: 13
Processing image: 14
Processing image: 15
Processing image: 16
Processing image: 17
Processing image: 18
Processing image: 19
Processing image: 20
Processing image: 21
Processing image: 22
Processing image: 23
Processing image: 24
Processing image: 25
Processing image: 26
Processing image: 27
Processing image: 28
Processing image: 29
Processing image: 30
Processing image: 31
Processing image: 32
Processing image: 33
Processing image: 34
Processing image: 35
Processing image: 36
Processing image: 37
Processing image: 38
Processing image: 39
Processing image: 40
Processing image: 41
Processing image: 42
Processing image: 43
Processing image: 44
Processing image: 45
Processing image: 46
Processing image: 47
Pr

Processing image: 378
Processing image: 379
Processing image: 380
Processing image: 381
Processing image: 382
Processing image: 383
Processing image: 384
Processing image: 385
Processing image: 386
Processing image: 387
Processing image: 388
Processing image: 389
Processing image: 390
Processing image: 391
Processing image: 392
Processing image: 393
Processing image: 394
Processing image: 395
Processing image: 396
Processing image: 397
Processing image: 398
Processing image: 399
Processing image: 400
Processing image: 401
Processing image: 402
Processing image: 403
Processing image: 404
Processing image: 405
Processing image: 406
Processing image: 407
Processing image: 408
Processing image: 409
Processing image: 410
Processing image: 411
Processing image: 412
Processing image: 413
Processing image: 414
Processing image: 415
Processing image: 416
Processing image: 417
Processing image: 418
Processing image: 419
Processing image: 420
Processing image: 421
Processing image: 422
Processing

Processing image: 751
Processing image: 752
Processing image: 753
Processing image: 754
Processing image: 755
Processing image: 756
Processing image: 757
Processing image: 758
Processing image: 759
Processing image: 760
Processing image: 761
Processing image: 762
Processing image: 763
Processing image: 764
Processing image: 765
Processing image: 766
Processing image: 767
Processing image: 768
Processing image: 769
Processing image: 770
Processing image: 771
Processing image: 772
Processing image: 773
Processing image: 774
Processing image: 775
Processing image: 776
Processing image: 777
Processing image: 778
Processing image: 779
Processing image: 780
Processing image: 781
Processing image: 782
Processing image: 783
Processing image: 784
Processing image: 785
Processing image: 786
Processing image: 787
Processing image: 788
Processing image: 789
Processing image: 790
Processing image: 791
Processing image: 792
Processing image: 793
Processing image: 794
Processing image: 795
Processing

Processing image: 1119
Processing image: 1120
Processing image: 1121
Processing image: 1122
Processing image: 1123
Processing image: 1124
Processing image: 1125
Processing image: 1126
Processing image: 1127
Processing image: 1128
Processing image: 1129
Processing image: 1130
Processing image: 1131
Processing image: 1132
Processing image: 1133
Processing image: 1134
Processing image: 1135
Processing image: 1136
Processing image: 1137
Processing image: 1138
Processing image: 1139
Processing image: 1140
Processing image: 1141
Processing image: 1142
Processing image: 1143
Processing image: 1144
Processing image: 1145
Processing image: 1146
Processing image: 1147
Processing image: 1148
Processing image: 1149
Processing image: 1150
Processing image: 1151
Processing image: 1152
Processing image: 1153
Processing image: 1154
Processing image: 1155
Processing image: 1156
Processing image: 1157
Processing image: 1158
Processing image: 1159
Processing image: 1160
Processing image: 1161
Processing 