# Data Model for Object Detection
The task here is to prepare a TFRecord dataset that can be fed into the [object detection API of tensorflow](https://github.com/tensorflow/models/tree/master/research/object_detection). This notebook uses a subset of the [GSSS](https://datadryad.org/resource/doi:10.5061/dryad.5pt92) dataset that were used in this [paper](https://datadryad.org/resource/doi:10.5061/dryad.5pt92) by Schneider! <br> 
I broke down the data model into the following steps:<br>
1. Database creation - As part of this step I work on consolidating the input data in various format into a one json file.
2. Using this json to create a tensorflow record
3. Validating the pipeline


The detailed steps that I follow are:
1. Data Export : CSV (from the panoptes API) -> JSON file
2. Data Import : JSON file -> Dictionary object 
3. Write TFRecord : Dictionary Object -> TFRecord file
4. Validate data in the TFRecord

## Importing necessary packages

In [13]:
#import pandas as pd
import csv, os, sys
import operator
import tensorflow as tf
import json
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
sys.path.append('/home/ubuntu/data/tensorflow/my_workspace/camera-trap-detection/data/')
from utils import dataset_util
#Added this to handle the truncation error while decoding the jpeg
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

In [14]:
Project_filepath = "/home/ubuntu/data/tensorflow/my_workspace/training_demo/"#"/home/rai00007/Desktop/zooniverse/" # Original data - "/data/lucifer1.2/users/rai00007/"

In [17]:
df_schneider_box = pd.read_csv(Project_filepath + 'Data/GoldStandardBoundBoxCoord.csv')
schneider_events = list(set(df_schneider_box['filename']))
schneider_events = [word.split('.')[0] for word in schneider_events] # [word[:-4] for word in schneider_events]
len(schneider_events)
df_schneider_box.head()

Unnamed: 0,filename,width,height,class,xmin,ymin,xmax,ymax
0,ASG000dz24.jpg,2048,1536,Impala,1141,883,1227,977
1,ASG000dz24.jpg,2048,1536,Impala,1340,876,1381,925
2,ASG000dz24.jpg,2048,1536,Impala,1448,803,1538,1042
3,ASG000dz24.jpg,2048,1536,Impala,1382,763,1485,1080
4,ASG000c7hr.jpg,2048,1536,Wildebeest,1987,680,2048,751


In [23]:
df_all_images = pd.read_csv(Project_filepath + 'Data/all_images.csv')
df_all_images = df_all_images[df_all_images['CaptureEventID'].isin(schneider_events)]
print(df_all_images.shape)
df_all_images.head()

(10597, 2)


Unnamed: 0,CaptureEventID,URL_Info
1377257,ASG000c6uw,S4/B03/B03_R1/S4_B03_R1_IMAG0137.JPG
1377258,ASG000c6uw,S4/B03/B03_R1/S4_B03_R1_IMAG0138.JPG
1377259,ASG000c6uw,S4/B03/B03_R1/S4_B03_R1_IMAG0139.JPG
1381262,ASG000c6x1,S4/B03/B03_R1/S4_B03_R1_IMAG4142.JPG
1381263,ASG000c6x1,S4/B03/B03_R1/S4_B03_R1_IMAG4143.JPG


In [41]:
df = df_all_images.drop_duplicates(subset='CaptureEventID', keep='first')
df['URL_info_full'] = 'https://snapshotserengeti.s3.msi.umn.edu/' + df['URL_Info'].astype(str)
df.iloc[1]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


CaptureEventID                                           ASG000c6x1
URL_Info                       S4/B03/B03_R1/S4_B03_R1_IMAG4142.JPG
URL_info_full     https://snapshotserengeti.s3.msi.umn.edu/S4/B0...
Name: 1381262, dtype: object

**Download the images**

In [42]:
import os, sys, random, ssl
import urllib, urllib.request

In [43]:
def get_images_from_url(dataset, image_name_index, url_col_index, outpath):
    if (not os.environ.get('PYTHONHTTPSVERIFY', '') and
        getattr(ssl, '_create_unverified_context', None)): 
        ssl._create_default_https_context = ssl._create_unverified_context
        
        check = []
        
        for i in range(dataset.shape[0]):
            if dataset.iloc[i][image_name_index] not in check:
                j = 0
            if dataset.iloc[i][image_name_index] in check:
                j += 1 
            
            print('Processing image: %d' % i)
            
            urllib.request.urlretrieve(dataset.iloc[i][url_col_index], outpath+'{0}.jpg'\
                                       .format(dataset.iloc[i][image_name_index] ))

In [45]:
outpath = '../data/LILA/schneider_images/'

In [47]:
get_images_from_url(df, 0, 2, outpath)

Processing image: 0
Processing image: 1
Processing image: 2
Processing image: 3
Processing image: 4
Processing image: 5
Processing image: 6
Processing image: 7
Processing image: 8
Processing image: 9
Processing image: 10
Processing image: 11
Processing image: 12
Processing image: 13
Processing image: 14
Processing image: 15
Processing image: 16
Processing image: 17
Processing image: 18
Processing image: 19
Processing image: 20
Processing image: 21
Processing image: 22
Processing image: 23
Processing image: 24
Processing image: 25
Processing image: 26
Processing image: 27
Processing image: 28
Processing image: 29
Processing image: 30
Processing image: 31
Processing image: 32
Processing image: 33
Processing image: 34
Processing image: 35
Processing image: 36
Processing image: 37
Processing image: 38
Processing image: 39
Processing image: 40
Processing image: 41
Processing image: 42
Processing image: 43
Processing image: 44
Processing image: 45
Processing image: 46
Processing image: 47
Pr

Processing image: 378
Processing image: 379
Processing image: 380
Processing image: 381
Processing image: 382
Processing image: 383
Processing image: 384
Processing image: 385
Processing image: 386
Processing image: 387
Processing image: 388
Processing image: 389
Processing image: 390
Processing image: 391
Processing image: 392
Processing image: 393
Processing image: 394
Processing image: 395
Processing image: 396
Processing image: 397
Processing image: 398
Processing image: 399
Processing image: 400
Processing image: 401
Processing image: 402
Processing image: 403
Processing image: 404
Processing image: 405
Processing image: 406
Processing image: 407
Processing image: 408
Processing image: 409
Processing image: 410
Processing image: 411
Processing image: 412
Processing image: 413
Processing image: 414
Processing image: 415
Processing image: 416
Processing image: 417
Processing image: 418
Processing image: 419
Processing image: 420
Processing image: 421
Processing image: 422
Processing

Processing image: 751
Processing image: 752
Processing image: 753
Processing image: 754
Processing image: 755
Processing image: 756
Processing image: 757
Processing image: 758
Processing image: 759
Processing image: 760
Processing image: 761
Processing image: 762
Processing image: 763
Processing image: 764
Processing image: 765
Processing image: 766
Processing image: 767
Processing image: 768
Processing image: 769
Processing image: 770
Processing image: 771
Processing image: 772
Processing image: 773
Processing image: 774
Processing image: 775
Processing image: 776
Processing image: 777
Processing image: 778
Processing image: 779
Processing image: 780
Processing image: 781
Processing image: 782
Processing image: 783
Processing image: 784
Processing image: 785
Processing image: 786
Processing image: 787
Processing image: 788
Processing image: 789
Processing image: 790
Processing image: 791
Processing image: 792
Processing image: 793
Processing image: 794
Processing image: 795
Processing

Processing image: 1119
Processing image: 1120
Processing image: 1121
Processing image: 1122
Processing image: 1123
Processing image: 1124
Processing image: 1125
Processing image: 1126
Processing image: 1127
Processing image: 1128
Processing image: 1129
Processing image: 1130
Processing image: 1131
Processing image: 1132
Processing image: 1133
Processing image: 1134
Processing image: 1135
Processing image: 1136
Processing image: 1137
Processing image: 1138
Processing image: 1139
Processing image: 1140
Processing image: 1141
Processing image: 1142
Processing image: 1143
Processing image: 1144
Processing image: 1145
Processing image: 1146
Processing image: 1147
Processing image: 1148
Processing image: 1149
Processing image: 1150
Processing image: 1151
Processing image: 1152
Processing image: 1153
Processing image: 1154
Processing image: 1155
Processing image: 1156
Processing image: 1157
Processing image: 1158
Processing image: 1159
Processing image: 1160
Processing image: 1161
Processing 

Processing image: 1476
Processing image: 1477
Processing image: 1478
Processing image: 1479
Processing image: 1480
Processing image: 1481
Processing image: 1482
Processing image: 1483
Processing image: 1484
Processing image: 1485
Processing image: 1486
Processing image: 1487
Processing image: 1488
Processing image: 1489
Processing image: 1490
Processing image: 1491
Processing image: 1492
Processing image: 1493
Processing image: 1494
Processing image: 1495
Processing image: 1496
Processing image: 1497
Processing image: 1498
Processing image: 1499
Processing image: 1500
Processing image: 1501
Processing image: 1502
Processing image: 1503
Processing image: 1504
Processing image: 1505
Processing image: 1506
Processing image: 1507
Processing image: 1508
Processing image: 1509
Processing image: 1510
Processing image: 1511
Processing image: 1512
Processing image: 1513
Processing image: 1514
Processing image: 1515
Processing image: 1516
Processing image: 1517
Processing image: 1518
Processing 

Processing image: 1833
Processing image: 1834
Processing image: 1835
Processing image: 1836
Processing image: 1837
Processing image: 1838
Processing image: 1839
Processing image: 1840
Processing image: 1841
Processing image: 1842
Processing image: 1843
Processing image: 1844
Processing image: 1845
Processing image: 1846
Processing image: 1847
Processing image: 1848
Processing image: 1849
Processing image: 1850
Processing image: 1851
Processing image: 1852
Processing image: 1853
Processing image: 1854
Processing image: 1855
Processing image: 1856
Processing image: 1857
Processing image: 1858
Processing image: 1859
Processing image: 1860
Processing image: 1861
Processing image: 1862
Processing image: 1863
Processing image: 1864
Processing image: 1865
Processing image: 1866
Processing image: 1867
Processing image: 1868
Processing image: 1869
Processing image: 1870
Processing image: 1871
Processing image: 1872
Processing image: 1873
Processing image: 1874
Processing image: 1875
Processing 

Processing image: 2190
Processing image: 2191
Processing image: 2192
Processing image: 2193
Processing image: 2194
Processing image: 2195
Processing image: 2196
Processing image: 2197
Processing image: 2198
Processing image: 2199
Processing image: 2200
Processing image: 2201
Processing image: 2202
Processing image: 2203
Processing image: 2204
Processing image: 2205
Processing image: 2206
Processing image: 2207
Processing image: 2208
Processing image: 2209
Processing image: 2210
Processing image: 2211
Processing image: 2212
Processing image: 2213
Processing image: 2214
Processing image: 2215
Processing image: 2216
Processing image: 2217
Processing image: 2218
Processing image: 2219
Processing image: 2220
Processing image: 2221
Processing image: 2222
Processing image: 2223
Processing image: 2224
Processing image: 2225
Processing image: 2226
Processing image: 2227
Processing image: 2228
Processing image: 2229
Processing image: 2230
Processing image: 2231
Processing image: 2232
Processing 

Processing image: 2547
Processing image: 2548
Processing image: 2549
Processing image: 2550
Processing image: 2551
Processing image: 2552
Processing image: 2553
Processing image: 2554
Processing image: 2555
Processing image: 2556
Processing image: 2557
Processing image: 2558
Processing image: 2559
Processing image: 2560
Processing image: 2561
Processing image: 2562
Processing image: 2563
Processing image: 2564
Processing image: 2565
Processing image: 2566
Processing image: 2567
Processing image: 2568
Processing image: 2569
Processing image: 2570
Processing image: 2571
Processing image: 2572
Processing image: 2573
Processing image: 2574
Processing image: 2575
Processing image: 2576
Processing image: 2577
Processing image: 2578
Processing image: 2579
Processing image: 2580
Processing image: 2581
Processing image: 2582
Processing image: 2583
Processing image: 2584
Processing image: 2585
Processing image: 2586
Processing image: 2587
Processing image: 2588
Processing image: 2589
Processing 

Processing image: 2904
Processing image: 2905
Processing image: 2906
Processing image: 2907
Processing image: 2908
Processing image: 2909
Processing image: 2910
Processing image: 2911
Processing image: 2912
Processing image: 2913
Processing image: 2914
Processing image: 2915
Processing image: 2916
Processing image: 2917
Processing image: 2918
Processing image: 2919
Processing image: 2920
Processing image: 2921
Processing image: 2922
Processing image: 2923
Processing image: 2924
Processing image: 2925
Processing image: 2926
Processing image: 2927
Processing image: 2928
Processing image: 2929
Processing image: 2930
Processing image: 2931
Processing image: 2932
Processing image: 2933
Processing image: 2934
Processing image: 2935
Processing image: 2936
Processing image: 2937
Processing image: 2938
Processing image: 2939
Processing image: 2940
Processing image: 2941
Processing image: 2942
Processing image: 2943
Processing image: 2944
Processing image: 2945
Processing image: 2946
Processing 

Processing image: 3261
Processing image: 3262
Processing image: 3263
Processing image: 3264
Processing image: 3265
Processing image: 3266
Processing image: 3267
Processing image: 3268
Processing image: 3269
Processing image: 3270
Processing image: 3271
Processing image: 3272
Processing image: 3273
Processing image: 3274
Processing image: 3275
Processing image: 3276
Processing image: 3277
Processing image: 3278
Processing image: 3279
Processing image: 3280
Processing image: 3281
Processing image: 3282
Processing image: 3283
Processing image: 3284
Processing image: 3285
Processing image: 3286
Processing image: 3287
Processing image: 3288
Processing image: 3289
Processing image: 3290
Processing image: 3291
Processing image: 3292
Processing image: 3293
Processing image: 3294
Processing image: 3295
Processing image: 3296
Processing image: 3297
Processing image: 3298
Processing image: 3299
Processing image: 3300
Processing image: 3301
Processing image: 3302
Processing image: 3303
Processing 

Processing image: 3618
Processing image: 3619
Processing image: 3620
Processing image: 3621
Processing image: 3622
Processing image: 3623
Processing image: 3624
Processing image: 3625
Processing image: 3626
Processing image: 3627
Processing image: 3628
Processing image: 3629
Processing image: 3630
Processing image: 3631
Processing image: 3632
Processing image: 3633
Processing image: 3634
Processing image: 3635
Processing image: 3636
Processing image: 3637
Processing image: 3638
Processing image: 3639
Processing image: 3640
Processing image: 3641
Processing image: 3642
Processing image: 3643
Processing image: 3644
Processing image: 3645
Processing image: 3646
Processing image: 3647
Processing image: 3648
Processing image: 3649
Processing image: 3650
Processing image: 3651
Processing image: 3652
Processing image: 3653
Processing image: 3654
Processing image: 3655
Processing image: 3656
Processing image: 3657
Processing image: 3658
Processing image: 3659
Processing image: 3660
Processing 

Processing image: 3975
Processing image: 3976
Processing image: 3977
Processing image: 3978
Processing image: 3979
Processing image: 3980
Processing image: 3981
Processing image: 3982
Processing image: 3983
Processing image: 3984
Processing image: 3985
Processing image: 3986
Processing image: 3987
Processing image: 3988
Processing image: 3989
Processing image: 3990
Processing image: 3991
Processing image: 3992
Processing image: 3993
Processing image: 3994
Processing image: 3995
Processing image: 3996
Processing image: 3997
Processing image: 3998
Processing image: 3999
Processing image: 4000
Processing image: 4001
Processing image: 4002
Processing image: 4003
Processing image: 4004
Processing image: 4005
Processing image: 4006
Processing image: 4007
Processing image: 4008
Processing image: 4009
Processing image: 4010


In [48]:
pwd

'/home/ubuntu/data/tensorflow/my_workspace/camera-trap-detection/data_prep'