## nested image classification
## Data processing part 1

This is a notebook documenting the first part of the data processing on the image dataset provided by nested.

In [2]:
# import libraries and nested_utilities
import nested_utilities as nutil
import os
import shutil
import pandas as pd


# create a timestamped directory for this processing activity, a
# and corresponding image catalogues created
out_dir = os.path.join('./data_catalogues/processing_'+nutil.timestamp())
os.mkdir(out_dir)
#out_dir

Most of the code routines used in this notebook are in:
```nested_utilities.py```

### 1. Initial 'source_data' transfer, directory rearrangement and non-image file deletion 

This will take the 'raw' data as provided in: 

`tagged-property-images-20180304/`  

and move it into    

`data/base_data_NEW/`  

The directory

`./data/`

needs to exist before this code is run.

In [2]:
# basic initial configuration - moves directory, renames Misc (to _int and _ext)
# creates 'uncertain' dir
# moves dirs out of interior/ and exterior/ and deletes them
# removes non-.jpg files
# all created in ./data/base_data_NEW/


source_path = './archive/tagged-property-images-20180304'
nutil.configure_base_directory(source_path)

Moving directory
Renaming misc directories
Creating 'uncertain' & 'graphic' directory
Move room directories out of /interior & /exterior
Deleting /interior and /exterior
Deleting non *.jpg files
Removed 20328 non .jpg files


To avoid deleting data by mistake, any existing `base_data/` catalogue is renamed, and then the newly created directory is promoted to `base_data/`

In [3]:
# promotes 'base_data_NEW' to 'base_data', and renames old base_data to 'base_data_OLD

nutil.promote_new_base_dir('./data/base_data_NEW/')

### 2. Catalogue ./base_data/, find any duplicates, remove them, and re-catalogue

The first task is to produce a 'starting point' catalogue of the data using
`nutil.build_catalogue`. And then check for any duplicates.

In [4]:
# catalogues base_data folders
base_data_cat = nutil.build_catalogue('./data/base_data/')
#base_data_cat.to_csv(os.path.join(out_dir, 'base_data_with_dupes.csv'))
print(base_data_cat.shape)
print(base_data_cat.info())

base_data_cat.sample(12).head(12)

Unnamed: 0,id,room,filename,image_path
4852,0890449e472ae76b7a96c825297d327e368acc7a,bedroom,0890449e472ae76b7a96c825297d327e368acc7a.jpg,./data/base_data/bedroom/0890449e472ae76b7a96c...
7946,0836501e8728f574bd5a10a35b2458681f406672,empty,0836501e8728f574bd5a10a35b2458681f406672.jpg,./data/base_data/empty/0836501e8728f574bd5a10a...
1133,0760416ee502259aa4abb3c284e198f97e71225d,bathroom,0760416ee502259aa4abb3c284e198f97e71225d.jpg,./data/base_data/bathroom/0760416ee502259aa4ab...
3617,06aaf072af5b92d42618178e5c8cadc54f50d750,bedroom,06aaf072af5b92d42618178e5c8cadc54f50d750.jpg,./data/base_data/bedroom/06aaf072af5b92d426181...
2418,01c58f202b165f2c9e493e1a7c0d35db334ec089,bedroom,01c58f202b165f2c9e493e1a7c0d35db334ec089.jpg,./data/base_data/bedroom/01c58f202b165f2c9e493...
10565,116455cf80090879359be9a55254bf3ac37b5059,front,116455cf80090879359be9a55254bf3ac37b5059.jpg,./data/base_data/front/116455cf80090879359be9a...
699,0652b6d568ff3422b6f9d73c9c795bba20b83030,bathroom,0652b6d568ff3422b6f9d73c9c795bba20b83030.jpg,./data/base_data/bathroom/0652b6d568ff3422b6f9...
1788,08ea1f59c0af861c5576323b6734455d9e99c53a,bathroom,08ea1f59c0af861c5576323b6734455d9e99c53a.jpg,./data/base_data/bathroom/08ea1f59c0af861c5576...
5422,205acf42212711a77c682ba6541622ba932da0b9,bedroom,205acf42212711a77c682ba6541622ba932da0b9.jpg,./data/base_data/bedroom/205acf42212711a77c682...
12928,057755d7cbb36e42ec45438f955c04f1215d48e3,kitchen,057755d7cbb36e42ec45438f955c04f1215d48e3.jpg,./data/base_data/kitchen/057755d7cbb36e42ec454...


In [8]:
# finds any duplicates
id_val_counts = base_data_cat['id'].value_counts()
duplicates = id_val_counts.loc[id_val_counts > 1]
duplicates

02a55bf3e3f81df999b6afde2c1cecdebeaca619    2
00b2c17a73354413bbaf825a929b1263d63000c9    2
Name: id, dtype: int64

In [10]:
duplicates_df = base_data_cat.loc[base_data_cat['id'].isin(duplicates.index)].copy()
duplicates_df.reset_index(inplace=True)
duplicates_df.drop('index', axis=1, inplace=True)
duplicates_df.to_csv(os.path.join(out_dir, 'duplicate_data.csv'))

duplicates_df

Unnamed: 0,id,room,filename,image_path
0,00b2c17a73354413bbaf825a929b1263d63000c9,conservatory,00b2c17a73354413bbaf825a929b1263d63000c9.jpg,./data/base_data/conservatory/00b2c17a73354413...
1,00b2c17a73354413bbaf825a929b1263d63000c9,diningroom,00b2c17a73354413bbaf825a929b1263d63000c9.jpg,./data/base_data/diningroom/00b2c17a73354413bb...
2,02a55bf3e3f81df999b6afde2c1cecdebeaca619,entrance,02a55bf3e3f81df999b6afde2c1cecdebeaca619.jpg,./data/base_data/entrance/02a55bf3e3f81df999b6...
3,02a55bf3e3f81df999b6afde2c1cecdebeaca619,front,02a55bf3e3f81df999b6afde2c1cecdebeaca619.jpg,./data/base_data/front/02a55bf3e3f81df999b6afd...


Now delete the duplicate files:

In [11]:
# '00b2c17a73354413bbaf825a929b1263d63000c9.jpg' is a conservatory
# '02a55bf3e3f81df999b6afde2c1cecdebeaca619.jpg' is a 'front' (though uncertain ...)
# So delete the rows with the 'other' label ([1,2])

idx_to_delete = [1,2]
path_to_delete = []
for img in idx_to_delete:
    path_to_delete.append(duplicates_df.iloc[img,3])

for file in path_to_delete:
    try: 
        os.remove(file)
        print('deleted file {}'.format(file))
    except OSError:
        pass
          

deleted file ./data/base_data/diningroom/00b2c17a73354413bbaf825a929b1263d63000c9.jpg
deleted file ./data/base_data/entrance/02a55bf3e3f81df999b6afde2c1cecdebeaca619.jpg


And now recatalogue and recheck for duplicates to make sure we got them all.

In [19]:
# Now re-index again
# catalogues base_data folders

base_data_cat = nutil.build_catalogue('./data/base_data/')
#base_data_cat.to_csv(os.path.join(out_dir, 'base_data_no_dupes.csv'))

base_data_cat.sample(12).head(12)


Unnamed: 0,id,room,filename,image_path
11458,0650295c6e572739cba1250c3507e5f1ff084b1d,garden,0650295c6e572739cba1250c3507e5f1ff084b1d.jpg,./data/base_data/garden/0650295c6e572739cba125...
13953,078c93fe515a57bb57038456ef3b0a5d8e3f1306,kitchen,078c93fe515a57bb57038456ef3b0a5d8e3f1306.jpg,./data/base_data/kitchen/078c93fe515a57bb57038...
2974,05aedea91252c032233e89b398fe97263aeb46e6,bedroom,05aedea91252c032233e89b398fe97263aeb46e6.jpg,./data/base_data/bedroom/05aedea91252c032233e8...
3024,05bc8fb5610c1ef2b314bf7629be33bb0c04c146,bedroom,05bc8fb5610c1ef2b314bf7629be33bb0c04c146.jpg,./data/base_data/bedroom/05bc8fb5610c1ef2b314b...
2883,058ec08f35ca491d215e1bbbb0f8bdcdcba873e5,bedroom,058ec08f35ca491d215e1bbbb0f8bdcdcba873e5.jpg,./data/base_data/bedroom/058ec08f35ca491d215e1...
12035,082e735cacbe58155d55fa952a2b0df42f68ef79,garden,082e735cacbe58155d55fa952a2b0df42f68ef79.jpg,./data/base_data/garden/082e735cacbe58155d55fa...
3690,06cadd9f86fb6635deb1441f9c0b28ca8b623a79,bedroom,06cadd9f86fb6635deb1441f9c0b28ca8b623a79.jpg,./data/base_data/bedroom/06cadd9f86fb6635deb14...
17717,24566d3cbf900613fc6f0d3f512c5b51f6f60ae9,livingroom,24566d3cbf900613fc6f0d3f512c5b51f6f60ae9.jpg,./data/base_data/livingroom/24566d3cbf900613fc...
14519,08d14066227d4ff966d0799c46a2849e71ef4c53,kitchen,08d14066227d4ff966d0799c46a2849e71ef4c53.jpg,./data/base_data/kitchen/08d14066227d4ff966d07...
11648,06de16bd2385b549c698dca1dad3763519601c83,garden,06de16bd2385b549c698dca1dad3763519601c83.jpg,./data/base_data/garden/06de16bd2385b549c698dc...


In [24]:
# checks on duplicates remaining
# should return an empty series

id_val_counts_2 = base_data_cat['id'].value_counts()
id_val_counts_2.loc[id_val_counts_2 > 1]


Series([], Name: id, dtype: int64)

In [25]:
print(base_data_cat.shape)
print(base_data_cat.info())

(20529, 4)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20529 entries, 0 to 20528
Data columns (total 4 columns):
id            20529 non-null object
room          20529 non-null object
filename      20529 non-null object
image_path    20529 non-null object
dtypes: object(4)
memory usage: 641.6+ KB
None


### 3. MANUAL STEP

This is the first step of manual sorting - reviewing each classification folder in turn and removing any image that is wrongly classified or questionable into a holding folder called `uncertain/`.


Go through each folder, and move anything where label is incorrect / innappropriate into
'uncertain' folder.


bathroom - DONE  
bedroom - DONE  
carpark - DONE  
conservatory - DONE  
diningroom - DONE  
NOTE - Lots of ambiguous images in diningroom. Aimed for photos where 'principal focus was an areas for sitting & eating'.  
  
    
empty - DONE   
entrance - DONE - again, quite a lot of ambiguity - and a lot of variability   

front - DONE  
NOTE - not sure on how to handle more 'detail' pics ofr front doors - front or misc-ext. tbd.   

garden - DONE   
ony a few changes   

kitchen - DONE   
quite a few changes - mainly around kitchen / dining room ambiguity ...    


livingroom - DONE   
some changes. Some mislabelling (from bedrooms). Some ambiguity (typically from diningroom, or bedroom)    

misc_ext - DONE   
A lot of mislabelling. Lots of 'graphic', lots of 'gardens' and a few 'carparks'. Also quite a lot of ambiguity on some images vs. 'front'.   Also quite a few misc-int.    

misc_int - DONE   
Again, a lot of mislabelling. Lots of graphics, lots of external. A few mislabelled rooms.    

rear - DONE   
A lot of ambiguity here. Hard to distinguish between garden / misc_ext / carpark and rear. Have tried to focus 'rear' on photos mainly of the rear of the property (commonly from garden).

study - DONE   
Quite a lot of changes - 'study' is open to interpretation. Tend to refine to clear presence of desk & chair (or equivalent) and functioning as a working study-type room.





### 4. WHEN MANUAL SORTING COMPLETE 

Re-catalogue with images in 'uncertain' folder, filter the df down to **just** the uncertatain rooms, and then merge data from the original catalogue to show their **original** room classification.

Then handle a couple of duplication issues resulting (see cell comments).

In [64]:
uncertain_data_cat = nutil.build_catalogue('./data/base_data/')
#uncertain_data_cat.to_csv(os.path.join(out_dir, '090318_uncertain_sort_X.csv'))

In [65]:
# filter down to JUST uncertain rooms
uncertain_data_cat = uncertain_data_cat.loc[uncertain_data_cat['room'] == 'uncertain']
uncertain_data_cat.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 485 entries, 20154 to 20638
Data columns (total 4 columns):
id            485 non-null object
room          485 non-null object
filename      485 non-null object
image_path    485 non-null object
dtypes: object(4)
memory usage: 18.9+ KB


In [87]:
# then merge (on id) with base_data_cat to get 'original' room

uncert_with_base = pd.merge(uncertain_data_cat, base_data_cat, 
                            how='left', on=['id', 'filename'], 
                            suffixes=('_uncertain', '_base'))

uncert_with_base = uncert_with_base[['id', 'room_uncertain', 'room_base',
                                     'filename','image_path_uncertain',
                                     'image_path_base']]

uncert_with_base.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 485 entries, 0 to 484
Data columns (total 6 columns):
id                      485 non-null object
room_uncertain          485 non-null object
room_base               483 non-null object
filename                485 non-null object
image_path_uncertain    485 non-null object
image_path_base         483 non-null object
dtypes: object(6)
memory usage: 26.5+ KB


In [89]:
uncert_with_base.head(12)

Unnamed: 0,id,room_uncertain,room_base,filename,image_path_uncertain,image_path_base
0,0004f3c83558a0033af8c4cb6fc421d50f383ae9,uncertain,study,0004f3c83558a0033af8c4cb6fc421d50f383ae9.jpg,./data/base_data/uncertain/0004f3c83558a0033af...,./data/base_data/study/0004f3c83558a0033af8c4c...
1,00159cfba55720a60b93ad428b147b0c047302f9,uncertain,diningroom,00159cfba55720a60b93ad428b147b0c047302f9.jpg,./data/base_data/uncertain/00159cfba55720a60b9...,./data/base_data/diningroom/00159cfba55720a60b...
2,01723a29323dd1e3eb2686dd8cef4a6bce343859,uncertain,empty,01723a29323dd1e3eb2686dd8cef4a6bce343859.jpg,./data/base_data/uncertain/01723a29323dd1e3eb2...,./data/base_data/empty/01723a29323dd1e3eb2686d...
3,0232125f31767e2638c31b5837a73695f9a7c379,uncertain,study,0232125f31767e2638c31b5837a73695f9a7c379.jpg,./data/base_data/uncertain/0232125f31767e2638c...,./data/base_data/study/0232125f31767e2638c31b5...
4,02651bc091644ade28bf797241506e5fd3004449,uncertain,livingroom,02651bc091644ade28bf797241506e5fd3004449.jpg,./data/base_data/uncertain/02651bc091644ade28b...,./data/base_data/livingroom/02651bc091644ade28...
5,0274daeec5b54cfff4923b255071f37afe9468f9,uncertain,study,0274daeec5b54cfff4923b255071f37afe9468f9.jpg,./data/base_data/uncertain/0274daeec5b54cfff49...,./data/base_data/study/0274daeec5b54cfff4923b2...
6,02a55bf3e3f81df999b6afde2c1cecdebeaca619,uncertain,front,02a55bf3e3f81df999b6afde2c1cecdebeaca619.jpg,./data/base_data/uncertain/02a55bf3e3f81df999b...,./data/base_data/front/02a55bf3e3f81df999b6afd...
7,042b853e3d4a362a21870c13982bc181b63e11c9,uncertain,kitchen,042b853e3d4a362a21870c13982bc181b63e11c9.jpg,./data/base_data/uncertain/042b853e3d4a362a218...,./data/base_data/kitchen/042b853e3d4a362a21870...
8,044f5d4ddc6cb1db54aabc970bd686f4409b38b9,uncertain,kitchen,044f5d4ddc6cb1db54aabc970bd686f4409b38b9.jpg,./data/base_data/uncertain/044f5d4ddc6cb1db54a...,./data/base_data/kitchen/044f5d4ddc6cb1db54aab...
9,04a85d9ef447bbe372b7559d09ae52c3f0dce099,uncertain,empty,04a85d9ef447bbe372b7559d09ae52c3f0dce099.jpg,./data/base_data/uncertain/04a85d9ef447bbe372b...,./data/base_data/empty/04a85d9ef447bbe372b7559...


In [90]:
#uncert_with_base.to_csv(os.path.join(out_dir, 'uncert_with_base_x.csv'))

In [92]:
# find the missing 'room_base' data
uncert_with_base.loc[uncert_with_base['room_base'].isnull()]

Unnamed: 0,id,room_uncertain,room_base,filename,image_path_uncertain,image_path_base
439,09199e65fe2dd5c14dc1cc225e9fb602315dd751 (1),uncertain,,09199e65fe2dd5c14dc1cc225e9fb602315dd751 (1).jpg,./data/base_data/uncertain/09199e65fe2dd5c14dc...,
480,2a20626c9830eb9929c3497f399487e702b181b9 (1),uncertain,,2a20626c9830eb9929c3497f399487e702b181b9 (1).jpg,./data/base_data/uncertain/2a20626c9830eb9929c...,


In [98]:
# on manual check - BOTH files are in ./uncertain with both xxxx.jpg AND xxxx (1).jpg
# versions of the name
# they are duplicates - we can delete the xxxx (1).jpg files


idx_to_delete = []   # was [439,480]
path_to_delete = []
for img in idx_to_delete:
    path_to_delete.append(uncert_with_base.iloc[img,4])

for file in path_to_delete:
    try: 
        os.remove(file)
        print('deleted file {}'.format(file))
    except OSError:
        pass

In [99]:
# Now recatalogue, and remerge...
uncertain_data_cat = nutil.build_catalogue('./data/base_data/')
uncertain_data_cat = uncertain_data_cat.loc[uncertain_data_cat['room'] == 'uncertain']
uncertain_data_cat.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 483 entries, 20154 to 20636
Data columns (total 4 columns):
id            483 non-null object
room          483 non-null object
filename      483 non-null object
image_path    483 non-null object
dtypes: object(4)
memory usage: 18.9+ KB


In [100]:
uncert_with_base = pd.merge(uncertain_data_cat, base_data_cat, 
                            how='left', on=['id', 'filename'], 
                            suffixes=('_uncertain', '_base'))

uncert_with_base = uncert_with_base[['id', 'room_uncertain', 'room_base',
                                     'filename','image_path_uncertain',
                                     'image_path_base']]

uncert_with_base.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 483 entries, 0 to 482
Data columns (total 6 columns):
id                      483 non-null object
room_uncertain          483 non-null object
room_base               483 non-null object
filename                483 non-null object
image_path_uncertain    483 non-null object
image_path_base         483 non-null object
dtypes: object(6)
memory usage: 26.4+ KB


In [101]:
#uncert_with_base.to_csv(os.path.join(out_dir, 'uncert_with_base_x.csv'))



### 6. After re-classifying 'uncertain' images - catalogue again


After a pause (time to sleep on it), go to the `uncertain/` folder and reclassify all the images by placing them into the **correct** classification folder.

The catalogue again, and merge with the `uncertain` df to have a catalogue of all uncertain and reclassified images, and their 'before' and 'after' reclassification labels

In [102]:
reclassified_data_cat = nutil.build_catalogue('./data/base_data/')
#reclassified_data_cat.to_csv(os.path.join(out_dir, '120318_reclassified_x.csv'))

In [103]:
reclassified_data_cat

Unnamed: 0,id,room,filename,image_path
0,0000cb639c369a1d5b333d8c804cbfb628c31289,bathroom,0000cb639c369a1d5b333d8c804cbfb628c31289.jpg,./data/base_data/bathroom/0000cb639c369a1d5b33...
1,000733430218ceea3108c989541c57fac2ba0ec9,bathroom,000733430218ceea3108c989541c57fac2ba0ec9.jpg,./data/base_data/bathroom/000733430218ceea3108...
2,00098584f6a4e17fd099b93c13788606e398a989,bathroom,00098584f6a4e17fd099b93c13788606e398a989.jpg,./data/base_data/bathroom/00098584f6a4e17fd099...
3,000edba64f83bce23027d2a4c213d354bb970439,bathroom,000edba64f83bce23027d2a4c213d354bb970439.jpg,./data/base_data/bathroom/000edba64f83bce23027...
4,002154511c2145b7a1f28e680ef70b93c71f3089,bathroom,002154511c2145b7a1f28e680ef70b93c71f3089.jpg,./data/base_data/bathroom/002154511c2145b7a1f2...
5,0024d3b7ded6abdcd8d2c9a91a963227dd47e4f9,bathroom,0024d3b7ded6abdcd8d2c9a91a963227dd47e4f9.jpg,./data/base_data/bathroom/0024d3b7ded6abdcd8d2...
6,003457d1fb62155b4531aba3e5b39f4d57bed9f9,bathroom,003457d1fb62155b4531aba3e5b39f4d57bed9f9.jpg,./data/base_data/bathroom/003457d1fb62155b4531...
7,0035742dfac3284819b412eb277cce498a52da79,bathroom,0035742dfac3284819b412eb277cce498a52da79.jpg,./data/base_data/bathroom/0035742dfac3284819b4...
8,0038cad4553a9922d24a70a4d77abb9d2c1261f9,bathroom,0038cad4553a9922d24a70a4d77abb9d2c1261f9.jpg,./data/base_data/bathroom/0038cad4553a9922d24a...
9,0039e85363197bca4b7b7188fc97e962894a13c9,bathroom,0039e85363197bca4b7b7188fc97e962894a13c9.jpg,./data/base_data/bathroom/0039e85363197bca4b7b...


In [131]:
# merge with uncertain to have records of all reclassified images.

reclass_with_base_and_uncertain = pd.merge(uncert_with_base, reclassified_data_cat,
                                          how='left', on=['id', 'filename'],
                                          suffixes=('_','_reclass'))

reclass_with_base_and_uncertain.rename(columns={'room':'room_reclass', 'image_path':'image_path_reclass'}, inplace=True)

reclass_with_base_and_uncertain = reclass_with_base_and_uncertain[['id', 'room_reclass','room_base',
                                                                    'room_uncertain','filename', 
                                                                  'image_path_reclass', 'image_path_base',
                                                                  'image_path_uncertain']]

In [132]:
#reclass_with_base_and_uncertain.to_csv(os.path.join(out_dir, '120318_reclass_and_base_x.csv'))

In [133]:
reclass_with_base_and_uncertain.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 483 entries, 0 to 482
Data columns (total 8 columns):
id                      483 non-null object
room_reclass            483 non-null object
room_base               483 non-null object
room_uncertain          483 non-null object
filename                483 non-null object
image_path_reclass      483 non-null object
image_path_base         483 non-null object
image_path_uncertain    483 non-null object
dtypes: object(8)
memory usage: 34.0+ KB


There are 483 images that were classed as 'uncertain' and then reclassified.

In [134]:
reclass_with_base_and_uncertain.columns

Index(['id', 'room_reclass', 'room_base', 'room_uncertain', 'filename',
       'image_path_reclass', 'image_path_base', 'image_path_uncertain'],
      dtype='object')

In [135]:
reclass_with_base_and_uncertain.loc[reclass_with_base_and_uncertain['room_reclass'] == 
                                    reclass_with_base_and_uncertain['room_base']].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 24 entries, 17 to 436
Data columns (total 8 columns):
id                      24 non-null object
room_reclass            24 non-null object
room_base               24 non-null object
room_uncertain          24 non-null object
filename                24 non-null object
image_path_reclass      24 non-null object
image_path_base         24 non-null object
image_path_uncertain    24 non-null object
dtypes: object(8)
memory usage: 1.7+ KB


24 of the 'uncertain' images were reclassified the same as their original classification, the rest were changed to a different classification.

In [136]:
reclass_with_base_and_uncertain.loc[reclass_with_base_and_uncertain['room_reclass'] == 'exclude']

Unnamed: 0,id,room_reclass,room_base,room_uncertain,filename,image_path_reclass,image_path_base,image_path_uncertain
31,0536c9b06e3cbdc9d774452b0cfe9b93c16862cd,exclude,study,uncertain,0536c9b06e3cbdc9d774452b0cfe9b93c16862cd.jpg,./data/base_data/exclude/0536c9b06e3cbdc9d7744...,./data/base_data/study/0536c9b06e3cbdc9d774452...,./data/base_data/uncertain/0536c9b06e3cbdc9d77...
39,0551c5c22af07e8415df25737180a04e09174bf5,exclude,misc_int,uncertain,0551c5c22af07e8415df25737180a04e09174bf5.jpg,./data/base_data/exclude/0551c5c22af07e8415df2...,./data/base_data/misc_int/0551c5c22af07e8415df...,./data/base_data/uncertain/0551c5c22af07e8415d...
52,056db3637dcca6dd50591c5918ee09d99411037b,exclude,misc_int,uncertain,056db3637dcca6dd50591c5918ee09d99411037b.jpg,./data/base_data/exclude/056db3637dcca6dd50591...,./data/base_data/misc_int/056db3637dcca6dd5059...,./data/base_data/uncertain/056db3637dcca6dd505...
73,05976d9d1875b56e2e1e35b0dbf261ac88f758e7,exclude,misc_int,uncertain,05976d9d1875b56e2e1e35b0dbf261ac88f758e7.jpg,./data/base_data/exclude/05976d9d1875b56e2e1e3...,./data/base_data/misc_int/05976d9d1875b56e2e1e...,./data/base_data/uncertain/05976d9d1875b56e2e1...
74,0597b8d45b7715cf78ce068a79c632dfabc3d5f5,exclude,misc_int,uncertain,0597b8d45b7715cf78ce068a79c632dfabc3d5f5.jpg,./data/base_data/exclude/0597b8d45b7715cf78ce0...,./data/base_data/misc_int/0597b8d45b7715cf78ce...,./data/base_data/uncertain/0597b8d45b7715cf78c...
91,05cc91249666961dddac850be5a4c68bacd586bc,exclude,misc_ext,uncertain,05cc91249666961dddac850be5a4c68bacd586bc.jpg,./data/base_data/exclude/05cc91249666961dddac8...,./data/base_data/misc_ext/05cc91249666961dddac...,./data/base_data/uncertain/05cc91249666961ddda...
107,060598fb5174ff14cd8acd3bb7dc394070667f67,exclude,misc_ext,uncertain,060598fb5174ff14cd8acd3bb7dc394070667f67.jpg,./data/base_data/exclude/060598fb5174ff14cd8ac...,./data/base_data/misc_ext/060598fb5174ff14cd8a...,./data/base_data/uncertain/060598fb5174ff14cd8...
128,063cdec519be5b7fcfaeedec6c20e9973d9ef512,exclude,misc_ext,uncertain,063cdec519be5b7fcfaeedec6c20e9973d9ef512.jpg,./data/base_data/exclude/063cdec519be5b7fcfaee...,./data/base_data/misc_ext/063cdec519be5b7fcfae...,./data/base_data/uncertain/063cdec519be5b7fcfa...
161,06a0e94c11f29e6281aefeb35aab42b28dc617af,exclude,misc_int,uncertain,06a0e94c11f29e6281aefeb35aab42b28dc617af.jpg,./data/base_data/exclude/06a0e94c11f29e6281aef...,./data/base_data/misc_int/06a0e94c11f29e6281ae...,./data/base_data/uncertain/06a0e94c11f29e6281a...
211,06f8f4294a8b14cc97107be282542d4c5edbd750,exclude,diningroom,uncertain,06f8f4294a8b14cc97107be282542d4c5edbd750.jpg,./data/base_data/exclude/06f8f4294a8b14cc97107...,./data/base_data/diningroom/06f8f4294a8b14cc97...,./data/base_data/uncertain/06f8f4294a8b14cc971...


And some of the images were chosen for exclusion and placed in an `exclude` folder. We wil look at these in a separate workbook.