# ImageNet Validation Sort

The ImageNet ILSVRC2012 validation dataset does not come in the form of class organization.  Instead, it is a single directory, with a text file in the development toolkit detailing the class for each image by filename.

This notebook copies the original data into a class-based folder hierarchy.  

NOTE: IN THIS VERSION OF THE NOTEBOOK, WE MAKE A DIRECTORY HIERARCHY ACCORDING TO THE RESNET CLASS LABELS.

In [1]:
import numpy as np
import pandas as pd
from shutil import copy
import os

In [2]:
# Pull in class information from text file: 
filepath = './ImageNet/ILSVRC2012_validation_ground_truth.txt'

# Extract entire text file as a list:
with open(filepath) as fp:
    classes = fp.readlines()
# print(classes)
classes = [x[:-1] for x in classes]
# print(classes)

# Pull in class information from key table:
df_key = pd.read_csv('./ImageNet/imagenet_resnet_key.csv')
display(df_key)

Unnamed: 0.1,Unnamed: 0,val_id,train_id,name,description,resnet_id,resnet_desc,resnet_class
0,0,1,n02119789,"kit fox, Vulpes macrotis",small grey fox of southwestern United States; ...,n02119789,kit_fox,279
1,1,2,n02100735,English setter,an English breed having a plumed tail and a so...,n02100735,English_setter,213
2,2,3,n02110185,Siberian husky,breed of sled dog developed in northeastern Si...,n02110185,Siberian_husky,251
3,3,4,n02096294,Australian terrier,small greyish wire-haired breed of terrier fro...,n02096294,Australian_terrier,194
4,4,5,n02102040,"English springer, English springer spaniel",a breed having typically a black-and-white coat,n02102040,English_springer,218
...,...,...,...,...,...,...,...,...
995,995,996,n03063599,coffee mug,a mug intended for serving coffee,n03063599,coffee_mug,505
996,996,997,n04116512,"rubber eraser, rubber, pencil eraser",an eraser made of rubber (or of a synthetic ma...,n04116512,rubber_eraser,768
997,997,998,n04325704,stole,a wide scarf worn about their shoulders by women,n04325704,stole,825
998,998,999,n07831146,carbonara,sauce for pasta; contains eggs and bacon or ha...,n07831146,carbonara,960


In [3]:
# NOTE: ONLY RUN ONCE!

# Generate target directories:
cwd = os.getcwd()
cwd += '/ImageNet/organized_validation_resnet/'

# We have a folder for each class:
for i in range(1, 1001):
    os.makedirs(cwd + str(i) + '/')

In [4]:
files = os.listdir('./ImageNet/ILSVRC2012_img_val')  # Get all the files in that directory
files.sort()
print(len(files))

50000


In [5]:
# Make cwd (again to be safe) and src paths:
cwd = os.getcwd()
src = cwd + '/ImageNet/ILSVRC2012_img_val/'
cwd += '/ImageNet/organized_validation_resnet/'


# Loop through sorted files and place in directory indicated by classes list:
for idx, file in enumerate(files):
#     print(df_key[df_key['val_id'] == int(classes[idx])]['resnet_class'].iloc[0])
    dest = cwd + str(df_key[df_key['val_id'] == int(classes[idx])]['resnet_class'].iloc[0]) + '/'
#     print(src + file)
#     print(dest + file)
    copy(src + file, dest + file)
