Skip to content

Latest commit

 

History

History
58 lines (44 loc) · 2.62 KB

How-to-properly-set-up-Imagenet-Dataset.md

File metadata and controls

58 lines (44 loc) · 2.62 KB

#How to properly set up Imagenet Dataset for training in Caffe !

Assuming that you have the training and testing compressed files downloaded, let's move ahead and untar them properly. Make folders for training, testing and validation images. Make these where you want to store the dataset. If you are making these directories in a seperate partition, make sure that it has enough space to accomodate the files.

mkdir path/to/new/train/images/folder
mkdir path/to/new/test/images/folder
mkdir path/to/new/val/images/folder

Once you have the directories ready, we will start extracting the files. Please note that extracting the ILSVRC2012_img_test.tar and ILSVRC2012_img_val.tar will give you images with .JPEG extension. However, extracting ILSVRC2012_img_train.tar and ILSVRC2012_img_train_t3.tar files will give you many more .tar files which you can extract according to your needs.

To extract the tar files, simply do these steps:

tar -xf ILSVRC2012_img_test.tar -C path/to/new/test/images/folder
tar -xf ILSVRC2012_img_val.tar -C path/to/new/val/images/folder
tar -xf ILSVRC2012_img_train.tar -C path/to/new/train/images/folder
tar -xf ILSVRC2012_img_train_t3.tar -C path/to/new/train_t3/images/folder

If you want to see what files are being extracted, use the -v flag along with -xf. So that would be tar -xvf file.tar -C <destination folder>.

Once you are done with the extraction, let us now discuss about the training images. Each .tar files in the training images folder corresponds to different set of images. You can train your network with any number of images from any number of these subsets. Here, however, I will explain how to extract every tar files in the train folder so that we can work with the entire dataset.

To extract all the .tar files in train folder, make a script file inside the train folder. The following script will extract all the .tar files into respective directories and will remove the .tar files after extraction completes. If you do not want to remove the compressed files, please comment the appropriate line of code.

Make sure you are in the train folder.

cd path/to/new/train/images/folder

Copy the following and paste it into a file (ex. tar_extract_script.sh).

#!/bin/bash
for f in *.tar; do
  d=`basename $f .tar`
  mkdir $d
  (cd $d && tar xf ../$f)
done
rm -r *.tar

Give it executable permissions.

chmod 700 tar_extract_script.sh

Run the script.

./tar_extract_script.sh

It will take a long time to extract all the files. If you want to see which files are being extracted, give the -v flag in the script.

Lets do some deep learning now!