# TFRecord maker: from `.jpg` to `.tfrecord`

This notebook presents how to convert the image data from `.jpg` to `.tfrecord`. The main reference is [KWOT SIN's blog](https://kwotsin.github.io/tech/2017/01/29/tfrecords.html)

## Arranging data according to classes

In order to create TFRecord files, the dataset has to be arranged into subdirectories, where each one represents the class. In our case, the dataset is already arranged in this way.
```
drivers_data
└── train [22424 images]
    ├── c0 安全驾驶 [2489 images]
    ├── c1 右手手机打字 [2267 images]
    ├── c2 右手打电话 [2317 images]
    ├── c3 左手手机打字 [2346 images]
    ├── c4 左手打电话 [2326 images]
    ├── c5 调收音机 [2312 images]
    ├── c6 喝水 [2325 images]
    ├── c7 拿后面的东西 [2002 images]
    ├── c8 整理头发和化妆 [1911 images]
    └── c9 和乘客交谈 [2129 images]

```

## Writing `.tfrecord` file

TF-slim provides several useful functions that will help us to creat TFRecord file without dig into the low-level TensorFlow. KWOT SIN also compiles necessary functions into `dataset_utils.py` file. 

First import required modules.

In [1]:
import random
import tensorflow as tf
from datasets.dataset_utils import _dataset_exists, _get_filenames_and_classes, write_label_file, _convert_dataset

And required arguments:

In [2]:
flags = tf.app.flags

#State your dataset directory
flags.DEFINE_string('dataset_dir', 'drivers_data', 'String: Your dataset directory')

# Proportion of dataset to be used for evaluation
flags.DEFINE_float('validation_size', 0.3, 'Float: The proportion of examples in the dataset to be used for validation')

# The number of shards to split the dataset into.
flags.DEFINE_integer('num_shards', 5, 'Int: Number of shards to split the TFRecord files into')

# Seed for repeatability.
flags.DEFINE_integer('random_seed', 42, 'Int: Random seed to use for repeatability.')

#Output filename for the naming the TFRecord file
flags.DEFINE_string('tfrecord_filename', 'drivers', 'String: The output filename to name your TFRecord file')

FLAGS = flags.FLAGS

Check if the TFRecord files exist.

In [None]:
#=============CHECKS==============
#Check if there is a tfrecord_filename entered
if not FLAGS.tfrecord_filename:
    raise ValueError('tfrecord_filename is empty. Please state a tfrecord_filename argument.')

#Check if there is a dataset directory entered
if not FLAGS.dataset_dir:
    raise ValueError('dataset_dir is empty. Please state a dataset_dir argument.')

#If the TFRecord files already exist in the directory, then exit without creating the files again
if _dataset_exists(dataset_dir = FLAGS.dataset_dir, _NUM_SHARDS = FLAGS.num_shards, output_filename = FLAGS.tfrecord_filename):
    print 'Dataset files already exist. Exiting without re-creating them.'
    return None
#==========END OF CHECKS============

Write the `.tfrecord` files.

get image filenames

In [3]:
photo_filenames, class_names = _get_filenames_and_classes(FLAGS.dataset_dir)  
class_names_to_ids = dict(zip(class_names, range(len(class_names))))

In [4]:
#Find the number of validation examples we need
num_validation = int(FLAGS.validation_size * len(photo_filenames))

# Divide the training datasets into train and test:
random.seed(FLAGS.random_seed)
random.shuffle(photo_filenames)
training_filenames = photo_filenames[num_validation:]
validation_filenames = photo_filenames[:num_validation]

In [5]:
# First, convert the training and validation sets.
_convert_dataset('train', training_filenames, class_names_to_ids,
                 dataset_dir = FLAGS.dataset_dir,
                 tfrecord_filename = FLAGS.tfrecord_filename,
                 _NUM_SHARDS = FLAGS.num_shards)
_convert_dataset('validation', validation_filenames, class_names_to_ids,
                 dataset_dir = FLAGS.dataset_dir,
                 tfrecord_filename = FLAGS.tfrecord_filename,
                 _NUM_SHARDS = FLAGS.num_shards)

>> Converting image 15697/15697 shard 4
>> Converting image 6727/6727 shard 4


In [6]:
# Finally, write the labels file:
labels_to_class_names = dict(zip(range(len(class_names)), class_names))
write_label_file(labels_to_class_names, FLAGS.dataset_dir)