# Creating Datasets #
## ImageConverter Method ##

In [None]:
! pip install --upgrade tensorflow-datasets > /dev/null

import io, os, csv
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

We are going to convert an image dataset to TFRecords using the [DatasetBuilder](https://www.tensorflow.org/datasets/api_docs/python/tfds/core/DatasetBuilder) utility in [TensorFlow Datasets](https://www.tensorflow.org/datasets) (`tfds`). TFDS is a high-level wrapper around `tf.data`.

First we'll fetch some extensions to TFDS. For more info, see [this Colab notebook](https://github.com/tensorflow/tpu/blob/master/tools/colab/image_classification_converter.ipynb).

In [None]:
os.chdir('/kaggle/working/')
! git clone -b image_classification_converter https://github.com/tensorflow/tpu.git
# os.chdir('/home/jovyan/work/kaggle/image_converter/tools/data_converter')
os.chdir('/kaggle/working/tpu/tools/data_converter/')
from image_classification.image_classification_data import ImageClassificationBuilder
from image_classification.image_classification_data import ImageClassificationConfig
os.chdir('/kaggle/working/')

With `ImageClassificationBuilder` we write a generator that generates tuples `(image_file, label)` according to the layout specified in `ImageClassificationConfig`. We specify a layout with three fields:
- `num_labels` - the number of labels in the dataset
- `supported_modes` - a list *(tuple?)** containing one or more of `'train'`, `'validation'`, `'test'`.
- `example_generator` - a generator that returns the set of image examples for a given mode

### Configuration ###

Let's first set up our environment.

In [None]:
# Where are we?
if os.getenv('PWD') == '/kaggle/working':
    HOME_PATH = '/kaggle/'
else:
    HOME_PATH = '/home/jovyan/work/kaggle/computer-vision/' # your local project directory here

TFRECORD_PATH = HOME_PATH + 'working/'
ROOT_PATH = HOME_PATH + 'input/stanford-car-dataset-by-classes-folder/car_data/car_data/'

And now we'll define the layout configuration.

In [None]:
with open(HOME_PATH + 'input/stanford-car-dataset-by-classes-folder/names.csv') as csvfile:
    LABELS = [r[0] for r in csv.reader(csvfile)]
# CLASSES.sort()
NUM_LABELS = len(LABELS)

class StanfordCarsConfig(ImageClassificationConfig):
    def __init__(self, root_path, *args, **kwargs):
        super(StanfordCarsConfig, self).__init__(
            version=tfds.core.Version('0.1.0'),
            supported_versions=[],
            **kwargs)
        self.root_path = root_path

    @property
    def supported_modes(self):
        return ('train', 'test')

    @property
    def num_labels(self):
        return NUM_LABELS

    def example_generator(self, mode):
        data_path = self.root_path
        mode_path = os.path.join(data_path, mode)

        for class_name in os.listdir(mode_path):
            class_dir = os.path.join(mode_path, class_name)
            for img_path in os.listdir(class_dir):
                abs_path = os.path.abspath(os.path.join(class_dir, img_path))
                yield {
                    'image_fobj': tf.io.gfile.GFile(abs_path, 'rb'),
                    'label': class_name,
                }
                

### Build ###

In [None]:
config = StanfordCarsConfig(name='stanford-cars-tfrecords',
                            description='The Stanford Cars dataset',
                            root_path=ROOT_PATH)
dataset = ImageClassificationBuilder(data_dir=TFRECORD_PATH,
                                     config=config)
dataset.download_and_prepare()

In [None]:
dataset.info

### Inspect ###

In [None]:
def show_image_from_bytes(binary_image):
  im = Image.open(io.BytesIO(binary_image)).convert("RGB")
  plt.imshow(im)

def visualize_tfrecord(tf_record_path):
    tf_raw = next(tf.data.TFRecordDataset(tf_record_path).__iter__()).numpy()
    tf_example = tf.train.Example()
    tf_example.ParseFromString(tf_raw)
    print(repr(tf_example)[:1000])
    show_image_from_bytes(tf_example.features.feature['image/encoded'].bytes_list.value[0])
  
tf_record_path = os.path.join(
    TFRECORD_PATH,
    'image_classification_builder',
    'stanford-cars-tfrecords',
    '0.1.0',
    'image_classification_builder-train.tfrecord-00002-of-00016')
visualize_tfrecord(tf_record_path)

### Clean Up ###

In [None]:
!rm -rf /kaggle/working/downloads/
!rm -rf /kaggle/working/tpu/

## Dataset Builder Method ##

We need to understand the directory structure that TFDS uses.

In [None]:
! pip install --upgrade tensorflow-datasets > /dev/null

In [None]:
import os
import tensorflow as tf
import tensorflow_datasets as tfds

In [None]:
class StanfordCars(tfds.image.Cars196):
    """The Stanford Cars Dataset."""
    VERSION = tfds.core.Version('0.1.0')
    def __init__(self, **kwargs):
        super(StanfordCars, self).__init__(**kwargs)

In [None]:
_ROOT_DIR = '/home/jovyan/work/kaggle/computer-vision/'
_DATA_DIR = os.path.join(_ROOT_DIR, 'working/')
_INPUT_DIR = os.path.join(_ROOT_DIR, 'input/')
_MANUAL_DIR = os.path.join(_INPUT_DIR, 'stanford-cars-dataset/')

builder = tfds.builder(name = 'stanford_cars',
                       data_dir = _DATA_DIR)
download_config = tfds.download.DownloadConfig(extract_dir = _DATA_DIR,
                                               manual_dir = _MANUAL_DIR)
builder.download_and_prepare(download_config=download_config)

Let's look.

In [None]:
ds = builder.as_dataset(split='train')
fig = tfds.show_examples(ds_info=builder.info, ds=ds,
                         rows=3, cols=3, plot_scale=4.0)

In [None]:
for ex in ds.take(3):
    print(ex['bbox'])

# Training Models with TFDS Datasets #

Previously, we converted the dataset into TFRecords form, ready to be ingested by the TF model. We create a Kaggle dataset from the produced files and reload. Now the data lives in the `../input` directory.

## Without Data Augmentation ##
### Pipeline ###

Let's try training a simple model without accelleration first.

In [None]:
! pip install --upgrade tensorflow-datasets > /dev/null

In [None]:
import os
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_datasets as tfds

In [None]:
class StanfordCars(tfds.image.Cars196):
    """The Stanford Cars Dataset."""
    VERSION = tfds.core.Version('0.1.0')
    def __init__(self, **kwargs):
        super(StanfordCars, self).__init__(**kwargs)

ROOT_DIR = '/kaggle/'
DATA_DIR = os.path.join(ROOT_DIR, 'input/stanford-cars-for-learn/')

ds, ds_info = tfds.load('stanford_cars', split='train', with_info=True, 
                        download=False, data_dir=DATA_DIR)

In [None]:
fig = tfds.show_examples(ds_info=ds_info, ds=ds)

In [None]:
(ds_train, ds_validation), ds_info = tfds.load('stanford_cars',
                                               download=False,
                                               data_dir=DATA_DIR,
                                               split=['train', 'test'],
                                               as_supervised=True,
                                               shuffle_files=True,
                                               with_info=True)

AUTOTUNE = tf.data.experimental.AUTOTUNE
BATCH_SIZE = 64
SIZE = 64
NUM_EXAMPLES = ds_info.splits['train'].num_examples

def preprocessor(image, label):
    dim = tf.constant([SIZE, SIZE], dtype=tf.dtypes.int32)
    return tf.image.resize(image, dim), label

ds_train = (
    ds_train.map(preprocessor, AUTOTUNE)
    .cache()
    .shuffle(NUM_EXAMPLES//4)
    .batch(BATCH_SIZE)
    .prefetch(AUTOTUNE)
)

ds_validation = (
    ds_validation.map(preprocessor, AUTOTUNE)
    .batch(BATCH_SIZE)
    .cache()
    .prefetch(AUTOTUNE)
)

### Model ###

In [None]:
NUM_LABELS = 196
EPOCHS = 30

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=[SIZE, SIZE, 3]),
    tf.keras.layers.Dense(4096, activation='relu'),
    tf.keras.layers.Dense(4096, activation='relu'),
    tf.keras.layers.Dense(NUM_LABELS)
])
model.compile(
    loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer='adam',
    metrics=['accuracy']
)

model.fit(
    ds_train,
    epochs=EPOCHS,
    validation_data=ds_validation
)

## With Data Augmentation ##

Now let's add online data augmentation and use TPU accelleration.

In [None]:
! pip install --upgrade tensorflow-datasets > /dev/null

In [None]:
import os
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_datasets as tfds
from kaggle_datasets import KaggleDatasets

In [None]:
class StanfordCars(tfds.image.Cars196):
    """The Stanford Cars Dataset."""
    VERSION = tfds.core.Version('0.1.0')
    def __init__(self, **kwargs):
        super(StanfordCars, self).__init__(**kwargs)

DATA_DIR = KaggleDatasets().get_gcs_path()
ds, ds_info = tfds.load('stanford_cars', split='train', with_info=True, 
                        download=False, data_dir=DATA_DIR)

In [None]:
fig = tfds.show_examples(ds_info=ds_info, ds=ds)

### Define Preprocessor and Augmentor ###

Random transformations should be done separately from deterministic transformations. Deterministic transformations should be cached. Random transformations should not be cached. Random transformations should also be applied after batching so that they may be vectorized.

In [None]:
def make_preprocessor(size):
    def preprocessor(image, label):
        # Convert Int to Float and scale from [0, 255] to [0.0, 1.0]
        image = tf.image.convert_image_dtype(image,
                                             dtype=tf.float32)
        # Resize the image to size=[width, height]
        image = tf.image.resize(image,
                                size=size,
                                method="bicubic",
                                preserve_aspect_ratio=True)
        return image, label
    return preprocessor

def make_augmentor(# rotation_range=0,
                   # width_shift_range=0,
                   # height_shift_range=0,
                   brightness_delta=None,
                   contrast_range=None,
                   hue_delta=None,
                   saturation_range=None,
                   # shear_range=0.0,
                   # zoom_range=0.0,
                   # channel_shift_range=0.0,
                   # fill_mode='nearest',
                   horizontal_flip=False,
                   vertical_flip=False,
                   seed = 31415):
    def augmentor(image, label):
        if brightness_delta is not None:
            image = tf.image.random_brightness(image,
                                               max_delta=brightness_delta, seed=seed)
        if contrast_range is not None:
            image = tf.image.random_contrast(image,
                                             lower=contrast_delta[0],
                                             upper=contrast_delta[1])
        if hue_delta is not None:
            image = tf.image.random_hue(image, 
                                        max_delta=hue_delta, seed=seed)
        if saturation_range is not None:
            image = tf.image.random_saturation(image,
                                               lower=saturation_range[0],
                                               upper=saturation_range[1], seed=seed)
        if horizontal_flip:
            image = tf.image.random_flip_left_right(image, seed=seed)

        if vertical_flip:
            image = tf.image.random_flip_up_down(image, seed=seed)

        return image, label
    return augmentor

Let's try it out.

In [None]:
try_preprocessor = make_preprocessor(size=[192, 192])

try_augmentor = make_augmentor(hue_delta=0.25,
                               saturation_range=[0.1, 3.0],
                               horizontal_flip=True)

rows = 4; cols = 4
ds = tfds.load('stanford_cars', split='train',
               download=False, data_dir=DATA_DIR,
               as_supervised=True)
examples = list(tfds.as_numpy(ds.take(rows * cols)))

plt.figure(figsize=(15, int(15 * rows / cols)))
for i, (image, label) in enumerate(examples):
    image, _ = try_preprocessor(image, label)
    plt.subplot(rows, cols, i+1)
    plt.axis('off')
    plt.imshow(image)
plt.show()

### Build Model ###

In [None]:
# Detect hardware, return appropriate distribution strategy
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection. No parameters necessary if TPU_NAME environment variable is set. On Kaggle this is always the case.
    print('Running on TPU ', tpu.master())
except ValueError:
    tpu = None

if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
    strategy = tf.distribute.get_strategy() # default distribution strategy in Tensorflow. Works on CPU and single GPU.

print("REPLICAS: ", strategy.num_replicas_in_sync)

In [None]:
(ds_train, ds_validation), ds_info = tfds.load('stanford_cars',
                                               download=False,
                                               data_dir=DATA_DIR,
                                               split=['train', 'test'],
                                               as_supervised=True,
                                               shuffle_files=True,
                                               with_info=True)

NUM_LABELS = 196
SHUFFLE_BUFFER = ds_info.splits['train'].num_examples // 4
BATCH_SIZE = 16 * strategy.num_replicas_in_sync
EPOCHS = 10
SIZE = 192
AUTOTUNE = tf.data.experimental.AUTOTUNE


preprocess = make_preprocessor(size=[SIZE, SIZE])

augment = make_augmentor(brightness_delta=0.2,
                         horizontal_flip=True)

train_batches = (
    ds_train
    .map(preprocess, num_parallel_calls=AUTOTUNE)
    .cache()
    .repeat() # why is this needed?
    .shuffle(SHUFFLE_BUFFER)
    .batch(BATCH_SIZE)
    .map(augment, num_parallel_calls=AUTOTUNE)
    .prefetch(AUTOTUNE)
)

validation_batches = (
    ds_validation
    .map(preprocess, num_parallel_calls=AUTOTUNE)
    .batch(BATCH_SIZE)
    .cache()
    .prefetch(AUTOTUNE)
)


In [None]:
with strategy.scope():
    model = tf.keras.models.Sequential([
        tf.keras.applications.ResNet50(weights='imagenet',
                                       include_top=False,
                                       input_shape=[SIZE, SIZE, 3])
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Dense(NUM_LABELS, activation='softmax')
    ])
model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

model.summary()

In [None]:
history = model.fit(
    train_batches,
    epochs=EPOCHS,
    validation_data=validation_batches
)

In [None]:
import tensorflow_docs as tfdocs

plotter = tfdocs.plots.HistoryPlotter()
plotter.plot({"Augmented": history}, metric = "accuracy")
plt.title("Accuracy")
plt.ylim([0.0,1.0])

## tf.data.Dataset ##

- Distributed training strategies (tf.distribute)
  - Data parallelism
  - Model parallelism

Distributed training in Tensorflow occurs via *data parallelism*: the data is split up among identical copies of the model. These pieces are called **shards**. The number of shards should ideally be a multiple of the number of (workers, devices, ?), so that no (worker, device, ?) becomes idle.

- TPUs
  - 8 cores
  - generally, data is arranged in memory either as 8x128 or as 128x128; to minimize the cost of padding the data to these dimenions:
    - arrange batch sizes to multiples of 128
    - arrange feature dimensions to multiples of either 8 or 128 (spatial dimensions are not padded)

- **Rule of Thumb:** 128 (ideal) data items per core (targeting the matrix unit)

### TFRecords and Examples ###

#### Examples ####

In [None]:
FValue :: tf.train.BytesList | tf.train.FloatList | tf.train.Int64List

tf.train.Feature :: feature {key: String,
                             value: FValue}

tf.train.Example :: {feature: tf.train.Features}

TFRecord :: 

#### Tabular Data as Examples ####

| TFRecord  | "Feature 1" | "Feature 2" | "Feature 3" |
|           | (BytesList) | (FloatList) | (Int64List) |
|-----------|-------------|-------------|-------------|
| Example 1 | value       | value       | value       |
| Example 2 |             |             |             |
| Example 3 |             |             |             |


In [None]:
# Example 1
features {
    feature {
        "Feature 1": value (BytesList)
    }
    feature {
        "Feature 2": value (FloatList)
    }
    feature {
        "Feature 3": value (Int64List)
    }
}


#### TFRecords ####

A TFRecord is a file containing a stream of serialized records.

`tf.Example` is a way of serializing records (like from the above table).

So, a TFRecord file can contain a stream of `tf.Example`s.


### Finding the Optimal Shard/Batch/Image Size ###



# Optimizing Data Pipelines #

In [None]:
import os, time
import tensorflow as tf
import tensorflow_datasets as tfds

In [None]:
def benchmark(dataset, num_epochs=2):
    start_time = time.perf_counter()
    for epoch_num in range(num_epochs):
        for sample in dataset:
            # Performing a training step
            time.sleep(0.001)
    execution_time = time.perf_counter() - start_time
    tf.print("Execution time:", execution_time)
    tf.print("Average time per epoch:", execution_time/2)

AUTO = tf.data.experimental.AUTOTUNE

ds = tfds.load('stanford_cars',
               split='train',
               shuffle_files='true',
               as_supervised=True)

By default, when `shuffle_files='true'`, TFDS sets `Options.experimental_deterministic=False`

* Things to investigate: 
  - `tf.data.experimental.MapVectorizationOptions`: whether to vectorize map transformations
  - `tf.data.Options.experimental_optimization.autotune_buffers`: whether to autotune buffer sizes
  - `tfds.ReadConfig`: various data reading options; can be passed to `tfds.load`
  
## Naive ##


In [None]:
benchmark(ds)

# Hardware Architecture #

- Kaggle VM
  - Storage
  - CPUs
- Host VM
  - Storage (GCS)
  - CPUs
  - TPUs
    - 8 cores per TPU
      - MXU (matrix unit), 128x128, bfloat16/float32 to float32
      - VPU (vector unit), ?, float32/int32 to float32/int32

# Feature Format #
## Images ##
- Dimensions
  - MNIST
    - 28 x 28
  - Imagenet Resized
    - 8 x 8
    - 16 x 16
    - 32 x 32
    - 64 x 64
  - Imagenette
    - 160 x 160
    - 320 x 320
  - TFFlowers
    - 192 x 192
    - 224 x 224 
    - 331 x 331
    - 512 x 512

# References #

## Computer Vision ##

- [TensorFlow, Keras, and Deep Learning, without a PhD](https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/index.html?index=..%2F..index#0) (Codelabs)
- [Feature Visualization](https://distill.pub/2017/feature-visualization/) (Distill)
- [Visualizing and Understanding CNNs - Zeiler and Fergus (2017)](https://arxiv.org/abs/1311.2901) (arXiv)

### Online Courses ##

- [CNNs](https://github.com/udacity/aind2-cnn) (Udacity)
- [Deep Learning](https://www.coursera.org/specializations/deep-learning) (Coursera)
- [TensorFlow in Practice](https://www.coursera.org/specializations/tensorflow-in-practice) (Coursera)
- [Convolutional Neural Nets](http://cs231n.github.io/) (Stanford)
- [Practical Deep Learning for Coders](https://course.fast.ai/) (fast.ai)

## TPUs and TensorFlow ##

### General ###

- [Getting Started with 100+ Flowers on TPU](https://www.kaggle.com/mgornergoogle/getting-started-with-100-flowers-on-tpu) (Kaggle notebook)
- [Keras and modern convnets, on TPUS](https://codelabs.developers.google.com/codelabs/keras-flowers-tpu/) (Codelabs)
  - What are TPUs
  - Loading Data Fast
  - Transfer Learning
  - Modern Convnets
  - Xception Fine-Tuned
- [Cloud TPU Repo](https://github.com/tensorflow/tpu) (GCS GitHub) - "This repository is a collection of reference models and tools used with Cloud TPUs." Much is still TF1, but some is TF2. **Very Useful**
  - Benchmarks
    - ResNet50 - detailed discussion of how the benchmark was obtained, VM config, batch size, etc.
  - Models
    - ResNet50
    - Keras Applications
    - Inception
  - Tools
    - Image Data Converter
    - Data Profiler
- [When to Use CPUs vs GPUs vs TPUs in a Kaggle Competition?](https://towardsdatascience.com/when-to-use-cpus-vs-gpus-vs-tpus-in-a-kaggle-competition-9af708a8c3eb) (article)
- [Advanced Guide to Inception v3](https://cloud.google.com/tpu/docs/inception-v3-advanced) (Cloud TPU docs)
  - TF1, but a good overview of TPU data pipelines with preprocessing on host CPU
- [Training ResNet on Cloud TPU](https://cloud.google.com/tpu/docs/tutorials/resnet-2.x) (Cloud TPU docs)
  - much less material (no preprocessing), but TF2

### TFRecords and Data Preparation ###

- [Convert Kaggle Dataset to GCS Bucket of TFRecords](https://www.kaggle.com/paultimothymooney/convert-kaggle-dataset-to-gcs-bucket-of-tfrecords) (Kaggle notebook)
- [Flower pictures to TFRecords](https://colab.research.google.com/github/GoogleCloudPlatform/training-data-analyst/blob/master/courses/fast-and-lean-data-science/03_Flower_pictures_to_TFRecords.ipynb) (Colab notebook)

### Data Augmentation ###

- [Rotation](https://www.kaggle.com/cdeotte/rotation-augmentation-gpu-tpu-0-96/notebook) (Kaggle notebook)
- [Cutup and Mixup](https://www.kaggle.com/cdeotte/cutmix-and-mixup-on-gpu-tpu) (Kaggle notebook)
- [Gridmask](https://www.kaggle.com/xiejialun/gridmask-data-augmentation-with-tensorflow) (Kaggle notebook)
- [Faster Data Augmentation](https://www.kaggle.com/yihdarshieh/make-chris-deotte-s-data-augmentation-faster?scriptVersionId=29453906) (Kaggle notebook)
- [Perspective Transform](https://www.kaggle.com/yihdarshieh/perspective-transformation?scriptVersionId=29866403) (Kaggle notebook)

### Distributed TensorFlow ###

- [tf.distribute](https://www.tensorflow.org/api_docs/python/tf/distribute) (TF2 docs)
- [Distributed TensorFlow](https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/distributed.md) (GitHub community docs)

### Performance ###

- [Custom Training Loop on 100+ Flowers](https://www.kaggle.com/mgornergoogle/custom-training-loop-with-100-flowers-on-tpu/notebook) (Kaggle notebook)
- [Cloud TPU Performance Guide](https://cloud.google.com/tpu/docs/performance-guide) (Cloud TPU docs)
- [Training Performance (TensorFlow Dev Summit 2018)](https://www.youtube.com/watch?v=SxOsJPaxHME) (YouTube)
- [TensorFlow Profiler on TensorBoard](https://colab.research.google.com/github/tensorflow/tensorboard/blob/master/docs/tensorboard_profiling_keras.ipynb) (Colab notebook)
- [Examining the TensorFlow Graph with TensorBoard](https://colab.research.google.com/github/tensorflow/tensorboard/blob/master/docs/graphs.ipynb) (Colab notebook)
- [In-Datacenter Performance Analysis of a TPU](https://arxiv.org/abs/1704.04760) (arXiv)

### Hardware ###

- [Systolic Architectures](http://www.telesens.co/2018/07/30/systolic-architectures/) (article)
- [Why Systolic Architectures?- Kung (1982)](http://www.eecs.harvard.edu/~htk/publication/1982-kung-why-systolic-architecture.pdf) (pdf)
