### Introduction to Keras for Engineers

In [1]:
# Setup
import numpy as np
import tensorflow as tf
from tensorflow import keras

### Introduction

Are you a machine learning engineer looking to use Keras to ship deep-learning powered features in real products? This guide will
server as your first introduction to core Keras API concepts.

In this guide, you will learn how to:

* Prepare your data before training a model (by turning it into either NumPy arrays or tf.data.Dataset objects)
* Do data preprocessing, for instance feature normalization or vocabulary indexing.
* Build a model that turns your data into useful predictions, useing the Keras Functional API.
* Train your model with the built-in Keras fit() function, while being mindful of checkpointing, metrics monitoring, and fault tolerance
* Evaluate your model on a test dataset and how to use it for inference on new data.
* Customize what fit() does, for instance to build a GAN.
* Speed up training by leveraging multiple GPUs.
* Refine your model through hyperparameter tuning.

As the end of this guide, you will get pointers to end-to-end examples to solidify these concepts:

* Image classification
* Text classification
* Credit card fraud detection

### Data loading & preprocessing

Neural networks don't process raw data, like text files, encoded JPEG image files, or CSV files. They process vectorized & standardized representations.

* Text files need to be read into string tensors, then split into words. Finally, the words need to be indexed and turned into integer tensors.
* Images need to be read and decoded into integer tensors, then converted to floating point and normalized to small values (usually between 0 and 1).
* CSV data needs to be parsed, with numercial features converted to floating point tensors and categorical features indexed and converted to integer tensors. Then each feature typically needs to be normalized to zero-mean and unit-variance.
* Etc.

Let's start with data loading.

### Data Loading

Keras models accept three types of inputs:

* NumPy arrays, just like Scikit-learn and many other Python-based libraries. This is a good option if your data fits in memory.
* Tensorflow Dataset Objects. This is a high-performance option that is more suitable for datasets that do not fit in memory and that are streamed from disk or from a distributed file system.
* Python generators that yield batches of data (such as custom subclasses of the keras.utils.Sequence class).

Before you start training a model, you will need to make your data available as one of these formants. If you have a large dataset and you are training on GPU(s), consider using Dataset objects, since they will take care of performance-critical details, such as:

* Asynchronously preprocessing your data on CPU while your GPU is busy, and buffering it into a queue.
* Prefetching data on GPU memory so it's immediately available when the GPU has finished processing the previous batch, so you can reach full GPU utilization.

Kerras features a range of utilities to help you turn raw data on disk into a Dataset:

* tf.keras.preprocessing.image_dataset_from_directory turns image files sorted into class-specific folders into a labeled dataset of image tensors.
* tf.keras.preprocessing.text_dataset_from_directory does the same for text files.

In addition, the TensorFlow tf.data includes other similar utilities, such as tf.data.experimental.make_csv_dataset to load structured data from CSV files.


In [None]:
"""
Example: obtaining a labeled dataset from image files on disk

Supposed you have image files sorted by class in different folder, like this:

main_directory/
...class_a/
......a_image_1.jpg
......a_image_2.jpg
...class_b/
......b_image_1.jpg
......b_image_2.jpg

Then you can do:
"""

dataset = keras.preprocessing.image_dataset_from_directory("PATH_TO_MAIN_DIRECTORY", batch_size=64, image_size=(200,200))

for data, labels in dataset:
    print(data.shape)   # (64, 200, 200, 3)
    print(data.dtype)   # float32
    print(labels.shape) # (64, )
    print(labels.dtype) # int32

The label of a sample is the rank of its folder in alphanumeric order. Naturally, this can also be configured explicitly
by passing, e.g. class_names = ['class_a', 'class_b'], in which cases label 0 will be class_a and 1 will be class_b.

**Example: obtaining a labeled dataset from text files on disk**

Likewise for text: if you have .txt documents sorted by class in different folder, you can do:

In [None]:
dataset = keras.preprocessing.text_dataset_from_directory('path/to/main_directory', batch_size=64)

# For demonstration, iterate over the batches yielded by the dataset.
for data, labels in dataset:
    print(data.shape)   # (64, )
    print(data.dtype)   # string
    print(labels.shape) # (64, )
    print(labels.dtype) # int32

### Data preprocessing with Keras

Once your data is in the form of string/int/float NumpPy arrays, or a Dataset object (or Python generator) that yields batches of
string/int/float tensors, it is time to **preprocess** the data. This can mean:

 * Tokenization of string data, followed by token indexing
 * Feature normalization
 * Rescaling the data to small values (in general, input values to a neural network should be close to zero --
  typically we expect either data with zero-mean and unit-variance, or data in the [0, 1] range.)

#### The ideal machine learning model is end-to-end

In genreal, you should seek to do data preprocessing **as part of your model** as much as possible, not via an external data
preprocessing pipeline. That's because external data preprocessing makes your models less portable when it's time to use them in production.
Consider a model that processes text: it uses a specific tokenization algorithm and a specific vocabulary index. When you want to ship your model
to a mobile app or a JavaScript app, you will need to recreate the exact same preprocessing setup in the target language. This can get very tricky:
any small discrepancy between the original pipeline and the one you recreate has the potential to completely invalidate your model, or at least
severly degrade its performance.

It would be much easier to be able to simply export an end-to-end model that already includes preprocessing. **The ideal model should expect as input something
as closeas possible to raw data: an image model should expect RGB pixel values in the [0, 255] range, and a text model should accept strings of utf-8
characters.** That way, the consumer of the exported model doesn't have to know about the preprocessing pipeline.

### Using Keras preprocessing layers

In Keras, you do in-model data preprocessing via **preprocessing layers**. This includes:
* Vectorizing raw strings of text via the TextVectorization layer
* Features normalization via the Normalization layer
* Image rescaling, cropping, or image data augmentation

THe key advantage of using Keras preprocessing layers is that **they can be included directly into your model,
** either during training or after training, which makes your models portable.

Some preprocessing layers have a state:
* TextVectorization holds an index maping words or token to integer indices
* Normalization holds the mean and variance of your features

The state of a preprocessing layer is obtained yb calling layer.adapt(data) on a smaple of the training data
(or all of it).

**Example: turning strings into sequences of integer word indices**

In [None]:
from tensorflow.keras.layers import TextVectorization

# Example training data, of dtype 'strings'.
training_data = np.array([["This is the 1st sample."], ["And here's the 2nd sample."]])

# Create a TextVectorization layer instance. It can be configured to either
# return integer token indices, or a dense token representation (e.g multi-hot
# or TF-IDF). The text standardization and text splitting algorithms are fully
# configurable.
vectorizer = TextVectorization(output_mode='int')

# Calling 'adapt' on an array or dataset makes the layer generate a vocabulary
# index for the data, which can then be reused when seeing new data.
vectorizer.adapt(training_data)

# After calling adapt, the layer is able to encode any n-gram it has seen before in the
# 'adapt()' data. Unknown n-grams are e ncoded via an "out-of-vocabulary"
# token.

**Example: turning strings into sequences of one-hot encoded bigrams**

In [None]:
from tensorflow.keras.layers import TextVectorization

# Example training data, of dtype `string`.
training_data = np.array([["This is the 1st sample."], ["And here's the 2nd sample."]])

# Create a TextVectorization layer instance. It can be configured to either
# return integer token indices, or a dense token representation (e.g. multi-hot
# or TF-IDF). The text standardization and text splitting algorithms are fully
# configurable.
vectorizer = TextVectorization(output_mode="binary", ngrams=2)

# Calling `adapt` on an array or dataset makes the layer generate a vocabulary
# index for the data, which can then be reused when seeing new data.
vectorizer.adapt(training_data)

# After calling adapt, the layer is able to encode any n-gram it has seen before
# in the `adapt()` data. Unknown n-grams are encoded via an "out-of-vocabulary"
# token.
integer_data = vectorizer(training_data)
print(integer_data)

**Example: normalizing features**

In [None]:
from tensorflow.keras.layers import Normalization

# Example image data, with values in the [0, 255] range
training_data = np.random.randint(0, 256, size=(64, 200, 200, 3)).astype("float32")

normalizer = Normalization(axis=-1)
normalizer.adapt(training_data)

normalized_data = normalizer(training_data)
print("var: %.4f" % np.var(normalized_data))
print("mean: %.4f" % np.mean(normalized_data))

**Example: rescaling & center-cropping images**

Both the Rescaling layer and the CenterCrop layer are stateless, so it isn't necessary to call adapt() in this case

In [None]:
from tensorflow.keras.layers import CenterCrop
from tensorflow.keras.layers import Rescaling

# Example image data, with values in the [0, 255] range
training_data = np.random.randint(0, 256, size=(54, 200, 200, 3)).astype("float32")

cropper = CenterCrop(height=150, width=150)
scaler = Rescaling(scale=1.0/255)

output_data = scaler(cropper(training_data))
print("shape:", output_data.shape)
print("min:", np.min(output_data))
print("max:", np.max(output_data))

### Building models with the Keras Functional API

A "layer" is a simple input-output transformation (such as the scaling & center-cropping transformations above).
For instance, here's a linear projection layer that maps its inputs to a 16-dimensional feature space:

In [None]:
dense = keras.layers.Dense(units=16)

A "model" is a directed acyclic graph of layers. You can think of a model as a "bigger layer" that encompasses multiple sublayers and that can be trained via exposure to data.

The most common and most powerful way to build Keras models is the Functional API. To build modes with the Functional API, you start by specifying the shape (and optionally the dtype) of your inputs. If any dimension of your input can vary, you can specify it as None. For instance, an input for 200x200 RGB image would have shape (200, 200, 3), but an input for RGB images of any size would have shape (None, None, 3).

# Let's say we expect our inputs to be RGB images of arbitrary size
inputs = keras.Input(shape=(None, None, 3))

After defining your input(s), you can chain layer tarnsformation on top of your inputs, until your final output:

In [None]:
from tensorflow.kears import layers

# Center-crop images to 150x150
x = CenterCrop(height=150, width=150)(inputs)
# Rescale images to [0,1]
x = Rescaling(scale=1.0/255)(x)

# Apply some convolution and pooling layers
x = layers.Conv2D(filters=32, kernel_size=(3,3), activation='relu')(x)
x = layers.MaxPooling2D(pool_size=(3, 3))(x)
x = layers.Conv2D(filters=32, kernel_size=(3, 3), activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(3, 3))(x)
x = layers.Conv2D(filters=32, kernel_size=(3, 3), activation="relu")(x)

# Apply global average pooling to get flat feature vectors
x = layers.GlobalAveragePooling2D()(x)

# Add a dense classifier on top
num_classes = 10
outputs = layers.Dense(num_classes, activation="softmax")(x)

Once you have defined the directed acyclic graph of layers that turns your input(s) into your outputs, instantiate a Model object:

In [None]:
model = keras.Model(inputs=inputs, outputs=outputs)

This model behaves basically like a bigger layer. You can call it on batches of data, like this:

In [None]:
data = np.random.randint(0, 256, size=(64, 200, 200, 3)).astype("float32")
processed_data = model(data)
print(processed_data.shape)

You can print a summary of how your data gets transformed at each stage of the model. This is useful for debugging.

Note that the output shape displayed for each layer inclues the **batch size**. Here the batch size is None, which indicates our model can process batches of any size.

In [None]:
model.summary()

The Functional API also makes it easy to build models that have multiple inputs (for instance, an image and its metadata) or multiple outputs (for instance, predicting the class of the image and the likelihood that a user will click on it). FOr a deeper dive into what you can do, see our guide to the Functiional API.

### Training models with fit()

At this point, you know:
* How to prepare your data (e.g. as a NumPy array or a tf.data.Dataset object)
* How to build a model that will process yuur data

The next step is to train your model on your data. The Model class features a built-in training loop, the fit() method. It accepts Dataset objects, Python generators that yield batches of data, or NumPy arrays.

Before youc an call fit(), you need to specify an optimizer and a loss function (we assume you are already familiar with these concepts). This is the compile() step:

In [None]:
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3), loss=keras.losses.CategoricalCrossentropy())