Alexander S. Lundervold, 20.04.22

# Introduction

Now we've come to the (small) part of the machine learning engineering pipeline where the actual machine learning takes place. 

<center>
<a href="https://dl.acm.org/doi/10.5555/2969442.2969519"><img width=60% src="assets/mlengineering.png"></a><br>
<span style="font-size:10px">Figure from <a href="https://dl.acm.org/doi/10.5555/2969442.2969519">Sculley et.al., Hidden technical debt in Machine learning systems, 2015</a></span></center>

We've done the **data ingestion** (`ExampleGen`), the **data validation** (`StatisticsGen`, `SchemaGen`, `ExampleValidator`), and the **data preprocessing** (`Transform`), and are ready to move on to **model training**, then **model analysis and validation**, before, finally, **model deployment**.

In this notebook, we'll take a look at **hyperparameter tuning** and **model training**. 

## `Trainer` and `Tuner`

In TensorFlow Extended, we can use the `Trainer` and `Tuner` components for training and using models. 

Our goal is to construct the following pipeline:

<img width=100% src="assets/pipeline_4.png">

The inputs to the `Trainer` component will be the preprocessing graph and the transformed example artifacts from the `Transform` component, the data schema (that we defined using `SchemaGen`), and a user-provided module file that specifies the model and training logic. 

The `Tuner` component takes as its inputs the transformed examples and a module file that specifies the model and the tuning logic, including the hyperparameter space over which to search, and the objective to be used during the search. When executed, it produces the best results found during the search. These can then be consumed by the `Trainer`.

As always, you should consult the TFX guide for additional details: https://www.tensorflow.org/tfx/guide/tuner<br>
https://www.tensorflow.org/tfx/guide/trainer

# Setup

Import basic libraries:

In [None]:
%matplotlib inline
import os
from pathlib import Path

Check whether we're running on Colab:

In [None]:
try:
    import colab
    colab=True
except:
    colab=False

Set up data directories:

In [None]:
if colab:
    from google.colab import drive
    drive.mount('./gdrive')
    DATA = Path('./gdrive/MyDrive/ColabData/petfinder-mini/csv')
else:
    NB_DIR = Path.cwd()
    DATA = NB_DIR/'..'/'data'/'petfinder-mini'/'csv'
    
SPLIT_DATA = DATA/'..'/'split_csv'

In [None]:
import os
# To use a specific GPU in a multi-GPU setup
# You will want to remove this if you're using a single GPU system
os.environ["CUDA_VISIBLE_DEVICES"]="2"

Install TFX and import components:

In [None]:
if colab:
    !pip install -U tfx

> If on Colab, restart the runtime after running the above cell

In [None]:
import tensorflow as tf

In [None]:
import tfx

Set up the interactive context for running TFX components:

In [None]:
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

In [None]:
context = InteractiveContext()

# Recreate the previous pipeline

In [None]:
from tfx.components import CsvExampleGen
from tfx.components import StatisticsGen
from tfx.components import SchemaGen
from tfx.components import ExampleValidator
from tfx.components import Transform

In [None]:
# Generate examples
example_gen = CsvExampleGen(input_base=str(DATA)+'/')

# Generate statistics
statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])

# Automatic data schema (in a more realistic setting we would have 
# used a manually modified schema saved to disk)
schema_gen = SchemaGen(statistics=statistics_gen.outputs['statistics'])

# Validate examples
example_validator = tfx.components.ExampleValidator(
    statistics=statistics_gen.outputs['statistics'],
    schema=schema_gen.outputs['schema'])

# Preprocess
pets_transform_file = 'pets_transforms.py'

transform = Transform(
    examples=example_gen.outputs['examples'],
    schema=schema_gen.outputs['schema'],
    module_file=os.path.abspath(pets_transform_file))

## Run the components

In [None]:
for component in [example_gen, statistics_gen, schema_gen, example_validator, transform]:
    context.run(component)

# Set up the model, tuning and training

As we did in the previous notebook, we'll follow the example in Hapke & Nelson, Building Machine Learning Pipelines: [https://github.com/Building-ML-Pipelines/building-machine-learning-pipelines/blob/main/components/module.py](https://github.com/Building-ML-Pipelines/building-machine-learning-pipelines/blob/main/components/module.py)

The basic idea of our model is to have the preprocessed features go through a simple one-layer neural network, except the text feature (`Description`) which will be passed to a pretrained NLP model that extracts embeddings. We'll use the Universal Sentence Encoder from the TensorFlow Hub: https://tfhub.dev/google/universal-sentence-encoder/4. 

Here's an illustration of our model:

![](model.png)

> Note that in practice one would make an effort to find a good model setup. This can be used as a possible starting point.

## Hyperparameter tuning

We will illustrate how one can set up hyperparameter tuning using the Tuner component from TFX. The Tuner component needs a `tuner_fn` and the Trainer component a `run_fn`. 

## Create a module file for training and tuning

This module has quite a large number of components. Go through this very carefully to see that you understand what's going on.

In [None]:
%%writefile module.py

import os
# To train on a specific GPU in a multi-GPU setup
# You will want to remove this if you're using a single GPU system
os.environ["CUDA_VISIBLE_DEVICES"]="2"


import tensorflow as tf
import keras_tuner
from tfx import v1 as tfx
import tensorflow_transform as tft
import tensorflow_hub as hub


# We grab the features from our pets_transform module
import pets_transforms
_ONE_HOT_FEATURES = pets_transforms.ONE_HOT_FEATURES
_NUMERICAL_FEATURES = pets_transforms.NUMERICAL_FEATURES
_TEXT_FEATURES = pets_transforms.TEXT_FEATURES
_LABEL_KEY = pets_transforms.LABEL_KEY

_transformed_name = pets_transforms._transformed_name


############################################################
# Define the model and its hyperparameters
############################################################

def _get_hyperparameters() -> keras_tuner.HyperParameters:
    """Returns hyperparameters for building Keras model.
    Copied from 
    https://github.com/tensorflow/tfx/blob/master/tfx/examples/penguin/penguin_utils_keras.py
    """
    hp = keras_tuner.HyperParameters()
    # Defines search space.
    hp.Choice('learning_rate', [1e-2, 1e-3], default=1e-2)
    hp.Int('num_nontext_layers', 1, 3, default=2)
    return hp



def get_model(hparams: keras_tuner.HyperParameters) -> tf.keras.models.Model:
    """
    Creates a Keras model using the specified hyperparameters
    
    Returns:
        A model as a Keras object
    """
    
    # We'll store all the input features except the text feature here:
    input_features = []
    
    
    for key, dim in _ONE_HOT_FEATURES.items():
        input_features.append(
            tf.keras.Input(shape=(dim+1, ), name=_transformed_name(key))
        )
        
    for feature in _NUMERICAL_FEATURES:
        input_features.append(
            tf.keras.Input(shape=(1, ), name=_transformed_name(feature))
        )
        
    # Text feature
    input_texts = []
    for key in _TEXT_FEATURES.keys():
        input_texts.append(
            tf.keras.Input(shape=(1,), name=_transformed_name(key), dtype=tf.string)
        )

        
    # Embedding the text feature
    MODULE_URL = "https://tfhub.dev/google/universal-sentence-encoder/4"
    embed = hub.KerasLayer(MODULE_URL)
    reshaped_description = tf.reshape(input_texts[0], [-1])
    embed_description = embed(reshaped_description)
    
    # Construct the subgraph for the text features
    text_model = tf.keras.layers.Reshape((512,), input_shape=(1, 512))(embed_description)
    text_model = tf.keras.layers.Dense(16, activation="relu")(text_model)
    
    # Subgraph for the other features
    other_model = tf.keras.layers.concatenate(input_features)
    for _ in range(int(hparams.get('num_nontext_layers'))):
        other_model = tf.keras.layers.Dense(8, activation="relu")(other_model)
    
    # Stitch the two model parts together
    both = tf.keras.layers.concatenate([text_model, other_model])
    both = tf.keras.layers.Dropout(.7)(both)

    # Produce output predictions
    output = tf.keras.layers.Dense(5, activation="softmax")(both)
    
    # Define the inputs
    inputs = input_features + input_texts
    
    # Create the model
    keras_model = tf.keras.models.Model(inputs, output)
    
    keras_model.compile(
        optimizer=tf.keras.optimizers.Adam(hparams.get('learning_rate')),
        loss="sparse_categorical_crossentropy",
        metrics=[tf.keras.metrics.SparseCategoricalAccuracy()]
    )
    
    
    # Save a plot of the model
    tf.keras.utils.plot_model(keras_model, show_shapes=True, rankdir="LR")
    
    return keras_model


############################################################
# Define an input function to generate features and label
############################################################

# This is taken from Hapke & Nelson and the TFX documentation (links below)

def _gzip_reader_fn(filenames):
    """Small utility returning a record reader that can read gzip'ed files."""
    return tf.data.TFRecordDataset(filenames, compression_type="GZIP")


def _get_serve_tf_examples_fn(model, tf_transform_output):
    """Returns a function that parses a serialized tf.Example.
    From 
    https://github.com/tensorflow/tfx/blob/master/tfx/examples/mnist/mnist_utils_native_keras.py
    """

    model.tft_layer = tf_transform_output.transform_features_layer()

    @tf.function
    def serve_tf_examples_fn(serialized_tf_examples):
        """Returns the output to be used in the serving signature."""
        feature_spec = tf_transform_output.raw_feature_spec()
        feature_spec.pop(_LABEL_KEY)
        parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)

        transformed_features = model.tft_layer(parsed_features)

        outputs = model(transformed_features)
        return {"outputs": outputs}

    return serve_tf_examples_fn


def _input_fn(file_pattern, tf_transform_output, batch_size=64):
    """Generates features and label for tuning/training.
    Args:
    file_pattern: input tfrecord file pattern.
    tf_transform_output: A TFTransformOutput.
    batch_size: representing the number of consecutive elements of returned
      dataset to combine in a single batch
      Returns:
        A dataset that contains (features, indices) tuple where features is a
          dictionary of Tensors, and indices is a single Tensor of
          label indices.
          
    See also 
    https://github.com/tensorflow/tfx/blob/master/tfx/examples/mnist/mnist_utils_native_keras_base.py
    """
    transformed_feature_spec = tf_transform_output.transformed_feature_spec().copy()

    dataset = tf.data.experimental.make_batched_features_dataset(
        file_pattern=file_pattern,
        batch_size=batch_size,
        features=transformed_feature_spec,
        reader=_gzip_reader_fn,
        label_key=_transformed_name(_LABEL_KEY),
    )

    return dataset


############################################################
# Define the hyperparameter tuner
############################################################
# Tuner will call the following function.
# Based on https://github.com/tensorflow/tfx/blob/master/tfx/examples/penguin/penguin_utils_keras.py

def tuner_fn(fn_args: tfx.components.FnArgs) -> tfx.components.TunerFnResult:
    """Build the tuner using the KerasTuner API
    """
    
    tuner = keras_tuner.RandomSearch(
          get_model,
          max_trials=6,
          hyperparameters=_get_hyperparameters(),
          allow_new_entries=False,
          objective=keras_tuner.Objective('val_sparse_categorical_accuracy', 'max'),
          directory=fn_args.working_dir,
          project_name='petfinder_tuning')
    
    transform_graph = tft.TFTransformOutput(fn_args.transform_graph_path)

    train_dataset = _input_fn(
        fn_args.train_files,
        transform_graph,
        batch_size=64)

    eval_dataset = _input_fn(
        fn_args.eval_files,
        transform_graph,
        batch_size=64)

    return tfx.components.TunerFnResult(
        tuner=tuner,
        fit_kwargs={
            'x': train_dataset,
            'validation_data': eval_dataset,
            'steps_per_epoch': fn_args.train_steps,
            'validation_steps': fn_args.eval_steps
      })


############################################################
# Define the training function
############################################################
# Trainer will call this function.

def run_fn(fn_args: tfx.components.FnArgs):
    """Train the model based on given args.
    Args:
    fn_args: Holds args used to train the model as name/value pairs.
    """
    tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)

    train_dataset = _input_fn(fn_args.train_files, tf_transform_output, batch_size=64)
    eval_dataset = _input_fn(fn_args.eval_files, tf_transform_output, batch_size=64)
    
    # Grab hyperparameters
    hparams = _get_hyperparameters()

    # Define the model
    model = get_model(hparams)

    # Log to TensorBoard
    log_dir = os.path.join(os.path.dirname(fn_args.serving_model_dir), "logs")
    tensorboard_callback = tf.keras.callbacks.TensorBoard(
        log_dir=log_dir, update_freq="batch"
    )
    callbacks = [tensorboard_callback]

    # Train the model
    model.fit(
        train_dataset,
        epochs=5,
        steps_per_epoch=fn_args.train_steps,
        validation_data=eval_dataset,
        validation_steps=fn_args.eval_steps,
        callbacks=callbacks,
    )
    
    # Save the model    
    signatures = {
        "serving_default": _get_serve_tf_examples_fn(
            model, tf_transform_output
        ).get_concrete_function(
            tf.TensorSpec(shape=[None], dtype=tf.string, name="examples")
        ),
    }
    model.save(fn_args.serving_model_dir, save_format="tf", signatures=signatures)    

## Search for hyperparameters

In [None]:
from tfx.components import Tuner

In [None]:
from tfx.proto import trainer_pb2

In [None]:
# As training takes some time we'll only use a few steps
train_steps = 200
eval_steps = 100

In [None]:
tuner = tfx.components.Tuner(
    module_file=os.path.abspath('module.py'),
    examples=transform.outputs['transformed_examples'],
    transform_graph=transform.outputs['transform_graph'],
    train_args=trainer_pb2.TrainArgs(num_steps=train_steps),
    eval_args=trainer_pb2.EvalArgs(num_steps=eval_steps))

In [None]:
context.run(tuner)

## Train a model

In [None]:
from tfx.components import Trainer

In [None]:
# As training takes some time we'll only use a few steps
train_steps = 200
eval_steps = 100

In [None]:
trainer = Trainer(
    module_file=os.path.abspath('module.py'),
    transformed_examples=transform.outputs['transformed_examples'],
    schema=schema_gen.outputs['schema'],
    transform_graph=transform.outputs['transform_graph'],
    train_args=trainer_pb2.TrainArgs(splits=['train'], num_steps=train_steps),
    eval_args=trainer_pb2.EvalArgs(splits=['eval'], num_steps=eval_steps)
)

In [None]:
context.run(trainer)

## Using TensorBoard to inspect and monitor the training

The logs from our training process was stored as an output artifact:

In [None]:
trainer.outputs['model']

In [None]:
model_dir = trainer.outputs['model'].get()[0].uri
model_dir

We find the logs in the `logs` subdirectory:

In [None]:
os.listdir(model_dir)

We can use TensorBoard directly in the notebook:

In [None]:
%load_ext tensorboard
%tensorboard --logdir {model_dir}

# What have we done so far?

Here's our current pipeline:

<img width=100% src="assets/pipeline_4.png">

# What's next?

The next step is to do **model analysis**. For this, we'll use the **[TensorFlow Model Analysis](https://www.tensorflow.org/tfx/tutorials/model_analysis/tfma_basic)** library (for manual inspection) and look at the TFX components **Evaluator**, **InfraValidator**, and **Pusher** (for automatic model analysis as part of our pipeline).