# TensorFlow Model Analysis in Beam
[TensorFlow Model Analysis (TFMA)](https://www.tensorflow.org/tfx/guide/tfma) is a library for performing model evaluation across different slices of data. TFMA performs its computations in a distributed manner over large amounts of data using Apache Beam.

This example colab notebook illustrates how TFMA can be used to investigate and visualize the performance of a model as part of your Beam pipeline. This allows for scalable and flexible execution of your evaluation pipeline. For this we will use [**ExtractEvaluateAndWriteResults**](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/ExtractEvaluateAndWriteResults), which is a PTransform for performing extraction, evaluation, and writing results all in one step.

For additional information on TFMA, you can refer to the [TFMA basic notebook](https://www.tensorflow.org/tfx/tutorials/model_analysis/tfma_basic) which provides a more in-depth look at its capabilities

## Install Jupyter extensions
Note: If running in a local Jupyter notebook, then these Jupyter extensions must be installed in the environment before running Jupyter.

```bash
jupyter nbextension enable --py widgetsnbextension --sys-prefix 
jupyter nbextension install --py --symlink tensorflow_model_analysis --sys-prefix 
jupyter nbextension enable --py tensorflow_model_analysis --sys-prefix 
```

##  Install TensorFlow Model Analysis (TFMA)

This will pull in all the dependencies, and will take a minute.

In [None]:
# Upgrade pip to the latest, and install TFMA.
!pip install -U pip
!pip install tensorflow-model-analysis

# To use the newly installed version, restart the runtime.
exit() 

In [None]:
# This setup was tested with TF 2.11, TFMA 0.43 and Beam 2.44 (using colab),
# but it should also work with the latest release.
import sys

# Confirm that we're using Python 3
assert sys.version_info.major==3, 'This notebook must be run using Python 3.'

import tensorflow as tf
print('TF version: {}'.format(tf.__version__))
import apache_beam as beam
print('Beam version: {}'.format(beam.__version__))
import tensorflow_model_analysis as tfma
print('TFMA version: {}'.format(tfma.__version__))
import tensorflow_datasets as tfds
print('TFDS version: {}'.format(tfds.__version__))

**NOTE: The output above should be clear of errors before proceeding. Re-run the install and restart your kernel if you are still seeing errors.**

# Data preprocessing

## Diamonds price prediction

We will be using the [TFDS diamonds dataset](https://www.tensorflow.org/datasets/catalog/diamonds) to train a linear regression model that will predict the price of a diamond. This dataset contains various physical attributes such as the weight of the diamond (carat), the cut quality, color, clarity and the price of 53940 diamonds. The model's performance will be evaluated using metrics such as mean squared error and mean absolute error.


In order to simulate a scenario where a model's performance improves over time as new data is added to the dataset, we will first train a model called v1 using half of the diamonds dataset. Later on, we will train a second model called v2 using additional data. This will enable us to demonstrate the use of TFMA when comparing the performance of the two models for the same task.

In [None]:
# Load the data from TFDS and create a train, test and validation dataset by splitting the dataset into parts
(ds_train_v1, ds_test, ds_val), info = tfds.load('diamonds', split=['train[:40%]', 'train[80%:90%]', 'train[90%:]'], as_supervised=True, with_info=True)

In [None]:
import numpy as np

# Load in the numerical training data to use for normalization
def extract_numerical_features(item):
  carat = item['carat']
  depth = item['depth']
  table = item['table']
  x = item['x']
  y = item['y']
  z = item['z']
  
  return [carat, depth, table, x, y, z]

def get_train_data(ds_train):
  train_data = []
  for item, label in ds_train:
    features = extract_numerical_features(item)
    train_data.append(features)

  train_data = np.array(train_data)

  return train_data

In [None]:
train_data_v1 = get_train_data(ds_train_v1)

In [None]:
# Define the features length
NUMERICAL_FEATURES = 6
NUM_FEATURES = (NUMERICAL_FEATURES +
                info.features['features']['color'].num_classes +
                info.features['features']['cut'].num_classes +
                info.features['features']['clarity'].num_classes)

In [None]:
# Transform the input data into a feature vector and label by selecting the input and output for the model
def transform_data(item, label):
  numerical_features = extract_numerical_features(item)

  # Categorical features will be encoded using one-hot encoding
  color = tf.one_hot(item['color'], info.features['features']['color'].num_classes)
  cut = tf.one_hot(item['cut'], info.features['features']['cut'].num_classes)
  clarity = tf.one_hot(item['clarity'], info.features['features']['clarity'].num_classes)
  
  # Create output tensor
  output = tf.concat([tf.stack(numerical_features, axis=0), color, cut, clarity], 0)
  return output, [label]

In [None]:
ds_train_v1 = ds_train_v1.map(transform_data)
ds_test = ds_test.map(transform_data)
ds_val = ds_val.map(transform_data)

In [None]:
# Prepare the data for training by structuring it in batches
BATCH_SIZE = 32
ds_train_v1 = ds_train_v1.batch(BATCH_SIZE)
ds_test = ds_test.batch(BATCH_SIZE)

## TFRecords creation

TFMA and Beam need to read the dataset used during evaluation from a file. We will create a TFRecords file that contains our validation dataset.

In [None]:
!mkdir data

In [None]:
# Write the validation record to a file (used by TFMA)
tfrecord_file = 'data/val_data.tfrecord'

with tf.io.TFRecordWriter(tfrecord_file) as file_writer:
  for x, y in ds_val:
    record_bytes = tf.train.Example(features=tf.train.Features(feature={
        "inputs": tf.train.Feature(float_list=tf.train.FloatList(value=x)),
        "output": tf.train.Feature(float_list=tf.train.FloatList(value=[y])),
    })).SerializeToString()
    file_writer.write(record_bytes)

# Model definition and training

Now let's train a linear regression model that will predict the price of a diamond. The model we will train is a neural network with one hidden layer. We also will use one normalisation layer to scale all the numerical features between 0 and 1.

In [None]:
def construct_model(model_name, train_data):
  inputs = tf.keras.Input(shape=(NUM_FEATURES,), name='inputs')

  # Normalize numerical features
  normalization_layer = tf.keras.layers.Normalization()
  # Fit normalization layer on training data
  normalization_layer.adapt(train_data)
  # Split input between numerical and categorical input
  input_numerical = tf.gather(inputs, indices=[*range(NUMERICAL_FEATURES)], axis=1)
  input_normalized = normalization_layer(input_numerical)
  input_one_hot = tf.gather(inputs, indices=[*range(NUMERICAL_FEATURES, NUM_FEATURES)], axis=1)
  # Define one hidden layer with 8 neurons
  x = tf.keras.layers.Dense(8, activation='relu')(tf.concat([input_normalized, input_one_hot], 1))
  outputs = tf.keras.layers.Dense(1, name='output')(x)
  model = tf.keras.Model(inputs=inputs, outputs=outputs, name=model_name)

  model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.1),
    loss='mean_absolute_error')
  
  return model

In [None]:
model_v1 = construct_model('model_v1', train_data_v1)

In [None]:
# Train the model
history = model_v1.fit(
    ds_train_v1,
    validation_data=ds_test,
    epochs=5,
    verbose=1)

In [None]:
# Save the model to disk
model_path_v1 = 'saved_model_v1'
model_v1.save(model_path_v1)

# Evaluation

Now that we have trained a model, we can use TFMA to analyze the performance. The first thing we need to do is define our evaluation config. For our use case, we will use the most common metrics used for a linear regression model: MAE & MSE. See [TFMA metrics and plots](https://www.tensorflow.org/tfx/model_analysis/metrics) for more information about the supported evaluation parameters. 

In [None]:
from google.protobuf import text_format

# Define TFMA evaluation config
eval_config = text_format.Parse("""
  ## Model information
  model_specs {
    # For keras (and serving models) we need to add a `label_key`.
    label_key: "output"
  }

  ## Post training metric information. These will be merged with any built-in
  ## metrics from training.
  metrics_specs {
    metrics { class_name: "ExampleCount" }
    metrics { class_name: "MeanAbsoluteError" }
    metrics { class_name: "MeanSquaredError" }
    metrics { class_name: "MeanPrediction" }
  }

  slicing_specs {}
""", tfma.EvalConfig())

We will now use [ExtractEvaluateAndWriteResults](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/ExtractEvaluateAndWriteResults), which is a PTransform for performing extraction, evaluation, and writing results. This PTransform can directly be used in our Beam pipeline if we combine it with reading in our TFRecords via [TFXIO](https://www.tensorflow.org/tfx/tfx_bsl/api_docs/python/tfx_bsl/public/tfxio)

In [None]:
from tfx_bsl.public import tfxio

output_path = 'evaluation_results'

eval_shared_model = tfma.default_eval_shared_model(
    eval_saved_model_path=model_path_v1, eval_config=eval_config)

tfx_io = tfxio.TFExampleRecord(
          file_pattern=tfrecord_file,
          raw_record_column_name=tfma.ARROW_INPUT_COLUMN)

# Run Evaluation
with beam.Pipeline() as pipeline:
    _ = (
        pipeline
        | 'ReadData' >> tfx_io.BeamSource()
        | 'EvalModel' >> tfma.ExtractEvaluateAndWriteResults(
           eval_shared_model=eval_shared_model,
           eval_config=eval_config,
           output_path=output_path))

In [None]:
# Visualize results
result = tfma.load_eval_result(output_path=output_path)
tfma.view.render_slicing_metrics(result)

# Comparing multiple models

An interesting and common use case it to compare the performance of multiple models to select the best candidate to put into production. We can also use Beam to evaluate and compare multiple models in one step.

## Model v2

In order to showcase this use case, we will now train a second model on the full dataset.

In [None]:
# Preprocess data
ds_train_v2 = tfds.load('diamonds', split=['train[:80%]'], as_supervised=True)[0]
train_data_v2 = get_train_data(ds_train_v2)
ds_train_v2 = ds_train_v2.map(transform_data)
ds_train_v2 = ds_train_v2.batch(BATCH_SIZE)

In [None]:
# Define and train model
model_v2 = construct_model('model_v2', train_data_v2)
history = model_v2.fit(
    ds_train_v2,
    validation_data=ds_test,
    epochs=5,
    verbose=1)

In [None]:
# Save model to file
model_path_v2 = 'saved_model_v2'
model_v2.save(model_path_v2)

## Evaluation

In [None]:
# Define TFMA evaluation config, including two model specs for the two models we want to compare
eval_config_compare = text_format.Parse("""
  ## Model information
  model_specs {
    name: "model_v1"
    # For keras (and serving models) we need to add a `label_key`.
    label_key: "output"
    is_baseline: true
  }
  model_specs {
    name: "model_v2"
    # For keras (and serving models) we need to add a `label_key`.
    label_key: "output"
  }

  ## Post training metric information. These will be merged with any built-in
  ## metrics from training.
  metrics_specs {
    metrics { class_name: "ExampleCount" }
    metrics { class_name: "MeanAbsoluteError" }
    metrics { class_name: "MeanSquaredError" }
    metrics { class_name: "MeanPrediction" }
  }

  slicing_specs {}
""", tfma.EvalConfig())

In [None]:
from tfx_bsl.public import tfxio

output_path_compare = 'evaluation_results_compare'

eval_shared_models = [
  tfma.default_eval_shared_model(
      model_name='model_v1',
      eval_saved_model_path=model_path_v1,
      eval_config=eval_config_compare),
  tfma.default_eval_shared_model(
      model_name='model_v2',
      eval_saved_model_path=model_path_v2,
      eval_config=eval_config_compare),
]

tfx_io = tfxio.TFExampleRecord(
          file_pattern=tfrecord_file,
          raw_record_column_name=tfma.ARROW_INPUT_COLUMN)

# Run Evaluation
with beam.Pipeline() as pipeline:
    _ = (
        pipeline
        | 'ReadData' >> tfx_io.BeamSource()
        | 'EvalModel' >> tfma.ExtractEvaluateAndWriteResults(
           eval_shared_model=eval_shared_models,
           eval_config=eval_config_compare,
           output_path=output_path_compare))

In [None]:
# Visualize results
results = tfma.load_eval_results(output_paths=output_path_compare)
tfma.view.render_time_series(results)