<a href="https://colab.research.google.com/github/alvitohawari/Hands-on-Machine-Learning-with-Scikit-Learn-Keras-TensorFlow/blob/main/Chapter_19_training_and_deploying_at_scale.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 19 – Training and Deploying TensorFlow Models at Scale (Summary)

## Overview
This chapter discusses how to **deploy, serve, scale, and optimize TensorFlow models in real-world production environments**.
Training a high-performing model is only the first step—production systems require reliable deployment,
version management, scalability, low latency, and efficient resource usage.

The chapter covers:
- Model serving using **TensorFlow Serving**
- Model versioning and A/B testing
- Deployment on **cloud platforms (Google Cloud AI Platform)**
- Secure and scalable prediction services
- Deployment to **mobile, embedded devices, and web browsers**
- Model optimization techniques such as **quantization**

---

## Why Model Serving Is Necessary
As systems grow, calling `model.predict()` directly inside applications becomes impractical.
Instead, models are wrapped inside **dedicated prediction services**, which provide:
- Centralized model access
- Independent scaling
- Easy model updates and rollbacks
- Consistent predictions across systems
- Support for **A/B experiments** and canary releases

This decoupling improves reliability, maintainability, and scalability.

---

## SavedModel Format
Before deployment, TensorFlow models must be exported to the **SavedModel** format.

A SavedModel includes:
- The computation graph
- Model weights
- Optional assets (e.g., vocabularies, class names)

SavedModels:
- Support **model versioning**
- Are required by TensorFlow Serving
- Can include preprocessing layers to avoid training–serving mismatches

Each model version is stored in a separate directory, enabling smooth transitions between versions.

---

## TensorFlow Serving
**TensorFlow Serving** is a high-performance, production-ready model server written in C++.

Key features:
- Serves multiple models and multiple versions simultaneously
- Automatically loads the latest model version
- Supports **graceful model updates**
- Enables **automatic request batching** for higher throughput
- Exposes models via **REST** and **gRPC** APIs

TF Serving is commonly deployed using **Docker containers**, making installation and scaling straightforward.

---

## REST vs gRPC APIs
TensorFlow Serving supports two main APIs:

### REST API
- Simple and widely supported
- Uses JSON over HTTP
- Easy to debug and integrate
- Less efficient for large inputs due to text-based serialization

### gRPC API
- Binary protocol based on Protocol Buffers
- Much more efficient and faster
- Lower latency and bandwidth usage
- Preferred for high-throughput production systems

---

## Model Versioning and Rollback
TensorFlow Serving continuously monitors model directories.
When a new version appears:
- It loads the new version automatically
- Handles pending requests gracefully
- Unloads the old version once no longer needed

Rolling back a model is as simple as removing the new version directory.
This makes experimentation and recovery from failures safe and fast.

---

## Scaling with Load Balancing
To handle high traffic:
- Multiple TF Serving instances can be deployed
- Requests are distributed using a **load balancer**
- Container orchestration tools like **Kubernetes** simplify management

Scaling ensures the system can handle high queries per second (QPS) reliably.

---

## Cloud Deployment with Google Cloud AI Platform
The chapter demonstrates deploying models on **Google Cloud AI Platform**, which internally uses TensorFlow Serving.

Benefits:
- Automatic scaling based on traffic
- Integrated monitoring and logging
- Secure access via authentication
- Pay-as-you-go pricing model

Models are stored in **Google Cloud Storage (GCS)** and served through managed infrastructure.

---

## Authentication and Security
Production services require secure access.

Google Cloud uses:
- **Service accounts** instead of user credentials
- Token-based authentication
- Fine-grained access control

Client applications authenticate using service account keys, ensuring security and minimal permissions.

---

## Deploying to Mobile and Embedded Devices
Large models are often unsuitable for mobile or embedded environments due to:
- Limited memory
- Limited computation power
- Battery and latency constraints

**TensorFlow Lite (TFLite)** addresses these challenges by:
- Converting models to lightweight FlatBuffer format
- Removing unnecessary operations
- Optimizing computation graphs

---

## Model Optimization with Quantization
Quantization reduces model size and improves efficiency by using lower-precision numbers.

Techniques include:
- **Post-training quantization** (e.g., 8-bit integers)
- **Full integer quantization** (weights and activations)
- **Quantization-aware training** to reduce accuracy loss

These methods significantly reduce storage size, power consumption, and inference latency.

---

## TensorFlow in the Browser
Models can also run directly in web browsers using **TensorFlow.js**.

Advantages:
- No server required for inference
- Low latency
- Improved privacy (data stays on the client)
- Works offline or with poor connectivity

Models are converted to a web-friendly format and executed using JavaScript and WebGL.



# Setup
First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20 and TensorFlow ≥2.0.


In [17]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Is this notebook running on Colab or Kaggle?
IS_COLAB = "google.colab" in sys.modules
IS_KAGGLE = "kaggle_secrets" in sys.modules

if IS_COLAB or IS_KAGGLE:
    !echo "deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" > /etc/apt/sources.list.d/tensorflow-serving.list
    !curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
    !apt update && apt-get install -y tensorflow-model-server
    %pip install -q -U tensorflow-serving-api

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# TensorFlow ≥2.0 is required
import tensorflow as tf
from tensorflow import keras
assert tf.__version__ >= "2.0"

if not tf.config.list_physical_devices('GPU'):
    print("No GPU was detected. CNNs can be very slow without a GPU.")
    if IS_COLAB:
        print("Go to Runtime > Change runtime and select a GPU hardware accelerator.")
    if IS_KAGGLE:
        print("Go to Settings > Accelerator and select GPU.")

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
np.random.seed(42)
tf.random.set_seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "deploy"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  2943  100  2943    0     0  12283      0 --:--:-- --:--:-- --:--:-- 12365
OK
Hit:1 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease
Hit:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Get:3 https://cli.github.com/packages stable InRelease [3,917 B]
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:5 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:6 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:8 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:9 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Fet

# Deploying TensorFlow models to TensorFlow Serving (TFS)
We will use the REST API or the gRPC API.

## Save/Load a `SavedModel`

In [18]:
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()
X_train_full = X_train_full[..., np.newaxis].astype(np.float32) / 255.
X_test = X_test[..., np.newaxis].astype(np.float32) / 255.
X_valid, X_train = X_train_full[:5000], X_train_full[5000:]
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_new = X_test[:3]

In [19]:
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28, 1]),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.SGD(learning_rate=1e-2),
              metrics=["accuracy"])
model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))

  super().__init__(**kwargs)


Epoch 1/10
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 9ms/step - accuracy: 0.7269 - loss: 1.0513 - val_accuracy: 0.9044 - val_loss: 0.3649
Epoch 2/10
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.8982 - loss: 0.3667 - val_accuracy: 0.9188 - val_loss: 0.2954
Epoch 3/10
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 4ms/step - accuracy: 0.9148 - loss: 0.3059 - val_accuracy: 0.9302 - val_loss: 0.2623
Epoch 4/10
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 5ms/step - accuracy: 0.9233 - loss: 0.2721 - val_accuracy: 0.9352 - val_loss: 0.2395
Epoch 5/10
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 4ms/step - accuracy: 0.9305 - loss: 0.2479 - val_accuracy: 0.9384 - val_loss: 0.2219
Epoch 6/10
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 4ms/step - accuracy: 0.9360 - loss: 0.2288 - val_accuracy: 0.9416 - val_loss: 0.2078
Epoch 7/10
[

<keras.src.callbacks.history.History at 0x7d70f1e8a450>

In [20]:
np.round(model.predict(X_new), 2)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 95ms/step


array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 1.  , 0.  , 0.  ],
       [0.  , 0.  , 0.99, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.99, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]],
      dtype=float32)

In [21]:
model_version = "0001"
model_name = "my_mnist_model"
model_path = os.path.join(model_name, model_version)
model_path

'my_mnist_model/0001'

In [22]:
import shutil
import os

if os.path.exists(model_name):
    shutil.rmtree(model_name)

In [23]:
model.export(model_path)

Saved artifact at 'my_mnist_model/0001'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='keras_tensor_8')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  137924100800336: TensorSpec(shape=(), dtype=tf.resource, name=None)
  137924100791312: TensorSpec(shape=(), dtype=tf.resource, name=None)
  137924100795152: TensorSpec(shape=(), dtype=tf.resource, name=None)
  137924100792080: TensorSpec(shape=(), dtype=tf.resource, name=None)


In [24]:
for root, dirs, files in os.walk(model_name):
    indent = '    ' * root.count(os.sep)
    print('{}{}/'.format(indent, os.path.basename(root)))
    for filename in files:
        print('{}{}'.format(indent + '    ', filename))

my_mnist_model/
    0001/
        fingerprint.pb
        saved_model.pb
        variables/
            variables.index
            variables.data-00000-of-00001
        assets/


In [25]:
!saved_model_cli show --dir {model_path}

2026-01-10 07:01:52.201873: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
The given SavedModel contains the following tag-sets:
'serve'


In [26]:
!saved_model_cli show --dir {model_path} --tag_set serve

2026-01-10 07:02:02.959479: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
The given SavedModel MetaGraphDef contains SignatureDefs with the following keys:
SignatureDef key: "__saved_model_init_op"
SignatureDef key: "serve"
SignatureDef key: "serving_default"


In [27]:
!saved_model_cli show --dir {model_path} --tag_set serve \
                      --signature_def serving_default

2026-01-10 07:02:12.081381: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
The given SavedModel SignatureDef contains the following input(s):
  inputs['keras_tensor_8'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 28, 28, 1)
      name: serving_default_keras_tensor_8:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['output_0'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 10)
      name: StatefulPartitionedCall_1:0
Method name is: tensorflow/serving/predict


In [28]:
!saved_model_cli show --dir {model_path} --all

2026-01-10 07:02:20.836621: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serve']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['keras_tensor_8'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 28, 28, 1)
        name: serve_keras_tensor_8:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output_0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: StatefulPartitionedCall:

Let's write the new instances to a `npy` file so we can pass them easily to our model:

In [29]:
np.save("my_mnist_tests.npy", X_new)

In [31]:
input_name = 'keras_tensor_8'
input_name

'keras_tensor_8'

And now let's use `saved_model_cli` to make predictions for the instances we just saved:

In [32]:
!saved_model_cli run --dir {model_path} --tag_set serve \
                     --signature_def serving_default    \
                     --inputs {input_name}=my_mnist_tests.npy

2026-01-10 07:04:50.587730: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
Instructions for updating:
Use `tf.saved_model.load` instead.
W0110 07:04:50.603549 138241714081792 deprecation.py:50] From /usr/local/lib/python3.12/dist-packages/tensorflow/python/tools/saved_model_cli.py:716: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.saved_model.load` instead.
INFO:tensorflow:Restoring parameters from my_mnist_model/0001/variables/variables
I0110 07:04:50.628452 138241714081792 saver.py:1417] Restoring parameters from my_mnist_model/0001/variables/variables
I0000 00:00:1768028690.633095    8285 mlir_graph_optimization_pass.cc:437] MLIR V1 optimization pass is not enabled
Result for output key output_0:
[[3.3691311e-05 4.6527626e-07 5.0736702e-04 3.6078931e-03 2.8341301e-06
  2.1192979e-

In [33]:
np.round([[1.1347984e-04, 1.5187356e-07, 9.7032893e-04, 2.7640699e-03, 3.7826971e-06,
           7.6876910e-05, 3.9140293e-08, 9.9559116e-01, 5.3502394e-05, 4.2665208e-04],
          [8.2443521e-04, 3.5493889e-05, 9.8826385e-01, 7.0466995e-03, 1.2957400e-07,
           2.3389691e-04, 2.5639210e-03, 9.5886099e-10, 1.0314899e-03, 8.7952529e-08],
          [4.4693781e-05, 9.7028232e-01, 9.0526715e-03, 2.2641101e-03, 4.8766597e-04,
           2.8800720e-03, 2.2714981e-03, 8.3753867e-03, 4.0439744e-03, 2.9759688e-04]], 2)

array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 1.  , 0.  , 0.  ],
       [0.  , 0.  , 0.99, 0.01, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.97, 0.01, 0.  , 0.  , 0.  , 0.  , 0.01, 0.  , 0.  ]])

## TensorFlow Serving

Install [Docker](https://docs.docker.com/install/) if you don't have it already. Then run:

```bash
docker pull tensorflow/serving

export ML_PATH=$HOME/ml # or wherever this project is
docker run -it --rm -p 8500:8500 -p 8501:8501 \
   -v "$ML_PATH/my_mnist_model:/models/my_mnist_model" \
   -e MODEL_NAME=my_mnist_model \
   tensorflow/serving
```
Once you are finished using it, press Ctrl-C to shut down the server.

Alternatively, if `tensorflow_model_server` is installed (e.g., if you are running this notebook in Colab), then the following 3 cells will start the server:

In [34]:
os.environ["MODEL_DIR"] = os.path.split(os.path.abspath(model_path))[0]

In [35]:
%%bash --bg
nohup tensorflow_model_server \
     --rest_api_port=8501 \
     --model_name=my_mnist_model \
     --model_base_path="${MODEL_DIR}" >server.log 2>&1

In [36]:
!tail server.log

In [37]:
import json

input_data_json = json.dumps({
    "signature_name": "serving_default",
    "instances": X_new.tolist(),
})

In [38]:
repr(input_data_json)[:1500] + "..."

'\'{"signature_name": "serving_default", "instances": [[[[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]], [[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]], [[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]], [[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]], [[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0

Now let's use TensorFlow Serving's REST API to make predictions:

In [39]:
import requests

SERVER_URL = 'http://localhost:8501/v1/models/my_mnist_model:predict'
response = requests.post(SERVER_URL, data=input_data_json)
response.raise_for_status() # raise an exception in case of error
response = response.json()

In [40]:
response.keys()

dict_keys(['predictions'])

In [41]:
y_proba = np.array(response["predictions"])
y_proba.round(2)

array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 1.  , 0.  , 0.  ],
       [0.  , 0.  , 0.99, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.99, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]])

### Using the gRPC API

In [43]:
from tensorflow_serving.apis.predict_pb2 import PredictRequest

request = PredictRequest()
request.model_spec.name = model_name
request.model_spec.signature_name = "serving_default"
request.inputs[input_name].CopyFrom(tf.make_tensor_proto(X_new))

In [44]:
import grpc
from tensorflow_serving.apis import prediction_service_pb2_grpc

channel = grpc.insecure_channel('localhost:8500')
predict_service = prediction_service_pb2_grpc.PredictionServiceStub(channel)
response = predict_service.Predict(request, timeout=10.0)

In [45]:
response

outputs {
  key: "output_0"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 3
      }
      dim {
        size: 10
      }
    }
    float_val: 3.36913108e-05
    float_val: 4.65276258e-07
    float_val: 0.000507367
    float_val: 0.00360789313
    float_val: 2.83413e-06
    float_val: 0.000211929786
    float_val: 3.42504549e-08
    float_val: 0.995358407
    float_val: 4.07686275e-05
    float_val: 0.000236621214
    float_val: 0.000148913474
    float_val: 2.91367051e-05
    float_val: 0.992796361
    float_val: 0.00390521856
    float_val: 1.66162106e-09
    float_val: 0.00113858096
    float_val: 0.00182021945
    float_val: 1.12476757e-10
    float_val: 0.000161501826
    float_val: 1.30862943e-09
    float_val: 1.44608221e-05
    float_val: 0.988683641
    float_val: 0.00401918683
    float_val: 0.00115726853
    float_val: 0.000558520725
    float_val: 0.000610187592
    float_val: 0.000721733551
    float_val: 0.00233021332
    float_val: 0.001676343

Convert the response to a tensor:

In [47]:
output_name = "output_0"
outputs_proto = response.outputs[output_name]
y_proba = tf.make_ndarray(outputs_proto)
y_proba.round(2)

array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 1.  , 0.  , 0.  ],
       [0.  , 0.  , 0.99, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.99, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]],
      dtype=float32)

Or to a NumPy array if your client does not include the TensorFlow library:

In [48]:
output_name = "output_0"
outputs_proto = response.outputs[output_name]
shape = [dim.size for dim in outputs_proto.tensor_shape.dim]
y_proba = np.array(outputs_proto.float_val).reshape(shape)
y_proba.round(2)

array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 1.  , 0.  , 0.  ],
       [0.  , 0.  , 0.99, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.99, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]])

## Deploying a new model version

In [49]:
np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.Input(shape=[28, 28, 1]),
    keras.layers.Flatten(),
    keras.layers.Dense(50, activation="relu"),
    keras.layers.Dense(50, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.SGD(learning_rate=1e-2),
              metrics=["accuracy"])
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))

  super().__init__(**kwargs)


Epoch 1/10
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.6617 - loss: 1.1876 - val_accuracy: 0.9046 - val_loss: 0.3526
Epoch 2/10
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - accuracy: 0.9004 - loss: 0.3504 - val_accuracy: 0.9254 - val_loss: 0.2773
Epoch 3/10
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 4ms/step - accuracy: 0.9169 - loss: 0.2880 - val_accuracy: 0.9304 - val_loss: 0.2415
Epoch 4/10
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9265 - loss: 0.2521 - val_accuracy: 0.9370 - val_loss: 0.2166
Epoch 5/10
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 9ms/step - accuracy: 0.9348 - loss: 0.2243 - val_accuracy: 0.9434 - val_loss: 0.1972
Epoch 6/10
[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 5ms/step - accuracy: 0.9413 - loss: 0.2020 - val_accuracy: 0.9474 - val_loss: 0.1817
Epoch 7/10
[1m

In [50]:
model_version = "0002"
model_name = "my_mnist_model"
model_path = os.path.join(model_name, model_version)
model_path

'my_mnist_model/0002'

In [53]:
model.export(model_path)

Saved artifact at 'my_mnist_model/0002'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='keras_tensor_12')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  137924661670928: TensorSpec(shape=(), dtype=tf.resource, name=None)
  137924661671696: TensorSpec(shape=(), dtype=tf.resource, name=None)
  137924661671504: TensorSpec(shape=(), dtype=tf.resource, name=None)
  137924661668816: TensorSpec(shape=(), dtype=tf.resource, name=None)
  137924661672272: TensorSpec(shape=(), dtype=tf.resource, name=None)
  137924661671888: TensorSpec(shape=(), dtype=tf.resource, name=None)


In [51]:
for root, dirs, files in os.walk(model_name):
    indent = '    ' * root.count(os.sep)
    print('{}{}/'.format(indent, os.path.basename(root)))
    for filename in files:
        print('{}{}'.format(indent + '    ', filename))

my_mnist_model/
    0001/
        fingerprint.pb
        saved_model.pb
        variables/
            variables.index
            variables.data-00000-of-00001
        assets/


**Warning**: You may need to wait a minute before the new model is loaded by TensorFlow Serving.

In [54]:
import requests

SERVER_URL = 'http://localhost:8501/v1/models/my_mnist_model:predict'

response = requests.post(SERVER_URL, data=input_data_json)
response.raise_for_status()
response = response.json()

In [55]:
response.keys()

dict_keys(['predictions'])

In [56]:
y_proba = np.array(response["predictions"])
y_proba.round(2)

array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.99, 0.  , 0.  ],
       [0.  , 0.  , 0.99, 0.01, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.99, 0.  , 0.  , 0.  , 0.  , 0.  , 0.01, 0.  , 0.  ]])

# Deploy the model to Google Cloud AI Platform

Follow the instructions in the book to deploy the model to Google Cloud AI Platform, download the service account's private key and save it to the `my_service_account_private_key.json` in the project directory. Also, update the `project_id`:

In [57]:
project_id = "onyx-smoke-242003"

In [58]:
import googleapiclient.discovery

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "my_service_account_private_key.json"
model_id = "my_mnist_model"
model_path = "projects/{}/models/{}".format(project_id, model_id)
model_path += "/versions/v0001/" # if you want to run a specific version
ml_resource = googleapiclient.discovery.build("ml", "v1").projects()

DefaultCredentialsError: File my_service_account_private_key.json was not found.

In [None]:
def predict(X):
    input_data_json = {"signature_name": "serving_default",
                       "instances": X.tolist()}
    request = ml_resource.predict(name=model_path, body=input_data_json)
    response = request.execute()
    if "error" in response:
        raise RuntimeError(response["error"])
    return np.array([pred[output_name] for pred in response["predictions"]])

In [None]:
Y_probas = predict(X_new)
np.round(Y_probas, 2)

# Using GPUs

**Note**: `tf.test.is_gpu_available()` is deprecated. Instead, please use `tf.config.list_physical_devices('GPU')`.

In [59]:
#tf.test.is_gpu_available() # deprecated
tf.config.list_physical_devices('GPU')

[]

In [60]:
tf.test.gpu_device_name()

''

In [61]:
tf.test.is_built_with_cuda()

True

In [62]:
from tensorflow.python.client.device_lib import list_local_devices

devices = list_local_devices()
devices

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 2443272670311274920
 xla_global_id: -1]

# Distributed Training

In [63]:
keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)

In [64]:
def create_model():
    return keras.models.Sequential([
        keras.layers.Conv2D(filters=64, kernel_size=7, activation="relu",
                            padding="same", input_shape=[28, 28, 1]),
        keras.layers.MaxPooling2D(pool_size=2),
        keras.layers.Conv2D(filters=128, kernel_size=3, activation="relu",
                            padding="same"),
        keras.layers.Conv2D(filters=128, kernel_size=3, activation="relu",
                            padding="same"),
        keras.layers.MaxPooling2D(pool_size=2),
        keras.layers.Flatten(),
        keras.layers.Dense(units=64, activation='relu'),
        keras.layers.Dropout(0.5),
        keras.layers.Dense(units=10, activation='softmax'),
    ])

In [65]:
batch_size = 100
model = create_model()
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.SGD(learning_rate=1e-2),
              metrics=["accuracy"])
model.fit(X_train, y_train, epochs=10,
          validation_data=(X_valid, y_valid), batch_size=batch_size)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/10
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m493s[0m 894ms/step - accuracy: 0.3701 - loss: 1.8643 - val_accuracy: 0.9030 - val_loss: 0.3484
Epoch 2/10
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m502s[0m 895ms/step - accuracy: 0.8364 - loss: 0.5218 - val_accuracy: 0.9448 - val_loss: 0.1879
Epoch 3/10
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m503s[0m 897ms/step - accuracy: 0.9016 - loss: 0.3353 - val_accuracy: 0.9638 - val_loss: 0.1330
Epoch 4/10
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m488s[0m 888ms/step - accuracy: 0.9222 - loss: 0.2616 - val_accuracy: 0.9686 - val_loss: 0.1040
Epoch 5/10
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m513s[0m 908ms/step - accuracy: 0.9394 - loss: 0.2096 - val_accuracy: 0.9726 - val_loss: 0.0916
Epoch 6/10
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m492s[0m 894ms/step - accuracy: 0.9463 - loss: 0.1855 - val_accuracy: 0.9754 - val_loss: 0.0806
Epoc

<keras.src.callbacks.history.History at 0x7d711695baa0>

In [66]:
keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)

distribution = tf.distribute.MirroredStrategy()

# Change the default all-reduce algorithm:
#distribution = tf.distribute.MirroredStrategy(
#    cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())

# Specify the list of GPUs to use:
#distribution = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])

# Use the central storage strategy instead:
#distribution = tf.distribute.experimental.CentralStorageStrategy()

#if IS_COLAB and "COLAB_TPU_ADDR" in os.environ:
#  tpu_address = "grpc://" + os.environ["COLAB_TPU_ADDR"]
#else:
#  tpu_address = ""
#resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu_address)
#tf.config.experimental_connect_to_cluster(resolver)
#tf.tpu.experimental.initialize_tpu_system(resolver)
#distribution = tf.distribute.experimental.TPUStrategy(resolver)

with distribution.scope():
    model = create_model()
    model.compile(loss="sparse_categorical_crossentropy",
                  optimizer=keras.optimizers.SGD(learning_rate=1e-2),
                  metrics=["accuracy"])

In [67]:
batch_size = 100 # must be divisible by the number of workers
model.fit(X_train, y_train, epochs=10,
          validation_data=(X_valid, y_valid), batch_size=batch_size)

Epoch 1/10
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m492s[0m 894ms/step - accuracy: 0.3542 - loss: 1.9191 - val_accuracy: 0.9020 - val_loss: 0.3596
Epoch 2/10
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m500s[0m 910ms/step - accuracy: 0.8286 - loss: 0.5573 - val_accuracy: 0.9446 - val_loss: 0.2039
Epoch 3/10
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m500s[0m 909ms/step - accuracy: 0.8960 - loss: 0.3527 - val_accuracy: 0.9600 - val_loss: 0.1459
Epoch 4/10
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m495s[0m 900ms/step - accuracy: 0.9202 - loss: 0.2661 - val_accuracy: 0.9686 - val_loss: 0.1128
Epoch 5/10
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m501s[0m 912ms/step - accuracy: 0.9352 - loss: 0.2202 - val_accuracy: 0.9740 - val_loss: 0.0897
Epoch 6/10
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m493s[0m 897ms/step - accuracy: 0.9462 - loss: 0.1854 - val_accuracy: 0.9768 - val_loss: 0.0812
Epoc

<keras.src.callbacks.history.History at 0x7d70f2289010>

In [68]:
model.predict(X_new)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 243ms/step


array([[1.5360947e-10, 6.7660409e-08, 1.6691367e-06, 1.5719796e-06,
        1.4532838e-09, 5.0937308e-09, 6.3870675e-13, 9.9997663e-01,
        1.3030483e-09, 2.0081137e-05],
       [3.4333658e-07, 3.8386035e-05, 9.9995732e-01, 3.9597112e-06,
        2.2954702e-11, 2.4577980e-09, 3.9073328e-08, 1.4761659e-10,
        4.8106042e-08, 5.0022001e-15],
       [1.1794024e-06, 9.9952281e-01, 2.6604683e-05, 1.8608032e-06,
        1.5183409e-04, 1.1377758e-06, 9.9319812e-05, 1.7172418e-04,
        2.0697747e-05, 2.7802221e-06]], dtype=float32)

Custom training loop:

In [69]:
keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)

K = keras.backend

distribution = tf.distribute.MirroredStrategy()

with distribution.scope():
    model = create_model()
    optimizer = keras.optimizers.SGD()

with distribution.scope():
    dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train)).repeat().batch(batch_size)
    input_iterator = distribution.make_dataset_iterator(dataset)

@tf.function
def train_step():
    def step_fn(inputs):
        X, y = inputs
        with tf.GradientTape() as tape:
            Y_proba = model(X)
            loss = K.sum(keras.losses.sparse_categorical_crossentropy(y, Y_proba)) / batch_size

        grads = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(grads, model.trainable_variables))
        return loss

    per_replica_losses = distribution.experimental_run(step_fn, input_iterator)
    mean_loss = distribution.reduce(tf.distribute.ReduceOp.SUM,
                                    per_replica_losses, axis=None)
    return mean_loss

n_epochs = 10
with distribution.scope():
    input_iterator.initialize()
    for epoch in range(n_epochs):
        print("Epoch {}/{}".format(epoch + 1, n_epochs))
        for iteration in range(len(X_train) // batch_size):
            print("\rLoss: {:.3f}".format(train_step().numpy()), end="")
        print()

Instructions for updating:
Use the iterator's `initializer` property instead.
Instructions for updating:
use run() instead


Epoch 1/10
Loss: 0.428
Epoch 2/10
Loss: 0.333
Epoch 3/10
Loss: 0.305
Epoch 4/10
Loss: 0.290
Epoch 5/10
Loss: 0.286
Epoch 6/10
Loss: 0.282
Epoch 7/10
Loss: 0.282
Epoch 8/10
Loss: 0.282
Epoch 9/10
Loss: 0.283
Epoch 10/10
Loss: 0.281
