# Chapter 19: Training and Deploying TensorFlow Models at Scale

**Based on "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition" by Aurélien Géron**

This notebook reproduces the code from Chapter 19 and provides theoretical explanations for each concept, as required by the individual task.

## Chapter Summary

This chapter covers the crucial final steps in a machine learning project: deploying a trained model into a production environment and scaling up training for large, complex models.

1.  **Serving a TensorFlow Model:** We learn how to export a model to TensorFlow's **SavedModel** format, which is a universal, language-neutral format that includes the computation graph and weights. We then deploy this model using **TensorFlow Serving (TF Serving)**, a high-performance, production-ready serving system. We learn how to:
    * Install TF Serving using Docker.
    * Serve a model and query it using both the **REST API** (simple, JSON-based) and the **gRPC API** (high-performance, binary-based).
    * Deploy new model versions seamlessly for graceful transitions.

2.  **Deploying on the Cloud (GCP AI Platform):** We explore how to use a managed cloud service like **Google Cloud AI Platform** to serve our models. This handles all the infrastructure, scaling, and version management for us. We learn how to set up a project, create a model, and deploy a model version from a SavedModel stored in Google Cloud Storage (GCS).

3.  **Deploying to Other Platforms:** We briefly cover other deployment targets:
    * **TensorFlow Lite (TFLite):** For deploying models on **mobile and embedded devices**. This involves converting the model to a lightweight `.tflite` (FlatBuffer) format and using optimizations like **quantization** to reduce model size and latency.
    * **TensorFlow.js:** For running models directly in a **web browser**, enabling client-side ML, user privacy, and low latency.

4.  **Using GPUs:** We cover how to accelerate training using GPUs. This includes managing GPU memory (e.g., enabling memory growth to avoid grabbing all RAM at once) and how to explicitly place operations on a CPU or GPU using `tf.device()`.

5.  **Distributed Training at Scale:** For very large models or datasets, we explore how to train a model across multiple devices and servers using TensorFlow's **Distribution Strategies API**.
    * **Data Parallelism:** The main strategy, where the model is replicated on each device, and each replica processes a different batch of data.
    * **`MirroredStrategy`:** For synchronous training on all GPUs on a single machine.
    * **`MultiWorkerMirroredStrategy`:** For synchronous training across multiple machines (a TF Cluster).
    * **`ParameterServerStrategy`:** An asynchronous strategy where parameters are stored on dedicated parameter servers.

6.  **Hyperparameter Tuning on GCP:** Finally, we learn how to use **GCP AI Platform's hyperparameter tuning service** (based on Google Vizier) to run a black-box Bayesian optimization search to find the best hyperparameters for our model automatically.

## Setup

First, let's import the necessary libraries and set up our environment. We'll also train a basic MNIST model to use for deployment.

In [2]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import os
import json

# Common imports
import pandas as pd
import matplotlib.pyplot as plt

# to make this notebook's output stable across runs
np.random.seed(42)
tf.random.set_seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Prepare MNIST dataset
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()
X_train_full = X_train_full[..., np.newaxis].astype(np.float32) / 255.
X_test = X_test[..., np.newaxis].astype(np.float32) / 255.
X_valid, X_train = X_train_full[:5000], X_train_full[5000:]
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

# A simple MNIST model to be saved and served
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28, 1]),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])

model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.SGD(learning_rate=1e-2),
              metrics=["accuracy"])

# Train the model (for demonstration, we'll just train for 1 epoch)
model.fit(X_train, y_train, epochs=1, validation_data=(X_valid, y_valid))

# Set up paths
model_name = "my_mnist_model"
model_version = "0001"
model_path = os.path.join(model_name, model_version)

# Get a few test instances
X_new = X_test[:3]

  super().__init__(**kwargs)


[1m1719/1719[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - accuracy: 0.7179 - loss: 1.0888 - val_accuracy: 0.9008 - val_loss: 0.3724


## 1. Serving a TensorFlow Model

> **Theoretical Deep-Dive: Why Use a Model Server?**
>
> While you *can* just call `model.predict()` in any Python app, a dedicated model server like **TF Serving** is much better for production.
>
> 1.  **Decoupling:** Your main application (e.g., a web backend) is decoupled from the ML model. You can update the model without redeploying your entire application.
> 2.  **Performance:** TF Serving is written in C++ and highly optimized for performance. It can handle many requests per second (QPS) and leverage hardware like GPUs.
> 3.  **Scalability:** You can easily scale the number of TF Serving instances up or down to meet demand, independently of your main application.
> 4.  **Versioning:** TF Serving can manage multiple versions of a model. You can deploy a new version, test it (a "canary" release), and roll back to a previous version if it doesn't perform well.
> 5.  **Batching:** It can automatically batch requests together to get much higher throughput on a GPU.

### Exporting to SavedModel Format

To deploy a model, we first need to export it to TensorFlow's **SavedModel** format. This is a language-neutral, portable format that contains the full computation graph and all the learned weights.

In [5]:
# Saving the model to SavedModel format for TensorFlow Serving
# You can also use the model's save() method for native Keras format, but for TF Serving, export() is preferred.
model.export(model_path)

Saved artifact at 'my_mnist_model/0001'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='keras_tensor_4')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  134709011884816: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134709011887888: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134709011888080: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134709011886544: TensorSpec(shape=(), dtype=tf.resource, name=None)


This creates a directory `my_mnist_model/0001/` containing:
* `saved_model.pb`: The computation graph (as a protocol buffer).
* `variables/`: A subdirectory with all the learned weights (variables).
* `assets/`: A subdirectory for any extra files (like vocabularies, not used here).

### Inspecting the SavedModel with `saved_model_cli`

We can use the `saved_model_cli` command-line tool (which comes with TensorFlow) to inspect our model's **signatures**. A signature defines the inputs and outputs of a function in the model. Keras models default to a `serving_default` signature.

In [6]:
!saved_model_cli show --dir {model_path} --all

2025-11-16 08:47:51.748177: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1763282871.773324    1067 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1763282871.779962    1067 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1763282871.796446    1067 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1763282871.796494    1067 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1763282871.796499    1067 computation_placer.cc:177] computation placer alr

### Installing TensorFlow Serving

The easiest way to install TF Serving is with Docker. These commands are for your terminal (not this notebook).

```bash
# Download the official TF Serving image
docker pull tensorflow/serving

# Start the TF Serving container
docker run -it --rm -p 8500:8500 -p 8501:8501 \
   -v "$(pwd)/my_mnist_model:/models/my_mnist_model" \
   -e MODEL_NAME=my_mnist_model \
   tensorflow/serving
```

**Explanation of the command:**
* `-it --rm`: Makes the container interactive and cleans it up after it stops.
* `-p 8500:8500`: Forwards the host's port 8500 to the container's port 8500 (for gRPC).
* `-p 8501:8501`: Forwards the host's port 8501 to the container's port 8501 (for the REST API).
* `-v "..."`: Mounts our local model directory (`my_mnist_model`) into the container's `/models/` directory.
* `-e MODEL_NAME=...`: Tells TF Serving which model to serve from the `/models/` directory.
* `tensorflow/serving`: The name of the Docker image to run.

### Querying TF Serving via the REST API

The REST API is simple and uses JSON. It's great for general-purpose use.

In [7]:
# !pip install -q requests
import requests

# Create the JSON payload
input_data_json = json.dumps({
    "signature_name": "serving_default",
    "instances": X_new.tolist(),
})

# Define the server URL
SERVER_URL = 'http://localhost:8501/v1/models/my_mnist_model:predict'

try:
    response = requests.post(SERVER_URL, data=input_data_json)
    response.raise_for_status() # Raise an exception in case of error
    response = response.json()

    y_proba_rest = np.array(response["predictions"])
    print(y_proba_rest.round(2))

except Exception as e:
    print("Could not connect to TF Serving. Is the Docker container running?")
    print(e)

Could not connect to TF Serving. Is the Docker container running?
HTTPConnectionPool(host='localhost', port=8501): Max retries exceeded with url: /v1/models/my_mnist_model:predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7a845f95be60>: Failed to establish a new connection: [Errno 111] Connection refused'))


### Querying TF Serving via the gRPC API

The gRPC API is much more efficient. It uses Protocol Buffers (protobufs) to send binary, low-latency requests. This is preferred for high-performance, internal services.

In [9]:
!pip install -q grpcio tensorflow-serving-api
import grpc
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

try:
    channel = grpc.insecure_channel('localhost:8500')
    predict_service = prediction_service_pb2_grpc.PredictionServiceStub(channel)

    # Create the gRPC request
    request = predict_pb2.PredictRequest()
    request.model_spec.name = model_name
    request.model_spec.signature_name = "serving_default"

    # Convert the NumPy array to a TensorProto
    input_name = model.input_names[0]
    request.inputs[input_name].CopyFrom(tf.make_tensor_proto(X_new))

    # Send the request
    response_grpc = predict_service.Predict(request, timeout=10.0)

    # Convert the response (a TensorProto) back to a NumPy array
    output_name = model.output_names[0]
    outputs_proto = response_grpc.outputs[output_name]
    y_proba_grpc = tf.make_ndarray(outputs_proto)

    print(y_proba_grpc.round(2))

except Exception as e:
    print("Could not connect to TF Serving. Is the Docker container running?")
    print(e)

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m620.7/620.7 MB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.5/5.5 MB[0m [31m88.7 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-text 2.19.0 requires tensorflow<2.20,>=2.19.0, but you have tensorflow 2.20.0 which is incompatible.
tf-keras 2.19.0 requires tensorflow<2.20,>=2.19, but you have tensorflow 2.20.0 which is incompatible.
tensorflow-decision-forests 1.12.0 requires tensorflow==2.19.0, but you have tensorflow 2.20.0 which is incompatible.[0m[31m
[0mCould not connect to TF Serving. Is the Docker container running?
'Sequential' object has no attribute 'input_names'


## 2. Deploying a Model on Google Cloud AI Platform

Instead of managing our own TF Serving containers, we can use a managed service like Google Cloud AI Platform. It handles setup, versioning, and scaling for us.

> **Theoretical Deep-Dive: Cloud AI Platform Setup**
>
> 1.  **Create a GCP Project:** All resources (models, storage) live inside a project.
> 2.  **Enable Billing:** You must have an active billing account.
> 3.  **Enable APIs:** You must enable the "AI Platform Training & Prediction API" and "Google Cloud Storage API".
> 4.  **Create a GCS Bucket:** Google Cloud Storage (GCS) is where you will store your SavedModel files. You need to create a unique "bucket" name.
> 5.  **Upload Model:** You upload your `my_mnist_model/0001` directory to the GCS bucket.
> 6.  **Create AI Platform Model:** In the AI Platform console, you create a "model" resource (e.g., `my_mnist_model`).
> 7.  **Create Model Version:** Inside that model, you create a "version" (e.g., `v0001`) and point it to the GCS path of your SavedModel (e.g., `gs://your-bucket/my_mnist_model/0001`).
> 8.  **Create Service Account:** For security, you create a **service account** (an account for an application, not a person) and give it the "AI Platform Developer" role. You download its private key (a JSON file) to authenticate your client code.

### Querying the GCP AI Platform Prediction Service

Once the model is deployed on GCP, we query it using Google's client libraries. This is similar to the REST API, but it handles authentication for us.

In [10]:
# !pip install -q google-api-python-client
import googleapiclient.discovery

# --- CONFIGURATION ---
# 1. Set this environment variable to point to your downloaded service account JSON key
# os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "my_service_account_key.json"

# 2. Replace with your actual project and model IDs
project_id = "your-gcp-project-id"
model_id = "my_mnist_model"
# ---------------------

model_path = f"projects/{project_id}/models/{model_id}"
# You can also query a specific version:
# model_path = f"projects/{project_id}/models/{model_id}/versions/{model_version}"

# Create a resource object to interact with the service
try:
    ml_resource = googleapiclient.discovery.build("ml", "v1").projects()
except Exception as e:
    print("Error building Google API client. Is the library installed and GOOGLE_APPLICATION_CREDENTIALS set?")
    print(e)

def predict_gcp(X):
    # Format the request body
    input_data_json = {"signature_name": "serving_default",
                       "instances": X.tolist()}

    # Prepare and send the request
    request = ml_resource.predict(name=model_path, body=input_data_json)
    response = request.execute()

    if "error" in response:
        raise RuntimeError(response["error"])

    # Parse the response
    output_name = model.output_names[0]
    return np.array([pred[output_name] for pred in response["predictions"]])

# try:
#     Y_probas_gcp = predict_gcp(X_new)
#     print(Y_probas_gcp.round(2))
# except Exception as e:
#     print("Error querying GCP. Please check your project_id, model_id, and authentication.")
#     print(e)

*(Note: The cell above is commented out to prevent errors, as it requires a live GCP project and a valid service account key.)*

## 3. Deploying to Mobile, Embedded, and Web

### TensorFlow Lite (TFLite)

> **Theoretical Deep-Dive: TFLite and Quantization**
>
> **TFLite** is a framework for running TF models on devices with low compute power and memory, like mobile phones and microcontrollers.
>
> It uses a **TFLite Converter** to convert a SavedModel into a highly optimized `.tflite` file (a FlatBuffer). This conversion reduces the model size and latency.
>
> A key optimization technique is **quantization**. This converts the model's 32-bit floating-point weights (and optionally, activations) into 8-bit integers (or 16-bit floats).
> -   **Pros:** 4x reduction in model size, 2-4x speedup, and less power consumption.
> -   **Cons:** A small (usually acceptable) drop in accuracy.
>
> **Post-training quantization** is the simplest method: you quantize the model after training. For full-integer quantization (the fastest), you need to provide a small sample of data for calibration.

In [20]:
# Convert the SavedModel to a TFLite FlatBuffer
converter = tf.lite.TFLiteConverter.from_saved_model(model_path)
tflite_model = converter.convert()

with open("converted_model.tflite", "wb") as f:
    f.write(tflite_model)

# We can also convert a Keras model directly
# converter = tf.lite.TFLiteConverter.from_keras_model(model)
# ...

# To use post-training quantization:
converter.optimizations = [tf.lite.Optimize.DEFAULT] # or OPTIMIZE_FOR_SIZE
tflite_quantized_model = converter.convert()

print(f"Original model size: {os.path.getsize(model_path + '/saved_model.pb')/1024:.2f} KB (graph only)")
print(f"TFLite model size: {len(tflite_model)/1024:.2f} KB")
print(f"Quantized TFLite model size: {len(tflite_quantized_model)/1024:.2f} KB")

OSError: SavedModel file does not exist at: projects/your-gcp-project-id/models/my_mnist_model/{saved_model.pbtxt|saved_model.pb}

### TensorFlow.js (TF.js)

> **Theoretical Deep-Dive:**
>
> **TensorFlow.js** is a JavaScript library for training and deploying models in the browser and on Node.js.
>
> **Why use it?**
> -   **Client-Side Inference:** The model runs on the user's machine, so no server is needed. This scales infinitely and works offline.
> -   **Privacy:** Sensitive data (e.g., from a webcam) never leaves the user's browser.
> -   **Interactivity:** You can build highly interactive web applications (e.g., in-browser pose estimation).
>
> You use the `tensorflowjs_converter` tool to convert a SavedModel or Keras model into a `model.json` file (for the architecture) and a set of binary `groupX-shardXofX.bin` files (for the weights).

#### TF.js Converter (Terminal Command)

```bash
# First, install the converter
pip install tensorflowjs

# Run the converter
tensorflowjs_converter --input_format=tf_saved_model \
                       my_mnist_model/0001 \
                       my_tfjs_model
```
This creates a `my_tfjs_model` directory with `model.json` and the binary weight files.

#### TF.js Example Usage (JavaScript)

```html

<script src="[https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@latest](https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@latest)"></script>

<script>
    async function runModel() {
        // Load the model
        const model = await tf.loadLayersModel('my_tfjs_model/model.json');
        
        // Create a dummy input tensor (e.g., a 1x28x28x1 image)
        const image = tf.zeros([1, 28, 28, 1]);
        
        // Make a prediction
        const prediction = model.predict(image);
        prediction.print();
    }
    runModel();
</script>
```

## 4. Using GPUs to Speed Up Computations

### Checking for GPU Access

If you have a compatible NVIDIA GPU and the correct CUDA and cuDNN libraries installed, TensorFlow will automatically use the GPU for most operations.

In [21]:
print("Is GPU available:", tf.test.is_gpu_available())
print("GPU device name:", tf.test.gpu_device_name())
print("List of physical GPUs:", tf.config.experimental.list_physical_devices(device_type='GPU'))

Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.


Is GPU available: False
GPU device name: 
List of physical GPUs: []


### Managing GPU RAM

> **Theoretical Deep-Dive: GPU Memory Management**
>
> By default, TensorFlow tries to grab *all* available GPU memory when it starts. This is efficient, but it prevents you from running a second TF process on the same GPU.
>
> A better approach is to enable **memory growth**. This tells TensorFlow to only grab the memory it needs, *as* it needs it. This must be done right at the start of your script.

In [22]:
physical_gpus = tf.config.experimental.list_physical_devices("GPU")
if physical_gpus:
    try:
        # Set memory growth to True for all GPUs
        for gpu in physical_gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        print(f"{len(physical_gpus)} Physical GPUs found, Memory Growth set to True.")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)
else:
    print("No GPU found.")

No GPU found.


### Placing Operations on Devices

TensorFlow automatically places operations on the GPU if one is available and a GPU-kernel (implementation) exists for that op. You can manually override this using a `tf.device()` context.

In [23]:
print("Default placement:")
a = tf.Variable(42.0)
print(a.device)

print("Forced CPU placement:")
with tf.device("/cpu:0"):
    b = tf.Variable(42.0)

print(b.device)

Default placement:
/job:localhost/replica:0/task:0/device:CPU:0
Forced CPU placement:
/job:localhost/replica:0/task:0/device:CPU:0


## 5. Training Models Across Multiple Devices

For very large models, we need to distribute training across multiple GPUs, and even multiple machines.

> **Theoretical Deep-Dive: Model vs. Data Parallelism**
>
> -   **Model Parallelism:** You split *one* model across multiple devices. For example, layer 1 goes on GPU 0, layer 2 goes on GPU 1. This is complex and often inefficient due to communication bottlenecks (layer 2 must wait for layer 1).
> -   **Data Parallelism (Recommended):** You replicate the *entire* model on each device. Each replica processes a different mini-batch of data. The gradients from all replicas are then aggregated (e.g., averaged), and the model parameters on all replicas are updated. This is much easier to implement and scales well.

### The Distribution Strategies API

TensorFlow's `tf.distribute` API makes data parallelism easy. You just create a **strategy** object and build/compile your model inside its `scope()`.

#### `MirroredStrategy`: Training on one machine with multiple GPUs

In [25]:
# Create a strategy to use all available GPUs
distribution = tf.distribute.MirroredStrategy()

# You can also specify which GPUs to use:
# distribution = tf.distribute.MirroredStrategy(["/gpu:0", "/gpu:1"])

with distribution.scope():
    mirrored_model = keras.models.Sequential([
        keras.layers.Dense(100, activation="relu", input_shape=[28*28]),
        keras.layers.Dense(10, activation="softmax")
    ])
    mirrored_model.compile(loss="sparse_categorical_crossentropy",
                             optimizer=keras.optimizers.SGD(learning_rate=1e-2),
                             metrics=["accuracy"])

# Flatten the data for this simple model
X_train_flat = X_train.reshape(-1, 28*28)

# The batch size should be divisible by the number of replicas (GPUs)
# e.g., if you have 2 GPUs, use a batch_size of 32, 64, etc.
batch_size = 64
history = mirrored_model.fit(X_train_flat, y_train, epochs=5, batch_size=batch_size)

# Saving the model saves a single, non-distributed version
# mirrored_model.save("my_mirrored_model.h5")

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m860/860[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 9ms/step - accuracy: 0.6191 - loss: 1.3878
Epoch 2/5
[1m860/860[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 6ms/step - accuracy: 0.8806 - loss: 0.4599
Epoch 3/5
[1m860/860[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - accuracy: 0.8984 - loss: 0.3709
Epoch 4/5
[1m860/860[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.9078 - loss: 0.3304
Epoch 5/5
[1m860/860[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 5ms/step - accuracy: 0.9139 - loss: 0.3044


#### `MultiWorkerMirroredStrategy`: Training on a Multi-Machine Cluster

> **Theoretical Deep-Dive: TF Cluster**
>
> A TensorFlow Cluster is a group of TF processes (tasks) running on different machines.
> -   `worker`: A task that performs computations (training).
> -   `chief`: A special worker (usually `worker:0`) that also handles tasks like saving checkpoints and writing TensorBoard logs.
> -   `ps`: A **Parameter Server** task that only stores and updates variables (used by the `ParameterServerStrategy`).
>
> To configure a cluster, you must set the `TF_CONFIG` environment variable on *each machine* before it starts. This JSON variable tells the task what the cluster looks like (all task addresses) and what its own role is (e.g., `{"type": "worker", "index": 1}`).
>
> `MultiWorkerMirroredStrategy` implements synchronous data parallelism across all workers. It's the multi-machine equivalent of `MirroredStrategy`.

In [26]:
# This code would be run on every machine in the cluster.
# TF_CONFIG would be set as an environment variable before running the script.

# Example TF_CONFIG for Worker 0:
# os.environ["TF_CONFIG"] = json.dumps({
#     "cluster": {
#         "worker": ["machine-a.example.com:2222", "machine-b.example.com:2222"]
#     },
#     "task": {"type": "worker", "index": 0}
# })

# Example TF_CONFIG for Worker 1:
# os.environ["TF_CONFIG"] = json.dumps({
#     "cluster": {
#         "worker": ["machine-a.example.com:2222", "machine-b.example.com:2222"]
#     },
#     "task": {"type": "worker", "index": 1}
# })

# --- The Python script (run on all workers) ---

# distribution_multi = tf.distribute.experimental.MultiWorkerMirroredStrategy()

# with distribution_multi.scope():
#     multi_worker_model = keras.models.Sequential([...]) # same model as before
#     multi_worker_model.compile([...])

# history = multi_worker_model.fit(X_train_flat, y_train, epochs=10, batch_size=64)

print("MultiWorkerMirroredStrategy setup (conceptual)")

MultiWorkerMirroredStrategy setup (conceptual)


## 6. Running Training Jobs on GCP AI Platform

AI Platform can manage the entire cluster for you. You package your code and use the `gcloud` command-line tool to submit a training job.

#### Submitting a Training Job (Terminal Command)

```bash
gcloud ai-platform jobs submit training my_job_20251116_150000 \
    --region us-central1 \
    --scale-tier PREMIUM_1 \
    --runtime-version 2.0 \
    --python-version 3.7 \
    --package-path ./my_project/src/trainer \
    --module-name trainer.task \
    --staging-bucket gs://my-staging-bucket \
    --job-dir gs://my-model-bucket/trained_model \
    -- \
    --my-extra-argument1 foo
```

This command packages the code from `my_project/src/trainer`, uploads it to a `staging-bucket`, and runs the `trainer.task` module on a pre-configured cluster (`PREMIUM_1`).

### Black Box Hyperparameter Tuning on AI Platform

> **Theoretical Deep-Dive: Bayesian Hyperparameter Optimization**
>
> Instead of grid search or random search, GCP's **Hyperparameter Tuning Service (Google Vizier)** uses a more intelligent, black-box Bayesian optimization approach.
>
> It works like this:
> 1.  It tries a few random combinations of hyperparameters.
> 2.  It uses the results to build a probabilistic model of how the hyperparameters relate to the model's performance (the `hyperparameterMetricTag`).
> 3.  It uses this model to intelligently choose the *next* set of hyperparameters that are most likely to yield an improvement.
> 4.  It repeats this process, getting smarter with each trial.
>
> You define the search in a YAML config file.

#### Example `tuning.yaml` file

```yaml
trainingInput:
  hyperparameters:
    goal: MAXIMIZE
    hyperparameterMetricTag: accuracy # Must match the name in your TensorBoard logs
    maxTrials: 10
    maxParallelTrials: 2
    params:
      - parameterName: n_layers
        type: INTEGER
        minValue: 1
        maxValue: 5
      - parameterName: learning_rate
        type: DOUBLE
        minValue: 1e-4
        maxValue: 1e-2
        scaleType: UNIT_LOG_SCALE
```

Your training script just needs to:
1.  Accept arguments like `--n_layers` and `--learning_rate`.
2.  Log the target metric (`accuracy`) using a `TensorBoard` callback. The service will automatically read these logs.

# Task
```python
import grpc
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

try:
    channel = grpc.insecure_channel('localhost:8500')
    predict_service = prediction_service_pb2_grpc.PredictionServiceStub(channel)

    # Create the gRPC request
    request = predict_pb2.PredictRequest()
    request.model_spec.name = model_name
    request.model_spec.signature_name = "serving_default"

    # Convert the NumPy array to a TensorProto
    # Fix: Use explicit input name 'keras_tensor_4' as identified by saved_model_cli
    input_name = 'keras_tensor_4'
    request.inputs[input_name].CopyFrom(tf.make_tensor_proto(X_new))

    # Send the request
    response_grpc = predict_service.Predict(request, timeout=10.0)

    # Convert the response (a TensorProto) back to a NumPy array
    # Fix: Use explicit output name 'output_0' as identified by saved_model_cli
    output_name = 'output_0'
    outputs_proto = response_grpc.outputs[output_name]
    y_proba_grpc = tf.make_ndarray(outputs_proto)

    print(y_proba_grpc.round(2))

except Exception as e:
    print("Could not connect to TF Serving. Is the Docker container running?")
    print(e)
```