# 10. Kubernetes and TensorFlow Serving

We`ll deploy the clothes classification model we trained previously using Kubernetes and TensorFlow Serving.

## 10.1 Overview

- What we'll cover this week
- Two-tier architecture

![Kubernetes overview](./images/kubernetes_overview.png)

## 10.2 TensorFlow Serving

- The saved_model format
- Running TF-Serving locally with Docker
- Invoking the model from Jupyter

In [1]:
# Import necessary libraries
import tensorflow as tf
from tensorflow import keras

In [2]:
# Load the model
model = keras.models.load_model("./models/xception_v4_large_08_0.894.h5")

2024-12-11 10:10:28.603762: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1
2024-12-11 10:10:28.603794: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB
2024-12-11 10:10:28.603800: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB
2024-12-11 10:10:28.604075: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-12-11 10:10:28.604464: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


In [3]:
tf.saved_model.save(model, "./models/clothing-model")

INFO:tensorflow:Assets written to: ./models/clothing-model/assets


INFO:tensorflow:Assets written to: ./models/clothing-model/assets


In [4]:
!lsd --tree models/clothing-model

[38;5;245m[39m[38;5;6m[1m clothing-model[0m
[38;5;245m├── [39m[38;5;6m[1m assets[0m
[38;5;245m├── [39m[38;5;6m[1m variables[0m
[38;5;245m│   ├── [39m variables.data-00000-of-00001
[38;5;245m│   └── [39m variables.index
[38;5;245m├── [39m fingerprint.pb
[38;5;245m└── [39m saved_model.pb


In [30]:
!saved_model_cli show --dir clothing-model --all


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_8'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 299, 299, 3)
        name: serving_default_input_8:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense_7'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict
The MetaGraph with tag set ['serve'] contains the following ops: {'StringJoin', 'AssignVariableOp', 'AddV2', 'Mean', 'VarHandleOp

We are interested in the second signature defined in the model `signature_def['serving_default']`, which is the one that takes an image as input and returns the class probabilities. see [Model description](./model-description.txt)

Now we have to inject the parameters into a docker container

```bash
docker run -it --rm \
    --platform=linux/amd64 \
    -p 8500:8500 \
    -v "$(pwd)/models/clothing-model:/models/clothing-model/1" \
    -e MODEL_NAME="clothing-model" \
    tensorflow/serving:2.7.0
```

>[!BUG]
>
>```plaintext
> /usr/bin/tf_serving_entrypoint.sh: line 3:     7 Illegal instruction     tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"
>```

if found this [hint](https://github.com/tensorflow/serving/issues/1816#issuecomment-2445056791)

Docker release 4.35.0 (172550) for Mac introduces Docker VMM Beta, a replacement for the Apple Virtualisation Framework using Rosetta. Good news is that I can run the native TF Serving image now on.


<!-- ! HELP: Does not run under Apple Silicon -->

In [15]:
# !pip install grpcio==1.42.0 tensorflow-serving-api==2.7.0

In [16]:
# !pip install keras-image-helper

In [17]:
import grpc
import tensorflow as tf

from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

In [18]:
host = 'localhost:8500'

channel = grpc.insecure_channel(host)

stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

In [19]:
from keras_image_helper import create_preprocessor

In [20]:
preprocessor = create_preprocessor('xception', target_size=(299, 299))

In [21]:
url = 'http://bit.ly/mlbookcamp-pants'
X = preprocessor.from_url(url)

In [22]:
def np_to_protobuf(data):
    return tf.make_tensor_proto(data, shape=data.shape)

In [25]:
pb_request = predict_pb2.PredictRequest()

pb_request.model_spec.name = 'clothing-model'
pb_request.model_spec.signature_name = 'serving_default'

pb_request.inputs["input_8"].CopyFrom(np_to_protobuf(X))

In [26]:
pb_response = stub.Predict(pb_request, timeout=20.0)

In [31]:
preds = pb_response.outputs['dense_7'].float_val

In [32]:
classes = [
    'dress',
    'hat',
    'longsleeve',
    'outwear',
    'pants',
    'shirt',
    'shoes',
    'shorts',
    'skirt',
    't-shirt'
]

In [33]:
dict(zip(classes, preds))

{'dress': -1.8798644542694092,
 'hat': -4.756311416625977,
 'longsleeve': -2.35953426361084,
 'outwear': -1.0892632007598877,
 'pants': 9.903782844543457,
 'shirt': -2.8261797428131104,
 'shoes': -3.6483113765716553,
 'shorts': 3.2411556243896484,
 'skirt': -2.6120963096618652,
 't-shirt': -4.852035999298096}

## 10.3 Creating a pre-processing service

- Converting the notebook to a Python script

```bash
jupyter nbconvert --to script 10_kubernetes.ipynb
```
=> saved as [gateway.py](./gateway.py)


- Wrappping the script into a Flask app
- Testing the service with [`test.py`](./test.py)

```bash
python test.py
```

- Putting everything into Pipenv

```bash
pip install pipenv
pipenv install grpcio==1.42.0 flask gunicorn keras-image-helper
pipenv install tensorflow-protobuf==2.7.0
pipenv shell
python gateway.py
```

Did not work out of the box. See [new env with py3.8](./environment_py38.yml). Still not locking.

To not drag 1,2GB TF into docker, we only need `TensorFlow Protobuf`, new file `proto.py`
- [https://github.com/alexeygrigorev/tensorflow-protobuf](https://github.com/alexeygrigorev/tensorflow-protobuf)

## 10.4 Running everything locally with Docker-compose

- Preparing tensorflow-serving image
```bash
docker build -t zoomcamp-10-model:xception-v4-001 -f image-model.dockerfile .
```

- on macOS, with Apple Silicon (arm64) we need to build the image for the correct platform

```bash
docker build --platform=linux/amd64 -t zoomcamp-10-model:xception-v4-001 -f image-model.dockerfile .
```


- Running tensorflow-serving image
```bash
docker run -it --rm \
    -p 8500:8500 \
    zoomcamp-10-model:xception-v4-001
```

- again on macOS, with Apple Silicon (arm64) we need to run the image for the correct platform

```bash
docker run -it --rm \
    --platform=linux/amd64 \
    -p 8500:8500 \
    zoomcamp-10-model:xception-v4-001
```


- Running the service, switch __main__ code in `gateway.py`
```bash
pipenv run python gateway.py
```

- Building the gateway image
```bash
docker build -t zoomcamp-10-gateway:001 -f image-gateway.dockerfile .
```

- Running the gateway
```bash
docker run -it --rm \
    -p 9696:9696 \
    zoomcamp-10-gateway:001
```

communication between the two containers is done via docker compose -> network bridge

- Installing docker-compose on macOS with Homebrew, for other OS see [https://docs.docker.com/compose/install/](https://docs.docker.com/compose/install/)

<img src="./images/docker_compose.png" alt="docker-compose" style="width:600px;height:auto;">

```bash
brew install docker-compose
```

- Create `/bin`-folder in home directory
- move into folder, download docker-compose
- make it executable
```bash
mkdir ~/bin
cd ~/bin
curl -L <LINK> -o docker-compose
chmod +x docker-compose
```

- Add to PATH
```bash
echo 'export PATH="${HOME}/bin:${PATH}"' >> ~/.bashrc
source ~/.bashrc
```

- Create `docker-compose.yml`
```bash
docker build -t zoomcamp-10-gateway:002 -f image-gateway.dockerfile .
```

- Running the service
```bash
docker-compose up
```

- Testing the service
```bash
python test.py
```

- Ctrl+C to stop the service
- detaching the service
```bash
docker-compose up -d
```

- Stopping the service
```bash
docker-compose down
```

## 10.5 Introduction to Kubernetes

- The anatomy of a Kubernetes cluster

<img src="./images/intro_kubernetes.png" alt="intro kubernetes" style="width:600px;height:auto;">

<img src="./images/glossar.png" alt="glossar" style="width:600px;height:auto;">


## 10.6 Deploying a simple service to Kubernetes

- Create simple ping application in Flask
```bash
pipenv install flask gunicorn
```

✘ Locking Failed! Failed to lock Pipfile.lock!
on macOS

- Installing kubectl - installed with docker/docker-desktop

- Setting up a local Kubernetes cluster with Kind
```bash
brew install kind
kind create cluster
kubectl cluster-info --context kind-kind
kubectl get service
kubectl get pods
kubectl get deployments
```

- Creating a deployment
    - [deployment.yaml](./deployment.yaml)
```bash
kubectl apply -f deployment.yaml
kubectl get deployment
kubectl get pod
kubectl describe pod <pod-name>
kind load docker-image ping:v001
kubectl get pod
kubectl port-forward <pod-name> 9696:9696
curl localhost:9696/ping
```

- Creating a service
  - [service.yaml](./service.yaml)
```bash
kubectl apply -f service.yaml
kubectl get service
kubectl get svc
```

- change service type to `LoadBalancer` in `service.yaml`
```bash
kubectl apply -f service.yaml
kubectl get service
kubectl port-forward service/ping 8080:80
curl localhost:8080/ping
```

<img src="./images/port_forwarding.png" alt="port forwarding" style="width:600px;height:auto;">


## 10.7 Deploying TensorFlow models to Kubernetes

- Deploying the TF-Serving model

```bash
kind load docker-image zoomcamp-10-model:xception-v4-001
cd kube-config
kubectl apply -f model-deployment.yaml
kubectl get pod
kubectl port-forward <pod-name> 8500:8500
kubectl apply -f model-service.yaml
kubectl get service
kubectl port-forward service/tf-serving-clothing-model 8500:8500
```

- Deploying the Gateway

```bash
kind load docker-image zoomcamp-10-gateway:002
kubectl apply -f gateway-deployment.yaml
```

- Testing the service
```bash
kubectl get pod
kubectl exec -it <pod-name> -- /bash
```

- on pod
```bash
curl localhost:9696/ping
```

- Exposing the service
```bash
kubectl apply -f gateway-service.yaml
kubectl port-forward service/gateway 8080:80

## 10.8 Deploying to EKS

- Creating a EKS cluster on AWS

- Installing eksctl
```bash
eksctl create cluster -f ./kube-config/eks-config.yaml
aws ecr create-repository --repository-name mlzoomcamp-images
```

- Publishing the image to ECR
- Configuring kubectl

## 10.9 Summary

- TF-Serving is a system for deploying TensorFlow models
- When using TF-Serving, we need a component for pre-processing
- Kubernetes is a container orchestration platform
- To deploy something on Kubernetes, we need to specify a deployment and a service
- You can use Docker compose and Kind for local experiments

## 10.10 Explore more

- Other local Kuberneteses: minikube, k3d, k3s, microk8s, EKS Anywhere
- [Rancher desktop](https://rancherdesktop.io/)
- Docker desktop
- [Lens](https://k8slens.dev/)
- Many cloud providers have Kubernetes: GCP, Azure, Digital Ocean and others. Look for "Managed Kubernetes" in your favorite search engine
- Deploy the model from previous modules and from your project with Kubernetes