# Pipeline example with OpenVINO inference execution engine 

This notebook illustrates how you can serve ensemble of models using [OpenVINO prediction model](https://github.com/SeldonIO/seldon-core/tree/master/examples/models/openvino_imagenet_ensemble/resources/model).
The demo includes optimized ResNet50 and DenseNet169 models by OpenVINO model optimizer. 
They have [reduced precision](https://www.edge-ai-vision.com/2019/02/introducing-int8-quantization-for-fast-cpu-inference-using-openvino/) of graph operations from FP32 to INT8. It significantly improves the execution peformance with minimal impact on the accuracy. The gain is particularly visible with the latest Casade Lake CPU with [VNNI](https://www.intel.com/content/www/us/en/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html#inpage-nav-1) extension.

![pipeline](pipeline1.png)

## Setup Seldon Core

Use the setup notebook to [Setup Cluster](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html#Setup-Cluster) with [Ambassador Ingress](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html#Ambassador) and [Install Seldon Core](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html#Install-Seldon-Core). Instructions [also online](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html).

In [1]:
!kubectl create namespace seldon

Error from server (AlreadyExists): namespaces "seldon" already exists


## Deploy Seldon pipeline with Intel OpenVINO models ensemble

 * Ingest compressed JPEG binary and transform to TensorFlow Proto payload
 * Ensemble two OpenVINO optimized models for ImageNet classification: ResNet50, DenseNet169
 * Return result in human readable text



In [6]:
!pygmentize seldon_ov_predict_ensemble.yaml

[94mapiVersion[39;49;00m:[37m [39;49;00mmachinelearning.seldon.io/v1[37m[39;49;00m
[94mkind[39;49;00m:[37m [39;49;00mSeldonDeployment[37m[39;49;00m
[94mmetadata[39;49;00m:[37m[39;49;00m
[37m  [39;49;00m[94mannotations[39;49;00m:[37m[39;49;00m
[37m[39;49;00m
[37m  [39;49;00m[94mcreationTimestamp[39;49;00m:[37m [39;49;00m[33m"[39;49;00m[33m2022-06-06T17:06:06Z[39;49;00m[33m"[39;49;00m[37m[39;49;00m
[37m  [39;49;00m[94mgeneration[39;49;00m:[37m [39;49;00m1[37m[39;49;00m
[37m  [39;49;00m[94mlabels[39;49;00m:[37m[39;49;00m
[37m    [39;49;00m[94mapp[39;49;00m:[37m [39;49;00mseldon[37m[39;49;00m
[37m  [39;49;00m[94mname[39;49;00m:[37m [39;49;00mopenvino-model[37m[39;49;00m
[37m  [39;49;00m[94mnamespace[39;49;00m:[37m [39;49;00mseldon[37m[39;49;00m
[37m  [39;49;00m[94mresourceVersion[39;49;00m:[37m [39;49;00m[33m"[39;49;00m[33m8442[39;49;00m[33m"[39;49;00m[37m[39;49;00m
[94mspec[39;49;00m:[37m[39;49;0

In [8]:
!kubectl apply -f seldon_ov_predict_ensemble.yaml -n seldon

seldondeployment.machinelearning.seldon.io/openvino-model created


In [9]:
!kubectl wait sdep/openvino-model \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/openvino-model condition met


### Using the exemplary grpc client

Install client dependencies: seldon-core and grpcio packages

In [10]:
!pip install -q seldon-core grpcio

In [11]:
!python seldon_grpc_client.py --debug --ambassador localhost:8003

2025-11-18 17:10:12.104370: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1763485812.121468 3801804 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1763485812.126530 3801804 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-11-18 17:10:12.144078: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
strData: "Eskimo dog, husky"

Duration 79.081 ms
strData: "zebra"

Duration 47.917 ms
strData: "pelican"

Duration 42.088 ms


For more extensive test see the client help.

You can change the default test-input file including labeled list of images to calculate accuracy based on complete imagenet dataset. Follow the format from file `input_images.txt` - path to the image and imagenet class in every line.

In [12]:
!python seldon_grpc_client.py --help

2025-11-18 17:10:21.403986: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1763485821.420783 3802566 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1763485821.426170 3802566 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-11-18 17:10:21.444000: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
usage: seldon_grpc_client.py [-h] [--repeats REPEATS] [--debug]
                             [--test-input TEST_INPUT]
      

## Examining the logs

You can use Seldon containers logs to get additional details about the execution:


In [13]:
!kubectl logs $(kubectl get pods -l seldon-app=openvino-model-openvino -o jsonpath='{.items[0].metadata.name}') prediction1 --tail=10

Trying to download  gs://public-artifacts/intelai_public_models/densenet_169/1/densenet_169_i8.bin
path object /tmp/densenet_169_i8.xml
E1118 17:09:44.818908574      14 socket_utils_common_posix.cc:197] check for SO_REUSEPORT: {"created":"@1763485784.818892864","description":"SO_REUSEPORT unavailable on compiling system","file":"src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":165}
2025-11-18 17:09:44,820 - seldon_core.microservice:grpc_prediction_server:246 - INFO:  GRPC microservice Running on port 9503
2025-11-18 17:10:14,450 - Prediction - DEBUG - Processing time: 29.69 ms
2025-11-18 17:10:14,450 - Prediction:predict:104 - DEBUG:  Processing time: 29.69 ms
2025-11-18 17:10:14,498 - Prediction - DEBUG - Processing time: 19.57 ms
2025-11-18 17:10:14,498 - Prediction:predict:104 - DEBUG:  Processing time: 19.57 ms
2025-11-18 17:10:14,541 - Prediction - DEBUG - Processing time: 18.25 ms
2025-11-18 17:10:14,541 - Prediction:predict:104 - DEBUG:  Processing time: 18.25 ms


In [14]:
!kubectl logs $(kubectl get pods -l seldon-app=openvino-model-openvino -o jsonpath='{.items[0].metadata.name}') prediction2 --tail=10

Trying to download  gs://public-artifacts/intelai_public_models/resnet_50_i8/1/resnet_50_i8.bin
path object /tmp/resnet_50_i8.xml
E1118 17:09:45.491524993      14 socket_utils_common_posix.cc:197] check for SO_REUSEPORT: {"created":"@1763485785.491512553","description":"SO_REUSEPORT unavailable on compiling system","file":"src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":165}
2025-11-18 17:09:45,492 - seldon_core.microservice:grpc_prediction_server:246 - INFO:  GRPC microservice Running on port 9504
2025-11-18 17:10:14,440 - Prediction - DEBUG - Processing time: 20.77 ms
2025-11-18 17:10:14,440 - Prediction:predict:104 - DEBUG:  Processing time: 20.77 ms
2025-11-18 17:10:14,493 - Prediction - DEBUG - Processing time: 12.97 ms
2025-11-18 17:10:14,493 - Prediction:predict:104 - DEBUG:  Processing time: 12.97 ms
2025-11-18 17:10:14,533 - Prediction - DEBUG - Processing time: 11.07 ms
2025-11-18 17:10:14,533 - Prediction:predict:104 - DEBUG:  Processing time: 11.07 ms


In [15]:
!kubectl logs $(kubectl get pods -l seldon-app=openvino-model-openvino -o jsonpath='{.items[0].metadata.name}') imagenet-itransformer --tail=10

2025-11-18 17:10:14,388 - ImageNetTransformer:transform_input_grpc:43 - INFO:  jpeg preprocessing: 1.905 ms
2025-11-18 17:10:14,392 - ImageNetTransformer:transform_input_grpc:50 - INFO:  Total transformation: 5.119000000000001 ms
2025-11-18 17:10:14,461 - ImageNetTransformer:transform_input_grpc:33 - INFO:  Transform called
2025-11-18 17:10:14,463 - ImageNetTransformer:transform_input_grpc:40 - INFO:  Shape: (1, 3, 224, 224); Dtype: float64; Min: 0.0; Max: 255.0
2025-11-18 17:10:14,463 - ImageNetTransformer:transform_input_grpc:43 - INFO:  jpeg preprocessing: 1.277 ms
2025-11-18 17:10:14,466 - ImageNetTransformer:transform_input_grpc:50 - INFO:  Total transformation: 4.213 ms
2025-11-18 17:10:14,508 - ImageNetTransformer:transform_input_grpc:33 - INFO:  Transform called
2025-11-18 17:10:14,510 - ImageNetTransformer:transform_input_grpc:40 - INFO:  Shape: (1, 3, 224, 224); Dtype: float64; Min: 0.0; Max: 255.0
2025-11-18 17:10:14,510 - ImageNetTransformer:transform_input_grpc:43 - INFO: 

## Performance consideration

In production environment with a shared workloads, you might consider contraining the CPU resources for individual pipeline components. You might restrict the assigned capacity using [Kubernetes capabilities](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/). This configuration can be added to seldon pipeline definition.

Another option for tuning the resource allocation is adding environment variable `OMP_NUM_THREADS`. It can indicate how many threads will be used by OpenVINO execution engine and how many CPU cores can be consumed. The recommended value is equal to the number of allocated CPU physical cores.

In the tests using GKE service in Google Cloud on nodes with 32 SkyLake vCPU assigned, the following configuration was set on prediction components. It achieved the optimal latency and throughput:
```
"resources": {
  "requests": {
     "cpu": "1"
  },
  "limits": {
     "cpu": "32"
  }
}

"env": [
  {
    "name": "KMP_AFFINITY",
    "value": "granularity=fine,verbose,compact,1,0"
  },
  {
    "name": "KMP_BLOCKTIME",
    "value": "1"
  },
  {
    "name": "OMP_NUM_THREADS",
    "value": "8"
  }
]
```

In [17]:
!kubectl delete -f seldon_ov_predict_ensemble.yaml -n seldon

seldondeployment.machinelearning.seldon.io "openvino-model" deleted
