diff --git a/source/_figures/stacks/Locust_charts.png b/source/_figures/stacks/Locust_charts.png new file mode 100644 index 000000000..27e52ecac Binary files /dev/null and b/source/_figures/stacks/Locust_charts.png differ diff --git a/source/_figures/stacks/Locust_exception.png b/source/_figures/stacks/Locust_exception.png new file mode 100644 index 000000000..07303c485 Binary files /dev/null and b/source/_figures/stacks/Locust_exception.png differ diff --git a/source/_figures/stacks/Locust_failures.png b/source/_figures/stacks/Locust_failures.png new file mode 100644 index 000000000..388af55d4 Binary files /dev/null and b/source/_figures/stacks/Locust_failures.png differ diff --git a/source/_figures/stacks/Locust_statistics.png b/source/_figures/stacks/Locust_statistics.png new file mode 100644 index 000000000..0073dfcc8 Binary files /dev/null and b/source/_figures/stacks/Locust_statistics.png differ diff --git a/source/_figures/stacks/htop.png b/source/_figures/stacks/htop.png new file mode 100755 index 000000000..89e509406 Binary files /dev/null and b/source/_figures/stacks/htop.png differ diff --git a/source/_figures/stacks/kubeflow-seldon-dlrs-example-diagram.png b/source/_figures/stacks/kubeflow-seldon-dlrs-example-diagram.png new file mode 100644 index 000000000..64d602193 Binary files /dev/null and b/source/_figures/stacks/kubeflow-seldon-dlrs-example-diagram.png differ diff --git a/source/guides/stacks/dlrs-inference.rst b/source/guides/stacks/dlrs-inference.rst new file mode 100644 index 000000000..fdcb9febe --- /dev/null +++ b/source/guides/stacks/dlrs-inference.rst @@ -0,0 +1,1047 @@ +.. _dlrs-inference: + +AI Inference with the Deep Learning Reference Stack +################################################### + +In this guide walk through a solution for using the Deep Learning Reference Stack with a Seldon Core\* platform deployed on Kubernetes\*. Seldon Core simplifies deployment of the models we create and use with the Deep Learning Reference Stack. Use this guide to set up your infrastructure and deploy a benchmarking workload on your Kubernetes cluster. + +.. contents:: + :local: + :depth: 1 + +Overview +******** + +.. figure:: /_figures/stacks/kubeflow-seldon-dlrs-example-diagram.png + :alt: Example diagram with DLRS deployed by Seldon + :width: 800 + + +The solution covered here requires the following software components: + +* `Deep Learning Reference Stack`_ which is a |CL-ATTR| based Docker\* container providing deep learning frameworks and is optimized for Intel Xeon Scalable platforms. +* `Kubeflow`_ is the machine learning toolkit for Kubernetes that helps with deployment of Seldon Core and Istio components. +* `Seldon Core`_ is a software platform for deploying machine learning models. We use the DLRS container to serve the OpenVino\* framework for inference with the Seldon Core. +* The OpenVino Model Server is included in DLRS and provides the OpenVino framework for inference. From OpenVino, the `OpenVino Toolkit`_ provides improved neural network performance on a variety of Intel processors. For this guide, we converted pre-trained Caffe models into the `Intermediate Representation(IR)`_ of ResNet50 with the OpenVino toolkit. +* `Istio`_ is a traffic manager and performs the load balancing for service requests in the cluster. +* Pre-processing container converts jpeg content into NumPy array +* Post-processing container converts an array of classification probabilities to a human-readable class name. +* Pre and post-processing containers are created using the `Source-to-Image`_ (S2I)toolkit, which builds reproducible container images from source code/ +* `Min.io`_ is used as the distributed object storage for models + + + +Prerequisites +************* + +Although this guide assumes a |CL| host system, it has also been validated with the following software components. + +.. list-table:: **Table 1. Software Component Versions** + :widths: 16,16 + :header-rows: 1 + + * - Component + - Version + + * - DLRS + - 0.4.0 + + * - Docker + - 18.09 + + * - Kubernetes + - 1.15.3 + + * - Source-to-Image + - 1.1.14 + + * - Helm + - 2.14.3 + + * - Kubeflow + - 0.6.1 + + * - Seldon + - 0.3.2 + + * - Rook + - 1.0.5 + + * - Ceph + - 14.2.1-20190430 + + * - Minio + - RELEASE.2019-04-23T23-50-36Z + + * - CentOS + - 7.6 + + * - OpenVINO Toolkit + - 2019 R1.0.1 + + * - MKL-DNN + - 0.19 + +Recommended Hardware +==================== + +We validated this guide on an `Intel Cascade Lake`_ server and this is recommended to get optimal performance and take advantage of the built in IntelĀ® Deep Learning Boost functionality. + +Required Software +================= + +#. :ref:`Install ` |CL| on your host system + + +#. Install the :command:`containers-basic` and :command:`cloud-native-basic` bundles: + + .. code-block:: bash + + sudo swupd bundle-add containers-basic cloud-native-basic + + +#. Start Docker + + Docker is not started upon installation of the :command:`containers-basic` bundle. To start Docker, enter: + + .. code-block:: bash + + sudo systemctl start docker + + +#. Install and configure :ref:`kubernetes`. + + + + + + + +.. note:: + + The Deep Learning Reference Stack was developed to provide the best user experience when executed on a |CL| host. However, as the stack runs in a container environment, you should be able to complete the following sections of this guide on other Linux* distributions, provided they comply with the Docker\* and Kubernetes\* package versions listed above. Look for your distribution documentation on how to update packages and manage Docker services. + + For other systems, please install the following software + + * `Docker 18.09`_ + * `Kubernetes 1.15.3`_ + + +Infrastructure Set-Up +********************* + +Environment +=========== + +Throughout this guide we will refer to the DEPLOY_DIR environment variable. DEPLOY_DIR is a pointer to the current directory with all resources used as the installation directory. Set it as follows + +.. code-block:: bash + + DEPLOY_DIR=`pwd` + +Deployment Tools +================ + +Source-to-Image (S2i) +--------------------- + +S2i is a tool for building artifacts from source and injecting them into Docker images. We use S2i to build the Imagenet transformer. Install it: + +.. code-block:: bash + + wget https://github.com/openshift/source-to-image/releases/download/v1.1.14/source-to-image-v1.1.14-874754de-linux-amd64.tar.gz + tar -zxvf source-to-image-v1.1.14-874754de-linux-amd64.tar.gz + mv -f -t /usr/local/bin/ sti s2i + rm -f source-to-image-v1.1.14-874754de-linux-amd64.tar.gz + chmod +x /usr/local/bin/sti + chmod +x /usr/local/bin/s2i + +kfctl +----- + +`kfctl` is a client used to control and deploy the Kubeflow platform. Install with: + +.. code-block:: bash + + wget https://github.com/kubeflow/kubeflow/releases/download/v0.6.1/kfctl_v0.6.1_linux.tar.gz + tar -zxvf kfctl_v0.6.1_linux.tar.gz + rm -f kfctl_v0.6.1_linux.tar.gz + mv -f kfctl /usr/local/bin/ + chmod +x /usr/local/bin/kfctl + +Minio +----- + +The Minio client is compatible with object cloud storage services. We use it to manage buckets and files stored in Minio storage. Install with: + +.. code-block:: bash + + wget https://dl.min.io/client/mc/release/linux-amd64/mc + mv mc /usr/local/bin/ + chmod +x /usr/local/bin/mc + + +Helm +---- + +Helm is used to deploy components on Kubernetes clusters. Helm is included in the :file:`cloud-native-basic` bundle in |CL| and can be installed with + +.. code-block:: bash + + sudo swupd bundle-add cloud-native-basic + +If you are not using a |CL| host, install with: + +.. code-block:: bash + + wget https://get.helm.sh/helm-v2.14.3-linux-amd64.tar.gz + tar -zxvf helm-v2.14.3-linux-amd64.tar.gz + rm -f helm-v2.14.3-linux-amd64.tar.gz + mv linux-amd64/helm /usr/local/bin/helm + +Regardless of your host OS, initialize Helm as follows: + +.. code-block:: bash + + helm init + kubectl create serviceaccount --namespace kube-system tiller + kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller + kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}' + + +gsutil +------ + +:file:`gsutil` is a client utility to work with Google Cloud\* storage. Follow the instructions to install `gsutil`_ . With the initialized Google Cloud storage command line interface, we will be able to download ResNet50 models, which we will use for model serving. + + +Rook +---- + +Rook.io is used to deploy Minio and Ceph. Clone the GitHub\* repository: + +.. code-block:: bash + + git clone -b release-1.0 https://github.com/rook/rook.git + + +.. todo: ADD CORRECT GITHUB LINK FOR ai-inferencing REPO + +AI Inferencing +-------------- + +This guide is based on the code in the IntelSolutionDev Ai Inferencing repository. Clone the repository + +.. code-block:: bash + + git clone https:// + + + + +Platform Backends +================= + +Ceph +---- + +#. Deploy Ceph Rook Operator + + + The Rook Operator is used to deploy the remaining Rook Ceph components. Deploy it: + + .. code-block:: bash + + cd $DEPLOY_DIR + cd rook/cluster/examples/kubernetes/ceph + kubectl create -f common.yaml + kubectl create -f operator.yaml + kubectl -n rook-ceph get pods # wait for rook-ceph-operator pod + +#. Deploy Rook Ceph Cluster + + + The Rook Ceph cluster is used for block storage for all platform components. You will need to modify the :file:`cluster.yaml` for your requirements. For this guide, we will prepare a cluster with 3 mons, and we will store data in :file:`/var/lib/rook` on all nodes. Modify the file: + + .. code-block:: yaml + + apiVersion: ceph.rook.io/v1 + kind: CephCluster + metadata: + name: rook-ceph + namespace: rook-ceph + spec: + cephVersion: + image: ceph/ceph:v14.2.1-20190430 + allowUnsupported: false + dataDirHostPath: /var/lib/rook + mon: + count: 3 + allowMultiplePerNode: false + dashboard: + enabled: true + network: + hostNetwork: false + rbdMirroring: + workers: 0 + annotations: + resources: + storage: + useAllNodes: true + useAllDevices: false + deviceFilter: + location: + config: + directories: + - path: /var/lib/rook + + After modifying the :file:`cluster.yaml`, run: + + .. code-block:: bash + + kubectl create -f cluster.yaml + kubectl -n rook-ceph get pods #wait for osd pods + kubectl create -f toolbox.yaml + kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" + kubectl create -f storageclass.yaml + kubectl patch storageclass rook-ceph-block -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' + + To verify the setup is correct, run: + + .. code-block:: bash + + kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') ceph status + + The command should return: + + .. code-block:: console + + HEALTH_OK + +#. Troubleshooting + + If you see a warning related to undersized PGs you need to increase the number of PGs using these commands: + + First get number of PGs: + + .. code-block:: bash + + ceph osd pool get replicapool pg_num + + Then double the number of pgs (for example from 300 to 600): + + .. code-block:: bash + + ceph osd pool set replicapool pg_num 600 + ceph osd pool set replicapool pgp_num 600 + +Minio +----- + +The Minio cluster is used as object storage for all components in the platform. Deploy it: + +.. code-block:: bash + + cd $DEPLOY_DIR + cd rook/cluster/examples/kubernetes/minio + kubectl create -f operator.yaml + kubectl -n rook-minio-system get pods # wait for rook-minio-operator pod + kubectl create -f object-store.yaml + +.. note:: + + Minio pods will not start if you are using a proxy in your environment. Please check the proxy settings in the :file:`/etc/kubernetes/manifests/kube-apiserver.yaml`. The `.local,.svc,.nip.io` line should be set to `no_proxy`. + +Docker registry +--------------- + +This Docker registry will be used for all platform components. We will use helm to set up the registry as shown: + +.. code-block:: bash + + cd $DEPLOY_DIR + cd ai-inferencing/infra + helm install --namespace registry --name registry stable/docker-registry -f registry-values.yaml + +Verify the registry setup + +.. code-block:: bash + + REGISTRY_URL=`kubectl get svc -n registry | grep NodePort | awk '{ print $3; }'`.nip.io:5000 + + +Create the Machine Learning Platform +==================================== + +The machine learning platform for this guide is built using the Kubeflow Toolkit from which we use the Seldon-core and Istio components. + +#. Prepare the definition files + + First we will get the configuration file for Istio + + .. code-block:: bash + + cd $DEPLOY_DIR + wget https://raw.githubusercontent.com/kubeflow/kubeflow/v0.6.1/bootstrap/config/kfctl_k8s_istio.yaml + sed -i 's/master.tar.gz/v0.6.1.tar.gz/g' kfctl_k8s_istio.yaml + kfctl init kubeflow --config=$(pwd)/kfctl_k8s_istio.yaml -V + cd kubeflow + kfctl generate all -V + +#. Edit :file:`kustomize/seldon-core-operator/base/statefulset.yaml` to change the version to `0.3.2-SNAPSHOT`. + +#. Edit :file:`kustomize/istio-install/base/istio-noauth.yaml` to change limits for the istio-pilot deployment as shown: + + .. code-block:: yaml + + resources: + limits: + cpu: 1000m + memory: 1000Mi + + This will correct a performance issue which results in istio-pilot causing crashes with multiple Seldon deployments start simultaneously. + + .. note:: + + If istio cannot start because of an OOM (Out of Memory) error, change the limits of all istio-system deployments. The Default settings should be enough for a small cluster (32GB RAM and less). + + + +#. Install the Kubeflow components and wait for all pods in the Kubeflow and istio-system namespace to start. + + .. code-block:: bash + + kfctl apply all -V + +#. Run + + .. code-block:: bash + + kubectl label namespace kubeflow istio-injection=enabled + + kubectl apply -f - < Dockerfile + FROM clearlinux/stacks-dlrs-mkl:v0.4.0 + COPY serve.sh /workspace/scripts/serve.sh + EOF + +#. Create the :file:`serve.sh` file + + .. code-block:: bash + + cat < serve.sh + #!/bin/bash + # temporary workaround + PY_PATH="/usr/local/lib/openvino/inference_engine/:/usr/local/lib" + echo "export PYTHONPATH=\${PY_PATH}" >>/.bashrc + source ~/.bashrc + + # start the model server + cd /ie_serving_py + exec "\$@" + EOF + +#. Make :file:`serve.sh` executable + + .. code-block:: bash + + chmod +x serve.sh + +#. Build the new docker image + + .. code-block:: bash + + REGISTRY_URL=`kubectl get svc -n registry | grep NodePort | awk '{ print $3; }'`.nip.io:5000 + sudo docker build -t ${REGISTRY_URL}/dlrs-mkl-fixed:v0.4.0 . + +#. Upload the image to the registry + + .. code-block:: bash + + sudo docker push ${REGISTRY_URL}/dlrs-mkl-fixed:v0.4.0 + + +Deploy Using Helm with Seldon +============================= + +At this point you are ready to go. Use the Helm chart with Seldon for deployment: + +.. code-block:: bash + + helm install \ + --namespace kubeflow \ + --name seldonovms-server-res \ + --set transformer.image=$REGISTRY_URL/imagenet_transformer:0.1 \ + --set openvino.image=$REGISTRY_URL/dlrs-mkl-fixed:v0.4.0 \ + ai-inferencing/seldon + +Verify that all pods are in the `Running` state: + +.. code-block:: bash + + kubectl -n kubeflow get pods -l version=openvino + +You have now created the inference infrastructure! + + + +Secure Communication +==================== + +You can optionally set up secure communication between the clients and the server. This is not required for completing this guide, but we will walk through it for completeness. + +For this example we will use `10.0.0.1.nip.io` for our domain name. + +#. Clone the repository + + .. code-block:: bash + + git clone https://github.com/nicholasjackson/mtls-go-example + +#. Generate the certificates. + + This script will generate four directories: 1_root, 2_intermediate, 3_application, and 4_client containing the client and server certificates that will be used in the following procedures. When prompted, select `y` for all questions. + + .. code-block:: bash + + cd mtls-go-example + ./generate.sh 10.0.0.1.nip.io password + mkdir 10.0.0.1.nip.io && mv 1_root 2_intermediate 3_application 4_client 10.0.0.1.nip.io + +#. Create a Kubernetes secret to hold the server's certificate and private key. + + We'll use :command:`kubectl` to create the secret istio-ingressgateway-certs in namespace istio-system. The Istio gateway will load the secret automatically. + + .. code-block:: bash + + kubectl create -n istio-system secret tls istio-ingressgateway-certs --key 10.0.0.1.nip.io/3_application/private/10.0.0.1.nip.io.key.pem --cert 10.0.0.1.nip.io/3_application/certs/10.0.0.1.nip.io.cert.pem + +#. Verify that :file:`tls.crt` and :file:`tls.key` have been mounted in the ingress gateway pod + + .. code-block:: bash + + kubectl exec -it -n istio-system $(kubectl -n istio-system get pods -l istio=ingressgateway -o jsonpath='{.items[0].metadata.name}') -- ls -al /etc/istio/ingressgateway-certs + +#. Edit the default kubeflow gateway + + .. code-block:: bash + + kubectl apply -f - < + export INGRESS_ADDRESS= + +#. Build the Docker image: + + .. code-block:: bash + + docker build -t ${REGISTRY_URL}/seldon-ovms-locust-client:0.1 --network=host . + +#. Push the image to the Docker registry + + .. code-block:: bash + + docker push ${REGISTRY_URL}/seldon-ovms-locust-client:0.1 + +#. Change to the :file:`ai-inferencing/clients/locust/helm` directory and modify the number of Lucust slave nodes by editing the :file:`values.yaml` file. Change `slaves_replicas` to the desired number of slave nodes. + +#. Run Locust, modifying this command as your environment requires: + + .. code-block:: bash + + helm helm install --name locust --namespace kubeflow + --set client.image=${REGISTRY_URL}/seldon-ovms-locust-client:0.1 + --set client.ingress=${INGRESS_ADDRESS} + --set client.mount_images_volume.enabled=false + --set client.images_path=./ + ../helm + + Values can be adjusted in the helm command using `--set` as shown in this sample command. Note that `.nip.io` may be necessary when using ingress. + +#. Find the UI port in the output from the helm command: + + .. code-block:: console + + NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE + locust NodePort 10.110.167.232 8089:XXXXX/TCP 0s + locust-master ClusterIP 10.107.78.16 5557/TCP,5558/TCP 0s + + + +#. On the system running the Kubernetes cluster, open a browser and go to `localhost:XXXXX` where `XXXXX` is the port found above. + +#. Run tests using the UI. + + * In the Locust's landing page you will see 2 fields - Number of users to simulate and Hatch rate. Fill them and press "start swarming" + * Locust should start the test. You can track the number of requests and fails in the "statistics" tab. + + .. figure:: /_figures/stacks/Locust_statistics.png + :alt: Locust statistics + :width: 600 + + * In the "Failures" section you should see the type of errors - there should be only classified errors while running the test. This means that the sent image was classified incorrectly. That's normal behavior - we expect <100% accuracy for this model. + + .. figure:: /_figures/stacks/Locust_failures.png + :alt: Locust failures + :width: 600 + + * You can see some simple charts in the "charts" tab. In "Response Times (ms)" chart, the green line is "Median Response Time", yellow line is "95% percentile". + + .. figure:: /_figures/stacks/Locust_charts.png + :alt: Locust charts + :width: 600 + + * In the Exceptions tab, there might be some exceptions shown. This might happen when tested environments reach their response limit and some requests start to fail. + + .. figure:: /_figures/stacks/Locust_exception.png + :alt: Locust exception + :width: 600 + + + +Performance Tuning +================== + +If you need to maximize the usage of available resources, +it is worth to adjust the threading parameters of inference serving instances. It is not enough to set the OMP_NUM_THREADS environment parameter which defines the number of threads used for inference on the CPU. In this case, the instances will scale across the nodes, but won't scale properly across the available cores on one node. Using the :command:`numactl` program is the solution in this case. :command:`numactl` allows you to run the instance on defined cores and uses memory from the same socket. + +To find out how to assign the cores and memory properly run :command:`numactl -H` which will produce output like this: + +.. code-block:: console + + available: 2 nodes (0-1) + node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 + node 0 size: 195279 MB + node 0 free: 128270 MB + node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 + node 1 size: 196608 MB + node 1 free: 119445 MB + node distances: + node 0 1 + 0: 10 21 + 1: 21 10 + + +In this case, the tests are run on Intel(R) Xeon(R) Platinum 6260L with 2 sockets(nodes) and 24 cores (CPUs) on each socket. +Running the inference serving the application with :command:`numactl --membind=0 --cpubind=0-3` forces the system to use 0,1,2,3 cores and memory located on the same socket (0). To use all available cores there is a need to create more service deployments assigned to the remaining cores. + +The `ai-inferencing` repository contains an example deployment script with 2 cores per instance assignment. + +Automatic CPU and memory binds in Seldon deployment +=================================================== + +The Seldon deployment works by default using one deployment only, that is, only one Seldon deployment should be spawned on one cluster node. When there is only one instance of the deployment, it is not necessary to use :command:`numactl` as all resources can be used by this single deployment. + +In most cases that is far too many resources being used, so this setting is not optimal. Instead, use a mechanism that allows creating more than one deployment per node, and equally spliting CPU and memory banks resources between them, using :command:`numactl`. + +First, it is necessary to set the following Helm values in the :file:`ai-inferencing/seldon/values.yaml` file: + +* `instances` is a number describing how many Seldon deployments and different resources ranges should be prepared (each CPU bind range would be used by only single one deployment) to be used by :file:`numactl` on a single socket. When this variable is set to 1, :file:`numactl` is not used. +* `cpus` should be set to the number of physical CPUs on a single node (without HyperThreading) +* `sockets` should be equal to the number of sockets on a single node and to memory banks number + +Run the benchmark +================= + +There are 2 scripts prepared to automate finding the best configuration +by customizing the number of clients and Seldon instances. + +#. :file:`clients/standalone/scale.sh` + + This is a script created to automatically scale and adjust Seldon instances to the selected configuration (2 or 24 cores per instance). + + It takes the following arguments: + * the number, how many replicas (pods) each Seldon deployment should contain, this number should be equal to the number of the nodes in the cluster + * the number, how many deployments should be created (each node would divide resources between deployments) + + This script is called by :file:`clients/standalone/benchmark.sh` script. + +#. :file:`clients/standalone/benchmark.sh` + + This script is used to run benchmarks with selected configuration. + There are 3 benchmark options to set: + + * number of `nodes` - how many nodes are in the cluster, this will scale Seldon deployments, to have one pod replica for each resource slice on each node. + * list of `instances` values - how many Seldon instances would be started for a particular benchmark + * list of `clients` values - it represents the number of clients to be used in particular benchmark + + It is necessary to customize the file itself to use the selected setup, setting environment variables mentioned below: + + * `SSH_PASSWORD` - password to Kubernetes master host + * `SSH_USER` - user to be used to connect Kubernetes master host + * `SSH_IP` - IP of the Kubernetes master host + * `SCALE_FILE_PATH` - path to downloaded this repository on the Kubernetes master host, for example :file:`/path/to/this/repository/clients/standalone` + * `INGRESS_ADDRESS` - server IP or domain name and port where Istio is exposed + + ssh settings should be set to Kubernetes master host where kubectl is usable. + +.. note:: Before starting :file:`benchmark.sh` script, make sure all standalone client requirements are fulfilled, including installed python requirements and downloaded small sample images set if it is used. + +The output from this file is shown on stdout and saved to file named +:file:`log_n<# nodes>_i<# instances per node>_c<# clients>_.txt`. + +The simplest way to monitor the cores usage is to run `htop` program on each tested node. + +.. figure:: /_figures/stacks/htop.png + :alt: htop output + :width: 600 + +Results +======= + +The test performed on a 2 node cluster with 48 cores per node showed that there are 2 optimal scenarios: + +#. Low latency + 2 instances with 24 cores per instance on each node (4 instances on 2 nodes): + + .. code-block:: console + + 1 (Node 1, socket 0): 'numactl --membind=0 --cpubind=0-23 + 2 (Node 1, socket 1): 'numactl --membind=1 --cpubind=24-46 + 3 (Node 2, socket 0): 'numactl --membind=0 --cpubind=0-23 + 4 (Node 2, socket 1): 'numactl --membind=1 --cpubind=46-47 + + + Inference engine configuration for this case + + .. code-block:: console + + OMP_NUM_THREADS=24 + KMP_SETTINGS=1 + KMP_AFFINITY=granularity=fine,verbose,compact,1,0 + KMP_BLOCKTIME=1 + + +#. High throughput + + 24 instances with 2 cores per instance on each node (48 instances on 2 nodes): + + .. code-block:: console + + 1 (Node 1, socket 0): 'numactl --membind=0 --cpubind=0-1 + 2 (Node 1, socket 0): 'numactl --membind=0 --cpubind=2-3 + ... + 48 (Node 2, socket 1): 'numactl --membind=1 --cpubind=46-47 + + + Inference engine configuration: + + .. code-block:: console + + OMP_NUM_THREADS=2 + KMP_SETTINGS=1 + KMP_AFFINITY=granularity=fine,verbose,compact,1,0 + KMP_BLOCKTIME=1 + + + + + +.. _Deep Learning Reference Stack: https://clearlinux.org/stacks/deep-learning +.. _Kubeflow: https://www.kubeflow.org/ +.. _Seldon Core: https://docs.seldon.io/projects/seldon-core/en/latest/ +.. _OpenVino Toolkit: https://software.intel.com/en-us/openvino-toolkit +.. _Intermediate Representation(IR): https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Converting_Model.html +.. _Istio: https://istio.io/ +.. _Source-to-Image: https://github.com/openshift/source-to-image +.. _Min.io: https://min.io/ +.. _Intel Cascade Lake: https://www.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/cascade-lake/2nd-gen-intel-xeon-scalable-processors.html +.. _Docker 18.09: https://kubernetes.io/docs/setup/production-environment/container-runtimes/ +.. _Kubernetes 1.15.3: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ +.. _gsutil: https://cloud.google.com/storage/docs/gsutil_install#linux +.. _Locust.io: https://locust.io