# Inference using the Model Zoo for Intel® Architecture

* https://github.com/IntelAI/models
* https://aihub.cloud.google.com/u/0/p/products%2Fc8019607-bf98-4870-bc32-2d19f6ab8766

This notebook goes through the process of creating a KubeFlow pipeline component which runs a simple TensorFlow inference example using code from the [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models).

We start out by defining variables for our experiment and the parameters that we will use to run the TensorFlow model, such as the name of the model, batch size, etc.

In [1]:
EXPERIMENT_NAME = 'Model Zoo pipeline experiments'
KFP_PACKAGE = 'https://storage.googleapis.com/ml-pipeline/release/0.1.20/kfp.tar.gz'

MODEL_NAME = "gnmt"
PRECISION = "fp32"
MODE = "inference"
BATCH_SIZE = "32"
SOCKET_ID = "0"
DATA_LOCATION = "gs://BUCKET_NAME/wmt16/"
PERFORMANCE_OR_ACCURACY = "performance"
DOCKER_IMAGE = "gcr.io/my-registry/intel-model-zoo:language-translation"

## Install dependencies

The next step installs the KubeFlow Pipelines SDK based on the `KFP_PACKAGE` release that was defined in the variable above.

In [4]:
!pip3 install $KFP_PACKAGE --upgrade --user

Collecting https://storage.googleapis.com/ml-pipeline/release/0.1.20/kfp.tar.gz
  Using cached https://storage.googleapis.com/ml-pipeline/release/0.1.20/kfp.tar.gz
Collecting urllib3<1.25,>=1.15
  Using cached https://files.pythonhosted.org/packages/01/11/525b02e4acc0c747de8b6ccdab376331597c569c42ea66ab0a1dbd36eca2/urllib3-1.24.3-py2.py3-none-any.whl
Collecting kubernetes<=9.0.0,>=8.0.0
  Using cached https://files.pythonhosted.org/packages/00/f7/4f196c55f1c2713d3edc8252c4b45326306eef4dc10048f13916fe446e2b/kubernetes-9.0.0-py2.py3-none-any.whl
Collecting PyJWT>=1.6.4
  Using cached https://files.pythonhosted.org/packages/87/8b/6a9f14b5f781697e51259d81657e6048fd31a113229cf346880bb7545565/PyJWT-1.7.1-py2.py3-none-any.whl
Collecting cryptography>=2.4.2
  Using cached https://files.pythonhosted.org/packages/ca/9a/7cece52c46546e214e10811b36b2da52ce1ea7fa203203a629b8dfadad53/cryptography-2.8-cp34-abi3-manylinux2010_x86_64.whl
Collecting requests_toolbelt>=0.8.0
  Using cached https://files.p

##  Define the pipeline
The KubeFlow Pipeline SDK is then used to define the pipeline which we will use to run inference using the model zoo. It defines the runtime parameters and uses those parameters to call the launch script, which is in the docker container.

In [5]:
import kfp.dsl as dsl
import kfp.gcp as gcp

@dsl.pipeline(
  name='Model Zoo Pipeline',
  description='A pipeline that runs TensorFlow benchmarking using the Model Zoo for Intel Architecture.'
)
def model_zoo_for_intel_architecture(
        data_location='',
        model_name='gnmt',
        precision='fp32',
        mode='inference',
        batch_size='32',
        socket_id='0',
        verbose='true',
        performance_or_accuracy='performance',
        extra_model_args='',
        docker_image=''):
  """
  This is a one-step pipeline that runs benchmarking using the specified parameters
  """

  model_zoo_component = dsl.ContainerOp(
      name='model_zoo_component',
      image=docker_image,
      arguments=["src/launch_inference.py",
                 "--model-name", model_name,
                 "--framework", "tensorflow",
                 "--precision", precision,
                 "--mode", mode,
                 "--performance-or-accuracy", performance_or_accuracy,
                 "--batch-size", batch_size,
                 "--socket-id", socket_id,
                 "--verbose", verbose,
                 "--data-location", data_location,
                 "--extra-model-args", extra_model_args]
  ).apply(gcp.use_gcp_secret('user-gcp-sa'))
  model_zoo_component.set_image_pull_policy("Always")

ModuleNotFoundError: No module named 'kfp'

## Compile the pipeline

Next, the pipeline is compiled into a file called `pipeline.tar.gz`.

In [None]:
import kfp.compiler as compiler

pipeline_filename = 'pipeline.tar.gz'
compiler.Compiler().compile(model_zoo_for_intel_architecture, pipeline_filename)

## Run the pipeline

In this last step, a dictionary is setup with the runtime arguments that specify which model to run, an experiment is created, and then the pipeline is run.

In [None]:
# Setup arguments to run model
arguments = {
    "model_name": MODEL_NAME,
    "precision": PRECISION,
    "mode": MODE,
    "batch_size": BATCH_SIZE,
    "socket_id": SOCKET_ID,
    "performance_or_accuracy": PERFORMANCE_OR_ACCURACY,
    "data_location": DATA_LOCATION,
    "docker_image": DOCKER_IMAGE
}

# Create an experiment
import kfp
client = kfp.Client()
experiment = client.create_experiment(EXPERIMENT_NAME)

# Run the pipeline
run_name = pipeline_func.__name__ + ' {} {} {}'.format(model_name, precision, mode)
run_result = client.run_pipeline(experiment.id, run_name, pipeline_filename, arguments)