## GPU Pipeline Deployment Tutorial

The following tutorial demonstrates how to allocate GPU resources in a cluster to a Wallaroo pipeline for models that require GPUs.  This tutorial is based on the Wallaroo SDK 2023.2.1 [Wallaroo SDK Essentials Guide: Pipeline Deployment Configuration](https://staging.docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline-deployment-config/).

The sample model used is the [AlexNet PyTorch model](https://pytorch.org/hub/pytorch_vision_alexnet/) that performs large scale image recognition.  The data used in this sample is "toy" data and is only used to demonstrate a sample inference using this model.

## Tutorial Goals

This tutorial will demonstrate:

1. Allocating GPU resources for a containerized model in a Wallaroo pipeline.  The AlexNet model used in this demonstration requires a GPU for execution.
1. Deploying the pipeline and running a sample inference.

## Prerequisites

The following is required for this tutorial:

* A Wallaroo Enterprise version 2023.2.1 or greater instance installed into a  GPU enabled Kubernetes cluster as described in the [Wallaroo Create GPU Nodepools Kubernetes Clusters guide](https://staging.docs.wallaroo.ai/wallaroo-operations-guide/wallaroo-install-guides/wallaroo-install-configurations/wallaroo-gpu-nodepools/).
* The Wallaroo SDK version 2023.2.1 or greater.

## References

* [Wallaroo SDK Essentials Guide: Pipeline Deployment Configuration](https://staging.docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline-deployment-config/)
* [Wallaroo SDK Reference wallaroo.deployment_config](https://staging.docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-reference-guide/deployment_config/)


## Tutorial Steps

### Import Library

The first step will be to import the various libraries used for this tutorial.  

In [None]:
import uuid
import json
import os
import pandas

import wallaroo
from wallaroo.pipeline import Pipeline
from wallaroo.deployment_config import DeploymentConfigBuilder
import pyarrow as pa

### Connect to the Wallaroo Instance through the User Interface

The next step is to connect to Wallaroo through the Wallaroo client.  The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the `wallaroo.Client()` command, which provides a URL to grant the SDK permission to your specific Wallaroo environment.  When displayed, enter the URL into a browser and confirm permissions.  Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use `wl = wallaroo.Client()`.  For more information on Wallaroo Client settings, see the [Client Connection guide](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-client/).

In [None]:
wl = wallaroo.Client()

### Set Tutorial Variables

The following variables are used through the tutorial to create the workspace and pipeline.  The helper methods below will either create the workspace and pipeline, or use existing ones if they have already been created.

Workspace names must be unique across a Wallaroo instance.  The following code will create a random 4 character suffix to the workspace name to allow this workspace to be run by multiple users across a Wallaroo instance without attempting to create a workspace with the same name.  Users are encouraged to set their own workspace names if they desire to reuse the same workspaces for other needs.

In [None]:
import string
import random

# make a random 4 character suffix to prevent overwriting other user's workspaces
suffix= ''.join(random.choice(string.ascii_lowercase) for i in range(4))
workspace_name = f'gpudemonstrationworkspace{suffix}'
pipeline_name = f'gpudemonstrationpipeline'
model_name = f'alexnetmodel'


In [None]:
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name, workspace):
    pipelines = workspace.pipelines()
    pipe_filter = filter(lambda x: x.name() == name, pipelines)
    pipes = list(pipe_filter)
    # we can't have a pipe in the workspace with the same name, so it's always the first
    if pipes:
        pipeline = pipes[0]
    else:
        pipeline = wl.build_pipeline(name)
    return pipeline

### Create or Retrieve the Workspace

We will create the workspace for this tutorial and set it in the current workspace.  All pipelines and models added in the code below will be set with this workspace.

In [None]:
workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

### Set Import and Output Schemas

The AlexNet ML Model used in this example is containerized in [MLFlow 1.3.0 format](https://mlflow.org/docs/1.3.0/python_api/mlflow.html).  Wallaroo requires MLFlow models registered in a Wallaroo instance to include the input schema from the Apache Arrow `pyarrow.lib.Schema` format.

Reference:  [Wallaroo SDK Essentials Guide: Model Uploads and Registrations: MLFlow](https://staging.docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/wallaroo-sdk-model-registration-mlflow/)

In [None]:
input_schema = pa.schema([
    pa.field('inputs', pa.list_(
        pa.list_(
            pa.list_(
                pa.int64(),
                list_size=224
            ),
            list_size=224
        ),
        list_size=3
    )),
])

output_schema = pa.schema([
    pa.field('prediction', pa.list_(pa.float64(), list_size=1000)),
])

### Register the Model

The sample AlexNet model is stored in a containerized model registry.  It is made publicly available through the Wallaroo Tutorials GitHub Repository.

In [None]:

model = wl.register_model_image(
    name="alexnet-cuda",
    image=f"ghcr.io/wallaroolabs/wallaroo_tutorials/alexnet-gpu:1.30"
).configure("mlflow", input_schema=input_schema, output_schema=output_schema)

### Create the Pipeline and Allocate GPU Resources

The pipeline will now be created with the registered model added as a model step.

A pipeline deployment configuration is created with the following attributes:

* Native Runtimes:  These are models that run directly in the Wallaroo engine.  The following resource are allocated to this pipeline for native runtimes:
  * CPU:  `0.25`
  * RAM:  `1Gi`
  * GPUs: 0 .  As there are no native runtimes added to this pipeline that require a GPU, we do not have to allocate a GPU.
* Containerized Runtimes:  These are models that are containerized, such as MLFlow as Docker containers.
  * AlexNet Model:
    * GPUs: 1

GPUs can only be allocated a entire integer units from the GPU enabled nodepools.  Organizations should be aware of how many GPUs are allocated to the cluster.  If all GPUs are already allocated to other pipelines, or if there are not enough GPUs to fulfill the request, the pipeline deployment will fail and return an error message.

In [None]:
pipeline = wl.build_pipeline(pipeline_name)
pipeline.add_model_step(model)

deployment_config = DeploymentConfigBuilder() \
    .cpus(0.25).memory('1Gi').gpus(0) \
    .sidekick_gpus(model, 1) \
    .sidekick_env(model, {"GUNICORN_CMD_ARGS": "--timeout=180 --workers=1"}) \
    .build()

deployment_config

### Deploy Pipeline

With the configuration set, we will deploy the pipeline and allocate resources from the cluster to the pipeline.

In [None]:
pipeline.deploy(deployment_config=deployment_config)

In [None]:
pipeline.status()

### Sample Inference

A sample inference will be run.  This is toy data, and is only used to verify that the inference is performed through the deployed pipeline.

In [None]:
import numpy as np
import pandas as pd

In [None]:
# input_data = {
#         "inputs": [np.random.randint(0, 256, (3, 224, 224), dtype=np.uint8)] * 2, # required
# }
# dataframe = pd.DataFrame(input_data)
# dataframe

In [None]:
pipeline.infer_from_file('./data/test_alexnet_gpu.json')

### Undeploy Pipeline

With the tutorial complete, the pipeline is undeployed to return the resources back to the cluster.

In [None]:
pipeline.undeploy()