This tutorial and the assets can be downloaded as part of the [Wallaroo Tutorials repository](https://github.com/WallarooLabs/Wallaroo_Tutorials/blob/2023.2.1_prerelease/wallaroo-features/parallel-inference-aloha-tutorial).

## Aloha Parallel Inference Demonstration

This tutorial will focus on the Pipeline method `parallel_infer`, which allows a List of data to be submitted to a Wallaroo instance for parallel inference requests.  This provides high speed increases in situations where data has to be broken up for size and memory needs, data is requested from multiple sources and submitted in a single request, or other use cases.

For this example we will be using an open source model that uses an [Aloha CNN LSTM model](https://www.researchgate.net/publication/348920204_Using_Auxiliary_Inputs_in_Deep_Learning_Models_for_Detecting_DGA-based_Domain_Names) for classifying Domain names as being either legitimate or being used for nefarious purposes such as malware distribution.  

## Tutorial Goals

* Create a workspace for our work.
* Upload the Aloha TensorFlow model.
* Create a pipeline that can ingest our submitted data, submit it to the model, and export the results.
* Run a sample inference through our pipeline by loading a file.
* Run a batch inference to show submitting a set of data to an inference request.
* Split a DataFrame into a List of 1,000 separate DataFrames to simulate separate inference requests.
* Submit the List of DataFrames sequentially and display how long this takes.
* Submit the same List of DataFrames with `parallel_infer` and compare how long it takes.

## Prerequisites

* A Wallaroo version 2023.2.1 and above instance.

## Reference

[Wallaroo SDK Essentials Guide: Inference Management](https://staging.docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-inferences/#parallel-inferences)

## Open a Connection to Wallaroo

The first step is to connect to Wallaroo through the Wallaroo client.  The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the `wallaroo.Client()` command, which provides a URL to grant the SDK permission to your specific Wallaroo environment.  When displayed, enter the URL into a browser and confirm permissions.  Store the connection into a variable that can be referenced later.

In [1]:
import wallaroo
import asyncio 
import datetime
from wallaroo.object import EntityNotFoundError
from wallaroo.framework import Framework

# used to display dataframe information without truncating
import pandas as pd
pd.set_option('display.max_colwidth', None)

# to display dataframe tables
from IPython.display import display

import warnings
warnings.filterwarnings('ignore')

In [None]:
# Client connection from local Wallaroo instance

wl = wallaroo.Client()

## Create the Workspace

We will create a workspace to work in and call it the "alohaworkspace", then set it as current workspace environment.  We'll also create our pipeline in advance as `alohapipeline`.  The model name and the model file will be specified for use in later steps.

To allow this tutorial to be run multiple times or by multiple users in the same Wallaroo instance, a random 4 character prefix will be added to the workspace, pipeline, and model.

In [3]:
import string
import random

# make a random 4 character prefix
prefix= ''.join(random.choice(string.ascii_lowercase) for i in range(4))
workspace_name = f"{prefix}alohaworkspace"
pipeline_name = f"{prefix}alohapipeline"
model_name = f"{prefix}alohamodel"
model_file_name = './models/alohacnnlstm.zip'

In [4]:
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline

In [5]:
workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

aloha_pipeline = get_pipeline(pipeline_name)
aloha_pipeline

0,1
name,ttmhalohapipeline
created,2023-07-06 14:58:27.829573+00:00
last_updated,2023-07-06 14:58:27.829573+00:00
deployed,(none)
tags,
versions,305bd468-1231-40c2-b7dc-409a676b1ba7
steps,


# Upload the Models

Now we will upload our models.  Note that for this example we are applying the model from a .ZIP file.  The Aloha model is a [protobuf](https://developers.google.com/protocol-buffers) file that has been defined for evaluating web pages, and we will configure it to use data in the `tensorflow` format.

In [6]:
model = wl.upload_model(model_name, model_file_name).configure("tensorflow")

## Deploy a model

Now that we have a model that we want to use we will create a deployment for it. 

We will tell the deployment we are using a tensorflow model and give the deployment name and the configuration we want for the deployment.

In [7]:
aloha_pipeline.add_model_step(model)

0,1
name,ttmhalohapipeline
created,2023-07-06 14:58:27.829573+00:00
last_updated,2023-07-06 14:58:27.829573+00:00
deployed,(none)
tags,
versions,305bd468-1231-40c2-b7dc-409a676b1ba7
steps,


In [8]:
REPLICAS = 4
deployment_config = (wallaroo.DeploymentConfigBuilder()
    .replica_count(REPLICAS)
    .build())

In [9]:
aloha_pipeline = aloha_pipeline.deploy(deployment_config =deployment_config)

We can verify that the pipeline is running and list what models are associated with it.

In [10]:
aloha_pipeline.status()

{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.244.0.119',
   'name': 'engine-f679478fb-6k9rw',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'ttmhalohapipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'ttmhalohamodel',
      'version': 'aebc527e-4c11-4229-bc25-0a7cd32cd63f',
      'sha': 'd71d9ffc61aaac58c2b1ed70a2db13d1416fb9d3f5b891e5e4e2e97180fe22f8',
      'status': 'Running'}]}},
  {'ip': '10.244.1.140',
   'name': 'engine-f679478fb-ffhmp',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'ttmhalohapipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'ttmhalohamodel',
      'version': 'aebc527e-4c11-4229-bc25-0a7cd32cd63f',
      'sha': 'd71d9ffc61aaac58c2b1ed70a2db13d1416fb9d3f5b891e5e4e2e97180fe22f8',
      'status': 'Running'}]}},
  {'ip': '10.244.3.110',
   'name': 'engine-f679478fb

## Interferences

### Infer 1 row

Now that the pipeline is deployed and our Aloha model is in place, we'll perform a smoke test to verify the pipeline is up and running properly.  We'll use the `infer_from_file` command to load a single encoded URL into the inference engine and print the results back out.

The result should tell us that the tokenized URL is legitimate (0) or fraud (1).  This sample data should return close to 0.

In [11]:
result = aloha_pipeline.infer_from_file('./data/data_1.df.json')

display(result)

Unnamed: 0,time,in.text_input,out.banjori,out.corebot,out.cryptolocker,out.dircrypt,out.gozi,out.kraken,out.locky,out.main,out.matsnu,out.pykspa,out.qakbot,out.ramdo,out.ramnit,out.simda,out.suppobox,check_failures
0,2023-07-06 14:58:43.560,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 28, 16, 32, 23, 29, 32, 30, 19, 26, 17]",[0.0015195857],[0.9829148],[0.012099565],[4.7591297e-05],[2.0289372e-05],[0.0003197726],[0.011029283],[0.997564],[0.010341615],[0.008038961],[0.016155062],[0.006236233],[0.0009985751],[1.793378e-26],[1.3889951e-27],0


### Batch Inference Example

Now we'll perform a batch inference.  We have the file `./data/data_25k.df.json`, which is a pandas DataFrame file with 25,000 records to analyze.  We'll provide it to the pipeline and perform a sample inference, and provide the first 20 rows.

In [12]:
%time

test_data = pd.read_json("./data/data_25k.df.json")


batch_result = aloha_pipeline.infer(test_data.head(1000))
display(batch_result.head(20))

CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs
Wall time: 8.11 µs


Unnamed: 0,time,in.text_input,out.banjori,out.corebot,out.cryptolocker,out.dircrypt,out.gozi,out.kraken,out.locky,out.main,out.matsnu,out.pykspa,out.qakbot,out.ramdo,out.ramnit,out.simda,out.suppobox,check_failures
0,2023-07-06 14:58:44.588,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 28, 16, 32, 23, 29, 32, 30, 19, 26, 17]",[0.0015195871],[0.9829148],[0.012099565],[4.7591344e-05],[2.0289392e-05],[0.0003197726],[0.011029272],[0.997564],[0.010341625],[0.008038965],[0.016155062],[0.006236233],[0.0009985756],[1.793378e-26],[1.3889898e-27],0
1,2023-07-06 14:58:44.588,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30, 20, 19, 27, 14, 17, 24, 13, 23, 20, 18, 35, 18, 22, 23]",[7.447225e-18],[6.7359245e-08],[0.17081991],[1.3220147e-09],[1.2758853e-24],[0.22559536],[0.34209844],[0.99999994],[0.30801848],[0.18282163],[3.8022554e-11],[0.20622534],[0.15215826],[1.17020745e-30],[3.1514465e-38],0
2,2023-07-06 14:58:44.588,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 33, 25, 36, 25, 31, 14, 32, 36, 25, 12, 35, 34, 30, 28, 27, 24, 29, 27]",[2.8599305e-21],[9.302005e-08],[0.04445295],[6.163758e-09],[8.34974e-23],[0.4823448],[0.2633289],[1.0],[0.29800323],[0.22361766],[1.5238921e-06],[0.3282038],[0.029332466],[1.1995533e-31],[0.0],0
3,2023-07-06 14:58:44.588,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 17, 23, 22, 15, 12, 35, 34, 36, 12, 18, 24, 34, 32, 36, 12, 14, 16, 27, 22, 23]",[2.1386805e-15],[3.8817485e-10],[0.045599725],[1.9090386e-07],[1.3139924e-25],[0.59542614],[0.17374131],[0.9999997],[0.2315157],[0.17591687],[1.087611e-09],[0.21832284],[0.012869288],[6.158882e-28],[1.438591e-35],0
4,2023-07-06 14:58:44.588,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 13, 14, 12, 33, 16, 23, 15, 22, 30, 28, 26, 12, 16, 32, 37, 29, 22, 28, 22, 16, 27, 32]",[9.453381e-15],[7.091152e-10],[0.049815107],[5.2914135e-09],[7.4132087e-19],[1.5504637e-13],[1.079181e-15],[0.9999989],[1.5003076e-15],[0.3307571],[2.6258948e-07],[0.50362796],[0.020393757],[0.0],[0.0],0
5,2023-07-06 14:58:44.588,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 17, 29, 20, 33, 13, 36, 35, 30, 21, 29, 17, 26, 19, 25, 36, 14, 23, 16, 18, 15, 21, 18, 28, 35, 19]",[1.7247417e-17],[8.13542e-08],[0.013697103],[5.608618e-11],[1.4033129e-17],[0.49469134],[0.11978851],[0.99999994],[0.19000009],[0.105966896],[5.524472e-06],[0.2421006],[0.0069435015],[1.2804916e-34],[9.482612e-35],0
6,2023-07-06 14:58:44.588,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22, 36, 14, 12, 23, 14, 13, 20, 20, 23, 27, 36, 29, 35, 19, 33, 22, 25, 26, 32, 21]",[5.5500374e-18],[3.3608643e-07],[0.023452949],[1.13188126e-10],[1.0496918e-22],[0.23692927],[0.064456955],[0.99999183],[0.07306596],[0.0649943],[1.430274e-08],[0.11925246],[0.0011031039],[1.5206227e-32],[0.0],0
7,2023-07-06 14:58:44.588,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 36, 22, 28, 23, 20, 25, 21, 20, 16, 12, 33, 21, 14, 34, 34, 32, 19, 36, 17, 29, 26, 14, 29]",[3.922242e-18],[1.4074376e-10],[0.010946894],[8.202797e-11],[2.454975e-24],[0.4210731],[0.07124003],[0.9982491],[0.11818306],[0.08340973],[1.9207815e-09],[0.16958177],[0.00051990483],[0.0],[0.0],0
8,2023-07-06 14:58:44.588,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 30, 33, 29, 37, 24, 33, 16, 20, 24]",[4.0574273e-11],[1.0878828e-09],[0.1791685],[1.7313055e-06],[8.697295e-18],[9.197088e-16],[3.852137e-17],[0.9999977],[3.265452e-17],[0.32568428],[6.834289e-09],[0.37007844],[0.44918326],[0.0],[2.0823953e-26],0
9,2023-07-06 14:58:44.588,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35, 29, 19, 35, 31, 15, 14, 21, 26, 31, 34, 27, 22]",[2.2576288e-09],[2.0812616e-09],[0.17788413],[1.1887528e-08],[1.0785784e-11],[0.041252833],[0.21430452],[0.9999988],[0.17853755],[0.13382341],[0.000114089744],[0.14033848],[0.011299981],[3.5758114e-24],[7.164665e-24],0


### Parallel Inference Example

This time, let's take the same file and split it into 1,000 **separate** DataFrames, which each indivual row as a single DataFrame.  This is toy data; we're just providing it as an example of how to submit a an inference request for parallel infer.

In [13]:
test_data = pd.read_json("./data/data_25k.df.json")
test_list = []

for index, row in test_data.head(1000).iterrows():
    test_list.append(row.to_frame('text_input').reset_index())

Now we'll perform an inference with Parallel Infer through the pipeline.

The pipeline `parallel_infer(tensor_list, timeout, num_parallel, retries)` **asynchronous** method performs an inference as defined by the pipeline steps and takes the following arguments:

* **tensor_list** (*REQUIRED List*): The data submitted to the pipeline for inference as a List of the supported data types:
  * [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html):  Data submitted as a pandas DataFrame are returned as a pandas DataFrame.  For models that output one column  based on the models outputs.
  * [Apache Arrow](https://arrow.apache.org/) (**Preferred**): Data submitted as an Apache Arrow are returned as an Apache Arrow.
* **timeout** (*OPTIONAL int*): A timeout in seconds before the inference throws an exception.  The default is 15 second per call to accommodate large, complex models.  Note that for a batch inference, this is **per list item** - with 10 inference requests, each would have a default timeout of 15 seconds.
* **num_parallel** (*OPTIONAL int*):  The number of parallel threads used for the submission.  **This should be no more than four times the number of pipeline replicas**.
* **retries** (*OPTIONAL int*):  The number of retries per inference request submitted.

`parallel_infer` is an asynchronous method that returns the Python callback list of tasks. Calling `parallel_infer` should be called with the `await` keyword to retrieve the callback results.

First we'll process the 1,000 rows serially and clock how long this takes.  This may take up to 3-10 minutes depending on the speed of the connection between the client and the Wallaroo instance.

In [14]:
#
# Run the inference sequentially to establish a baseline
#
now = datetime.datetime.now()

results = []
for df in test_list:
    results.append(aloha_pipeline.infer(tensor=df, timeout=10))

total_sequential = datetime.datetime.now() - now

print(f"Elapsed = {total_sequential.total_seconds()} : {len(results)}")

Elapsed = 441.705702 : 1000


Now we'll compare that to using the `parallel_infer` method.  The same data, but now submitted as multiple rows of the list of dataframes at a time.

In [15]:
timeout_secs=1200
now = datetime.datetime.now()
##########
parallel_results = await aloha_pipeline.parallel_infer(tensor_list=test_list, timeout=timeout_secs, num_parallel=2*REPLICAS, retries=3)
##########
total_parallel = datetime.datetime.now() - now
print(f"Elapsed_in_parallel = {total_parallel.total_seconds()} : {len(parallel_results)}")

Elapsed_in_parallel = 13.701955 : 1000


In [16]:
print(f"Comparison:\nTotal Time Sequentially: {total_sequential.total_seconds()}\nTotal Time Paralleled: {total_parallel.total_seconds()}")

Comparison:
Total Time Sequentially: 441.705702
Total Time Paralleled: 13.701955


Depending on the connection and other requirements, the differences in time can be immense.  For a local connection, the time to process the List sequentially took 4 minutes - versus 13 seconds for the `parallel_infer` method.  This is an immense difference.

## Undeploy Pipeline

When finished with our tests, we will undeploy the pipeline so we have the Kubernetes resources back for other tasks.  Note that if the deployment variable is unchanged aloha_pipeline.deploy() will restart the inference engine in the same configuration as before.

In [17]:
aloha_pipeline.undeploy()

0,1
name,ttmhalohapipeline
created,2023-07-06 14:58:27.829573+00:00
last_updated,2023-07-06 14:58:31.913295+00:00
deployed,False
tags,
versions,"d78d9794-6e8d-444a-9d17-75f320de1b8b, 305bd468-1231-40c2-b7dc-409a676b1ba7"
steps,ttmhalohamodel
