This tutorial and the assets can be downloaded as part of the [Wallaroo Tutorials repository](https://github.com/WallarooLabs/Wallaroo_Tutorials/tree/main/wallaroo-model-cookbooks/aloha).

## Aloha Demo

In this notebook we will walk through a simple pipeline deployment to inference on a model. For this example we will be using an open source model that uses an [Aloha CNN LSTM model](https://www.researchgate.net/publication/348920204_Using_Auxiliary_Inputs_in_Deep_Learning_Models_for_Detecting_DGA-based_Domain_Names) for classifying Domain names as being either legitimate or being used for nefarious purposes such as malware distribution.  

For our example, we will perform the following:

* Create a workspace for our work.
* Upload the Aloha model.
* Create a pipeline that can ingest our submitted data, submit it to the model, and export the results
* Run a sample inference through our pipeline by loading a file
* Run a sample inference through our pipeline's URL and store the results in a file.

All sample data and models are available through the [Wallaroo Quick Start Guide Samples repository](https://github.com/WallarooLabs/quickstartguide_samples).

## Open a Connection to Wallaroo

The first step is to connect to Wallaroo through the Wallaroo client.  The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the `wallaroo.Client()` command, which provides a URL to grant the SDK permission to your specific Wallaroo environment.  When displayed, enter the URL into a browser and confirm permissions.  Store the connection into a variable that can be referenced later.

In [1]:
import wallaroo
import asyncio 
import datetime
from wallaroo.object import EntityNotFoundError

# used to display dataframe information without truncating
import pandas as pd
pd.set_option('display.max_colwidth', None)

# to display dataframe tables
from IPython.display import display

In [2]:
# Client connection from local Wallaroo instance

wl = wallaroo.Client()

# SSO login through keycloak

# wallarooPrefix = "YOUR PREFIX"
# wallarooSuffix = "YOUR SUFFIX"

# wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}.api.{wallarooSuffix}", 
#                     auth_endpoint=f"https://{wallarooPrefix}.keycloak.{wallarooSuffix}", 
#                     auth_type="sso")

Please log into the following URL in a web browser:

	https://sparkly-apple-3026.keycloak.wallaroo.community/auth/realms/master/device?user_code=GHMU-YILH

Login successful!


### Arrow Support

As of the 2023.1 release, Wallaroo provides support for dataframe and Arrow for inference inputs.  This tutorial allows users to adjust their experience based on whether they have enabled Arrow support in their Wallaroo instance or not.

If Arrow support has been enabled, `arrowEnabled=True`. If disabled or you're not sure, set it to `arrowEnabled=False`

The examples below will be shown in an arrow enabled environment.

In [3]:
import os
# Only set the below to make the OS environment ARROW_ENABLED to TRUE.  Otherwise, leave as is.
# os.environ["ARROW_ENABLED"]="True"

if "ARROW_ENABLED" not in os.environ or os.environ["ARROW_ENABLED"].casefold() == "False".casefold():
    arrowEnabled = False
else:
    arrowEnabled = True
print(arrowEnabled)

True


## Create the Workspace

We will create a workspace to work in and call it the "alohaworkspace", then set it as current workspace environment.  We'll also create our pipeline in advance as `alohapipeline`.  The model name and the model file will be specified for use in later steps.

To allow this tutorial to be run multiple times or by multiple users in the same Wallaroo instance, a random 4 character prefix will be added to the workspace, pipeline, and model.

In [4]:
import string
import random

# make a random 4 character prefix
prefix= ''.join(random.choice(string.ascii_lowercase) for i in range(4))
workspace_name = f"{prefix}alohaworkspace"
pipeline_name = f"{prefix}alohapipeline"
model_name = f"{prefix}alohamodel"
model_file_name = './alohacnnlstm.zip'

In [5]:
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline

In [None]:
# aloha_pipeline = wl.list_pipelines()[0]

In [None]:
# aloha_pipeline

In [6]:
workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

aloha_pipeline = get_pipeline(pipeline_name)
aloha_pipeline

0,1
name,voomalohapipeline
created,2023-06-26 19:33:33.632845+00:00
last_updated,2023-06-26 19:33:33.632845+00:00
deployed,(none)
tags,
versions,c963b069-8fc8-4562-98c0-b05980b3b10e
steps,


We can verify the workspace is created the current default workspace with the `get_current_workspace()` command.

In [None]:
# wl.get_current_workspace()

# Upload the Models

Now we will upload our models.  Note that for this example we are applying the model from a .ZIP file.  The Aloha model is a [protobuf](https://developers.google.com/protocol-buffers) file that has been defined for evaluating web pages, and we will configure it to use data in the `tensorflow` format.

In [7]:
model = wl.upload_model(model_name, model_file_name).configure("tensorflow")

## Deploy a model

Now that we have a model that we want to use we will create a deployment for it. 

We will tell the deployment we are using a tensorflow model and give the deployment name and the configuration we want for the deployment.

To do this, we'll create our pipeline that can ingest the data, pass the data to our Aloha model, and give us a final output.  We'll call our pipeline `aloha-test-demo`, then deploy it so it's ready to receive data.  The deployment process usually takes about 45 seconds.

* **Note**:  If you receive an error that the pipeline could not be deployed because there are not enough resources, undeploy any other pipelines and deploy this one again.  This command can quickly undeploy all pipelines to regain resources.  We recommend **not** running this command in a production environment since it will cancel any running pipelines:

```python
for p in wl.list_pipelines(): p.undeploy()
```

In [8]:
aloha_pipeline.add_model_step(model)

0,1
name,voomalohapipeline
created,2023-06-26 19:33:33.632845+00:00
last_updated,2023-06-26 19:33:33.632845+00:00
deployed,(none)
tags,
versions,c963b069-8fc8-4562-98c0-b05980b3b10e
steps,


In [9]:
REPLICAS = 8
deployment_config = (wallaroo.DeploymentConfigBuilder()
    .cpus(1)
    .memory("0.5Gi")
    .replica_count(REPLICAS)
    .build())

In [10]:
aloha_pipeline = aloha_pipeline.deploy(deployment_config =deployment_config)

Waiting for deployment - this will take up to 45s .............. ok


We can verify that the pipeline is running and list what models are associated with it.

In [12]:
# aloha_pipeline.status()

## Interferences

### Infer 1 row

Now that the pipeline is deployed and our Aloha model is in place, we'll perform a smoke test to verify the pipeline is up and running properly.  We'll use the `infer_from_file` command to load a single encoded URL into the inference engine and print the results back out.

The result should tell us that the tokenized URL is legitimate (0) or fraud (1).  This sample data should return close to 0.

In [None]:
if arrowEnabled is True:
    result = aloha_pipeline.infer_from_file('./data/data_1.df.json')
else:
    result = aloha_pipeline.infer_from_file("
                                            ./data/data_1.json")
display(result)

### Prepare data for parallel inference

In [25]:
input_df = pd.read_json("./data/data_1.df.json")
input_df

Unnamed: 0,text_input
0,"[0, ABC, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 28, 16, 32, 23, 29, 32, 30, 19, 26, 17]"


In [14]:
dataset = []
for i in range(200):
    dataset.append(input_df)

In [None]:
dataset = []
for index, row in input_df.head(200).iterrows():
    dataset.append(row.to_frame('text_input').reset_index())

In [26]:
dataset.append(input_df)

### Sequential inference - how it is done today

In [15]:
timeout_secs=1200
now = datetime.datetime.now()
##########
results = aloha_pipeline.infer(tensor=input_df, timeout=10)
total = datetime.datetime.now() - now
print(f"Elapsed_in_parallel = {total.total_seconds()} : {len(results)}")

Elapsed_in_parallel = 0.313088 : 1


In [16]:
#
# Run the inference sequentially to establish a baseline
#
now = datetime.datetime.now()

results = []
for df in dataset:
    results.append(aloha_pipeline.infer(tensor=df, timeout=10))

total = datetime.datetime.now() - now

print(f"Elapsed = {total.total_seconds()} : {len(results)}")

Elapsed = 12.768011 : 200


In [None]:
# results

### Parallel inference using SDK's new pipeline.parallel_infer()

In [27]:
timeout_secs=1200
now = datetime.datetime.now()
##########
parallel_results = await aloha_pipeline.parallel_infer(tensor_list=dataset, timeout=timeout_secs, num_parallel=2*8, retries=3)
##########
total = datetime.datetime.now() - now
print(f"Elapsed_in_parallel = {total.total_seconds()} : {len(parallel_results)}")

Elapsed_in_parallel = 1.72374 : 201


In [30]:
parallel_results[200]

httpx.HTTPStatusError("Server error '500 Internal Server Error' for url 'http://engine-lb.voomalohapipeline-51:29502/pipelines/voomalohapipeline?dataset%5B%5D=%2A&dataset.exclude%5B%5D=metadata&dataset.separator=.'\nFor more information check: https://httpstatuses.com/500")

In [None]:
def wrapper_infer():
    timeout_secs=1200
    now = datetime.datetime.now()
    ##########
    parallel_results = asyncio.run(aloha_pipeline.parallel_infer(tensor_list=dataset, timeout=timeout_secs, num_parallel=8, retries=3))
    ##########
    total = datetime.datetime.now() - now
    print(f"Elapsed_in_parallel = {total.total_seconds()} : {len(parallel_results)}")

In [None]:
import threading
threads = []
for i in range(3):
    thread = threading.Thread(target=wrapper_infer)
    threads.append(thread)
    print("starting new thread...............")
    thread.start()
    

# Wait for all threads to complete
for thread in threads:
    thread.join()

In [None]:
# now = datetime.datetime.now()
# results_df = pd.concat((r for r in parallel_results if not None))
# total = datetime.datetime.now() - now
# print(f"Elapsed_in_parallel = {total.total_seconds()} : {len(results_df)}")

In [None]:
# results_df

### DIY parallel infer using pipeline.async_infer()

In [23]:
#
# PYTHON PARALLEL VERSION
#

import asyncio
import httpx

async def parallel_infer(pipe, dataset, timeout_secs, num_parallel):
    # do more 
    
    results = []
    async with httpx.AsyncClient(timeout=timeout_secs) as client:
        # get data from db async
        tasks = [pipe.async_infer(async_client=client, tensor=df, timeout=timeout_secs) for idx, df in enumerate(dataset)]
        results = await asyncio.gather(*tasks, return_exceptions=True)
        # analysis async
        
    # do more
    
    return results

In [24]:
timeout_secs=1200
now = datetime.datetime.now()
diy_results = await parallel_infer(aloha_pipeline, dataset, timeout_secs, 8)
total = datetime.datetime.now() - now
print(f"Elapsed_in_parallel = {total.total_seconds()} : {len(diy_results)}")

Elapsed_in_parallel = 1.885742 : 200


## I don't like using await - what are my options?

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
timeout_secs=10
def sync_parallel_infer():
    return asyncio.run(aloha_pipeline.parallel_infer(dataset, timeout_secs, 8))

In [None]:
now = datetime.datetime.now()
sync_results = sync_parallel_infer()
total = datetime.datetime.now() - now
print(f"Elapsed_in_parallel = {total.total_seconds()} : {len(sync_results)}")

In [None]:
sync_results[199]

### Gevent parallel inferencing

In [None]:
import gevent.pool
import json

from geventhttpclient import HTTPClient
from geventhttpclient.url import URL

In [None]:
url = aloha_pipeline._deployment._url()
inference_url = URL(url)

In [None]:
def execute_async_infer():
    greenlets = []
    # Create a gevent pool with a maximum concurrency level
    pool = gevent.pool.Pool(8)
    for data in dataset:
        greenlet = pool.spawn(aloha_pipeline.infer, data)
        greenlets.append(greenlet)
    gevent.joinall(greenlets)
    
    results = [greenlet.value for greenlet in greenlets]
    # combined_df = pd.concat(results, ignore_index=True)
    return results

In [None]:
now = datetime.datetime.now()
results = aloha_pipeline.parallel_infer(dataset, num_parallel)
total = datetime.datetime.now() - now
print(f"Elapsed_in_parallel = {total.total_seconds()} : {len(results)}")

## Undeploy Pipeline

When finished with our tests, we will undeploy the pipeline so we have the Kubernetes resources back for other tasks.  Note that if the deployment variable is unchanged aloha_pipeline.deploy() will restart the inference engine in the same configuration as before.

In [None]:
aloha_pipeline.undeploy()

In [None]:
time.sleep(1)

In [None]:
import asyncio

lock = asyncio.Lock()

async def async_task(i):
    print(f"Async {i} task starts")
    await sync_task_wrapper(i)
    await asyncio.sleep(1)
    print(f"Async {i} task finishes")
    
async def sync_task_wrapper(i):
    async with lock:
        await asyncio.get_event_loop().run_in_executor(sync_task(i))

def sync_task(n):
    print(f"Sync task {n} starts")
    # Simulating a blocking operation
    for i in range(100):
        pass
    print(f"Sync task {n} finishes")

async def main():
    print("Main task starts")
    tasks = [
        async_task(i)
        for i in range(5)
    ]
    await asyncio.gather(*tasks, return_exceptions=True)
    await asyncio.sleep(2)  # Simulating other async operations
    print("Main task finishes")


In [None]:
await main()

In [None]:
def wrapper_infer():
    asyncio.run(parallel())

In [None]:
import threading
threads = []
for i in range(3):
    thread = threading.Thread(target=wrapper_main)
    threads.append(thread)
    print("starting new thread...............")
    thread.start()
    

# Wait for all threads to complete
for thread in threads:
    thread.join()
