# Edge Demo

This notebook will walk through building a computer vision (CV) pipeline in Wallaroo, deploying it to the local cluster for testing, and then publishing it for edge deployment.

# Package imports
Here we will import the libraries needed for this notebook

In [1]:
import wallaroo
import os
import pandas as pd
import sys
 
# setting path
from wallaroo_demo_utils import WallarooDemoUtils 

w_demo = WallarooDemoUtils()

## Securily connect this notebook to the Wallaroo Cluster
We can now us the wallaroo library to set up a connection to the wallaroo cluster

In [2]:
wl = wallaroo.Client()

# Upload and package the model

When a model is uploaded to a Wallaroo cluster, it is optimized and packaged to make it ready to run as part of a pipeline. In many times, the Wallaroo Server can natively run a model without any Python overhead. In other cases, such as a Python script, a custom Python environment will be automatically generated. This is comparable to the process of "containerizing" a model by adding a small HTTP server and other wrapping around it.

In [4]:
model = wl.upload_model('resnet-50', "./models/resnet50_v1.onnx", 
                        framework = wallaroo.framework.Framework.ONNX).configure(tensor_fields=["tensor"])


### Reserve resources needed for this pipeline 

Before deploying an inference engine we need to tell wallaroo what resources it will need.
To do this we will use the wallaroo DeploymentConfigBuilder() and fill in the options listed below to determine what the properties of our inference engine will be

We will be testing this deployment for an edge scenario, so the resource specifications are kept small -- what's the minimum needed to meet the expected load on the planned hardware.

- cpus - 4 => allow the engine to use 4 CPU cores when running the neural net
- memory - 0.5Gi => each inference engine will have 512 MB of memory, which is plenty for processing a single image at a time.



In [5]:
deployment_config = wallaroo.DeploymentConfigBuilder() \
    .cpus(15)\
    .memory("4Gi")\
    .arch(wallaroo.engine_config.Architecture.X86)\
    .build()

### Set up the Pipeline Steps

Here we set up the pipeline steps with our sample model.

In [7]:
pipeline = wl.build_pipeline('cv-demo').add_model_step(model)

## Publish the pipeline for edge deployment

It worked! For a demo, we'll take working once as "testing". So now that we've tested our pipeline, we are ready to publish it for edge deployment. Publishing it means assembling all of the configuration files and model assets and pushing them to an OCI (aka Docker) repository. This came repository can hold the Wallaroo Server container image, any other container images needed for the edge system, plus any of the model pipelines.

The Wallaroo Server is available on the same repo at:
   oci.wallaroo.io/wallaroo-dev/wallaroo-server:latest

In [8]:
pub = pipeline.publish(deployment_config)
pub

Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is Publishing...Published.


0,1
ID,27
Pipeline Version,a677d6e9-d42f-4862-a16e-13412dd366a5
Status,Published
Engine URL,ghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/standalone-mini:v2023.3.0-3798
Pipeline URL,ghcr.io/wallaroolabs/doc-samples/pipelines/cv-demo:a677d6e9-d42f-4862-a16e-13412dd366a5
Helm Chart URL,ghcr.io/wallaroolabs/doc-samples/charts/cv-demo
Helm Chart Reference,ghcr.io/wallaroolabs/doc-samples/charts@sha256:08259abbf988fc3ed7989906e661de5ceedee4d077af31080c724e33fd77e886
Helm Chart Version,0.0.1-a677d6e9-d42f-4862-a16e-13412dd366a5
Engine Config,"{'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}}}, 'engineAux': {'images': {}}, 'enginelb': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}}}}"
User Images,[]


## Deploy on Edge Device

With our pipeline published, we will deploy it onto an Edge device via Docker.  The following code will generate the docker deployment command for port 8080, with the registry and login information provided by the user.

In [9]:
docker_deploy = f'''
docker run -p 8080:8080 \\
    -e DEBUG=true -e OCI_REGISTRY=$REGISTRYURL \\
    -e CONFIG_CPUS=4 \\
    -e OCI_USERNAME=$REGISTRYUSERNAME \\
    -e OCI_PASSWORD=$REGISTRYPASSWORD \\
    -e PIPELINE_URL={pub.pipeline_url} \\
    {pub.engine_url}
'''

print(docker_deploy)


docker run -p 8080:8080 \
    -e DEBUG=true -e OCI_REGISTRY=$REGISTRYURL \
    -e CONFIG_CPUS=4 \
    -e OCI_USERNAME=$REGISTRYUSERNAME \
    -e OCI_PASSWORD=$REGISTRYPASSWORD \
    -e PIPELINE_URL=ghcr.io/wallaroolabs/doc-samples/pipelines/cv-demo:a677d6e9-d42f-4862-a16e-13412dd366a5 \
    ghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/standalone-mini:v2023.3.0-3798



### Edge Inferece

Now we can perform an inference through the edge device.

In [10]:
!curl -X POST testboy.local:8080/pipelines/cv-demo \
    -H "Content-Type: application/vnd.apache.arrow.file" \
        --data-binary @image_224x224.arrow

[{"check_failures":[],"elapsed":[1011390,23659534],"model_name":"resnet-50","model_version":"b10b1e7a-48ef-4680-bde8-a67e6613adc6","original_data":null,"outputs":[{"Int64":{"data":[535],"dim":[1],"v":1}},{"Float":{"data":[0.00009498586587142199,0.00009141524787992239,0.0004606838047038764,0.00007667174941161647,0.00008047101437114179,0.00006355856748996302,0.0001758082798914984,0.000014166356777423061,0.00004344096305430867,0.00004225136217428371,0.0002540049026720226,0.005299815908074379,0.00016666953160893172,0.0001903128286357969,0.00020846890402026474,0.00014618519344367087,0.00034408149076625705,0.0008281364571303129,0.000119782991532702,0.00020627757476177067,0.00014886556891724467,0.0002607095520943403,0.000900866580195725,0.0014754909789189696,0.0008267511730082333,0.00030276484903879464,0.0001936637272592634,0.0005283929640427232,0.0001492276060162112,0.0002412181202089414,0.00041593582136556506,0.00003615637979237363,0.00024112094251904637,0.0001600235846126452,0.000126323997