This tutorial and the assets can be downloaded as part of the [Wallaroo Tutorials repository](https://github.com/WallarooLabs/Wallaroo_Tutorials/blob/wallaroo2024.1_tutorials/wallaroo-run-anywhere/edge-architecture-publish-linear-regression-houseprice-model).

## Run Anywhere for ARM Architecture Tutorial: Hugging Face Summarization Model

Wallaroo Run Anywhere provides model deployment in any device, any cloud, and any architecture.  Models uploaded to Wallaroo are set to their targeted architecture.

Organizations can deploy uploaded models to clusters that have nodes with the provisioned architecture.  The following architectures are supported:

* `X86`:  The standard X86 architecture.
* `ARM`:  For more details on cloud providers and their ARM offerings, see [Create ARM Nodepools for Kubernetes Clusters](https://staging.docs.wallaroo.ai/wallaroo-platform-operations/wallaroo-platform-operations-install/wallaroo-install-enterprise-environment/wallaroo-arm-nodepools/).

### Model Architecture Inheritance

The model's deployment configuration inherits its architecture.  Models automatically deploy in the target architecture provided nodepools with the architecture are available.  For information on setting up nodepools with specific architectures, see [Infrastructure Configuration Guides](https://staging.docs.wallaroo.ai/wallaroo-platform-operations/wallaroo-platform-operations-install/wallaroo-install-enterprise-environment/).

That deployment configuration is carried over to the models' publication in an Open Container Initiative (OCI) Registries, which allows edge model deployments on `X64` and `ARM` architectures.  More details on deploying models on edge devices is available with the [Wallaroo Run Anywhere Guides](https://staging.docs.wallaroo.ai/wallaroo-model-operations/wallaroo-model-operations-run-anywhere/).

The deployment configuration **can** be overridden for either model deployment in the Wallaroo Ops instance, or in the Edge devices.

This tutorial demonstrates deploying a ML model trained to predict house prices to ARM edge locations through the following steps.

* Upload a model with the architecture set to `ARM`.
* Create a pipeline with the uploaded model as a model step.
* Publish the pipeline model to an Open Container Initiative (OCI) Registry for both X64 and ARM deployments.

In this notebook, we use a ONNX model pre-trained to predict house prices for our examples.

## Goal

Demonstrate publishing a pipeline with model steps to various architectures.

### Resources

This tutorial provides the following:

* Models:
  * `models/model-auto-conversion_hugging-face_complex-pipelines_hf-summarisation-bart-large-samsun.zip`: **This model should be downloaded and placed into the `./models` folder before beginning this demonstration.** [model-auto-conversion_hugging-face_complex-pipelines_hf-summarisation-bart-large-samsun.zip (1.4 GB)](https://storage.googleapis.com/wallaroo-public-data/llm-models/model-auto-conversion_hugging-face_complex-pipelines_hf-summarisation-bart-large-samsun.zip)

### Prerequisites

* A deployed Wallaroo instance with [Edge Registry Services](https://docs.wallaroo.ai/wallaroo-operations-guide/wallaroo-configuration/wallaroo-edge-deployment/#enable-wallaroo-edge-deployment-registry) and [Edge Observability enabled](https://docs.wallaroo.ai/wallaroo-operations-guide/wallaroo-configuration/wallaroo-edge-deployment/#set-edge-observability-service).
* The following Python libraries installed:
  * [`wallaroo`](https://pypi.org/project/wallaroo/): The Wallaroo SDK. Included with the Wallaroo JupyterHub service by default.
  * [`pandas`](https://pypi.org/project/pandas/): Pandas, mainly used for Pandas DataFrame
  * `json`: Used for format input data for inference requests.
* A X64 Docker deployment to deploy the model on an edge location.


## Steps

* Upload the model with the targeted architecture set to `ARM`.
* Create the pipeline add the model as a model step.
* Deploy the model in the targeted architecture and perform sample inferences.
* Publish the pipeline an OCI registry.
* Deploy the model from the pipeline publish to the edge deployment with ARM architecture.
* Perform sample inferences on the ARM edge model deployment.

### Import Libraries

The first step will be to import our libraries, and set variables used through this tutorial.

In [1]:
import wallaroo
from wallaroo.object import EntityNotFoundError
from wallaroo.framework import Framework
from wallaroo.engine_config import Architecture
import pyarrow as pa

from IPython.display import display

# used to display DataFrame information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)

import datetime
import time

workspace_name = f'run-anywhere-architecture-hf-summarizer-demonstration-tutorial'
arm_pipeline_name = f'architecture-demonstration-arm'
model_name_arm = f'hf-summarizer-arm'
model_file_name = './models/hf_summarization.zip'

# ignoring warnings for demonstration
import warnings
warnings.filterwarnings('ignore')

# used to display DataFrame information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)

### Connect to the Wallaroo Instance

The first step is to connect to Wallaroo through the Wallaroo client.  The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the `wallaroo.Client()` command, which provides a URL to grant the SDK permission to your specific Wallaroo environment.  When displayed, enter the URL into a browser and confirm permissions.  Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use `wl = wallaroo.Client()`.  For more information on Wallaroo Client settings, see the [Client Connection guide](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-client/).

In [4]:
# Login through local Wallaroo instance

wl = wallaroo.Client()

### Create Workspace

We will create a workspace to manage our pipeline and models.  The following variables will set the name of our sample workspace then set it as the current workspace.

Workspace, pipeline, and model names should be unique to each user, so we'll add in a randomly generated suffix so multiple people can run this tutorial in a Wallaroo instance without effecting each other.

In [5]:
workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)

wl.set_current_workspace(workspace)

{'name': 'run-anywhere-architecture-hf-summarizer-demonstration-tutorial', 'id': 64, 'archived': False, 'created_by': 'b4a9aa3d-83fc-407a-b4eb-37796e96f1ef', 'created_at': '2024-03-05T16:09:41.98492+00:00', 'models': [{'name': 'house-price-estimator-arm', 'versions': 1, 'owner_id': '""', 'last_update_time': datetime.datetime(2024, 3, 5, 16, 12, 11, 996618, tzinfo=tzutc()), 'created_at': datetime.datetime(2024, 3, 5, 16, 12, 11, 996618, tzinfo=tzutc())}, {'name': 'hf-summarizer-arm', 'versions': 1, 'owner_id': '""', 'last_update_time': datetime.datetime(2024, 4, 1, 18, 12, 40, 404327, tzinfo=tzutc()), 'created_at': datetime.datetime(2024, 4, 1, 18, 12, 40, 404327, tzinfo=tzutc())}], 'pipelines': [{'name': 'architecture-demonstration-arm', 'create_time': datetime.datetime(2024, 3, 5, 16, 18, 38, 768602, tzinfo=tzutc()), 'definition': '[]'}]}

### Upload Models and Set Target Architecture to ARM

For our example, we will upload the Hugging Face Summarizer model.  The model file is `llm-models/model-auto-conversion_hugging-face_complex-pipelines_hf-summarisation-bart-large-samsun.zip`, and is uploaded with the name `hf-summarizer-arm`.

Models are uploaded to Wallaroo via the [`wallaroo.client.upload_model`](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-reference-guide/client/#Client.upload_model) method which takes the following arguments:

| Parameter | Type | Description |
|---|---|---|
| **path** | *String* (*Required*) | The file path to the model. |
| **framework** | *wallaroo.framework.Framework* (*Required*) | The model's framework.  See [Wallaroo SDK Essentials Guide: Model Uploads and Registrations](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/) for supported model frameworks. |
| **input_schema** | *pyarrow.lib.Schema* (*Optional*)  | The model's input schema.  **Only required for non-Native Wallaroo frameworks.  See [Wallaroo SDK Essentials Guide: Model Uploads and Registrations](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/) for more details. |
| **output_schema** | *pyarrow.lib.Schema* (*Optional*)  | The model's output schema.  **Only required for non-Native Wallaroo frameworks.  See [Wallaroo SDK Essentials Guide: Model Uploads and Registrations](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/) for more details. |
| **convert_wait** | *bool* (*Optional*)  | Whether to wait in the SDK session to complete the auto-packaging process for non-native Wallaroo frameworks. |
| **arch** | *wallaroo.engine_config.Architecture* (*Optional*)  | The targeted architecture for the model.  Options are <ol><li>`X86` (*Default*)</li><li>`ARM`</li></ol> |

Verify the ML model is downloaded from [model-auto-conversion_hugging-face_complex-pipelines_hf-summarisation-bart-large-samsun.zip (1.4 GB)](https://storage.googleapis.com/wallaroo-public-data/llm-models/model-auto-conversion_hugging-face_complex-pipelines_hf-summarisation-bart-large-samsun.zip) and placed into the `./models` directory.

In [7]:
input_schema = pa.schema([
    pa.field('inputs', pa.string()),
    pa.field('return_text', pa.bool_()),
    pa.field('return_tensors', pa.bool_()),
    pa.field('clean_up_tokenization_spaces', pa.bool_()),
    # pa.field('generate_kwargs', pa.map_(pa.string(), pa.null())), # dictionaries are not currently supported by the engine
])

output_schema = pa.schema([
    pa.field('summary_text', pa.string()),
])


model_name_arm = f'hf-summarizer-arm'
model_file_name = './models/hf_summarization.zip'

model_arm = wl.upload_model(model_name_arm, 
                        model_file_name, 
                        framework=wallaroo.framework.Framework.HUGGING_FACE_SUMMARIZATION, 
                        input_schema=input_schema, 
                        output_schema=output_schema,
                        arch=Architecture.ARM
                        )

Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime......................successful

Ready


In [8]:
display(model_arm)

0,1
Name,hf-summarizer-arm
Version,712b3023-afba-4b8b-ac63-fc2c1a59c903
File Name,hf_summarization.zip
SHA,ee71d066a83708e7ca4a3c07caf33fdc528bb000039b6ca2ef77fa2428dc6268
Status,ready
Image Path,proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy:v2024.1.0-main-4870
Architecture,arm
Acceleration,none
Updated At,2024-03-Apr 21:42:17


### Build Pipeline

We build the pipeline with the `wallaroo.client.build_pipeline(pipeline_name` command, and set the model as a model step in the pipeline.

In [9]:
pipeline_arm = wl.build_pipeline('architecture-demonstration-arm')
pipeline_arm.add_model_step(model_arm)
pipeline_arm

0,1
name,architecture-demonstration-arm
created,2024-03-05 16:18:38.768602+00:00
last_updated,2024-04-03 21:42:44.091069+00:00
deployed,False
arch,x86
accel,none
tags,
versions,"77dd7f95-42b9-422d-a40e-6b678a00e7a8, 47258923-c616-471a-af49-f6504d3c0d22, 4e942b31-d34e-4764-a7fb-6dc27ac00a64, 88801051-5e25-4dda-a3bd-6e64b154f81e, 80c2e1fb-57ba-4ee8-a47b-b09494158769, bbdbc69d-7cc5-4f9b-a70f-6ebaef441075, 07b5ee82-95df-4f30-9128-f344a8df0625, d033152c-494c-44a6-8981-627c6b6ad72e"
steps,house-price-estimator-arm
published,True


In [16]:
pipeline_arm_version = pipeline_arm.create_version()

pipeline_arm

0,1
name,architecture-demonstration-arm
created,2024-03-05 16:18:38.768602+00:00
last_updated,2024-04-03 21:46:21.865211+00:00
deployed,True
arch,arm
accel,none
tags,
versions,"ae54ae3f-6c26-4584-b424-4c0207d95f3e, 77dd7f95-42b9-422d-a40e-6b678a00e7a8, 47258923-c616-471a-af49-f6504d3c0d22, 4e942b31-d34e-4764-a7fb-6dc27ac00a64, 88801051-5e25-4dda-a3bd-6e64b154f81e, 80c2e1fb-57ba-4ee8-a47b-b09494158769, bbdbc69d-7cc5-4f9b-a70f-6ebaef441075, 07b5ee82-95df-4f30-9128-f344a8df0625, d033152c-494c-44a6-8981-627c6b6ad72e"
steps,hf-summarizer-arm
published,True


### Deploy Pipeline



In [None]:
from wallaroo.deployment_config import DeploymentConfigBuilder

deployment_config = DeploymentConfigBuilder() \
    .cpus(0.25).memory('1Gi') \
    .sidekick_cpus(model_arm, 4) \
    .sidekick_memory(model_arm, "8Gi") \
    .build()

pipeline_arm.deploy(deployment_config=deployment_config)

In [17]:
display(pipeline_arm)

0,1
name,architecture-demonstration-arm
created,2024-03-05 16:18:38.768602+00:00
last_updated,2024-04-03 21:46:21.865211+00:00
deployed,True
arch,arm
accel,none
tags,
versions,"ae54ae3f-6c26-4584-b424-4c0207d95f3e, 77dd7f95-42b9-422d-a40e-6b678a00e7a8, 47258923-c616-471a-af49-f6504d3c0d22, 4e942b31-d34e-4764-a7fb-6dc27ac00a64, 88801051-5e25-4dda-a3bd-6e64b154f81e, 80c2e1fb-57ba-4ee8-a47b-b09494158769, bbdbc69d-7cc5-4f9b-a70f-6ebaef441075, 07b5ee82-95df-4f30-9128-f344a8df0625, d033152c-494c-44a6-8981-627c6b6ad72e"
steps,hf-summarizer-arm
published,True


## Pipeline Publish for ARM Architecture via the Wallaroo SDK

We now publish our pipeline to the OCI registry.

### Publish Pipeline for ARM

Publishing the pipeline uses the pipeline `wallaroo.pipeline.Pipeline.publish()` command.  This requires that the Wallaroo Ops instance have [Edge Registry Services](https://docs.wallaroo.ai/wallaroo-operations-guide/wallaroo-configuration/wallaroo-edge-deployment/#enable-wallaroo-edge-deployment-registry) enabled.

When publishing, we specify the pipeline deployment configuration through the `wallaroo.DeploymentConnfigBuilder` and specify the architecture as `wallaroo.engine_config.Architecture.ARM`.

The following publishes the pipeline to the OCI registry and displays the container details.  For more information, see [Wallaroo SDK Essentials Guide: Pipeline Edge Publication](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline-publication/).

In [18]:
# default deployment configuration
assay_pub_arm = pipeline_arm.publish(deployment_config=wallaroo.DeploymentConfigBuilder().build())
assay_pub_arm

Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing................ Published.


0,1
ID,86
Pipeline Name,architecture-demonstration-arm
Pipeline Version,fd5e3d64-9eea-492d-92b2-8bdb5b20ec83
Status,Published
Engine URL,us-central1-docker.pkg.dev/wallaroo-dev-253816/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870
Pipeline URL,us-central1-docker.pkg.dev/wallaroo-dev-253816/uat/pipelines/architecture-demonstration-arm:fd5e3d64-9eea-492d-92b2-8bdb5b20ec83
Helm Chart URL,oci://us-central1-docker.pkg.dev/wallaroo-dev-253816/uat/charts/architecture-demonstration-arm
Helm Chart Reference,us-central1-docker.pkg.dev/wallaroo-dev-253816/uat/charts@sha256:7e2a314d9024cc2529be3e902eb24ac241f1e0819fc07e47bf26dd2e6e64f183
Helm Chart Version,0.0.1-fd5e3d64-9eea-492d-92b2-8bdb5b20ec83
Engine Config,"{'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'none', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}}"

0
docker run \  -p $EDGE_PORT:8080 \  -e OCI_USERNAME=$OCI_USERNAME \  -e OCI_PASSWORD=$OCI_PASSWORD \  -e PIPELINE_URL=us-central1-docker.pkg.dev/wallaroo-dev-253816/uat/pipelines/architecture-demonstration-arm:fd5e3d64-9eea-492d-92b2-8bdb5b20ec83 \  -e CONFIG_CPUS=1 us-central1-docker.pkg.dev/wallaroo-dev-253816/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870

0
helm install --atomic $HELM_INSTALL_NAME \  oci://us-central1-docker.pkg.dev/wallaroo-dev-253816/uat/charts/architecture-demonstration-arm \  --namespace $HELM_INSTALL_NAMESPACE \  --version 0.0.1-fd5e3d64-9eea-492d-92b2-8bdb5b20ec83 \  --set ociRegistry.username=$OCI_USERNAME \  --set ociRegistry.password=$OCI_PASSWORD


For details on performing inference requests through an edge deployed model, see [Edge Deployment Endpoints](https://staging.docs.wallaroo.ai/wallaroo-model-operations/wallaroo-model-operations-run-anywhere/wallaroo-model-operations-run-anywhere-deploy/#edge-deployment-endpoints).