This tutorial can be downloaded as part of the [Wallaroo Tutorials repository](https://github.com/WallarooLabs/Wallaroo_Tutorials/blob/wallaroo2025.2_tutorials/wallaroo-model-operations-tutorials/deploy/by-framework/python-models).

## Python Model Upload to Wallaroo

Python scripts can be deployed to Wallaroo as Python Models.  These are treated like other models, and are used for:

* ML Models: Models written entirely in Python script.
* Data Formatting:  Typically preprocess or post process modules that shape incoming data into what a ML model expects, or receives data output by a ML model and changes the data for other processes to accept.

Models are added to Wallaroo pipelines as pipeline steps, with the data from the previous step submitted to the next one.  Python steps require the entry method `wallaroo_json`.  These methods should be structured to receive and send pandas DataFrames as the inputs and outputs.

This allows inference requests to a Wallaroo pipeline to receive pandas DataFrames or Apache Arrow tables, and return the same for consistent results.

This tutorial will:

* Create a Wallaroo workspace and pipeline.
* Upload the sample Python model and ONNX model.
* Demonstrate the outputs of the ONNX model to an inference request.
* Demonstrate the functionality of the Python model in reshaping data after an inference request.
* Use both the ONNX model and the Python model together as pipeline steps to perform an inference request and export the data for use.

### Prerequisites

* Wallaroo Version 2023.2.1 or above instance.

### References

* [Wallaroo SDK Essentials Guide: Pipeline Management](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline/)
* [Python Models](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/wallaroo-sdk-model-upload-python/)

## Tutorial Steps

### Import Libraries

We'll start with importing the libraries we need for the tutorial.  The main libraries used are:

* Wallaroo: To connect with the Wallaroo instance and perform the MLOps commands.
* `pyarrow`: Used for formatting the data.
* `pandas`: Used for pandas DataFrame tables.

In [161]:
import wallaroo
from wallaroo.object import EntityNotFoundError
from wallaroo.framework import Framework
from wallaroo.deployment_config import DeploymentConfigBuilder

import datetime

import pandas as pd

import pyarrow as pa

### Connect to the Wallaroo Instance through the User Interface

The next step is to connect to Wallaroo through the Wallaroo client.  The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the `wallaroo.Client()` command, which provides a URL to grant the SDK permission to your specific Wallaroo environment.  When displayed, enter the URL into a browser and confirm permissions.  Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use `wl = wallaroo.Client()`.  For more information on Wallaroo Client settings, see the [Client Connection guide](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-client/).

In [162]:
# Login through local Wallaroo instance

wl = wallaroo.Client(request_timeout=1000)

### Set Variables

We'll set the name of our workspace, pipeline, models and files.  Workspace names must be unique across the Wallaroo workspace.  For this, we'll add in a randomly generated 4 characters to the workspace name to prevent collisions with other users' workspaces.  If running this tutorial, we recommend hard coding the workspace name so it will function in the same workspace each time it's run.



In [None]:
workspace_name = f'python-demo-20250303'
pipeline_name = f'python-step-demo-20250303'

onnx_model_name = 'house-price-sample'
onnx_model_file_name = './models/house_price_keras.onnx'
python_model_name = 'python-step-20250303'
python_model_file_name = './models/step.zip'

### Create a New Workspace

For our tutorial, we'll create the workspace, set it as the current workspace, then the pipeline we'll add our models to.

#### Create New Workspace References

* [Wallaroo SDK Essentials Guide: Workspace Management](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-workspace/)
* [Wallaroo SDK Essentials Guide: Pipeline Management](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-essentials-pipelines/wallaroo-sdk-essentials-pipeline/)

In [164]:
workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)

wl.set_current_workspace(workspace)

pipeline = wl.build_pipeline(pipeline_name)
pipeline

0,1
name,python-step-demo-20250303
created,2026-02-23 15:26:42.351665+00:00
last_updated,2026-02-23 15:26:42.351665+00:00
deployed,(none)
workspace_id,1859
workspace_name,python-demo-20250303
arch,
accel,none
tags,
versions,2d533a86-e6d6-412f-aa0f-579913025e1c


### Model Descriptions

We have two models we'll be using.

* `./models/house_price_keras.onnx`:  A ML model trained to forecast hour prices based on inputs.  This forecast is stored in the column `dense_2`.
* `./models/step.py`: A Python script that accepts the data from the house price model, and reformats the output. We'll be using it as a post-processing step.

For the Python step, it contains the method `wallaroo_json` as the entry point used by Wallaroo when deployed as a pipeline step.  Our sample script has the following:

```python
def process_data(input_data: InferenceData) -> InferenceData:
    # just changing the output data field to 'output'
    return {
        'output': np.float32(input_data.pop('dense_2'))
    }
```

As seen from the description, all those function will do it take the DataFrame output of the house price model, and output a DataFrame replacing the first element in the list from column `dense_2` with `output`.

### Upload Models

Both of these models will be uploaded to our current workspace using the method `upload_model(name, path, framework).configure(framework, input_schema, output_schema)`.

* For `./models/house_price_keras.onnx`, we will specify it as `Framework.ONNX`.  We do not need to specify the input and output schemas.
* For `./models/step.py`, we will set the input and output schemas in the required `pyarrow.lib.Schema` format.

#### Upload Model References

* [Wallaroo SDK Essentials Guide: Model Uploads and Registrations: ONNX](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/wallaroo-sdk-model-upload-onnx/)
* [Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Python Models](https://docs.wallaroo.ai/wallaroo-developer-guides/wallaroo-sdk-guides/wallaroo-sdk-essentials-guide/wallaroo-sdk-model-uploads/wallaroo-sdk-model-upload-python/)

In [165]:
house_price_model = (wl.upload_model(onnx_model_name, 
                                    onnx_model_file_name, 
                                    framework=Framework.ONNX)
                                    .configure('onnx', 
                                    tensor_fields=["tensor"]
                                    )
                    )

### Pipeline Steps

With our models uploaded, we'll perform different configurations of the pipeline steps.

First we'll add just the house price model to the pipeline, deploy it, and submit a sample inference.

In [166]:
# used to restrict the resources needed for this demonstration
deployment_config = DeploymentConfigBuilder() \
    .cpus(0.25).memory('1Gi') \
    .build()

In [167]:
# clear the pipeline if this tutorial was run before
pipeline.undeploy()
pipeline.clear()

0,1
name,python-step-demo-20250303
created,2026-02-23 15:26:42.351665+00:00
last_updated,2026-02-23 15:26:42.351665+00:00
deployed,(none)
workspace_id,1859
workspace_name,python-demo-20250303
arch,
accel,none
tags,
versions,2d533a86-e6d6-412f-aa0f-579913025e1c


In [168]:
pipeline.add_model_step(house_price_model).deploy(deployment_config=deployment_config)

0,1
name,python-step-demo-20250303
created,2026-02-23 15:26:42.351665+00:00
last_updated,2026-02-23 15:26:45.894300+00:00
deployed,True
workspace_id,1859
workspace_name,python-demo-20250303
arch,x86
accel,none
tags,
versions,"22e9e1d4-f441-4a13-a3df-ff87235288c0, 2d533a86-e6d6-412f-aa0f-579913025e1c"


In [169]:
pipeline.steps()

[{'ModelInference': {'models': [{'name': 'house-price-sample', 'version': '17478831-2a2e-4152-892e-4a8ae0e53b5b', 'sha': '809c9f9a3016e5ab2190900d5fcfa476ee7411aa7a9ac5d4041d1cbe874cf8b9'}]}}]

In [170]:
pipeline.status()

{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.4.2.12',
   'name': 'engine-96f7d9b67-qtd7n',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'python-step-demo-20250303',
      'status': 'Running',
      'version': '22e9e1d4-f441-4a13-a3df-ff87235288c0'}]},
   'model_statuses': {'models': [{'model_version_id': 1453,
      'name': 'house-price-sample',
      'sha': '809c9f9a3016e5ab2190900d5fcfa476ee7411aa7a9ac5d4041d1cbe874cf8b9',
      'status': 'Running',
      'version': '17478831-2a2e-4152-892e-4a8ae0e53b5b'}]}}],
 'engine_lbs': [{'ip': '10.4.2.13',
   'name': 'engine-lb-54b6db469f-zlh52',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

In [171]:
## sample inference data

data = pd.DataFrame.from_dict({"tensor": [[0.6878518042239091,
                                            0.17607340208535074,
                                            -0.8695140830357148,
                                            0.34638762962802144,
                                            -0.0916270832672289,
                                            -0.022063226781124278,
                                            -0.13969884765926363,
                                            1.002792335666138,
                                            -0.3067449033633758,
                                            0.9272000630461978,
                                            0.28326687982544635,
                                            0.35935375728372815,
                                            -0.682562654045523,
                                            0.532642794275658,
                                            -0.22705189652659302,
                                            0.5743846356405602,
                                            -0.18805086358065454
                                            ]]})

results = pipeline.infer(data)
display(results)

Unnamed: 0,time,in.tensor,out.dense_2,anomaly.count
0,2026-02-23 15:27:03.868,"[0.6878518042, 0.1760734021, -0.869514083, 0.3...",[12.886651],0


In [172]:
pipeline.undeploy()

0,1
name,python-step-demo-20250303
created,2026-02-23 15:26:42.351665+00:00
last_updated,2026-02-23 15:26:45.894300+00:00
deployed,False
workspace_id,1859
workspace_name,python-demo-20250303
arch,x86
accel,none
tags,
versions,"22e9e1d4-f441-4a13-a3df-ff87235288c0, 2d533a86-e6d6-412f-aa0f-579913025e1c"


### Inference with Pipeline Step

Our inference result had the results in the `out.dense_2` column.  We'll clear the pipeline, then add in as the pipeline step just the Python postprocessing step we've created.  Then for our inference request, we'll just submit the output of the house price model.  Our result should be the first element in the array returned in the `out.output` column.

In [173]:
input_schema = pa.schema([
    pa.field('dense_2', pa.list_(pa.float32()))
])
output_schema = pa.schema([
    pa.field('output', pa.list_(pa.float32()))
])

step = (wl.upload_model(python_model_name, 
                        python_model_file_name, 
                        framework=Framework.PYTHON,
                        input_schema=input_schema, 
                        output_schema=output_schema
                       )

       )

Waiting for model loading - this will take up to 10min.
Model is pending loading to a container runtime.................................................................................................................


[91mERROR![0m
[91mThere was an error during model conversion: Model failed to convert: None[0m
[91mYou can use model.get_upload_logs() to get more details.[0m


In [180]:
step.get_upload_logs()

### Putting Both Models Together

Now we'll do one last pipeline deployment with 2 steps:

* First the house price model that outputs the inference result into `dense_2`.
* Second the python step so it will accept the output of the house price model, and reshape it into `output`.

In [174]:
inference_start = datetime.datetime.now()
# give enough time to differentiate between inferences
import time
time.sleep(20)
pipeline.clear()
pipeline.add_model_step(house_price_model)
pipeline.add_model_step(step)

0,1
name,python-step-demo-20250303
created,2026-02-23 15:26:42.351665+00:00
last_updated,2026-02-23 15:26:45.894300+00:00
deployed,False
workspace_id,1859
workspace_name,python-demo-20250303
arch,x86
accel,none
tags,
versions,"22e9e1d4-f441-4a13-a3df-ff87235288c0, 2d533a86-e6d6-412f-aa0f-579913025e1c"


In [175]:
pipeline.undeploy()
pipeline.deploy(deployment_config=deployment_config)

0,1
name,python-step-demo-20250303
created,2026-02-23 15:26:42.351665+00:00
last_updated,2026-02-23 15:38:21.939118+00:00
deployed,True
workspace_id,1859
workspace_name,python-demo-20250303
arch,x86
accel,none
tags,
versions,"507d12e6-e7c3-41f5-a032-f9fd4737e5f3, 22e9e1d4-f441-4a13-a3df-ff87235288c0, 2d533a86-e6d6-412f-aa0f-579913025e1c"


In [176]:
pipeline.status()

{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.4.2.14',
   'name': 'engine-6f4b487d6c-f74hk',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'python-step-demo-20250303',
      'status': 'Running',
      'version': '507d12e6-e7c3-41f5-a032-f9fd4737e5f3'}]},
   'model_statuses': {'models': [{'model_version_id': 1453,
      'name': 'house-price-sample',
      'sha': '809c9f9a3016e5ab2190900d5fcfa476ee7411aa7a9ac5d4041d1cbe874cf8b9',
      'status': 'Running',
      'version': '17478831-2a2e-4152-892e-4a8ae0e53b5b'},
     {'model_version_id': 1454,
      'name': 'python-step',
      'sha': 'fd58a900c0d34b07a9bfd9e78fd1343198a757f14bede6a20212eeef149840a3',
      'status': 'Running',
      'version': 'e303e278-d00a-4931-b6a8-6e453cba2c93'}]}}],
 'engine_lbs': [{'ip': '10.4.2.15',
   'name': 'engine-lb-54b6db469f-vgdlb',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

In [179]:
data = pd.DataFrame.from_dict({"tensor": [[0.6878518042239091,
                                            0.17607340208535074,
                                            -0.8695140830357148,
                                            0.34638762962802144,
                                            -0.0916270832672289,
                                            -0.022063226781124278,
                                            -0.13969884765926363,
                                            1.002792335666138,
                                            -0.3067449033633758,
                                            0.9272000630461978,
                                            0.28326687982544635,
                                            0.35935375728372815,
                                            -0.682562654045523,
                                            0.532642794275658,
                                            -0.22705189652659302,
                                            0.5743846356405602,
                                            -0.18805086358065454
                                        ]]})

results = pipeline.infer(data)
display(results)

InferenceTimeoutError: Inference failed: Inference did not return within 15s. Adjust the timeout if necessary.

### Pipeline Logs

As the data was exported by the pipeline step as a pandas DataFrame, it will be reflected in the pipeline logs.  We'll retrieve the most recent log from our most recent inference.

In [None]:
inference_end = datetime.datetime.now()

pipeline.logs(start_datetime=inference_start, end_datetime=inference_end)

### Undeploy the Pipeline

With our tutorial complete, we'll undeploy the pipeline and return the resources back to the cluster.

This process demonstrated how to structure a postprocessing Python script as a Wallaroo Pipeline step.  This can be used for pre or post processing, Python based models, and other use cases.

In [None]:
pipeline.undeploy()