# Stage 3: Deploy the model pipeline in Wallaroo
 
Here we upload the trained model and the processing steps to Wallaroo, then set up and deploy the inference pipeline. 
We'll then feed the newest batch of data to the pipeline, do the inferences and write the results to a results table.

We'd expect to run this notebook in conjuction with the Stage 2 notebook, `training_notebook_final_1b`; for clarity in this demo, we have split the training/upload task into two notebooks. Assuming no changes are made to the structure of the model, these two notebooks, or a script based on them, can then be scheduled to run on a regular basis, to refresh the model with more recent training data and update the inference pipeline.

## Open a Connection to Wallaroo


In [1]:
import json
import pickle
import wallaroo
import pandas as pd
import numpy as np

import simdb # module for the purpose of this demo to simulate pulling data from a database

from wallaroo.ModelConversion import ConvertXGBoostArgs, ModelConversionSource, ModelConversionInputType
from wallaroo.object import EntityNotFoundError

from wallaroo_client import get_workspace

In [2]:
wl = wallaroo.Client()

Please log into the following URL in a web browser:

	https://sparkly-apple-3026.keycloak.wallaroo.community/auth/realms/master/device?user_code=HMRC-LUIO

Login successful!


In [4]:
new_workspace = get_workspace("housepricing", create_if_none=True)
_ = wl.set_current_workspace(new_workspace)

## Upload model to wallaroo (with autoconversion)

In [5]:
import preprocess  # just to get the number of variables; I could be more clever about this.

# the number of columns
NF = len(preprocess._vars)

model_name = "houseprice-model"

model_conversion_args = ConvertXGBoostArgs(
    name=model_name,
    comment="house price prediction",
    number_of_columns=NF,
    input_type=ModelConversionInputType.Float32
)
model_conversion_type = ModelConversionSource.XGBOOST

In [6]:
# convert and upload to wallaroo
modelfile = "./housing_model_xgb.pkl"
hpmodel = wl.convert_model(modelfile, model_conversion_type, model_conversion_args)

## Upload processing modules

In [7]:
# load the preprocess module
module_pre = wl.upload_model("preprocess", "./preprocess.py").configure('python')

In [8]:
# load the postprocess module
module_post = wl.upload_model("postprocess", "./postprocess.py").configure('python')

## Create pipeline

In [9]:
pipeline = (wl.build_pipeline('housing-pipe')
              .add_model_step(module_pre)
              .add_model_step(hpmodel)
              .add_model_step(module_post)
              .deploy()
           )
pipeline

Waiting for deployment - this will take up to 45s ................ ok


0,1
name,housing-pipe
created,2022-09-27 18:00:24.186536+00:00
last_updated,2022-09-27 18:00:24.396459+00:00
deployed,True
tags,
steps,preprocess


## Test pipeline

We'll just grab a single query from the housing_price table and infer.

In [10]:
conn = simdb.simulate_db_connection()

# create the query
query = f"select * from {simdb.tablename} limit 1"
print(query)

# read in the data
singleton = pd.read_sql_query(query, conn)
conn.close()

singleton

select * from house_listings limit 1


Unnamed: 0,id,date,list_price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,...,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15,sale_price
0,7129300520,2022-02-13,221900.0,3,1.0,1180,5650,1.0,0,0,...,1180,0,1955,0,98178,47.5112,-122.257,1340,5650,221900.0


In [11]:
result = pipeline.infer({'query': singleton.to_json()})
result

[InferenceResult({'check_failures': [],
  'elapsed': 516786,
  'model_name': 'postprocess',
  'model_version': 'ddde0d16-7a2d-48ba-94eb-22447d4c05fe',
  'original_data': {'query': '{"id":{"0":7129300520},"date":{"0":"2022-02-13"},"list_price":{"0":221900.0},"bedrooms":{"0":3},"bathrooms":{"0":1.0},"sqft_living":{"0":1180},"sqft_lot":{"0":5650},"floors":{"0":1.0},"waterfront":{"0":0},"view":{"0":0},"condition":{"0":3},"grade":{"0":7},"sqft_above":{"0":1180},"sqft_basement":{"0":0},"yr_built":{"0":1955},"yr_renovated":{"0":0},"zipcode":{"0":98178},"lat":{"0":47.5112},"long":{"0":-122.257},"sqft_living15":{"0":1340},"sqft_lot15":{"0":5650},"sale_price":{"0":221900.0}}'},
  'outputs': [{'Json': {'data': [{'prediction': [224852.0]}],
                        'dim': [1],
                        'v': 1}}],
  'pipeline_name': 'housing-pipe',
  'shadow_data': {},
  'time': 1664302139117})]

In [12]:
# just the output
result[0].data()

[array([224852.])]

In [13]:
pipeline.undeploy()

Waiting for undeployment - this will take up to 45s ..................................... ok


0,1
name,housing-pipe
created,2022-09-27 18:00:24.186536+00:00
last_updated,2022-09-27 18:00:24.396459+00:00
deployed,False
tags,
steps,preprocess
