# Simple deployment and inference
In this notebook we will walk through a simple deployment to inference on a model. For this example we will be using an open source model that uses an Aloha CNN LSTM model for classifiying Domain names.
In this notebook we will take the following steps
<ol>
    <li>Open connection to wallaroo</li>
    <li>upload a model</li>
    <li>create a deployment for inferencing on a model</li>
    <li>run inferences</li>
</ol>

# Open connection to wallaroo
here we will import the libraries needed for this notebook

In [5]:
import wallaroo
import os

In [6]:
os.environ["WALLAROO_SDK_CREDENTIALS"] = 'creds.json'
wl = wallaroo.Client(auth_type="user_password")

## Start client to cluster
We can now us the wallaroo library to set up a connection to the wallaroo cluster

In [7]:
wl.list_workspaces()

Name,Created At,Users,Models,Pipelines
demandcurve-workspace,2022-03-28 16:22:42,['steve@ex.co'],3,1
demandcurve-workspace,2022-03-28 16:28:21,['steve@ex.co'],3,1
imdb-workspace,2022-03-28 17:44:32,['steve@ex.co'],0,0
imdb-workspace,2022-03-28 18:59:43,['steve@ex.co'],2,1
imdb-workspace,2022-03-28 19:23:04,['steve@ex.co'],2,1


## Work space demo

In [8]:
new_workspace = wl.create_workspace("aloha-workspace")
_ = wl.set_current_workspace(new_workspace)

We can verify the workspace is created and running with the `get_current_workspace()` command.

In [9]:
wl.get_current_workspace()

{'name': 'aloha-workspace', 'id': 6, 'archived': False, 'created_by': '7dbb3754-4c14-4730-8b77-33caeea7a2a0', 'created_at': '2022-03-29T16:14:08.85824+00:00', 'models': [], 'pipelines': []}

## config 
before deploying an inference engine we will set the configuration of the engine.
To do this we will use the wallaroo DeploymentConfigBuilder() and fill in the options listed below to determine what the properties of our inference engine will be

note: this will not start the process of building anything in the kubernetes cluster yet. we are just setting the deployment configuration we will want to use later.
- replica_count - 1 => when deployed this will have a single inference engine
- cpus - 4 => each inference engine will have 4 cpus
- memory - 8Gi => each inference engine will have 8 Gb of memory

# recommedations
for this we are going to create two deployment_configrations 
 - deployment_config
 -- this config will use a single replica with 4 cpus and 8 Gb of memory and will be used for our deployment later



In [10]:
deployment_config = wallaroo.DeploymentConfigBuilder().replica_count(1).cpus(4).memory("8Gi").build()

# upload models
to get started we will upload 4 models(for simplicity it will be the 4 models with different names assigned to it.
once the models are uploaded we can select any model we wish and create a deployment with it.

Note that for this example we are applying the model from a .ZIP file.  This is a [protobuf](https://developers.google.com/protocol-buffers) file that has been defined for evaluating web pages through tensorflow data.

In [11]:
model = wl.upload_model("aloha-2", "./aloha-cnn-lstm.zip").configure("tensorflow")

# viewing and selecting a model
now that we have uploaded several models we can produce a list of them with the wallaroo library by using the list_models() tool. we can then view the models names with model_name()

In [12]:
for m in wl.list_models():
    print("model name: " + m.name())
    print(str(m.last_update_time()))
    print("----------------------")

model name: aloha-2
2022-03-29 16:14:22.076073+00:00
----------------------
model name: smodel-o
2022-03-28 19:23:04.901949+00:00
----------------------
model name: embedder-o
2022-03-28 19:23:04.657393+00:00
----------------------
model name: smodel-o
2022-03-28 18:59:48.734772+00:00
----------------------
model name: embedder-o
2022-03-28 18:59:48.489813+00:00
----------------------
model name: postprocess
2022-03-28 16:28:22.167192+00:00
----------------------
model name: preprocess
2022-03-28 16:28:22.019745+00:00
----------------------
model name: demandcurve
2022-03-28 16:28:21.723933+00:00
----------------------
model name: postprocess
2022-03-28 16:23:30.958720+00:00
----------------------
model name: preprocess
2022-03-28 16:23:30.124241+00:00
----------------------
model name: demandcurve
2022-03-28 16:23:29.344038+00:00
----------------------


# Deploy a model
Now that we have a model that we want to use we will create a deployment for it. 

We will tell the deployment we are using a tensorflow model and give the deployment name and the configuration we want for the deployment

# Example

Now that our models are uploaded, we'll create our pipeline that can ingest the data, pass the data through each of the pipelines steps and give us a final output.  In this case, we only have one step of applying the inputted data 

In [13]:
aloha_pipeline = wl.build_pipeline('aloha-test-demo')
aloha_pipeline.add_model_step(model)
aloha_pipeline.deploy()

Waiting for deployment - this will take up to 45s ....... ok


{'name': 'aloha-test-demo', 'create_time': datetime.datetime(2022, 3, 29, 16, 15, 31, 638290, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'aloha-2', 'version': '496e6860-a658-4d35-8b55-0f8cc6ad6fde', 'sha': 'fd998cd5e4964bbbb4f8d29d245a8ac67df81b62be767afbceb96a03d1a01520'}]}}]"}

We can verify that the pipeline is running and list what models are associated with it.

In [14]:
aloha_pipeline.status()

{'status': 'Running',
 'details': None,
 'engines': [{'ip': '10.12.1.236',
   'name': 'engine-864d86d898-k26hv',
   'status': 'Running',
   'reason': None,
   'pipeline_statuses': {'pipelines': [{'id': 'aloha-test-demo',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'aloha-2',
      'version': '496e6860-a658-4d35-8b55-0f8cc6ad6fde',
      'sha': 'fd998cd5e4964bbbb4f8d29d245a8ac67df81b62be767afbceb96a03d1a01520',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.12.1.235',
   'name': 'engine-lb-85846c64f8-dcj4f',
   'status': 'Running',
   'reason': None}]}

# Successful deployment
now that we have a deployment running we start inferencing.

* **Note**:  If you receive an error about running out of resources, undeploy any other pipelines.  This command can quickly undeploy all pipelines to regain resources - through it should not be run in a production environment for that reason:

```python
for p in wl.list_pipelines(): p.undeploy()
```

## infer 1 row

Now that the pipeline is deployed and our Aloha models are in place, we'll perform a test to verify that everythig is running.  We'll use the `infer_from_file` command to load a single encoded URL into the inference engine and print the results back out:

In [15]:
aloha_pipeline.infer_from_file("data-1.json")

[InferenceResult({'check_failures': [],
  'elapsed': 631348351,
  'model_name': 'aloha-2',
  'model_version': '496e6860-a658-4d35-8b55-0f8cc6ad6fde',
  'original_data': {'text_input': [[0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                                    0,
                               

## run larger batch

Now that our smoke test is succesful, let's really give it some data.  We have two inference files we can use:

* `data-1k.json`:  Contains 1,0000 inferences
* `data-25k.json`: Contains 25,000 inferences

We'll pipe both of these files through the `aloha_pipeline` deployment URL, and place the results in a file named `response.txt`.  We'll also display the time this takes.  Note that for larger batches of 50,000 inferences or more can be difficult to view in Juypter Hub because of its size.

When running this example, replace the URL from the `_deployment._url()` command into the `curl` command below.

In [17]:
aloha_pipeline._deployment._url()

'http://engine-lb.aloha-test-demo-5:29502/pipelines/aloha-test-demo'

In [18]:
!curl -X POST http://engine-lb.aloha-test-demo-5:29502/pipelines/aloha-test-demo -H "Content-Type:application/json" --data @data-25k.json > curl_response.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.9M  100 10.1M  100 2886k   539k   149k  0:00:19  0:00:19 --:--:-- 2570k


## undeploy model
this will take down our inference engine. and free up the resources in kubernetes
- note that if the deployment variable is unchanged deployment.deploy() will restart the inference engine in the same configuration

In [19]:
aloha_pipeline.undeploy()

{'name': 'aloha-test-demo', 'create_time': datetime.datetime(2022, 3, 29, 16, 15, 31, 638290, tzinfo=tzutc()), 'definition': "[{'ModelInference': {'models': [{'name': 'aloha-2', 'version': '496e6860-a658-4d35-8b55-0f8cc6ad6fde', 'sha': 'fd998cd5e4964bbbb4f8d29d245a8ac67df81b62be767afbceb96a03d1a01520'}]}}]"}