Welcome to the Wallaroo `ccfraud` model example!  This example will demonstrate how to use Wallaroo to detect credit card fraud through a trained model and sample data.  By the end of this example, you'll be able to:

* Start the Wallaroo client.
* Create a workspace.
* Upload the credit card fraud detection model to the workspace.
* Create a new pipeline and set it to our credit card fraud detection model.
* Run a smoke test to verify the pipeline and model is working properly.
* Perform a bulk inference and display the results.
* Undeploy the pipeline to get back the resources from our Kubernetes cluster.

This example and sample data comes from the Machine Learning Group's demonstration on [Credit Card Fraud detection](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud).

## Open a Connection to Wallaroo

The first step is to connect to Wallaroo through the Wallaroo client.  The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the `wallaroo.Client()` command, which provides a URL to grant the SDK permission to your specific Wallaroo environment.  When displayed, enter the URL into a browser and confirm permissions.  Store the connection into a variable that can be referenced later.

In [15]:
import wallaroo

In [16]:
wl = wallaroo.Client()

Next we're going to create a new workspace called `ccfraud_workspace` for our model, then set it as our current workspace context.

In [18]:
new_workspace = wl.create_workspace("ccfraud-workspace")
wl.set_current_workspace(new_workspace)

Just to make sure, let's list our current workspace.  If everything is going right, it will show us we're in the `ccfraud-workspace`.

In [19]:
wl.get_current_workspace()

{'name': 'ccfraud-workspace', 'id': 7, 'archived': False, 'created_by': '24eebcf0-9db0-461d-b3f4-bbf77d64f9fd', 'created_at': '2022-03-25T21:22:25.874829+00:00', 'models': [], 'pipelines': []}

Our workspace is created.  Let's upload our credit card fraud model to it.  This is the file name `ccfraud.onnx`, and we'll upload it as `ccfraud-model`.  The credit card fraud model is trained to detect credit card fraud based on a 0 to 1 model:  The closer to 0 the less likely the transactions indicate fraud, while the closer to 1 the more likely the transactions indicate fraud.


Since we're already in our default workspace `ccfraud-workspace`, it'll be uploaded right to there.  Once that's done uploading, we'll list out all of the models currently deployed so we can see it included.

In [6]:
# Upload the model to our workspace
model = wl.upload_model("ccfraud-model", "./ccfraud.onnx").configure()

With our model uploaded, time to create our pipeline and deploy it so it can accept data and process it through our `ccfraud-model`.  We'll call our pipeline `ccfraud-pipeline`.  This takes about 45 seconds, and when finished it will display `ok` at the end.

In [8]:
# Create the pipeline and deploy it
ccfraud_pipeline = wl.build_pipeline('ccfraud-pipeline')
ccfraud_pipeline.add_model_step(model)
ccfraud_pipeline.deploy()

Waiting for deployment - this will take up to 45s ....... ok


We can see our new pipeline with the `status()` command.

In [9]:
ccfraud_pipeline.status()

{'status': 'Running',
 'details': None,
 'engines': [{'ip': '10.12.1.179',
   'name': 'engine-6c45c5cb6b-chq8c',
   'status': 'Running',
   'reason': None,
   'pipeline_statuses': {'pipelines': [{'id': 'ccfraud-pipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'ccfraud-model',
      'version': '31d8c19a-e01f-4f04-abc6-17a3543a95a7',
      'sha': 'bc85ce596945f876256f41515c7501c399fd97ebcb9ab3dd41bf03f8937b4507',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.12.1.180',
   'name': 'engine-lb-85846c64f8-7clrb',
   'status': 'Running',
   'reason': None}]}

With our pipeline deployed, let's run a smoke test to make sure it's working right.  We'll run an inference through our pipeline from the file `smoke_test.json` and see the results.

In [10]:
ccfraud_pipeline.infer_from_file('./smoke_test.json')

Waiting for inference response - this will take up to 45s ... ok


[InferenceResult({'check_failures': [],
  'elapsed': 185745,
  'model_name': 'ccfraud-model',
  'model_version': 'f4024eed-d75c-402e-9d4a-f48b48ec0070',
  'original_data': {'tensor': [[1.0678324729342086,
                                0.21778102664937624,
                                -1.7115145261843976,
                                0.6822857209662413,
                                1.0138553066742804,
                                -0.43350000129006655,
                                0.7395859436561657,
                                -0.28828395953577357,
                                -0.44726268795990787,
                                0.5146124987725894,
                                0.3791316964287545,
                                0.5190619748123175,
                                -0.4904593221655364,
                                1.1656456468728569,
                                -0.9776307444180006,
                                -0.6322198962519854,
    

Looks good!  Time to run the real test on some real data.  Run another inference this time from the file `high_fraud.json` and let's see the results:

In [11]:
ccfraud_pipeline.infer_from_file('./high_fraud.json')

[InferenceResult({'check_failures': [],
  'elapsed': 161122,
  'model_name': 'ccfraud-model',
  'model_version': 'f4024eed-d75c-402e-9d4a-f48b48ec0070',
  'original_data': {'tensor': [[1.0678324729342086,
                                18.155556397512136,
                                -1.658955105843852,
                                5.2111788045436445,
                                2.345247064454334,
                                10.467083577773014,
                                5.0925820522419745,
                                12.82951536371218,
                                4.953677046849403,
                                2.3934736228338225,
                                23.912131817957253,
                                1.7599568310350209,
                                0.8561037518143335,
                                1.1656456468728569,
                                0.5395988813934498,
                                0.7784221343010385,
                  

With our work in the pipeline done, we'll undeploy it to get back our resources from the Kubernetes cluster.  If we keep the same settings we can redeploy the pipeline with the same configuration in the future.

In [None]:
ccfraud_pipeline.undeploy()

And there we have it!  Feel free to use this as a template for other models, inferences and pipelines that you want to deploy with Wallaroo!