📊 IBM Cloud Pak for Data Tutorial: Part II

In this hands-on tutorial you will build and evaluate machine learning models by using the AutoAI feature in Watson Studio.

Prerequisites

Sign up for an IBM Cloud account.
Fill in the required information and press the „Create Account“ button.
After you submit your registration, you will receive an e-mail from the IBM Cloud team with details about your account. In this e-mail, you will need to click the link provided to confirm your registration.
Now you should be able to login to your new IBM Cloud account ;-)

Cloud Pak for Data Tutorials Part I to VI

This tutorial consists of 6 parts, you can start with part I or any other part, however, the necessary environment is set up in part I.
Part I - data visualization, preparation, and transformation
Part II - build and evaluate machine learning models by using AutoAI
Part III - graphically build and evaluate machine learning models by using SPSS Modeler flow
Part IV - set up and run Jupyter Notebooks to develop a machine learning model
Part V - deploy a local Python app to test your model
Part VI - monitor your model with OpenScale

The first 4 parts of this tutorial are based on the Learning path: Getting started with Watson Studio.

1) CRISP-DM

The CRoss Industry Standard Process for Data Mining is a model to display the cycle of a data science project. It consists of six phases:
1. Business Understanding - What does the business need?
2. Data Understanding - What data do we have and how is it made up of?
3. Data Preparation - How can we structure the data for the modeling?
4. Modeling - Which modeling techniques could apply?
5. Evaluation - Which model is the most accurate?
6. Deployment - How to implement the model?

In this case we use AutoAI to nearly cover all phases of the CRISP-DM model. The business understanding is provided (predict churns) and the data understanding, preparation, modeling and evaluation are all performed by AutoAI.

Create a new AutoAI model

Select the Assets tab for your Watson Studio project.
In the Asset tab, click the Add to Project button.

Select the AutoAI Experiment asset type.
In the Create an AutoAI experiment window:

New is automatically selected as the experiment type and NOT (Gallery) sample.
Enter an Asset Name, such as ‘customer-churn-manual’.
For the Machine Learning Service, select the Watson Machine Learning service that you previously created for the project. If you have not created one, please do so now. It is available in the IBM Cloud Catalog under the category AI.
Then click Create.

In the Add data source window:

Click Select from project.
Select the customer churn data asset previously added to the project (e.g. customer-churn-analysis-V2, don't select any shaped data assets).
Click Select Asset.

Run and train the model

From the Configure details window:

Under select prediction column, select churn (If you get asked for a time series forecast, you can select no).

Keep the default prediction type of Binary Classification and the optimized metric of Accuracy.
Click Run experiment.

As the experiment is run, you see the different pipelines in the relationship map. After it finishes, a list of completed models is listed at the bottom of the panel, in order of accuracy. You can also take a look at the Progress map, by clicking swap view.

For our data, Pipeline 3 was ranked the highest, based on our Accuracy metric. After the AutoAI experiment completes, it is saved in the Watson Studio project. You can view it from the Assets tab under AutoAI experiments.

Evaluate the model performance

On the AutoAI Experiment page, there are a number of options available to get more details on how each pipeline performed.

The Pipeline comparison shows different metrics for each pipeline.
Clicking the pipeline name opens the Model Evaluation window for the pipeline.

Inside the Model Evaluation window, there is a menu on the left that provides more metrics for the pipeline, such as: Confusion Matrix table or Feature Importance / summary graph.

The AutoAI Experiment model feature might not provide the exact same set of classification approaches and evaluation metrics as you can get with a Jupyter Notebook, but it arrives at the result significantly faster, and with no programming required.

Deploy and test the model using Watson Machine Learning service

We can save as a Watson Machine Learning model asset that we can test with new data and deploy to generate predictions, or as a Notebook if we want to view the code that created this model pipeline or interact with with the model programatically.

Now, you must save the model.

For the highest rated pipeline, click Save as.
Keep the default choice to safe the model and the default name, and click Create.

The model should then appear in your project Models section of the Assets tab for the project.

To deploy the model, click the model name to open it.

Note: In the new Watson Studio version you have to promote your created model to a deployment space by clicking the button "Promote to deployment space". If you haven't created a depeloyment space you can do that here. You can access your deployment spaces from your Cloud Pak for Data Homepage. Inside your deployment space you will see your promoted model under assets. Click "Deploy" and create a new online deployment. Skip the next steps and go directly to the deployments tab, where you can test your model.

Click Promote to deployment space.

Chose your deployment space or set up a New space.
Click Promote.

Go to your deployment space under Assets and click on the deploy button next to your asset:

Enter a Name for the deployment (for example, ‘customer-churn-manual’).
Choose Online as Deployment Type.
Enter an optional Description.
Click Create to save the deployment.
Wait until Watson Studio sets the STATUS field to ‘Deployed’.

The model is now deployed and can be used for prediction. However, before using it in a production environment it might be worthwhile to test it using real data. Therefore we will use a JSON object. It’s the most convenient option to perform tests more than once (which is usually the case), and when a large set of feature values is needed.

To make it easier for you, you can cut and paste the following sample JSON object - or use the code in the test-model.json file - to use in the following steps:

{"input_data":[{"fields": ["state", "account length", "area code", "phone number", "international plan", "voice mail plan", "number vmail messages", "total day minutes", "total day calls", "total day charge", "total eve minutes", "total eve calls", "total eve charge", "total night minutes", "total night calls", "total night charge", "total intl minutes", "total intl calls", "total intl charge", "customer service calls"], "values": [["NY",161,415,"351-7269","no","no",0,332.9,67,56.59,317.8,97,27.01,160.6,128,7.23,5.4,9,1.46,4]]}]}

To test the model at run time:

Select the deployment that you just created by clicking the deployment name (for example, ‘customer-churn-manual’).

This opens a new page showing you an overview of the properties of the deployment (for example, name, creation date, and status).
Select the Test tab.
Select the file icon, which allows you to enter the values using JSON.
Paste the sample JSON object into the Enter input data field.
Click Predict to view the results.

The result of the prediction is given in terms of the probability that the customer will churn (True) or not (False). You can try it with other values, for example, by substituting the values with values taken from the ‘customer-churn-kaggle.csv’ file. Another test would be to change the phone number to something like “XYZ” and then run the prediction again. The result of the prediction should be the same, which indicates that the feature is not a factor in the prediction.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
readme_images		readme_images
LICENSE		LICENSE
README.md		README.md
customer-churn-analysis-V2.csv		customer-churn-analysis-V2.csv
customer-churn-flow.str		customer-churn-flow.str
customer-churn-notebook-py3.7.ipynb		customer-churn-notebook-py3.7.ipynb
test-model.json		test-model.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 IBM Cloud Pak for Data Tutorial: Part II

In this hands-on tutorial you will build and evaluate machine learning models by using the AutoAI feature in Watson Studio.

Prerequisites

Cloud Pak for Data Tutorials Part I to VI

1) CRISP-DM

Create a new AutoAI model

Run and train the model

Evaluate the model performance

Deploy and test the model using Watson Machine Learning service

About

Releases

Packages

Contributors 7

Languages

License

FelixAugenstein/cloud-pak-for-data-tutorial-part-ii

Folders and files

Latest commit

History

Repository files navigation

📊 IBM Cloud Pak for Data Tutorial: Part II

In this hands-on tutorial you will build and evaluate machine learning models by using the AutoAI feature in Watson Studio.

Prerequisites

Cloud Pak for Data Tutorials Part I to VI

1) CRISP-DM

Create a new AutoAI model

Run and train the model

Evaluate the model performance

Deploy and test the model using Watson Machine Learning service

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages