![image](https://github.com/IBM/watson-machine-learning-samples/raw/master/cloud/notebooks/headers/AutoAI-Banner_Experiment-Notebook.png)
# Experiment Notebook - AutoAI Notebook v1.15.0




## Contents

This notebook contains the following parts:

**[Setup](#setup)**<br>
&nbsp;&nbsp;[Package installation](#install)<br>
&nbsp;&nbsp;[Watson Machine Learning connection](#connection)<br>
**[Experiment configuration](#configuration)**<br>
&nbsp;&nbsp;[Experiment metadata](#metadata)<br>
**[Working with completed AutoAI experiment](#work)**<br>
&nbsp;&nbsp;[Get fitted AutoAI optimizer](#get)<br>
&nbsp;&nbsp;[Pipelines comparison](#comparison)<br>
&nbsp;&nbsp;[Get pipeline as scikit-learn pipeline model](#get_pipeline)<br>
&nbsp;&nbsp;[Inspect pipeline](#inspect_pipeline)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[Visualize pipeline model](#visualize)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[Preview pipeline model as python code](#preview)<br>
**[Deploy and Score](#scoring)**<br>
&nbsp;&nbsp;[Working with spaces](#working_spaces)<br>
**[Running AutoAI experiment with Python SDK](#run)**<br>
**[Clean up](#cleanup)**<br>
**[Next steps](#next_steps)**<br>
**[Copyrights](#copyrights)**

<a id="setup"></a>
# Setup

<a id="install"></a>
## Package installation
Before you use the sample code in this notebook, install the following packages:
 - ibm_watson_machine_learning,
 - autoai-libs,
 - lale,
 - scikit-learn,
 - xgboost,
 - lightgbm.


In [5]:
!pip install ibm-watson-machine-learning 
!pip install -U autoai-libs==1.12.7 
!pip install -U 'lale>=0.5.1,<0.6' 
!pip install -U scikit-learn==0.23.2 
!pip install -U xgboost==1.3.3 
!pip install -U lightgbm==3.1.1

Collecting ibm-watson-machine-learning
  Using cached ibm_watson_machine_learning-1.0.253-py3-none-any.whl (1.8 MB)
Collecting ibm-cos-sdk==2.12.*
  Using cached ibm_cos_sdk-2.12.0-py3-none-any.whl
Collecting importlib-metadata
  Using cached importlib_metadata-5.0.0-py3-none-any.whl (21 kB)
Collecting certifi
  Using cached certifi-2022.9.24-py3-none-any.whl (161 kB)
Collecting pandas<1.5.0,>=0.24.2
  Using cached pandas-1.4.4-cp310-cp310-win_amd64.whl (10.0 MB)
Collecting requests
  Using cached requests-2.28.1-py3-none-any.whl (62 kB)
Collecting ibm-cos-sdk-core==2.12.0
  Using cached ibm_cos_sdk_core-2.12.0-py3-none-any.whl
Collecting ibm-cos-sdk-s3transfer==2.12.0
  Using cached ibm_cos_sdk_s3transfer-2.12.0-py3-none-any.whl
Collecting idna<4,>=2.5
  Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting charset-normalizer<3,>=2
  Using cached charset_normalizer-2.1.1-py3-none-any.whl (39 kB)
Installing collected packages: importlib-metadata, idna, charset-normalizer, certifi, 

ERROR: Could not install packages due to an OSError: [WinError 2] The system cannot find the file specified: 'C:\\Python310\\Scripts\\normalizer.exe' -> 'C:\\Python310\\Scripts\\normalizer.exe.deleteme'

ERROR: Could not find a version that satisfies the requirement autoai-libs==1.12.7 (from versions: 1.14.0, 1.14.2, 1.14.3, 1.14.4, 1.14.6)
ERROR: No matching distribution found for autoai-libs==1.12.7
The system cannot find the file specified.


Collecting scikit-learn==0.23.2
  Using cached scikit-learn-0.23.2.tar.gz (7.2 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'error'


  error: subprocess-exited-with-error
  
  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [620 lines of output]
      Ignoring numpy: markers 'python_version == "3.6" and platform_system != "AIX" and platform_python_implementation == "CPython"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.6" and platform_system != "AIX" and platform_python_implementation != "CPython"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.7" and platform_system != "AIX"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.6" and platform_system == "AIX"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.7" and platform_system == "AIX"' don't match your environment
      Ignoring numpy: markers 'python_version >= "3.8" and platform_system == "AIX"' don't match your environment
      Collecting setuptools
        Using cached setuptools









<a id="configuration"></a>
# Experiment configuration

<a id="metadata"></a>
## Experiment metadata
This cell defines the metadata for the experiment, including: training_data_reference, training_result_reference, experiment_metadata.

In [4]:
from ibm_watson_machine_learning.helpers import DataConnection
from ibm_watson_machine_learning.helpers import S3Connection, S3Location

training_data_reference = [DataConnection(                                                                      
    connection=S3Connection(
        api_key='8MGQXLn_Vv9er3Ww4RuBIzZJd-7Ir8Xz4IKFgcc-E2X2',
        auth_endpoint='https://iam.bluemix.net/oidc/token/',
        endpoint_url='https://s3.eu-geo.objectstorage.softlayer.net'    
    ),
        location=S3Location(
        bucket='telcoconsumerchurn-donotdelete-pr-z6aanmrxdbpqcg',
        path='Telco-Customer-Churn.csv'
    )),
]
training_result_reference = DataConnection(
    connection=S3Connection(
        api_key='8MGQXLn_Vv9er3Ww4RuBIzZJd-7Ir8Xz4IKFgcc-E2X2',
        auth_endpoint='https://iam.bluemix.net/oidc/token/',
        endpoint_url='https://s3.eu-geo.objectstorage.softlayer.net'
    ),
    location=S3Location(
        bucket='telcoconsumerchurn-donotdelete-pr-z6aanmrxdbpqcg',
        path='auto_ml/8333b8d1-d61f-4781-a97c-289ffcefaa1c/wml_data/4ec64c50-6f28-4e7e-9448-f3e42734625a/data/automl',
        model_location='auto_ml/8333b8d1-d61f-4781-a97c-289ffcefaa1c/wml_data/4ec64c50-6f28-4e7e-9448-f3e42734625a/data/automl/pre_hpo_d_output/Pipeline1/model.pickle',
        training_status='auto_ml/8333b8d1-d61f-4781-a97c-289ffcefaa1c/wml_data/4ec64c50-6f28-4e7e-9448-f3e42734625a/training-status.json'
    ))

ModuleNotFoundError: No module named 'ibm_watson_machine_learning'

In [None]:
experiment_metadata = dict(
   prediction_type='classification', 
   prediction_column='Churn',
   holdout_size=0.1,
   scoring='accuracy',
   csv_separator=',',
   random_state=33,
   max_number_of_estimators=2,
   training_data_reference=training_data_reference,
   training_result_reference=training_result_reference,
   deployment_url='https://eu-gb.ml.cloud.ibm.com',
   project_id='992cbd02-a9bd-4ae0-a700-7eef966a4da2',
   positive_label='Yes',
   drop_duplicates=True
)

<a id="connection"></a>
## Watson Machine Learning connection

This cell defines the credentials required to work with the Watson Machine Learning service.

**Action** Please provide IBM Cloud apikey following [docs](https://cloud.ibm.com/docs/account?topic=account-userapikey).

In [None]:
api_key = 'PUT_YOUR_KEY_HERE'

In [None]:
wml_credentials = {
    "apikey": api_key,
    "url": experiment_metadata['deployment_url']
}

<a id="work"></a>


# Working with completed AutoAI experiment

This cell imports the pipelines generated for the experiment so they can be compared to find the optimal pipeline to save as a model.

<a id="get"></a>


## Get fitted AutoAI optimizer

In [None]:
from ibm_watson_machine_learning.experiment import AutoAI

pipeline_optimizer = AutoAI(wml_credentials, project_id=experiment_metadata['project_id']).runs.get_optimizer(metadata=experiment_metadata)

DEPRECATED!! Python 3.6 framework is deprecated and will be removed on Jan 20th, 2021. It will be read-only mode starting Nov 20th, 2020. i.e you won't be able to create new assets using this client. Use Python 3.7 instead. For details, see https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/pm_service_supported_frameworks.html


WMLClientError: Error during getting IAM Token.
Reason: <Response [400]>

Use `get_params()`- to retrieve configuration parameters.

In [6]:
pipeline_optimizer.get_params()

NameError: name 'pipeline_optimizer' is not defined

<a id="comparison"></a>
## Pipelines comparison

Use the `summary()` method to list trained pipelines and evaluation metrics information in
the form of a Pandas DataFrame. You can use the DataFrame to compare all discovered pipelines and select the one you like for further testing.

In [None]:
summary = pipeline_optimizer.summary()
best_pipeline_name = list(summary.index)[0]
summary

<a id="get_pipeline"></a>
### Get pipeline as scikit-learn pipeline model

After you compare the pipelines, download and save a scikit-learn pipeline model object from the
AutoAI training job.

**Tip:** If you want to get a specific pipeline you need to pass the pipeline name in:
```
pipeline_optimizer.get_pipeline(pipeline_name=pipeline_name)
```

In [None]:
pipeline_model = pipeline_optimizer.get_pipeline()

Next, check features importance for selected pipeline.

In [None]:
pipeline_optimizer.get_pipeline_details()['features_importance']

**Tip:** If you want to check all model evaluation metrics-details, use:
```
pipeline_optimizer.get_pipeline_details()
```

<a id="inspect_pipeline"></a>
## Inspect pipeline

<a id="visualize"></a>
### Visualize pipeline model

Preview pipeline model stages as a graph. Each node's name links to a detailed description of the stage.


In [None]:
pipeline_model.visualize()

<a id="preview"></a>
### Preview pipeline model as python code
In the next cell, you can preview the saved pipeline model as a python code.  
You will be able to review the exact steps used to create the model.

**Note:** If you want to get sklearn representation add following parameter to `pretty_print` call: `astype='sklearn'`.

In [None]:
pipeline_model.pretty_print(combinators=False, ipython_display=True)

<a id="scoring"></a>
## Deploy and Score

In this section you will learn how to deploy and score the model as a web service.

<a id="working_spaces"></a>
### Working with spaces

In this section you will specify a deployment space for organizing the assets for deploying and scoring the model. If you do not have an existing space, you can use [Deployment Spaces Dashboard](https://dataplatform.cloud.ibm.com/ml-runtime/spaces?context=cpdaas) to create a new space, following these steps:

- Click **New Deployment Space**.
- Create an empty space.
- Select Cloud Object Storage.
- Select Watson Machine Learning instance and press **Create**.
- Copy `space_id` and paste it below.

**Tip**: You can also use the SDK to prepare the space for your work. Learn more [here](https://github.com/IBM/watson-machine-learning-samples/blob/master/notebooks/python_sdk/instance-management/Space%20management.ipynb).

**Action**: assign or update space ID below

### Deployment creation

In [None]:
target_space_id = "PUT_YOUR_TARGET_SPACE_ID_HERE"

from ibm_watson_machine_learning.deployment import WebService
service = WebService(source_wml_credentials=wml_credentials,
                     target_wml_credentials=wml_credentials,
                     source_project_id=experiment_metadata['project_id'],
                     target_space_id=target_space_id)
service.create(
model=best_pipeline_name,
metadata=experiment_metadata,
deployment_name='Best_pipeline_webservice'
)

Use the `print` method for the deployment object to show basic information about the service: 

In [None]:
print(service)

To show all available information about the deployment use the `.get_params()` method:

In [None]:
service.get_params()

### Scoring of webservice
You can make scoring request by calling `score()` on the deployed pipeline.

If you want to work with the web service in an external Python application,follow these steps to retrieve the service object:

 - Initialize the service by `service = WebService(wml_credentials)`
 - Get deployment_id by `service.list()` method
 - Get webservice object by `service.get('deployment_id')` method

After that you can call `service.score()` method.

### Deleting deployment
<a id="cleanup"></a>
You can delete the existing deployment by calling the `service.delete()` command.
To list the existing web services, use `service.list()`.

<a id="run"></a>

## Running AutoAI experiment with Python SDK

If you want to run AutoAI experiment using python API follow up the steps decribed below. The experiment settings were generated basing on parameters set on UI.
 - Go to your COS dashboard.
 - In Service credentials tab, click New Credential.
 - Add the inline configuration parameter: `{“HMAC”:true}`, click Add.
This configuration parameter adds the following section to the instance credentials, (for use later in this notebook):
```
cos_hmac_keys”: {
      “access_key_id”: “***“,
      “secret_access_key”: “***”
 }
 ```

**Action:** Please provide cos credentials in following cells.

- Use provided markdown cells to run code.



```
from ibm_watson_machine_learning.experiment import AutoAI

experiment = AutoAI(wml_credentials, project_id=experiment_metadata['project_id'])
```

```
#@hidden_cell
cos_hmac_keys = {
    "access_key_id": "PLACE_YOUR_ACCESS_KEY_ID_HERE",
    "secret_access_key": "PLACE_YOUR_SECRET_ACCESS_KEY_HERE"
  }
  
cos_api_key = "PLACE_YOUR_API_KEY_HERE"
OPTIMIZER_NAME = 'custom_name'
```

The experiment settings were generated basing on parameters set on UI.

```
from ibm_watson_machine_learning.helpers import DataConnection
from ibm_watson_machine_learning.helpers import S3Connection, S3Location

training_data_reference = [DataConnection(
    connection=S3Connection(
        api_key=cos_api_key,
        auth_endpoint='https://iam.bluemix.net/oidc/token/',
        endpoint_url='https://s3.eu-geo.objectstorage.softlayer.net',
        access_key_id = cos_hmac_keys['access_key_id'],
        secret_access_key = cos_hmac_keys['secret_access_key']
    ),
        location=S3Location(
        bucket='telcoconsumerchurn-donotdelete-pr-z6aanmrxdbpqcg',
        path='Telco-Customer-Churn.csv'
    )),
]
from ibm_watson_machine_learning.helpers import S3Connection, S3Location
training_result_reference = DataConnection(
    connection=S3Connection(
        api_key=cos_api_key,
        auth_endpoint='https://iam.bluemix.net/oidc/token/',
        endpoint_url='https://s3.eu-geo.objectstorage.softlayer.net',
        access_key_id = cos_hmac_keys['access_key_id'],
        secret_access_key = cos_hmac_keys['secret_access_key']
    ),
    location=S3Location(
        bucket='telcoconsumerchurn-donotdelete-pr-z6aanmrxdbpqcg',
        path='auto_ml/8333b8d1-d61f-4781-a97c-289ffcefaa1c/wml_data/4ec64c50-6f28-4e7e-9448-f3e42734625a/data/automl',
        model_location='auto_ml/8333b8d1-d61f-4781-a97c-289ffcefaa1c/wml_data/4ec64c50-6f28-4e7e-9448-f3e42734625a/data/automl/pre_hpo_d_output/Pipeline1/model.pickle',
        training_status='auto_ml/8333b8d1-d61f-4781-a97c-289ffcefaa1c/wml_data/4ec64c50-6f28-4e7e-9448-f3e42734625a/training-status.json'
    ))
```

```
pipeline_optimizer = experiment.optimizer(
    name=OPTIMIZER_NAME,
    prediction_type=experiment_metadata['prediction_type'],
    prediction_column=experiment_metadata['prediction_column'],
    scoring=experiment_metadata['scoring'],
    holdout_size=experiment_metadata['holdout_size'],
    csv_separator=experiment_metadata['csv_separator'],
    positive_label=experiment_metadata['positive_label'],
    drop_duplicates=experiment_metadata['drop_duplicates'])
```

```
pipeline_optimizer.fit(training_data_reference=training_data_reference,
                       training_results_reference=training_result_reference,
                       background_mode=False)
```