# Automated ML

In [2]:
from azureml.core import Workspace, Experiment

## Dataset

### Overview
I will be using the Credit Card Default dataset from the UCI Machine Learning Repository. The dataset can be found here - https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients#

As mentioned on the site, it can be used to model customer default behaviour. My task will be to use the dataset to perform classification. We will first have to clean the dataset to one-hot encode certain columns, and then we can use the dataset for training using AutoML and Hyperdrive

I have downloaded the .csv file and created a Dataset in my Azure Workspace. I will load the dataset from the workspace

In [2]:
ws = Workspace.from_config()
# choose a name for experiment
experiment_name = 'capstone-project'
experiment=Experiment(ws, experiment_name)

In [3]:
name = "credit-card-default"

if name in ws.datasets.keys(): 
        found = True
        print("Found Dataset")
        dataset = ws.datasets[name] 
else: 
    print("Create the dataset first...")

df = dataset.to_pandas_dataframe()

Found Dataset


In [10]:
from train import clean_dataset
#We clean the dataset, and then upload the cleaned data on the datastore so we can use it later

dataset = clean_dataset(df)
if "data" not in os.listdir():
    os.mkdir("./data")
dataset.to_csv("data/cleaned_dataset.csv", index=False)
ds = ws.get_default_datastore()
ds.upload(src_dir='./data', target_path='train_data', overwrite=True, show_progress=True)

Uploading an estimated of 1 files
Uploading ./data/cleaned_dataset.csv
Uploaded ./data/cleaned_dataset.csv, 1 files out of an estimated total of 1
Uploaded 1 files


$AZUREML_DATAREFERENCE_d765c4b9edbc4ca0b9473b9318c1f0b5

In [11]:
from azureml.core.dataset import Dataset
dataset = Dataset.Tabular.from_delimited_files(path=ds.path("train_data/cleaned_dataset.csv"))

We now load the cluster, and if it doesn't exist, we create one

In [15]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

cluster_name = "compute-train"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', 
                                                           max_nodes=4)
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
    
compute_target.wait_for_completion(show_output=True)

Found existing compute target
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## AutoML Configuration

For using Azure AutoML, we have to first import it. I set it to run for 20 minutes, since I feel 20 minutes is enough time for training some good models. I also select the compute cluster that the AutoML should use, and set the number of cross validations to be performed to be 4. Cross Validation is important here, since we don't have a separate validation dataset.

In [16]:
from azureml.train.automl import AutoMLConfig

automl_settings = {
    "experiment_timeout_minutes": 20,
    "compute_target": compute_target, 
    "n_cross_validations": 4
}

automl_config = AutoMLConfig(
    task='classification',
    primary_metric='accuracy',
    training_data= dataset,
    label_column_name="Y", 
    **automl_settings)

In [17]:
remote_run = experiment.submit(automl_config)

Running on remote.


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [19]:
from azureml.widgets import RunDetails
RunDetails(remote_run).show()
remote_run.wait_for_completion(show_output=True)

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…


Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.m

{'runId': 'AutoML_ccde6c0d-a9c9-4a73-9f7a-57fb995e324c',
 'target': 'compute-train',
 'status': 'Completed',
 'startTimeUtc': '2021-02-13T22:39:00.916053Z',
 'endTimeUtc': '2021-02-13T23:11:41.836484Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '4',
  'target': 'compute-train',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"3521da30-64a7-4b6a-827b-27947f3c4b59\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"train_data/cleaned_dataset.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"grouprisk\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"f5878af0-ca26-411c-9906-acf91f5420e2\\\\\\", \\\\\\"wo

## Best Model

In [21]:
best_run, best_model = remote_run.get_output()

Package:azureml-automl-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-core, training version:1.21.0.post1, current version:1.20.0
Package:azureml-dataprep, training version:2.8.2, current version:2.7.3
Package:azureml-dataprep-native, training version:28.0.0, current version:27.0.0
Package:azureml-dataprep-rslex, training version:1.6.0, current version:1.5.0
Package:azureml-dataset-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-defaults, training version:1.21.0, current version:1.20.0
Package:azureml-interpret, training version:1.21.0, current version:1.20.0
Package:azureml-pipeline-core, training version:1.21.0, current version:1.20.0
Package:azureml-telemetry, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-client, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-runtime, training version:1.21.0, current version:1.20.0


In [5]:
# Best Run Details
print(best_run)

# Best Model Parameters
print(best_model.steps[1][1].estimators)

Run(Experiment: capstone-project,
Id: AutoML_ccde6c0d-a9c9-4a73-9f7a-57fb995e324c_12,
Type: azureml.scriptrun,
Status: Completed)
[('1', Pipeline(memory=None,
         steps=[('maxabsscaler', MaxAbsScaler(copy=True)),
                ('xgboostclassifier',
                 XGBoostClassifier(base_score=0.5, booster='gbtree',
                                   colsample_bylevel=1, colsample_bynode=1,
                                   colsample_bytree=1, gamma=0,
                                   learning_rate=0.1, max_delta_step=0,
                                   max_depth=3, min_child_weight=1, missing=nan,
                                   n_estimators=100, n_jobs=1, nthread=None,
                                   objective='binary:logistic', random_state=0,
                                   reg_alpha=0, reg_lambda=1,
                                   scale_pos_weight=1, seed=None, silent=None,
                                   subsample=1, tree_method='auto', verbose=-10,
   

In [6]:
print(best_run)
print(best_run.properties["score"])

Run(Experiment: capstone-project,
Id: AutoML_ccde6c0d-a9c9-4a73-9f7a-57fb995e324c_12,
Type: azureml.scriptrun,
Status: Completed)
0.8230333333333333


In [25]:
#Saving the model by registering it with the workspace
model_auto = best_run.register_model(model_name = "AutoML_model-1", description = "Best Auto ML Model", model_path='outputs/model.pkl')

## Model Deployment

Comparing both models trained by Hyperdrive and AutoML, I have decided to deploy the AutoML model since it performs much better on the dataset

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [3]:
from azureml.train.automl.run import AutoMLRun
ws = Workspace.from_config()
experiment = ws.experiments['capstone-project']
automl_run = AutoMLRun(experiment=experiment, run_id= "AutoML_ccde6c0d-a9c9-4a73-9f7a-57fb995e324c")

In [4]:
best_run, best_model = automl_run.get_output()

Package:azureml-automl-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-core, training version:1.21.0.post1, current version:1.20.0
Package:azureml-dataprep, training version:2.8.2, current version:2.7.3
Package:azureml-dataprep-native, training version:28.0.0, current version:27.0.0
Package:azureml-dataprep-rslex, training version:1.6.0, current version:1.5.0
Package:azureml-dataset-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-defaults, training version:1.21.0, current version:1.20.0
Package:azureml-interpret, training version:1.21.0, current version:1.20.0
Package:azureml-pipeline-core, training version:1.21.0, current version:1.20.0
Package:azureml-telemetry, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-client, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-runtime, training version:1.21.0, current version:1.20.0


In [71]:
best_model.steps

[('datatransformer',
  DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                  feature_sweeping_config=None, feature_sweeping_timeout=None,
                  featurization_config=None, force_text_dnn=None,
                  is_cross_validation=None, is_onnx_compatible=None, logger=None,
                  observer=None, task=None, working_dir=None)),
 ('prefittedsoftvotingclassifier',
  PreFittedSoftVotingClassifier(classification_labels=None,
                                estimators=[('1',
                                             Pipeline(memory=None,
                                                      steps=[('maxabsscaler',
                                                              MaxAbsScaler(copy=True)),
                                                             ('xgboostclassifier',
                                                              XGBoostClassifier(base_score=0.5,
                                                                  

### NOTE: 
Since we use AutoML to train a model, AzureML also provides us with the scoring file and the env file that we can use for deployment. Using this file is the best way to avoid errors and load the correct environment

In [46]:
script_file = 'score.py'
best_run.download_file('outputs/scoring_file_v_1_0_0.py', script_file)
best_run.download_file('outputs/conda_env_v_1_0_0.yml', 'env.yml')

Deploying the Model using an InferenceConfig File and a DeploymentConfig File

In [57]:
from azureml.core import Workspace, Environment
from azureml.core.webservice import AciWebservice
from azureml.core.model import Model, InferenceConfig

ws = Workspace.from_config()
run_env = Environment.from_conda_specification(name = 'run-env', file_path = './env.yml')
    
inference_config = InferenceConfig(entry_script=script_file, environment=run_env)
deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1, auth_enabled = True, enable_app_insights = True)
model1 = Model(workspace=ws, name="AutoML_model-1")

In [58]:
service = Model.deploy(ws, "deployed-automl-model2", [model1], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)
print(service.state)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running.............................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


In [75]:
import requests
import json

data = {"data":
        [
          {
            "LIMIT_BAL": 20000, 
            "SEX": 1, 
            "AGE": 25, 
            "PAY_0": -1, 
            "PAY_2": 2, 
            "PAY_3": 0, 
            "PAY_4": 2, 
            "PAY_5": 2, 
            "PAY_6": -1, 
            "BILL_AMT1": 4000, 
            "BILL_AMT2": 3000, 
            "BILL_AMT3": 2000, 
            "BILL_AMT4": 1000, 
            "BILL_AMT5": 0, 
            "BILL_AMT6": 0, 
            "PAY_AMT1": 700, 
            "PAY_AMT2": 0,
            "PAY_AMT3": 0, 
            "PAY_AMT4": 0, 
            "PAY_AMT5": 0, 
            "PAY_AMT6": 0, 
            "edu_0": 0, 
            "edu_1": 1, 
            "edu_2": 0, 
            "edu_3": 0, 
            "edu_4": 0, 
            "edu_5": 0, 
            "edu_6": 0, 
            "marriage_0": 0, 
            "marriage_1": 0, 
            "marriage_2": 1, 
            "marriage_3": 0
        }
      ]
    }


Sending data to the endpoint, and checking whether it works

In [76]:
scoring_uri = service.scoring_uri
key = service.get_keys()[0]

#Convert to JSON string
input_data = json.dumps(data)
with open("data.json", "w") as _f:
   _f.write(input_data)

#Set the content type
headers = {'Content-Type': 'application/json'}

#If authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'

#Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print(resp.json())

{"result": [0]}


In [77]:
print(service.get_logs(()))

2021-02-15T15:14:58,188769753+00:00 - rsyslog/run 
2021-02-15T15:14:58,189409654+00:00 - iot-server/run 
2021-02-15T15:14:58,189784455+00:00 - gunicorn/run 
2021-02-15T15:14:58,198474180+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_7785023fceb74e4facc1b1a577b1faf9/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7785023fceb74e4facc1b1a577b1faf9/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7785023fceb74e4facc1b1a577b1faf9/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7785023fceb74e4facc1b1a577b1faf9/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7785023fceb74e4facc1b1a577b1faf9/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
rsyslogd

In [None]:
service.delete()