# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [19]:
from azureml.core import Workspace, Experiment
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
import joblib
from azureml.core.webservice import AciWebservice
from azureml.core import Environment
from azureml.core.model import InferenceConfig, Model
import requests
import json

## Dataset

### Overview
This database was obtained from a credit recovery consultancy a few years ago, and corresponds to real records on which collection actions must be taken. The main task consists of identifying the records most likely to be paid, and use the machine learning models to predict future behavior.


TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [2]:
# get the current workspace

ws = Workspace.from_config()

In [3]:
# create the dataset, if not exists

found = False
key = "Debts"
description_text = "Debts dataset from Brazilian Telefony Company"

if key in ws.datasets.keys(): 
        found = True
        dataset = ws.datasets[key] 
if not found:
        # Create AML Dataset and register it into Workspace
        example_data = 'https://www.jlnsoftware.com.br/azure_ml/Debts.csv'
        dataset = TabularDatasetFactory.from_delimited_files(example_data,separator=";")        
        #Register Dataset in Workspace
        dataset = dataset.register(workspace=ws,
                                   name=key,
                                   description=description_text)
dataset

{
  "source": [
    "https://www.jlnsoftware.com.br/azure_ml/Debts.csv"
  ],
  "definition": [
    "GetFiles",
    "ParseDelimited",
    "DropColumns",
    "SetColumnTypes"
  ],
  "registration": {
    "id": "a586f540-3962-433d-89ce-9c0f53553fd0",
    "name": "Debts",
    "version": 1,
    "description": "Debts dataset from Brazilian Telefony Company",
    "workspace": "Workspace.create(name='quick-starts-ws-259246', subscription_id='aa7cf8e8-d23f-4bce-a7b9-1f0b4e0ac8ee', resource_group='aml-quickstarts-259246')"
  }
}

In [4]:
# create the cluster, if not exists

amlcompute_cluster_name = "cluster-6-ds3-v2"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D3_V2',
                                                           vm_priority = 'dedicated',
                                                           min_nodes=1,
                                                           max_nodes=6)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True, min_node_count = 1, timeout_in_minutes = 10)

InProgress..
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded...................
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [5]:
# choose a name for experiment
experiment_name = 'Debts-AutoML'
experiment=Experiment(ws, experiment_name)

## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [6]:
# create AutoML config

automl_settings = {
    "experiment_timeout_minutes": 60,
    "max_concurrent_iterations": 5,
    "primary_metric" : 'AUC_weighted',
    "enable_onnx_compatible_models" : True
}
automl_config = AutoMLConfig(compute_target=compute_target,
                             task = "classification",
                             training_data=dataset,
                             label_column_name="RESULTADO",   
                             path = './debts-project',
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "automl_errors.log",
                             **automl_settings
                            )

In [7]:
# TODO: Submit your experiment

remote_run = experiment.submit(automl_config, show_output = True)

Submitting remote run.
No run_configuration provided, running on cluster-6-ds3-v2 with default configuration
Running on remote compute: cluster-6-ds3-v2


Experiment,Id,Type,Status,Details Page,Docs Page
Debts-AutoML,AutoML_1381d957-e54f-4dfe-a306-e06771ae17b0,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetBalancing. Performing class balancing sweeping
Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Train-Test data split
STATUS:       DONE
DESCRIPTION:  In order to accurately evaluate the model(s) trained by AutoML, we leverage a dataset that the model is not trained on. Hence, if the user doesn't provide an explicit validation dataset, a part of the training dataset is used to achieve this. For smaller datasets (fewer than 20,000 samples), cross-validation is leveraged, else a single hold-out set is split from the training data to serve as the validation dataset. Hence, your input data has been split

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [10]:
from azureml.widgets import RunDetails
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [None]:
# make sure that the experiment is finished

remote_run.wait_for_completion()

In [11]:
# print best metric

automl_run_metrics = remote_run.get_metrics()
print('AUC_weighted', automl_run_metrics['AUC_weighted'])

AUC_weighted 0.9984605420115348


## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [12]:
# get best model and print all properties

best_run, fitted_model = remote_run.get_output()
fitted_model

Package:azureml-automl-runtime, training version:1.52.0.post1, current version:1.51.0.post1
Package:azureml-core, training version:1.52.0, current version:1.51.0
Package:azureml-dataprep, training version:4.11.4, current version:4.10.8
Package:azureml-dataprep-rslex, training version:2.18.4, current version:2.17.12
Package:azureml-dataset-runtime, training version:1.52.0, current version:1.51.0
Package:azureml-defaults, training version:1.52.0, current version:1.51.0
Package:azureml-interpret, training version:1.52.0, current version:1.51.0
Package:azureml-mlflow, training version:1.52.0, current version:1.51.0
Package:azureml-pipeline-core, training version:1.52.0, current version:1.51.0
Package:azureml-responsibleai, training version:1.52.0, current version:1.51.0
Package:azureml-telemetry, training version:1.52.0, current version:1.51.0
Package:azureml-train-automl-client, training version:1.52.0, current version:1.51.0.post1
Package:azureml-train-automl-runtime, training version:1.

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=False, enable_feature_sweeping=False, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=False, is_onnx_compatible=True, observer=None, task='classification', working_dir='/mnt/batch/tasks/shared/LS_root/moun...
                 PreFittedSoftVotingClassifier(classification_labels=array([0, 1]), estimators=[('14', Pipeline(memory=None, steps=[('standardscalerwrapper', StandardScalerWrapper(copy=True, with_mean=False, with_std=False)), ('randomforestclassifier', RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1, oob_score=False, random_state=None, v

In [15]:
# Save the best model on AzureML

model = remote_run.register_model(model_name="Debts-AutoML-bestmodel", description="The best model for Debts dataset using the AutoML")

# Save the best model on disk

joblib.dump(fitted_model, 'Debts-AutoML-bestmodel.pkl')

['Debts-AutoML-bestmodel.pkl']

In [41]:
# Save the best model in ONXX format

best_run, onnx_mdl = remote_run.get_output(return_onnx_model=True)

from azureml.automl.runtime.onnx_convert import OnnxConverter

onnx_fl_path = "./Debts-AutoML-bestmodel.onnx"
OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [None]:
# based on https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/ml-frameworks/keras/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb

# create score.py

In [17]:
%%writefile score.py
import json
import logging
import os
import pickle
import numpy as np
import pandas as pd
import joblib

import azureml.automl.core
from azureml.automl.core.shared import logging_utilities, log_server
from azureml.telemetry import INSTRUMENTATION_KEY

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType
from inference_schema.parameter_types.standard_py_parameter_type import StandardPythonParameterType

data_sample = PandasParameterType(pd.DataFrame({"TIPO_PES": pd.Series(["example_value"], dtype="object"), "SEXO_PES": pd.Series(["example_value"], dtype="object"), "ESTADO_CIVIL_PES": pd.Series(["example_value"], dtype="object"), "IDADE": pd.Series([0.0], dtype="float32"), "VALOR_TOTAL": pd.Series([0.0], dtype="float32"), "ATRASO": pd.Series([0], dtype="int16"), "NOME_TIPE": pd.Series(["example_value"], dtype="object"), "TEVE_DEVOL": pd.Series([0], dtype="int8"), "CIDADES": pd.Series(["example_value"], dtype="object"), "UF": pd.Series(["example_value"], dtype="object")}))
input_sample = StandardPythonParameterType({'data': data_sample})
method_sample = StandardPythonParameterType("predict")
sample_global_params = StandardPythonParameterType({"method": method_sample})

result_sample = NumpyParameterType(np.array([0]))
output_sample = StandardPythonParameterType({'Results':result_sample})

try:
    log_server.enable_telemetry(INSTRUMENTATION_KEY)
    log_server.set_verbosity('INFO')
    logger = logging.getLogger('azureml.automl.core.scoring_script_v2')
except:
    pass


def init():
    global model
    # This name is model.id of model that we want to deploy deserialize the model file back
    # into a sklearn model
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model.pkl')
    path = os.path.normpath(model_path)
    path_split = path.split(os.sep)
    log_server.update_custom_dimensions({'model_name': path_split[-3], 'model_version': path_split[-2]})
    try:
        logger.info("Loading model from path.")
        model = joblib.load(model_path)
        logger.info("Loading successful.")
    except Exception as e:
        logging_utilities.log_traceback(e, logger)
        raise

@input_schema('GlobalParameters', sample_global_params, convert_to_provided_type=False)
@input_schema('Inputs', input_sample)
@output_schema(output_sample)
def run(Inputs, GlobalParameters={"method": "predict"}):
    data = Inputs['data']
    if GlobalParameters.get("method", None) == "predict_proba":
        result = model.predict_proba(data)
    elif GlobalParameters.get("method", None) == "predict":
        result = model.predict(data)
    else:
        raise Exception(f"Invalid predict method argument received. GlobalParameters: {GlobalParameters}")
    if isinstance(result, pd.DataFrame):
        result = result.values
    return {'Results':result.tolist()}

Writing score.py


In [42]:
# publish the webservice in ACI, with authentication and enable_app_insights enabled

service_name="debts-automl-endpoint"

env = Environment.from_conda_specification(name="env", file_path="env.yml")
inference_config = InferenceConfig(entry_script="score.py", environment=env)

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1.0,
                                               auth_enabled=True,
                                               memory_gb=4.0,
                                               enable_app_insights=True,
                                               description='Debts Dataset AutoML best model endpoint')

service = Model.deploy(workspace=ws, 
                           name=service_name, 
                           models=[model], 
                           inference_config=inference_config, 
                           deployment_config=aciconfig)

service.wait_for_deployment(True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2024-05-16 23:20:00+00:00 Creating Container Registry if not exists.
2024-05-16 23:20:01+00:00 Registering the environment.
2024-05-16 23:20:04+00:00 Use the existing image.
2024-05-16 23:20:04+00:00 Generating deployment configuration.
2024-05-16 23:20:05+00:00 Submitting deployment to compute.
2024-05-16 23:20:12+00:00 Checking the status of deployment debts-automl-endpoint..
2024-05-16 23:25:02+00:00 Checking the status of inference endpoint debts-automl-endpoint.
Succeeded
ACI service creation operation finished, operation "Succeeded"


In [43]:
print("Service State: " + service.state)
print("Service REST URL: " + service.scoring_uri)
print("Service Keys:" + str(service.get_keys()))

Service State: Healthy
Service REST URL: http://1166ccdb-59a1-4bb5-b907-9f243f1bed36.westus2.azurecontainer.io/score
Service Keys:('onAfNC6TLsPx05QtMtcGlTGU88aaRYxD', 'bK2C0d6AROudThHJZ1XPVhBhjKYBmjSj')


TODO: In the cell below, send a request to the web service you deployed to test it.

In [47]:
# Run a sample data by the consume sample

import urllib.request
import json
import os
import ssl

def allowSelfSignedHttps(allowed):
    # bypass the server certificate verification on client side
    if allowed and not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None):
        ssl._create_default_https_context = ssl._create_unverified_context

allowSelfSignedHttps(True) # this line is needed if you use self-signed certificate in your scoring service.

# Request data goes here
# The example below assumes JSON formatting which may be updated
# depending on the format your endpoint expects.
# More information can be found here:
# https://docs.microsoft.com/azure/machine-learning/how-to-deploy-advanced-entry-script
data =  {
  "Inputs": {
    "data": [
      {
            "TIPO_PES": "F",
            "SEXO_PES": "F",
            "ESTADO_CIVIL_PES": "2",
            "IDADE": 46.0,
            "VALOR_TOTAL": 288.79,
            "ATRASO": 175,
            "NOME_TIPE": "RESIDENCIAL",
            "TEVE_DEVOL": 0,
            "CIDADES": "MANAUS",
            "UF": "AM"
      }
    ]
  },
  "GlobalParameters": {
    "method": "predict"
  }
}

body = str.encode(json.dumps(data))

url = service.scoring_uri
# Replace this with the primary/secondary key, AMLToken, or Microsoft Entra ID token for the endpoint
api_key = service.get_keys()[0]
if not api_key:
    raise Exception("A key should be provided to invoke the endpoint")


headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}

req = urllib.request.Request(url, body, headers)

try:
    response = urllib.request.urlopen(req)

    result = response.read()
    print(result)
except urllib.error.HTTPError as error:
    print("The request failed with status code: " + str(error.code))

    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(error.info())
    print(error.read().decode("utf8", 'ignore'))

b'{"Results": [1]}'


TODO: In the cell below, print the logs of the web service and delete the service

In [38]:
logs = service.get_logs()
logs



In [40]:
# Deletion of the webservice

service.delete()

# Deletion of compute cluster

compute_target.delete()

Running
2024-05-16 23:10:04+00:00 Check and wait for operation (ddf4c6d5-e205-42f9-8274-4b5a1d0c3575) to finish.
2024-05-16 23:10:07+00:00 Deleting service entity.
Succeeded


**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
