# Automated ML

In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
import logging
import os
import csv
import joblib
from azureml.core import Workspace, Experiment
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.widgets import RunDetails
from azureml.core.model import InferenceConfig 
from azureml.core.webservice import AciWebservice, Webservice
from azureml.core.model import Model

In [2]:
ws = Workspace.from_config()
ws.write_config(path='.azureml')
experiment_name = 'camels-exp'
exp = Experiment(workspace=ws, name=experiment_name)

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

run = exp.start_logging()

Workspace name: final-prj
Azure region: westus2
Subscription id: 0c66ad45-500d-48af-80d3-0039ebf1975e
Resource group: rgp


## Dataset

### Overview
The primary objective was to develop an early warning system, i.e. binary classification of failed (`'Target'==1`) vs. survived (`'Target'==0`), for the US banks using their quarterly filings with the regulator. Overall, 137 failed banks and 6,877 surviving banks were used in this machine learning exercise. Historical observations from the first 4 quarters ending 2010Q3 (stored in `./data`) are used to tune the model and out-of-sample testing is performed on quarterly data starting from 2010Q4 (stored in `./oos`).  For more information on methodology please refer to supplemental `CAMELS.md` file included in the repository.

In [3]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

final-prj
rgp
westus2
0c66ad45-500d-48af-80d3-0039ebf1975e


In [4]:
experiment_name = 'camels-exp'
project_folder = './dmik'
experiment = Experiment(ws, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
camels-exp,final-prj,Link to Azure Machine Learning studio,Link to Documentation


In [8]:
dataset = ws.datasets['camels'] 
df = dataset.to_pandas_dataframe()
df.describe()

Unnamed: 0,Target,EQTA,EQTL,LLRTA,LLRGL,OEXTA,INCEMP,ROA,ROE,TDTL,TDTA,TATA
count,7020.0,7020.0,7020.0,7020.0,7020.0,7020.0,7014.0,7020.0,7020.0,7020.0,7020.0,7020.0
mean,0.019516,0.107825,8.02595,0.01232,0.021934,0.02402,33.65851,0.00202,-0.234058,44.756417,0.835683,0.176412
std,0.138338,0.048877,573.594468,0.009366,0.16089,0.030903,1156.779875,0.015031,11.39799,3147.677966,0.080119,0.142363
min,0.0,-0.160659,-0.195857,0.0,0.0,-0.012004,-3639.467742,-0.29575,-887.458333,0.0,0.0,0.0
25%,0.0,0.087487,0.125263,0.007216,0.012119,0.018253,3.084559,0.000907,0.009412,1.126635,0.805493,0.066298
50%,0.0,0.101018,0.156656,0.01004,0.015915,0.022036,18.162698,0.004832,0.045176,1.273882,0.850135,0.148018
75%,0.0,0.121013,0.212105,0.014293,0.022124,0.0264,34.348039,0.008417,0.078245,1.527407,0.883593,0.258563
max,1.0,0.968116,47829.25,0.161906,12.25,2.164806,73600.0,0.173673,21.9631,260238.5,1.151905,0.868327


In [9]:
cpu_cluster_name = 'cmp'

try:
    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Existing compute target.')

except:
    print('Creating compute target.')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

print(compute_target.get_status())

Existing compute target.
{
  "errors": [],
  "creationTime": "2021-03-12T16:36:50.492879+00:00",
  "createdBy": {
    "userObjectId": "49e75006-b9ac-415c-9176-f83c59d4bf26",
    "userTenantId": "d689239e-c492-40c6-b391-2c5951d31d14",
    "userName": "Mikhaylov, Dmitry"
  },
  "modifiedTime": "2021-03-12T16:39:52.341446+00:00",
  "state": "Running",
  "vmSize": "STANDARD_DS3_V2"
}


## AutoML Configuration

Financial metrics recorded in the last reports of the failed banks should have predictive power that is needed to forecast future failures. Due to significant class imbalances and taking into account costs accosiated with financial distress, the model should aim to maximize the recall score. In other words, accuracy is probably not the best metrics, as Type II error needs to be minimized.

The main focus of this classification should be on maximizing AUC, hopefully, by achieving good recall score. This is why `'norm_macro_recall'` was chosen as a primary metric. Timeout and number of concurrent iterations were set conservatively to control the costs.

In [10]:
# Put your automl settings here
automl_settings = {
    "experiment_timeout_minutes": 20,
    "max_concurrent_iterations": 4,
    "primary_metric" : 'norm_macro_recall'
    }

# Put your automl config here
automl_config = AutoMLConfig(
    compute_target=compute_target, 
    task = "classification",
    training_data=dataset, 
    label_column_name="Target", 
    path = project_folder,
    enable_early_stopping= True, 
    featurization= 'auto', 
    debug_log = "automl_errors.log",
    **automl_settings
    )

## Run Details

In the cell below, use the `RunDetails` widget to show the different experiments.

In [11]:
# Submit your automl run
remote_run = experiment.submit(config=automl_config, show_output=True)
RunDetails(remote_run).show()
remote_run.wait_for_completion(show_output=True)

Running on remote.
No run_configuration provided, running on cmp with default configuration
Running on remote compute: cmp
Parent Run ID: AutoML_1357bf3c-d371-4641-9522-bfcb8d5290a0

Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetBalancing. Performing class balancing sweeping
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Cross validation
STATUS:       DONE
DESCRIPTION:  Each iteration of the trained model was validated through cross-validation.
              
DETAILS:      
+---------------------------------+
|Number of folds                  |
|3                                |
+---------------------------------+

****************************************************************************************************

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  T

In [13]:
# Fetch the latest status of the run. It should show 'Completed'
print("Run Status: ",remote_run.get_status())

Run Status:  Completed


## Best Model

In [14]:
# Retrieve and save your best automl model.
best_run, fitted_model = remote_run.get_output()

print('Best run:', best_run)
print('Best model:', fitted_model)

best_run_metrics = best_run.get_metrics()

for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)

Package:azureml-automl-runtime, training version:1.23.0, current version:1.22.0
Package:azureml-core, training version:1.23.0, current version:1.22.0
Package:azureml-dataprep, training version:2.10.1, current version:2.9.1
Package:azureml-dataprep-native, training version:30.0.0, current version:29.0.0
Package:azureml-dataprep-rslex, training version:1.8.0, current version:1.7.0
Package:azureml-dataset-runtime, training version:1.23.0, current version:1.22.0
Package:azureml-defaults, training version:1.23.0, current version:1.22.0
Package:azureml-interpret, training version:1.23.0, current version:1.22.0
Package:azureml-mlflow, training version:1.23.0, current version:1.22.0
Package:azureml-pipeline-core, training version:1.23.0, current version:1.22.0
Package:azureml-telemetry, training version:1.23.0, current version:1.22.0
Package:azureml-train-automl-client, training version:1.23.0, current version:1.22.0
Package:azureml-train-automl-runtime, training version:1.23.0, current versio

Best run: Run(Experiment: camels-exp,
Id: AutoML_1357bf3c-d371-4641-9522-bfcb8d5290a0_36,
Type: azureml.scriptrun,
Status: Completed)
Best model: Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                                  min_samples_leaf=0.035789473684210524,
                                                                                                  min_samples

In [17]:
# Save the best model
joblib.dump(value=fitted_model, filename="fitted_automl_model.joblib")

['fitted_automl_model.joblib']

## Model Deployment


In [21]:
# Register the model produced by AutoML
automl_model = remote_run.register_model(model_name='automl_model.pkl') #, model_path = './outputs/')

#model = remote_run.register_model(model_name = 'house_price_model.pkl')
print(remote_run.model_id)

automl_model.pkl


In [26]:
environment = best_run.get_environment()
entry_script='inference/scoring.py'
best_run.download_file('outputs/scoring_file_v_1_0_0.py', entry_script)

inference_config = InferenceConfig(entry_script = entry_script, environment = environment)

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                                    memory_gb = 1, 
                                                    auth_enabled= True, 
                                                    enable_app_insights= True)

service = Model.deploy(ws, "aciservice", [automl_model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running...............................
Succeeded
ACI service creation operation finished, operation "Succeeded"


Send a request to the web service you deployed to test it.

In [27]:
# If authentication is enabled, so I use the get_keys method to retrieve the primary and secondary authentication keys:
primary, secondary = service.get_keys()

print('Service state: ' + service.state)
print('Service scoring URI: ' + service.scoring_uri)
print('Service Swagger URI: ' + service.swagger_uri)
print('Service primary authentication key: ' + primary)

Service state: Healthy
Service scoring URI: http://9a8e86e1-3449-480d-bb35-a6cd6731753a.westus2.azurecontainer.io/score
Service Swagger URI: http://9a8e86e1-3449-480d-bb35-a6cd6731753a.westus2.azurecontainer.io/swagger.json
Service primary authentication key: KdVguq2bEEr4ug7K20EesDrdmpZkLYY2


Print the logs of the web service and delete the service

In [28]:
# Printing the logs
print(service.get_logs())

2021-03-13T18:44:14,032572600+00:00 - gunicorn/run 
2021-03-13T18:44:14,034772100+00:00 - rsyslog/run 
2021-03-13T18:44:14,041727100+00:00 - iot-server/run 
2021-03-13T18:44:14,069706800+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_661474bbe74e96b5d8added5888dfc85/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_661474bbe74e96b5d8added5888dfc85/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_661474bbe74e96b5d8added5888dfc85/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_661474bbe74e96b5d8added5888dfc85/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_661474bbe74e96b5d8added5888dfc85/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
rsyslogd

In [None]:
#Service.delete()

In [None]:
# Delete the cluster instance
#AmlCompute.delete(compute_target)