# Automated ML

In [21]:
from azureml.core import Workspace, Experiment, Dataset
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.train.automl.utilities import get_primary_metrics
from azureml.train.automl import AutoMLConfig
from azureml.pipeline.steps import AutoMLStep
from azureml.widgets import RunDetails
import joblib 
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig, Model
from azureml.core.webservice.aci import AciWebservice
from azureml.core.webservice import Webservice
import os

## Dataset

### Overview

The dataset contains data about employees within companies. The task is to predict whether an employee will leave their current employer.

Features:

* enrollee_id : Unique ID for candidate
* city: City code
* city_ development _index : Developement index of the city (scaled)
* gender: Gender of candidate
* relevent_experience: Relevant experience of candidate
* enrolled_university: Type of University course enrolled if any
* education_level: Education level of candidate
* major_discipline :Education major discipline of candidate
* experience: Candidate total experience in years
* company_size: No of employees in current employer's company
* company_type : Type of current employer
* lastnewjob: Difference in years between previous job and current job
* training_hours: training hours completed
* target: 0 – Not looking for job change, 1 – Looking for a job change

In [2]:
os.makedirs('results')

In [3]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'hr-analytics-automl'

# Check workspace details
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

experiment=Experiment(ws, experiment_name)

Workspace name: quick-starts-ws-134223
Azure region: southcentralus
Subscription id: 81cefad3-d2c9-4f77-a466-99a7f541c7bb
Resource group: aml-quickstarts-134223


In [4]:
# Check for existing cluster. Otherwise, create new cluster
try:
    cluster = ComputeTarget(workspace=ws, name="project-cluster")
    print("Cluster exists")
except:
    config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V12', max_nodes=4)
    cluster = ComputeTarget.create(ws, "project-cluster", config)

cluster.wait_for_completion()

Cluster exists


In [5]:
train_data = Dataset.get_by_name(ws, name="hr-analytics")

In [6]:
train_data = train_data.drop_columns(["enrollee_id", "city"])

In [7]:
train_data.take(5).to_pandas_dataframe()

Unnamed: 0,city_development_index,gender,relevent_experience,enrolled_university,education_level,major_discipline,experience,company_size,company_type,last_new_job,training_hours,target
0,0.92,Male,Has relevent experience,no_enrollment,Graduate,STEM,>20,,,1,36,1.0
1,0.776,Male,No relevent experience,no_enrollment,Graduate,STEM,15,50-99,Pvt Ltd,>4,47,0.0
2,0.624,,No relevent experience,Full time course,Graduate,STEM,5,,,never,83,0.0
3,0.789,,No relevent experience,,Graduate,Business Degree,<1,,Pvt Ltd,never,52,1.0
4,0.767,Male,Has relevent experience,no_enrollment,Masters,STEM,>20,50-99,Funded Startup,4,8,0.0


## AutoML Configuration

The task is a binary classification problem. We use accuracy as our primary metric and also use cross validation of 5 folds. Iterations are processed concurrently so as to speed up our training time. We have also enabled early stopping so as to prevent overfitting.

Additionally, for auto feature scaling, we utilize  the "featurization" parameter whose value is set to "auto".

In [8]:
get_primary_metrics("classification")

['precision_score_weighted',
 'accuracy',
 'norm_macro_recall',
 'average_precision_score_weighted',
 'AUC_weighted']

In [9]:
automl_settings = {
    "experiment_timeout_minutes": 30,
    "task": "classification", 
    "primary_metric": "accuracy",
    "training_data": train_data,
    "label_column_name": "target",
    "n_cross_validations": 5,
    "enable_early_stopping": True,
    "featurization": "auto",
    "max_cores_per_iteration": -1,
    "max_concurrent_iterations": 4,
    "compute_target": cluster
}


automl_config = AutoMLConfig(**automl_settings)

In [10]:
experiment = Experiment(workspace=ws, name="auto_exp")
remote_run = experiment.submit(automl_config, show_output=True)

Running on remote.
No run_configuration provided, running on project-cluster with default configuration
Running on remote compute: project-cluster
Parent Run ID: AutoML_1a933934-0ea3-4fc1-b8da-5644fb2b70f1

Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       DONE
DESCRIPTION:  If the missing values 

## Run Details

In [11]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [12]:
remote_run.wait_for_completion(show_output=True)



****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       DONE
DESCRIPTION:  If the missing values are expected, let the run complete. Otherwise cancel the current run and use a script to customize the handling of missing feature values that may be more appropriate based on the data type and business requirement.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization
DETAILS:      
+---------------------------------+---------------------------------+
|Column name                   

{'runId': 'AutoML_1a933934-0ea3-4fc1-b8da-5644fb2b70f1',
 'target': 'project-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-01-10T14:43:44.561949Z',
 'endTimeUtc': '2021-01-10T15:05:16.092778Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'project-cluster',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"0723e4a9-b2a8-4784-a80b-d4bf2763040b\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"UI/01-10-2021_020103_UTC/train.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"aml-quickstarts-134223\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"81cefad3-d2c9-4f77-a466-99a7f541c

## Best Model

In [13]:
best_auto_run, best_auto_model = remote_run.get_output()
best_auto_model._final_estimator

PreFittedSoftVotingClassifier(classification_labels=None,
                              estimators=[('0',
                                           Pipeline(memory=None,
                                                    steps=[('maxabsscaler',
                                                            MaxAbsScaler(copy=True)),
                                                           ('lightgbmclassifier',
                                                            LightGBMClassifier(boosting_type='gbdt',
                                                                               class_weight=None,
                                                                               colsample_bytree=1.0,
                                                                               importance_type='split',
                                                                               learning_rate=0.1,
                                                                               max_

In [14]:
joblib.dump(best_auto_model, filename="results/automl_best_model.joblib")

['results/automl_best_model.joblib']

## Model Deployment

In [27]:
env = best_auto_run.get_environment()

entry_script='score.py'

best_auto_run.download_file('outputs/scoring_file_v_1_0_0.py', entry_script)

In [28]:
model = remote_run.register_model(model_name=best_auto_run.properties['model_name'], 
                                           description='AutoML model')
inference_config = InferenceConfig(entry_script = entry_script, environment = env)
deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

In [29]:
service = Model.deploy(ws, 'employee-churn-api', [model], inference_config, deployment_config)
service.wait_for_deployment(True)
print("State: " + service.state)
print("Scoring URI: " + service.scoring_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running......................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
State: Healthy
Scoring URI: http://d86c8f6d-4026-41c5-88ac-8bcd72752825.southcentralus.azurecontainer.io/score


TODO: In the cell below, send a request to the web service you deployed to test it.

In [30]:
%run endpoint.py

{"result": [0.0, 0.0]}


TODO: In the cell below, print the logs of the web service and delete the service

In [31]:
service.get_logs()

'2021-01-10T16:06:08,083749400+00:00 - iot-server/run \n2021-01-10T16:06:08,090281200+00:00 - nginx/run \n2021-01-10T16:06:08,092419800+00:00 - rsyslog/run \n2021-01-10T16:06:08,094255100+00:00 - gunicorn/run \n/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)\n/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)\n/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)\n/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)\n/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)

In [22]:
service.update(enable_app_insights=True)

In [26]:
service.delete()