# Automated ML


In [47]:
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset
from azureml.core.compute import ComputeTarget
from azureml.pipeline.steps import AutoMLStep
from azureml.widgets import RunDetails
from azureml.core.model import Model, InferenceConfig
from azureml.core import Environment

## Dataset

### Overview
In this project we will be using a dataset from an HR department in a company. The dataset contains entreis for employees, including personal information, curring position and work performance metrics. 
The objective is to determine if a given employee will receive a promotion. The datase is highly imbalanced, wth only around 5% of employees having received a promotion.

The given dataset is available in Kaggle [https://www.kaggle.com/shivan118/hranalysis]. We have manually downloaded the dataset and registered in our workspaces's default store with the name "hr-data"

In [3]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'hr-automl'

experiment=Experiment(ws, experiment_name)

In [5]:
# get dataset by name
dataset = Dataset.get_by_name(ws, 'hr-data', version='latest')

In [7]:
# view first rows
dataset.take(5).to_pandas_dataframe()

Unnamed: 0,department,region,education,gender,recruitment_channel,no_of_trainings,age,previous_year_rating,length_of_service,KPIs_met >80%,awards_won?,avg_training_score,is_promoted
0,Sales & Marketing,region_7,Master's & above,f,sourcing,1,35,5,8,1,0,49,0
1,Operations,region_22,Bachelor's,m,other,1,30,5,4,0,0,60,0
2,Sales & Marketing,region_19,Bachelor's,m,sourcing,1,34,3,7,0,0,50,0
3,Sales & Marketing,region_23,Bachelor's,m,other,2,39,1,10,0,0,50,0
4,Technology,region_26,Bachelor's,m,other,1,45,3,2,0,0,73,0


## AutoML Configuration

We must ensure that we specify the target column is set to the same name it has in our dataset. 
The problem type is classificaton, and as metric we used the wheighted AUC since it is a good metric for inbalanced problems. 


We previously created a cluster with 4 nodes. For optimization of running time we allowed core minus 1 parallel runs. 
Additionally we allow AutoMl to stop early if a best score is found.
We also set the maximum duration of the experiment to be 30 minutes given Lab limitations. 




In [20]:
# get previously created compute target
cluster = ComputeTarget(workspace=ws, name='cluster-1')

# automl settings here
automl_settings = {
    "experiment_timeout_minutes": 30,
    "max_concurrent_iterations": 3,
    "primary_metric": "AUC_weighted"
}

# TAutoml config here
automl_config = AutoMLConfig(compute_target=cluster,
                            training_data=dataset,
                            task="classification",
                            label_column_name='is_promoted', 
                            enable_early_stopping=True,
                            featurization='auto',
                            debug_log='automl_errors.log',
                            **automl_settings
                            )

In [21]:
# Submit your experiment
remote_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
hr-automl,AutoML_5af789a9-4d2b-4ffe-9175-43d8c6294b72,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details

In [23]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

In this section we get the best run, then register the best model in case we want to deploy it. Additionally we download it to our local share


In [26]:
best_run = remote_run.get_best_child()
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
hr-automl,AutoML_5af789a9-4d2b-4ffe-9175-43d8c6294b72_34,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [42]:
best_run.get_metrics()['AUC_weighted']

0.9054698236799915

In [32]:
best_run.download_file('outputs/model.pkl', output_file_path='./hr-auto-ml-model.pkl')

## Model Deployment

The  best model od autoML had better performance, therefore we proceed to deploy it

In [43]:
# register model
model = best_run.register_model('hr-auto-ml-model', 
                        description='best model found by automl for HR data', 
                        model_path='outputs/model.pkl')

Model(workspace=Workspace.create(name='quick-starts-ws-143540', subscription_id='f9d5a085-54dc-4215-9ba6-dad5d86e60a0', resource_group='aml-quickstarts-143540'), name=hr-auto-ml-model, id=hr-auto-ml-model:2, version=2, tags={}, properties={})

In [None]:
env = Environment(name='project_environment')

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service