# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
from azureml.train.hyperdrive import RandomParameterSampling
from azureml.train.hyperdrive import normal, uniform, choice
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.dataset import Dataset
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform
import os

## Overview
Employee Attrition affects every organization. The IBM HR Attrition Case Study is aimed at determining factors that lead to employee attrition and predict those at risk of leaving the company.


The [Dataset](https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset) consists of 35 columns, which will help us predict employee attrition. We will use the Hyperdrive feature and a Decision Tree model.

**Import Workspace**

In [2]:
ws = Workspace.from_config()

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id,
      'Resource group: ' + ws.resource_group, sep = '\n')

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code RHHTTCHB9 to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.
Workspace name: quick-starts-ws-143105
Azure region: southcentralus
Subscription id: 1b944a9b-fdae-4f97-aeb1-b7eea0beac53
Resource group: aml-quickstarts-143105


**Create an Experiment**

In [3]:
experiment_name = 'capstone-hyperdrive'
experiment=Experiment(ws, experiment_name)
 
run = experiment.start_logging()

In [4]:
cluster_name = "notebook143105"

try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target, using it!')
except ComputeTargetException:
    print('Creating a new compute target!')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    
    # create the cluster
    cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
    
cpu_cluster.wait_for_completion(show_output=True)
 
# Using get_status() to get a detailed status for the current cluster.
print(cpu_cluster.get_status().serialize())

Found existing compute target, using it!

Running
{'errors': [], 'creationTime': '2021-04-19T03:10:01.054453+00:00', 'createdBy': {'userObjectId': '695a9b50-dd79-4e6b-b760-e29d07a0e1fd', 'userTenantId': '660b3398-b80e-49d2-bc5b-ac1dc93b5254', 'userName': 'ODL_User 143105'}, 'modifiedTime': '2021-04-19T03:12:32.756932+00:00', 'state': 'Running', 'vmSize': 'STANDARD_DS3_V2'}


## Dataset

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [6]:
# Loading the dataset from the Workspace. Otherwise, creating it from the file.
found = False
key = "Employee Attrition"
description_text = "IBM HR Analytics Employee Attrition & Performance"

if key in ws.datasets.keys(): 
        found = True
        dataset = ws.datasets[key] 

if not found:
        # Create AML Dataset and register it into Workspace
        data = 'https://raw.githubusercontent.com/ObinnaIheanachor/Capstone-Project-Udacity-Machine-Learning-Engineer/main/data/WA_Fn-UseC_-HR-Employee-Attrition.csv'
        dataset = Dataset.Tabular.from_delimited_files(data)        
        #Register Dataset in Workspace
        dataset = dataset.register(workspace=ws,
                                   name=key,description=description_text)


df = dataset.to_pandas_dataframe()
df.describe()

Unnamed: 0,Age,DailyRate,DistanceFromHome,Education,EmployeeCount,EmployeeNumber,EnvironmentSatisfaction,HourlyRate,JobInvolvement,JobLevel,...,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
count,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,...,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0
mean,36.92381,802.485714,9.192517,2.912925,1.0,1024.865306,2.721769,65.891156,2.729932,2.063946,...,2.712245,80.0,0.793878,11.279592,2.79932,2.761224,7.008163,4.229252,2.187755,4.123129
std,9.135373,403.5091,8.106864,1.024165,0.0,602.024335,1.093082,20.329428,0.711561,1.10694,...,1.081209,0.0,0.852077,7.780782,1.289271,0.706476,6.126525,3.623137,3.22243,3.568136
min,18.0,102.0,1.0,1.0,1.0,1.0,1.0,30.0,1.0,1.0,...,1.0,80.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
25%,30.0,465.0,2.0,2.0,1.0,491.25,2.0,48.0,2.0,1.0,...,2.0,80.0,0.0,6.0,2.0,2.0,3.0,2.0,0.0,2.0
50%,36.0,802.0,7.0,3.0,1.0,1020.5,3.0,66.0,3.0,2.0,...,3.0,80.0,1.0,10.0,3.0,3.0,5.0,3.0,1.0,3.0
75%,43.0,1157.0,14.0,4.0,1.0,1555.75,4.0,83.75,3.0,3.0,...,4.0,80.0,1.0,15.0,3.0,3.0,9.0,7.0,3.0,7.0
max,60.0,1499.0,29.0,5.0,1.0,2068.0,4.0,100.0,4.0,5.0,...,4.0,80.0,3.0,40.0,6.0,4.0,40.0,18.0,15.0,17.0


## Hyperdrive Configuration

Here we'll set configuration for different parameters for our hyperdrive run.

We will use `RandomParameterSampling` as the Sampling method, `BanditPolicy` as the Termination policy, and `SKLearn estimator` with the Primary metric as `AUC_weighted`.

In [11]:
# Creating an early termination policy.
early_termination_policy = BanditPolicy(slack_factor=0.1,evaluation_interval=3)

#Creating the different params that will be used during training
param_sampling = RandomParameterSampling({"--criterion": choice("gini", "entropy"),"--splitter": choice("best", "random"), "--max_depth": choice(3,4,5,6,7,8,9,10)})

#Creating estimator and hyperdrive config
estimator = SKLearn(source_directory=".", compute_target=cpu_cluster, entry_script="train.py")

hyperdrive_run_config = HyperDriveConfig(hyperparameter_sampling=param_sampling,
                                         policy=early_termination_policy, 
                                         primary_metric_name="AUC_weighted", 
                                         primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, 
                                         max_total_runs=8, 
                                         max_concurrent_runs=4, 
                                         estimator=estimator)



In [12]:
#TODO: Submit your experiment
hyperdrive_run = experiment.submit(hyperdrive_run_config)



## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [13]:

RunDetails(hyperdrive_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [14]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
 
print('Best Run Id: ', best_run.id)
print('AUC_weighted of Best Run is:', best_run_metrics['AUC_weighted'])
print('Parameter Values are:',best_run.get_details()['runDefinition']['arguments'])

Best Run Id:  HD_74b634a9-aec6-4f39-84d0-78a9a5c31d26_0
AUC_weighted of Best Run is: 0.7302746931618935
Parameter Values are: ['--criterion', 'gini', '--max_depth', '8', '--splitter', 'random']


In [15]:
#TODO: Save the best model
model = best_run.register_model(model_name='hyperdrive-model', 
                                model_path='outputs/model.joblib', 
                                tags={'Method':'Hyperdrive'}, 
                                properties={'AUC_weighted': best_run_metrics['AUC_weighted']})

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service