# Hyperparameter Tuning using HyperDrive
TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [32]:
import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.dataset import Dataset
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException
from azureml.core.model import Model
from azureml.core import Environment, ScriptRunConfig
from azureml.widgets import RunDetails
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy, MedianStoppingPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform, choice
from azureml.core import ScriptRunConfig
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import os
import shutil

# Dataset
TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

## Data set is taken from https://archive.ics.uci.edu/ml/datasets/bank+marketing
## Source:

[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014


## Data Set Information:

The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.

There are four datasets:
1) bank-additional-full.csv with all examples (41188) and 20 inputs, ordered by date (from May 2008 to November 2010), very close to the data analyzed in [Moro et al., 2014]
2) bank-additional.csv with 10% of the examples (4119), randomly selected from 1), and 20 inputs.
3) bank-full.csv with all examples and 17 inputs, ordered by date (older version of this dataset with less inputs).
4) bank.csv with 10% of the examples and 17 inputs, randomly selected from 3 (older version of this dataset with less inputs).
The smallest datasets are provided to test more computationally demanding machine learning algorithms (e.g., SVM).

The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y).


Attribute Information:

Input variables:
# bank client data:
1 - age (numeric)
2 - job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
3 - marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)
4 - education (categorical: 'basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown')
5 - default: has credit in default? (categorical: 'no','yes','unknown')
6 - housing: has housing loan? (categorical: 'no','yes','unknown')
7 - loan: has personal loan? (categorical: 'no','yes','unknown')
# related with the last contact of the current campaign:
8 - contact: contact communication type (categorical: 'cellular','telephone')
9 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
10 - day_of_week: last contact day of the week (categorical: 'mon','tue','wed','thu','fri')
11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
# other attributes:
12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
14 - previous: number of contacts performed before this campaign and for this client (numeric)
15 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')
# social and economic context attributes
16 - emp.var.rate: employment variation rate - quarterly indicator (numeric)
17 - cons.price.idx: consumer price index - monthly indicator (numeric)
18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric)
19 - euribor3m: euribor 3 month rate - daily indicator (numeric)
20 - nr.employed: number of employees - quarterly indicator (numeric)

Output variable (desired target):
21 - y - has the client subscribed a term deposit? (binary: 'yes','no')


In [8]:
#Dataser was uploaded external from this link to ws

from azureml.core import Workspace, Experiment

ws = Workspace.from_config()
ws.write_config(path='.azureml')
exp = Experiment(workspace=ws, name="udacity-AzureML1")

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

run = exp.start_logging()

Workspace name: DSStudio
Azure region: eastus2
Subscription id: baa67dbf-45d0-4d84-b662-527186361068
Resource group: dwtr-t332-20210421


In [9]:
from azureml.core import Workspace, Dataset

subscription_id = 'baa67dbf-45d0-4d84-b662-527186361068'
resource_group = 'dwtr-t332-20210421'
workspace_name = 'DSStudio'

workspace = Workspace(subscription_id, resource_group, workspace_name)

dataset = Dataset.get_by_name(workspace, name='BankMarketing')
df=dataset.to_pandas_dataframe()

In [10]:
from azureml.core.compute import ComputeTarget, AmlCompute

# TODO: Create compute cluster
# Use vm_size = "Standard_D2_V2" in your provisioning configuration.
# max_nodes should be no greater than 4.

### YOUR CODE HERE ###
cpu_cluster_name = "cluster-jupyter"
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
compute_target.wait_for_completion(show_output=True)

SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


# Hyperdrive Configuration
TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

In [12]:
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform, choice
import shutil
from azureml.core import Environment, ScriptRunConfig
import os

# Specify parameter sampler
#ps = ### YOUR CODE HERE ###
ps = RandomParameterSampling(
    {
        "--C" :        choice(0.001,0.01,0.1,1,10,20,50,100,200,500,1000),
        "--max_iter" : choice(50,100,200,300)
    }
)
# TODO: Create an early termination policy. This is not required if you are using Bayesian sampling.
# Specify a Policy
#Your Code Here
policy = BanditPolicy(slack_factor = 0.1, evaluation_interval = 1)

if "training" not in os.listdir():
    os.mkdir("./training")

# Create a SKLearn estimator for use with train.py
 ### YOUR CODE HERE ###
script_folder = './training'
os.makedirs(script_folder, exist_ok=True)
import shutil
shutil.copy('./train.py', script_folder)
myenv = Environment.from_conda_specification(name='sklearn-env', file_path='conda_dependencies.yml')

#est = ScriptRunConfig(
   # source_directory= script_folder,
    #compute_target=compute_target,
    #entry_script="train.py",
    #environment=myenv
#)
est = ScriptRunConfig(source_directory='.',
                      script='train.py',
                      compute_target=compute_target,
                      environment=myenv)
# Create a HyperDriveConfig using the estimator, hyperparameter sampler, and policy.
### YOUR CODE HERE ###
hyperdrive_config = HyperDriveConfig(
    run_config=est,
    hyperparameter_sampling = ps, 
    primary_metric_name = "Accuracy",
    primary_metric_goal = PrimaryMetricGoal.MAXIMIZE, 
    max_total_runs = 15,
    max_concurrent_runs = 3,
    policy = policy
    #estimator = est
)

# Run Details
OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the RunDetails widget to show the different experiment

In [13]:
# Submit your hyperdrive run to the experiment and show run details with the widget.

### YOUR CODE HERE ###
hyperdrive_run = exp.submit(hyperdrive_config)


In [14]:
RunDetails(hyperdrive_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

In [15]:
hyperdrive_run.wait_for_completion(show_output = True)

RunId: HD_fd4947e4-5b87-414b-a592-6c5d6cfe4496
Web View: https://ml.azure.com/runs/HD_fd4947e4-5b87-414b-a592-6c5d6cfe4496?wsid=/subscriptions/baa67dbf-45d0-4d84-b662-527186361068/resourcegroups/dwtr-t332-20210421/workspaces/DSStudio&tid=fd799da1-bfc1-4234-a91c-72b3a1cb9e26

Streaming azureml-logs/hyperdrive.txt

"<START>[2022-05-21T19:30:08.891069][API][INFO]Experiment created<END>\n""<START>[2022-05-21T19:30:09.587521][GENERATOR][INFO]Trying to sample '3' jobs from the hyperparameter space<END>\n"<START>[2022-05-21T19:30:10.2563337Z][SCHEDULER][INFO]Scheduling job, id='HD_fd4947e4-5b87-414b-a592-6c5d6cfe4496_0'<END><START>[2022-05-21T19:30:10.3868939Z][SCHEDULER][INFO]Scheduling job, id='HD_fd4947e4-5b87-414b-a592-6c5d6cfe4496_1'<END>"<START>[2022-05-21T19:30:10.412360][GENERATOR][INFO]Successfully sampled '3' jobs, they will soon be submitted to the execution target.<END>\n"<START>[2022-05-21T19:30:10.4680065Z][SCHEDULER][INFO]Scheduling job, id='HD_fd4947e4-5b87-414b-a592-6c5d6cfe4

{'runId': 'HD_fd4947e4-5b87-414b-a592-6c5d6cfe4496',
 'target': 'cluster-jupyter',
 'status': 'Completed',
 'startTimeUtc': '2022-05-21T19:30:08.660032Z',
 'endTimeUtc': '2022-05-21T19:44:12.42257Z',
 'services': {},
 'properties': {'primary_metric_config': '{"name": "Accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': 'd0f168b8-f9f4-41c8-9ece-037fa5d50a0c',
  'user_agent': 'python/3.8.5 (Linux-5.4.0-1074-azure-x86_64-with-glibc2.10) msrest/0.6.21 Hyperdrive.Service/1.0.0 Hyperdrive.SDK/core.1.40.0',
  'space_size': '44',
  'score': '0.9174506828528073',
  'best_child_run_id': 'HD_fd4947e4-5b87-414b-a592-6c5d6cfe4496_10',
  'best_metric_status': 'Succeeded',
  'best_data_container_id': 'dcid.HD_fd4947e4-5b87-414b-a592-6c5d6cfe4496_10'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://dsstudio5486078760.blob.core.windows

# Best Model
TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [16]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
parameter_values = best_run.get_details()['runDefinition']['arguments']
print('Best Run Id: ', best_run.id)
print('\n Accuracy:', best_run_metrics['Accuracy'])
print('\n C:', best_run_metrics['Regularization Strength:'])
print('\n max_iter:', best_run_metrics['Max iterations:'])

Best Run Id:  HD_fd4947e4-5b87-414b-a592-6c5d6cfe4496_10

 Accuracy: 0.9174506828528073

 C: 0.001

 max_iter: 100


In [17]:
best_run.get_details()['runDefinition']['arguments']


['--C', '0.001', '--max_iter', '100']

In [18]:
best_run.get_metrics(name='Accuracy')

{'Accuracy': 0.9174506828528073}

In [19]:
print(best_run.get_file_names())

['logs/azureml/dataprep/0/backgroundProcess.log', 'logs/azureml/dataprep/0/backgroundProcess_Telemetry.log', 'logs/azureml/dataprep/0/rslex.log', 'logs/azureml/dataprep/0/rslex.log.2022-05-21-19', 'outputs/model.joblib', 'system_logs/cs_capability/cs-capability.log', 'system_logs/hosttools_capability/hosttools-capability.log', 'system_logs/lifecycler/execution-wrapper.log', 'system_logs/lifecycler/lifecycler.log', 'user_logs/std_log.txt']


In [20]:
print(best_run)

Run(Experiment: udacity-AzureML1,
Id: HD_fd4947e4-5b87-414b-a592-6c5d6cfe4496_10,
Type: azureml.scriptrun,
Status: Completed)


# Model Deployment
Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [37]:
model = best_run.register_model('BankMarketingModel', model_path='outputs/model.joblib')

In [43]:
best_run.download_file('outputs/model.joblib', 'models/model.joblib')

In [44]:
model

Model(workspace=Workspace.create(name='DSStudio', subscription_id='baa67dbf-45d0-4d84-b662-527186361068', resource_group='dwtr-t332-20210421'), name=BankMarketingModel, id=BankMarketingModel:7, version=7, tags={}, properties={})

In [22]:
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.webservice import AciWebservice

In [23]:
#myenv = Environment.from_conda_specification(name='sklearn-env', file_path='conda_dependencies.yml')

from azureml.core.model import InferenceConfig
from azureml.core.webservice import Webservice, AciWebservice

inference_config = InferenceConfig(entry_script="score.py", environment = myenv)

deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb = 1,enable_app_insights = True)

In [24]:
from azureml.core.model import InferenceConfig
from azureml.core.webservice import Webservice, AciWebservice

inference_config = InferenceConfig(entry_script="score.py", environment = myenv)

deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb = 1,enable_app_insights = True)

In [25]:

from azureml.core.model import Model
service=Model.deploy(workspace=ws,
                    name="bank-service-final",
                    models=[model],
                    inference_config=inference_config,
                    deployment_config=deployment_config)
service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-05-21 19:50:40+00:00 Creating Container Registry if not exists.
2022-05-21 19:50:40+00:00 Registering the environment.
2022-05-21 19:50:40+00:00 Use the existing image.
2022-05-21 19:50:41+00:00 Submitting deployment to compute.
2022-05-21 19:50:48+00:00 Checking the status of deployment bank-service-final..
2022-05-21 19:52:39+00:00 Checking the status of inference endpoint bank-service-final.
Succeeded
ACI service creation operation finished, operation "Succeeded"


In [26]:
scoring_uri = service.scoring_uri

print(f'\nservice state: {service.state}\n')
print(f'scoring URI: \n{service.scoring_uri}\n')
print(f'swagger URI: \n{service.swagger_uri}\n')

print(service.scoring_uri)
print(service.swagger_uri)


service state: Healthy

scoring URI: 
http://0a3e5630-746b-4a28-a874-d6a4e74801ae.eastus2.azurecontainer.io/score

swagger URI: 
http://0a3e5630-746b-4a28-a874-d6a4e74801ae.eastus2.azurecontainer.io/swagger.json

http://0a3e5630-746b-4a28-a874-d6a4e74801ae.eastus2.azurecontainer.io/score
http://0a3e5630-746b-4a28-a874-d6a4e74801ae.eastus2.azurecontainer.io/swagger.json
