# Hyperparameter Tuning using HyperDrive

In [1]:
!pip install opendatasets
import opendatasets
import pandas as pd
import azureml.core
import logging
import os

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.dataset import Dataset
from azureml.core.datastore import Datastore
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.core import Environment, ScriptRunConfig
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import choice
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.widgets import RunDetails

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.59.0


## Dataset

### Overview
In this project I will be using Heart Failure Prediction dataset from Kaggle. As per dataset details in Kaggle, the goal of this dataset is to early detect and manage mortality by heart failure.
Based on the dataset, the properly train machine learning model can predict heart failure based on features like age, ejection fraction, tobacco use, unhealthy diet and obesity, anaemia, physical inactivity, harmful use of alcohol etc.

Dataset features:	

- age: Age of patient
- anaemia: Decrease of red blood cells or hemoglobin
- creatinine-phosphokinase:	Level of the CPK enzyme in the blood
- diabetes:	Whether the patient has diabetes or not
- ejection_fraction: Percentage of blood leaving the heart at each contraction
- high_blood_pressure: Whether the patient has hypertension or not
- platelets: Platelets in the blood
- serum_creatinine: Level of creatinine in the blood
- serum_sodium: Level of sodium in the blood
- sex: Female (F) or Male (M)
- smoking	Whether the patient smokes or not
- time:	Follow-up period
- DEATH_EVENT: Whether the patient died during the follow-up period


The dataset has been downloaded from Kaggle using opendatasets package and an account I have created, and registered in the Workspace Data store.


In [2]:
ws = Workspace.from_config()
experiment_name = 'awnanocapstoneexperiment02'

experiment=Experiment(ws, experiment_name)

opendatasets.download('https://www.kaggle.com/datasets/andrewmvd/heart-failure-clinical-data/heart_failure_clinical_records_dataset.csv', force = True)
df = pd.read_csv('./heart-failure-clinical-data/heart_failure_clinical_records_dataset.csv')

df.head()

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username:Your Kaggle Key:Dataset URL: https://www.kaggle.com/datasets/andrewmvd/heart-failure-clinical-data





Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


In [3]:
df.describe()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
count,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0
mean,60.833893,0.431438,581.839465,0.41806,38.083612,0.351171,263358.029264,1.39388,136.625418,0.648829,0.32107,130.26087,0.32107
std,11.894809,0.496107,970.287881,0.494067,11.834841,0.478136,97804.236869,1.03451,4.412477,0.478136,0.46767,77.614208,0.46767
min,40.0,0.0,23.0,0.0,14.0,0.0,25100.0,0.5,113.0,0.0,0.0,4.0,0.0
25%,51.0,0.0,116.5,0.0,30.0,0.0,212500.0,0.9,134.0,0.0,0.0,73.0,0.0
50%,60.0,0.0,250.0,0.0,38.0,0.0,262000.0,1.1,137.0,1.0,0.0,115.0,0.0
75%,70.0,1.0,582.0,1.0,45.0,1.0,303500.0,1.4,140.0,1.0,1.0,203.0,1.0
max,95.0,1.0,7861.0,1.0,80.0,1.0,850000.0,9.4,148.0,1.0,1.0,285.0,1.0


In [4]:
# Creating or attaching to a compute cluster
cluster_name = "awnanocapstonecomputecluster2"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    print('Creating a new compute cluster...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_D4s_v3', max_nodes=4)
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

Creating a new compute cluster...
InProgress..
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Hyperdrive Configuration

I am using a Logistic Regression model where I optimise hyperparameters using HyperDrive.

Early stopping policy (BanditPolicy)
This  policy is used to automatically terminate poorly performing runs which helps to improve efficiency:
- slack_factor: The amount of slack allowed with respect to the best performing training run.
- evaluation_interval: The frequency for applying the policy

Any run will be terminated which doesn't fall within the slack factor of the evaluation metric with respect to the best performing run.

HyperDriveConfig
- hyperparameter_sampling - sampling space as defined by the RandomParameterSampling estimator
- primary_metric_name - the name of the primary metric reported by the experiment: 'Accuracy'
- primary_metric_goal - I set it as PrimaryMetricGoal.MAXIMIZE; determines that the primary metric is to be maximized when evaluating runs
- max_total_runs=32 - the maximum total number of runs to create
- max_concurrent_runs=4 - the maximum number of runs to execute concurrently
- policy - set to the early terminantion policy with parameters as above



In [5]:
# Create an early termination policy
early_termination_policy =  BanditPolicy (
  slack_factor = 0.1,
  evaluation_interval = 1
)

sklearn_env = Environment.from_conda_specification(name='sklearn-env', file_path='conda_dependencies.yml')

# Create estimator
param_sampling = RandomParameterSampling( {
        "--C": choice(0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0),
        "--max_iter": choice(10, 50, 100, 500, 1000)
    }
)

src = ScriptRunConfig(source_directory='.',
                     script='train.py',
                     compute_target=compute_target,
                     environment=sklearn_env)
                     
# Create your estimator and hyperdrive config
hyperdrive_run_config = HyperDriveConfig(
    hyperparameter_sampling=param_sampling,
    primary_metric_name='Accuracy',
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=32,
    max_concurrent_runs=4,
    policy=early_termination_policy,
    run_config=src)

In [6]:
# Submit your experiment
run_instance = experiment.submit(config=hyperdrive_run_config)

## Run Details
Below, I use the RunDetails widget to show the different jobs.

In [7]:
RunDetails(run_instance).show()
run_instance.wait_for_completion(show_output = True)

2024-12-27 10:57:09.736098: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-27 10:57:10.573464: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-27 10:57:10.808926: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-27 10:57:12.716646: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_ced672ad-af13-4ef1-9332-2f134f406316
Web View: https://ml.azure.com/runs/HD_ced672ad-af13-4ef1-9332-2f134f406316?wsid=/subscriptions/dabe5329-2380-4e80-a0cb-c9b370668176/resourcegroups/nanocapstone/workspaces/awnanocapstone01&tid=6c3b75bb-be53-4e19-9a7d-f523c7e10636

Streaming azureml-logs/hyperdrive.txt

[2024-12-27T10:56:43.2829102Z][GENERATOR][DEBUG]Sampled 4 jobs from search space 
[2024-12-27T10:56:43.7984573Z][SCHEDULER][INFO]Scheduling job, id='HD_ced672ad-af13-4ef1-9332-2f134f406316_0' 
[2024-12-27T10:56:43.9010631Z][SCHEDULER][INFO]Scheduling job, id='HD_ced672ad-af13-4ef1-9332-2f134f406316_2' 
[2024-12-27T10:56:43.9020017Z][SCHEDULER][INFO]Scheduling job, id='HD_ced672ad-af13-4ef1-9332-2f134f406316_1' 
[2024-12-27T10:56:43.9028554Z][SCHEDULER][INFO]Scheduling job, id='HD_ced672ad-af13-4ef1-9332-2f134f406316_3' 
[2024-12-27T10:56:44.4468954Z][SCHEDULER][INFO]Successfully scheduled a job. Id='HD_ced672ad-af13-4ef1-9332-2f134f406316_0' 
[2024-12-27T10:56:44.4805625Z][S

{'runId': 'HD_ced672ad-af13-4ef1-9332-2f134f406316',
 'target': 'awnanocapstonecomputecluster2',
 'status': 'Completed',
 'startTimeUtc': '2024-12-27T10:56:41.835857Z',
 'endTimeUtc': '2024-12-27T11:17:54.771047Z',
 'services': {},
 'properties': {'primary_metric_config': '{"name":"Accuracy","goal":"maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '88d2bab3-e99e-4fb8-8982-d17d4ffb7984',
  'user_agent': 'python/3.10.11 (Linux-5.15.0-1073-azure-x86_64-with-glibc2.31) msrest/0.7.1 Hyperdrive.Service/1.0.0 Hyperdrive.SDK/core.1.59.0',
  'best_child_run_id': 'HD_ced672ad-af13-4ef1-9332-2f134f406316_3',
  'score': '0.8833333333333333',
  'best_metric_status': 'Succeeded',
  'best_data_container_id': 'dcid.HD_ced672ad-af13-4ef1-9332-2f134f406316_3'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'configuration': None,
  'attribution': None,
  'telemetryValues': {'amlClientTyp

## Best Model
Below, I get the best model and display its properties.

In [8]:
best_run = run_instance.get_best_run_by_primary_metric()

print(f'Best Run details: {best_run.get_details()}')
print('-----------------------------------')
print(f'Best Run ID: {best_run.id}')
print('-----------------------------------')
print(f'Metrics: {best_run.get_metrics()}')

Best Run details: {'runId': 'HD_ced672ad-af13-4ef1-9332-2f134f406316_3', 'target': 'awnanocapstonecomputecluster2', 'status': 'Completed', 'startTimeUtc': '2024-12-27T11:02:42.488839Z', 'endTimeUtc': '2024-12-27T11:02:59.520781Z', 'services': {}, 'properties': {'_azureml.ComputeTargetType': 'amlctrain', '_azureml.ClusterName': 'awnanocapstonecomputecluster2', 'ContentSnapshotId': '88d2bab3-e99e-4fb8-8982-d17d4ffb7984', 'ProcessInfoFile': 'azureml-logs/process_info.json', 'ProcessStatusFile': 'azureml-logs/process_status.json'}, 'inputDatasets': [], 'outputDatasets': [], 'runDefinition': {'script': 'train.py', 'command': '', 'useAbsolutePath': False, 'arguments': ['--C', '1000', '--max_iter', '100'], 'sourceDirectoryDataStore': None, 'framework': 'Python', 'communicator': 'None', 'target': 'awnanocapstonecomputecluster2', 'dataReferences': {}, 'data': {}, 'outputData': {}, 'datacaches': [], 'jobName': None, 'maxRunDurationSeconds': 2592000, 'nodeCount': 1, 'instanceTypes': [], 'priority

In [13]:
# Save the best model
best_run.register_model(model_name = "hyperdrive_model.pkl", model_path = './outputs/')
best_run.download_file('outputs/model.pkl', './hyper-outputs/model.pkl')

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.

