# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
from azureml.core import Workspace, Experiment

ws = Workspace.from_config()
experiment_name = 'hyperdrive'

experiment=Experiment(ws, experiment_name)

In [9]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Create compute cluster
cpu_cluster_name = "cpucluster"

try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Dataset

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [2]:
from azureml.data.dataset_factory import TabularDatasetFactory

csv_file = 'https://raw.githubusercontent.com/SwapnaKategaru/Project3/main/heart_failure_clinical_records_dataset.csv'

data = TabularDatasetFactory.from_delimited_files(csv_file)

## Hyperdrive Configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

In [3]:
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform
from azureml.core.script_run_config import ScriptRunConfig
from azureml.train.hyperdrive import choice
import os

In [14]:
if "training" not in os.listdir():
    os.mkdir("./training")
    
early_termination_policy = BanditPolicy(evaluation_interval=100, slack_factor=0.2, slack_amount=None, delay_evaluation=200)

param_sampling = RandomParameterSampling({
       "--C": uniform(0.0001, 1),
       "--max_iter": choice(80, 100, 120, 140, 180)
   })

estimator = SKLearn(source_directory=".", compute_target='computinstance', 
                vm_size="STANDARD_DS2_V2", vm_priority='lowpriority', entry_script='train.py')

hyperdrive_run_config = HyperDriveConfig(param_sampling, primary_metric_name='Accuracy', primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, max_total_runs=4, max_concurrent_runs=4, policy=early_termination_policy, estimator=estimator)



## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [15]:
remote_run = experiment.submit(hyperdrive_run_config)



In [37]:
from azureml.widgets import RunDetails
RunDetails(remote_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [21]:
best_run = remote_run.get_best_run_by_primary_metric()
best_run.download_file("outputs/hyperdrivemodel.joblib", "./outputs/hyperdrivemodel.joblib")

In [22]:
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
hyperdrive,HD_356ba787-3b34-4ca6-9826-4c01d88f9b4c_0,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [25]:
best_run.get_file_names()[-2]

'outputs/conda_dependencies.yml'

In [27]:
best_run.download_file('outputs/conda_dependencies.yml', './outputs/hyperdrive_conda_dependencies.yml')

In [36]:
best_run.get_details()['runDefinition']['arguments']

['--C', '0.8948076408416373', '--max_iter', '140']

In [30]:
best_run.get_metrics()

{'Regularization Strength:': 0.8948076408416373,
 'Max iterations:': 140,
 'Accuracy': 0.7272727272727273}

In [31]:
import joblib
joblib.load(filename='outputs/hyperdrivemodel.joblib')

The sklearn.linear_model.logistic module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.linear_model. Anything that cannot be imported from sklearn.linear_model is now part of the private API.
Trying to unpickle estimator LogisticRegression from version 0.20.3 when using version 0.22.2.post1. This might lead to breaking code or invalid results. Use at your own risk.
From version 0.24, get_params will raise an AttributeError if a parameter cannot be retrieved as an instance attribute. Previously it would return None.


LogisticRegression(C=0.8948076408416373, class_weight=None, dual=False,
                   fit_intercept=True, intercept_scaling=1, l1_ratio=None,
                   max_iter=140, multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service

In [33]:
from azureml.core.model import Model
model = Model.register(model_name='hyperdrivemodel-register', model_path='outputs/hyperdrivemodel.joblib', workspace=ws)

Registering model hyperdrivemodel-register


In [34]:
Model(ws, 'hyperdrivemodel-register')

Model(workspace=Workspace.create(name='quick-starts-ws-139939', subscription_id='9a7511b8-150f-4a58-8528-3e7d50216c31', resource_group='aml-quickstarts-139939'), name=hyperdrivemodel-register, id=hyperdrivemodel-register:1, version=1, tags={}, properties={})

In [None]:
cpu_cluster.delete()