# Hyperparameter Tuning using HyperDrive

This Jupyter Notebook is a portion of a larger project, and focuses specificially on the Hyperparameter tuning through HyperDrive for an experiment. Other portions include an AutoML experiment and project reference information.This Jupyter Notebook is a portion of a larger project, and focuses specificially on the Hyperparameter tuning through HyperDrive for an experiment. Other portions include an AutoML experiment and project reference information.

In [None]:
# Import dependencies
import azureml.core
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.dataset import Dataset
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive import RandomParameterSampling
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform, choice
from azureml.widgets import RunDetails

import joblib 
import numpy as np
import pandas as pd
import os

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

## Initialize Workspace & Create Experiment

Initialize workspace from config, then create experiment within the workspace

In [None]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

In [None]:
experiment_name = 'hyperdrive-heart-failure'
project_folder = './heartfailure'

experiment = Experiment(ws, experiment_name)

## Create or Attach an Auto ML Compute Cluster
Search for existing compute cluster; if not found, create one

In [None]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

# If using a different existing cluster, enter the name in place of 'mlecscompute' below:
amlcompute_cluster_name = 'mlecscompute'

# Verify cluster existence: 
try: 
    compute_target = ComputeTarget(workspace = ws, name = amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    print('Not found; creating new cluster.')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_D2_v2', max_nodes=4, min_nodes=1)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output = True)

## Dataset

TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.


Search for uploaded dataset in AzureML Studio; if not found, find it through the URLSearch for uploaded dataset in AzureML Studio; if not found, find it through the URL

In [None]:
# Attempt to load the dataset from the Workspace, but otherwise, create from file
# dataset file located at 'https://raw.githubusercontent.com/RachelAnnDrury/MLECapstone/main/heart_failure_clinical_records_dataset.csv'
found = False
# key should be set to 'heartfailuredataset' 
key = 'heartfailuredataset'
# description_text should be set to 'heartfailuredataset' 
description_text = 'heartfailuredataset'

if key in ws.datasets.keys():
    found = True
    dataset = ws.datasets[key]

if not found:
    datasetfile = 'https://raw.githubusercontent.com/RachelAnnDrury/MLECapstone/main/heart_failure_clinical_records_dataset.csv'
    dataset = Dataset.Tabular.from_delimited_files(datasetfile)
    dataset = dataset.register(workspace = ws, name = key, description = description_text)

df = dataset.to_pandas_dataframe()
df.describe()

## Hyperdrive Configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

In [None]:
# HyperDrive Early Stopping Policy
policy = BanditPolicy(slack_factor = 0.1, evaluation_interval = 1)

# HyperDrive Parameter Sampler
ps = RandomParameterSampling(
    {
        '--C' : uniform(0, 1),
        '--max_iter' : choice(10, 50, 100, 120, 200)
    }
)

#TODO: Create your estimator and hyperdrive config
if "training" not in os.listdir():
    os.mkdir("./training")

# Pass parameters to training script and create HyperDriveConfig
est = SKLearn(source_directory = '.', 
    entry_script = 'train.py',
    compute_target = compute_target)

hyperdrive_config = HyperDriveConfig(
    hyperparameter_sampling = ps, 
    primary_metric_name = 'Accuracy',
    primary_metric_goal = PrimaryMetricGoal.MAXIMIZE, 
    max_total_runs = 100,
    max_concurrent_runs = 4,
    policy = policy,
    estimator = est
)

In [None]:
# Submit Hyperdrive Experiment
hyperdrive_run = experiment.submit(config=hyperdrive_config, show_ouput = True)

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

In [None]:
# Run Details in Azure Widget
RunDetails(hyperdrive_run).show()

In [None]:
hyperdrive_run.wait_for_completion(show_output = True)

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [None]:
# Retrieve the best HyperDrive Run
best_hyperdrive_run = hyperdrive_run.get_best_run_by_primary_metric()
print(best_hyperdrive_run.get_file_names(), best_hyperdrive_run.id, best_hyperdrive_run.get_metrics())

#Get details???

In [None]:
# Save the best model
hyperdrive_model = best_hyperdrive_run.register_model(model_name = 'hyperdrivemodel', model_path = 'outputs/model.joblib')

## Model Deployment

AutoML generated a more accurate model than this HyperDrive Config; only one model to be deployed, see automl.ipynb Notebook for deployment