# Hyperparameters Tuning в Azure ML

Цель лабораторной работы: 

- __поиск оптимальных гиперпараметров__ модели машинного обучениz (Hyperparameters Tuning)
- мониторинг процессса поиска.

## Подготовка среды

Импорт необходимых модулей и проверка версии Azure ML SDK:

In [4]:
import azureml.core
from azureml.core import Workspace, Model, Environment, Experiment, ComputeTarget, Dataset
from azureml.widgets import RunDetails

from azureml.train.sklearn import SKLearn


# Check core SDK version number
print(f'Azure ML SDK version: {azureml.core.VERSION}')

Azure ML SDK version: 1.12.0


Зададим параметры Эксперимента:

In [5]:
experiment_name = 'hyperparams_tuning_demo'

experiment_dir = 'hyperparams-tuning-demo'
os.makedirs(experiment_dir, exist_ok=True)

## Соединение со Azure ML Workspace

Устанавливаем соединение с Рабочей областью в Azure ML:

In [6]:
ws = Workspace.from_config()
print(f'Successfully connected to Workspace: {ws.name}.')

Successfully connected to Workspace: ai-in-cloud-workspace.


## Подготовка к обучению модели с поиском оптимальных гиперпараметров

### Получим данные

In [30]:
# Get list
print([ds for ds in Dataset.get_all(ws)])

data_ds = Dataset.get_by_name(ws, 'diabetes_db')

['diabetes-batch-data', 'diabetes_db', 'credit-card-fraud', 'covid19-spread-russia', 'covid19-spread', 'mnist-dataset', 'Pima Indians Diabetes Database']


## Получим Среду запуска

In [29]:
print([e for e in Environment.list(ws)])

env = Environment.get(ws, 'diabetes-experiment-env')

['diabetes-experiment-env', 'AzureML-AutoML', 'AzureML-PyTorch-1.0-GPU', 'AzureML-Scikit-learn-0.20.3', 'AzureML-TensorFlow-1.12-CPU', 'AzureML-PyTorch-1.2-GPU', 'AzureML-TensorFlow-2.0-GPU', 'AzureML-TensorFlow-2.0-CPU', 'AzureML-Chainer-5.1.0-GPU', 'AzureML-TensorFlow-1.13-CPU', 'AzureML-Minimal', 'AzureML-Chainer-5.1.0-CPU', 'AzureML-PyTorch-1.4-GPU', 'AzureML-PySpark-MmlSpark-0.15', 'AzureML-PyTorch-1.3-CPU', 'AzureML-PyTorch-1.1-GPU', 'AzureML-TensorFlow-1.10-GPU', 'AzureML-PyTorch-1.2-CPU', 'AzureML-TensorFlow-1.13-GPU', 'AzureML-Hyperdrive-ForecastDNN', 'AzureML-TensorFlow-1.10-CPU', 'AzureML-PyTorch-1.3-GPU', 'AzureML-PyTorch-1.4-CPU', 'AzureML-Tutorial', 'AzureML-PyTorch-1.0-CPU', 'AzureML-PyTorch-1.1-CPU', 'AzureML-TensorFlow-1.12-GPU', 'AzureML-VowpalWabbit-8.8.0', 'AzureML-AutoML-GPU', 'AzureML-Designer-VowpalWabbit', 'AzureML-TensorFlow-2.2-GPU', 'AzureML-TensorFlow-2.2-CPU', 'AzureML-PyTorch-1.6-CPU', 'AzureML-PyTorch-1.6-GPU', 'AzureML-Triton', 'AzureML-Sidecar', 'AzureM

### Получим ML  кластер

In [31]:
print([comp.name for comp in ComputeTarget.list(ws)])

cluster = ComputeTarget(workspace=ws, name='ml-cluster')

['x-compute-vm', 'ml-cluster']


### Скопируем скрипт обучения модели

In [23]:
!cp scripts/train_model.py $experiment_dir
!ls $experiment_dir

train_model.py


## Запустим эксперимент по оптимизации гиперпараметров модели

In [None]:
from azureml.train.hyperdrive import BayesianParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal, uniform

params = BayesianParameterSampling(
    {
        '--reg_rate': uniform(0.01, 1.0)
    }
)

# Create an estimator
estimator = SKLearn(source_directory=experiment_dir,
                    inputs=[data_ds.as_named_input('data')], 
                    entry_script='train_model.py',
                    compute_target=cluster,
                    environment_definition=env)

# Configure hyperdrive settings
config = HyperDriveConfig(estimator=estimator, 
                          hyperparameter_sampling=params, 
                          policy=None, 
                          primary_metric_name='AUC', 
                          primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, 
                          max_total_runs=64,
                          max_concurrent_runs=4)

# Run the experiment
experiment = Experiment(workspace=ws, name=experiment_name)
run = experiment.submit(config=config)

# Show the status in the notebook as the experiment runs
RunDetails(run).show()
run.wait_for_completion()

For best results with Bayesian Sampling we recommend using a maximum number of runs greater than or equal to 20 times the number of hyperparameters being tuned. Recommendend value:20.


_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

## Определение и регистрация 'лучшей' модели 

In [36]:
for child_run in run.get_children_sorted_by_primary_metric():
    print(child_run)

best_run = run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
parameter_values = best_run.get_details() ['runDefinition']['arguments']

print(f'Best Run Id: {best_run.id}')
print(f'\t AUC: {best_run_metrics["AUC"]}')
print(f'\t Accuracy: {best_run_metrics["Accuracy"]}')
print(f'\t Regularization Rate: {parameter_values}')

{'run_id': 'HD_331d3726-034b-4b91-a312-24bbaebacd73_4', 'hyperparameters': '{"--reg_rate": 0.1}', 'best_primary_metric': 0.8577280341477337, 'status': 'Completed'}
{'run_id': 'HD_331d3726-034b-4b91-a312-24bbaebacd73_2', 'hyperparameters': '{"--reg_rate": 0.01}', 'best_primary_metric': 0.8576183444801528, 'status': 'Completed'}
{'run_id': 'HD_331d3726-034b-4b91-a312-24bbaebacd73_0', 'hyperparameters': '{"--reg_rate": 0.001}', 'best_primary_metric': 0.857617351813478, 'status': 'Completed'}
{'run_id': 'HD_331d3726-034b-4b91-a312-24bbaebacd73_1', 'hyperparameters': '{"--reg_rate": 0.005}', 'best_primary_metric': 0.8576163591468029, 'status': 'Completed'}
{'run_id': 'HD_331d3726-034b-4b91-a312-24bbaebacd73_3', 'hyperparameters': '{"--reg_rate": 0.05}', 'best_primary_metric': 0.8576148701467907, 'status': 'Completed'}
{'run_id': 'HD_331d3726-034b-4b91-a312-24bbaebacd73_5', 'hyperparameters': '{"--reg_rate": 1.0}', 'best_primary_metric': 0.8575687111464059, 'status': 'Completed'}
{'run_id': 

In [37]:
# Register model
best_run.register_model(model_path='outputs/model.pkl', model_name='diabetes_predict_model',
                        tags={'Lab':'Hyperparameters Tuning', 'Lab #':'8A'},
                        properties={'AUC': best_run_metrics['AUC'], 'Accuracy': best_run_metrics['Accuracy']})

# List registered models
for model in Model.list(ws):
    print(f'{model.name} v{model.version}')
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

diabetes_predict_model v3
	 Lab : Hyperparameters Tuning
	 Lab # : 8A
	 AUC : 0.8577280341477337
	 Accuracy : 0.7886666666666666


diabetes_predict_model v2
	 Demo : ML Pipeline


diabetes_predict_model v1
	 Demo : Target compute
	 AUC : 0.846851712258014
	 Accuracy : 0.7788888888888889


diabetes_model v4
	 Dataset : Diabetes
	 AUC : 0.846851712258014
	 Accuracy : 0.7788888888888889


diabetes_model v3
	 Dataset : Diabetes
	 AUC : 0.846851712258014
	 Accuracy : 0.7788888888888889


diabetes_model v2
	 Dataset : Diabetes
	 AUC : 0.8468519356081545
	 Accuracy : 0.7788888888888889


diabetes_model v1
	 Training context : Estimator
	 AUC : 0.8468519356081545
	 Accuracy : 0.7788888888888889


amlstudio-covid19-service v1
	 CreatedByAMLStudio : true


amlstudio-covid19-service-pipe v1
	 CreatedByAMLStudio : true


amlstudio-covid19-spread-servi v1
	 CreatedByAMLStudio : true


amlstudio-pima-diabets-service v2
	 CreatedByAMLStudio : true


amlstudio-letter-recognition-s v1
	 CreatedByAMLStu

## Вывод

## Полезные ссылки

1. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
2. https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.hyperdriveconfig?view=azure-ml-py
