Hyper Parameter Optimization, aka HPO, is to find a well-performed hyper-parameter on a given search space. The most well-known HPO is grid-search but it only performs well on tiny search space. To resolve hpo on large search space, a lot of algorithms are applied. For example, bayesian optimization is designed for optimizing expensive, black box functions which is very suitable for hpo task. Cost-Frugal optimization on the other hand, taking the training cost into consideration and is aimed to find a better solution within limited cost.

One thing to note is even though hpo is a very activate research field and a lot of algorithms have been invented in the last few years, there's still lacking a general, all-in-one hpo alogrithm that performs well on all datasets. So the best way to find out the right hpo algorithm is always try different hpos on your dataset.

AutoML.Net provides several hpos for you to try out, and you can configure and replace different hpos easily for `AutoMLExperiment` via setting different tuner. In this notebook, we'll go through the following topics.
- Available tuners in AutoML.Net, and how to use it.
- Comparing the performance for those tuners.

In [1]:
// using nightly-build
#i "nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-libraries/nuget/v3/index.json"
#r "nuget: Microsoft.ML.AutoML, 0.21.0-preview.23504.1"
#r "nuget: Microsoft.Data.Analysis, 0.21.0-preview.23504.1"

#r "nuget: Plotly.NET.Interactive, 3.0.2"
#r "nuget: Plotly.NET.CSharp, 0.0.1"
// Import usings.
using System;
using System.IO;
using System.Net;
using Microsoft.ML;
using Microsoft.ML.AutoML;
using Microsoft.ML.Data;
using Microsoft.ML.SearchSpace;
using Newtonsoft.Json;
using Microsoft.ML.AutoML.CodeGen;
using Microsoft.Data.Analysis;
using Microsoft.ML.SearchSpace.Option;

Loading extensions from `C:\Users\xiaoyuz\.nuget\packages\microsoft.ml.automl\0.21.0-preview.23504.1\interactive-extensions\dotnet\Microsoft.ML.AutoML.Interactive.dll`

Loading extensions from `C:\Users\xiaoyuz\.nuget\packages\plotly.net.interactive\3.0.2\interactive-extensions\dotnet\Plotly.NET.Interactive.dll`

Loading extensions from `C:\Users\xiaoyuz\.nuget\packages\skiasharp\2.88.6\interactive-extensions\dotnet\SkiaSharp.DotNet.Interactive.dll`

Loading extensions from `C:\Users\xiaoyuz\.nuget\packages\microsoft.data.analysis\0.21.0-preview.23504.1\interactive-extensions\dotnet\Microsoft.Data.Analysis.Interactive.dll`

### Available Tuners in AutoML.Net
For now, those tuners are available in AutoML.Net
- CostFrugalTuner: low-cost HPO algorithm, this is an implementation of [Frugal Optimization for Cost-related Hyperparameters](https://arxiv.org/abs/2005.01571).
- SMAC: Bayesian optimziation using random forest as regression model.
- EciCostFrugalTuner: CostFrugalTuner for hierarchical search space. This will be used as default tuner if `AutoMLExperiment.SetPipeline` get called.
- GridSearch
- RandomSearch

The following section shows how to use different tuner in `AutoMLExperiment`.

In [2]:
var context = new MLContext(1);
var experiment = context.Auto().CreateExperiment();

// use EciCostFrugalTuner
// Note: EciCostFrugalTuner will be set as default tuner if you call 
// experiment.SetPipeline()
experiment.SetEciCostFrugalTuner();

// use CostFrugalTuner
experiment.SetCostFrugalTuner();

// use SMAC
experiment.SetSmacTuner();

// use GridSearch
experiment.SetGridSearchTuner(step: 10);

// use RandomSearch
experiment.SetRandomSearchTuner(seed: 1);

### Compare GridSearch and EciCostFrugal on titanic dataset

The following section shows how different hpo effect automl performance, by comparing metric trend from GridSearch and EciCostFrugal on titanic dataset.

## Download titanic if necessary

In [3]:
string EnsureDataSetDownloaded(string fileName)
{

	// This is the path if the repo has been checked out.
	var filePath = Path.Combine(Directory.GetCurrentDirectory(),"data", fileName);

	if (!File.Exists(filePath))
	{
		// This is the path if the file has already been downloaded.
		filePath = Path.Combine(Directory.GetCurrentDirectory(), fileName);
	}

	if (!File.Exists(filePath))
	{
		using (var client = new WebClient())
		{
			client.DownloadFile($"https://raw.githubusercontent.com/dotnet/csharp-notebooks/main/machine-learning/data/{fileName}", filePath);
		}
		Console.WriteLine($"Downloaded {fileName}  to : {filePath}");
	}
	else
	{
		Console.WriteLine($"{fileName} found here: {filePath}");
	}

	return filePath;
}

### Load Dataset

In [4]:
var trainDataPath = EnsureDataSetDownloaded("titanic-train.csv");
var df = DataFrame.LoadCsv(trainDataPath);

var trainTestSplit = context.Data.TrainTestSplit(df, 0.1);
df.Head(10)

titanic-train.csv found here: c:\Users\xiaoyuz\source\repos\csharp-notebooks\machine-learning\data\titanic-train.csv


index,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
5,6,0,3,"Moran, Mr. James",male,<null>,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C


### Construct pipeline and AutoMLExperiment

In [5]:
var pipeline = context.Auto().Featurizer(df, excludeColumns: new[]{"Survived"})
                        .Append(context.Transforms.Conversion.ConvertType("Survived", "Survived", DataKind.Boolean))
					    .Append(context.Auto().BinaryClassification(labelColumnName: "Survived"));
// Configure AutoML
var monitor = new NotebookMonitor(pipeline);

var experiment = context.Auto().CreateExperiment()
                    .SetPipeline(pipeline)
                    .SetTrainingTimeInSeconds(10)
                    .SetDataset(trainTestSplit.TrainSet, trainTestSplit.TestSet)
                    .SetBinaryClassificationMetric(BinaryClassificationMetric.Accuracy, "Survived", "PredictedLabel")
                    .SetMonitor(monitor);


### Run HPO using GridSearch

In [6]:
experiment.SetGridSearchTuner(step: 10);
await experiment.RunAsync();
var gridSearchTrial = monitor.CompletedTrials.ToArray();
monitor.CompletedTrials.Clear();

### Run HPO using EciCostFrugal

In [7]:
experiment.SetEciCostFrugalTuner();
await experiment.RunAsync();
var eciSearchTrials = monitor.CompletedTrials.ToArray();
monitor.CompletedTrials.Clear();

### Compare HPO performace among GridSearch, EciCostFrugal

In [8]:
using Plotly.NET;

var gridSearchChart = Chart2D.Chart.Line<int, float, string>(gridSearchTrial.Select(t => t.TrialSettings.TrialId), gridSearchTrial.Select(t => (float)t.Metric), Name: "grid_search");
var eciCfoSearchChart = Chart2D.Chart.Line<int, float, string>(eciSearchTrials.Select(t => t.TrialSettings.TrialId), eciSearchTrials.Select(t => (float)t.Metric), Name: "eci_cfo");
var combineChart = Chart.Combine(new[]{ gridSearchChart, eciCfoSearchChart});
combineChart.Display()

## Continue learning

> [⏪ Last Module - AutoML Tuner](./06-AutoML%20HPO%20and%20tuner.ipynb)

## See also

- [AutoML SearchSpace](./Parameter%20and%20SearchSpace.ipynb)