### Iris flower
In this example, we will use [ML.Net](https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet) and [MLNet.AutoPipeline](https://github.com/LittleLittleCloud/machinelearning-auto-pipeline) to create a sweepable pipeline and use that to predict the type of iris flower. You will learn:

- How to create and train sweepable pipeline.
- How to run hyperparameter tuning using sweepable pipeline
- How to create ML.Net estimator chain from sweepable pipeline with fine-tuned hyper parameters.

### Let's start!

### Iris dataset and task

The Iris flower dataset is a multivariate dataset introduced by Ronald Fisher. It consisits of 50 samples from each of three iris (setosa, virginica and versicolor). For each sample, four features are measured:

- sepals_width
- sepals_length
- petals_width
- petals_length

The task is given the four features, determining which kind of iris the given sample is. In machine learning, it is known as **multiclass classification**

### Install dependency

In [1]:
#i "nuget:https://pkgs.dev.azure.com/xiaoyuz0315/BigMiao/_packaging/MLNet-Auto-Pipeline/nuget/v3/index.json"
#r "nuget:MLNet.AutoPipeline,0.9.2"

Installed package MLNet.AutoPipeline version 0.9.2

### Include namespace

In [3]:
using Microsoft.ML;
using Microsoft.ML.Data;
using MLNet.AutoPipeline;
using System;
using System.Threading.Tasks;

### Load dataset
We are going to use ML.Net Data Api to load Iris dataset. First, we need to define the class of Iris, then we can call [`LoadFromTextFile`](https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.textloadersavercatalog.loadfromtextfile?view=ml-dotnet#Microsoft_ML_TextLoaderSaverCatalog_LoadFromTextFile_Microsoft_ML_DataOperationsCatalog_System_String_Microsoft_ML_Data_TextLoader_Column___System_Char_System_Boolean_System_Boolean_System_Boolean_System_Boolean_) to load Iris dataset from file. The data will be saved in [`IDataView`](https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.idataview?view=ml-dotnet), which can be used for further pipeline operation.

In [4]:
// class that hold Iris dataset
class Iris
{
    [LoadColumn(0)]
    public float sepal_length;

    [LoadColumn(1)]
    public float sepal_width;

    [LoadColumn(2)]
    public float petal_length;

    [LoadColumn(3)]
    public float petal_width;

    [LoadColumn(4)]
    public string species;
}
        
var context = new MLContext(seed:0);
var dataset = context.Data.LoadFromTextFile<Iris>(@"iris.csv", separatorChar: ',', hasHeader: true);

// create train-test split on dataset
var split = context.Data.TrainTestSplit(dataset, 0.3);
Console.WriteLine($"train split: {split.TrainSet.Preview()}");
Console.WriteLine($"test split: {split.TestSet.Preview()}");

train split: 6 columns, 100 rows
test split: 6 columns, 49 rows


### Create sweepable pipeline
Create sweepable pipeline is very similar to create ML.Net EstimatorChain. MLNet.AutoPipeline provides two extension method to help transfer a ML.Net EstimatorChain to a sweepable pipeline.

- [`Append<TLastTran>(TLastTran, INode)`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.EstimatorChainExtension.html#MLNet_AutoPipeline_EstimatorChainExtension_Append__1___0_MLNet_AutoPipeline_INode_)
- [`Append<TLastTrain>(EstimatorChain<TLastTrain>, INode)`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.EstimatorChainExtension.html#MLNet_AutoPipeline_EstimatorChainExtension_Append__1___0_MLNet_AutoPipeline_INode_)

The `INode` interface is a thin wrapper over `IEstimator` in ML.Net. It provides a uniform way for `SweepablePipeline` to sweep over hyper-parameters. All sweepable trainers in MLNet.AutoPipeline implements `INode` and its derivative interface `ISweepableNode`. The available trainers for multiclass-classification can be found in [`SweepableMultiClassificationTrainerExtension`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.SweepableMultiClassificationTrainerExtension.html). And it should cover all standard trainers in ML.Net. Other than pre-defined trainers, MLNet.AutoPipeline also provides [`API`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.AutoPipelineCatalog.html#MLNet_AutoPipeline_AutoPipelineCatalog_SweepableTrainer__2_System_Func_Microsoft_ML_MLContext___1___0__MLNet_AutoPipeline_OptionBuilder___1__System_String___System_String_System_String_) for creating custom sweepable trainers.

Below, a [`SdcaMaximumEntropy`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.SweepableMultiClassificationTrainerExtension.html#MLNet_AutoPipeline_SweepableMultiClassificationTrainerExtension_SdcaMaximumEntropy_MLNet_AutoPipeline_SweepableMultiClassificationTrainers_System_String_System_String_MLNet_AutoPipeline_OptionBuilder_Microsoft_ML_Trainers_SdcaMaximumEntropyMulticlassTrainer_Options__Microsoft_ML_Trainers_SdcaMaximumEntropyMulticlassTrainer_Options_) sweepable trainer is used. The way to set up sweeping range for hyperparameter is through [`OptionBuilder`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.OptionBuilder-1.html). Because its `optionBuilder` is null, a default [`SdcaMaximumEntropyOptionBuilder`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.API.OptionBuilder.SdcaMaximumEntropyOptionBuilder.html) will be used here.

In [5]:
var sweepablePipeline = context.Transforms.Conversion.MapValueToKey("species", "species")
              .Append(context.Transforms.Concatenate("features", new string[] { "sepal_length", "sepal_width", "petal_length", "petal_width" }))
              // We use SdcaMaximumEntropy as trainer.
              .Append(context.AutoML().MultiClassification.SdcaMaximumEntropy("species", "features"));

### Train and tune hyperparameters
MLNet.AutoPipeline provides [`Experiment`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.Experiment.html) class to help train sweepable pipeline. It allows you to create an experiment using sweepable pipeline, training on dataset and get result. During training, you can pass an IProgress reporter to observe sweeping over hyperparameters, and after training, experiment result with all iteration info is available, with which you can re-create a ML.Net estimator chain with exactly the same hyper parameter and reproduce the training result.

In [20]:
class Reporter : IProgress<IterationInfo>
{
    public void Report(IterationInfo value)
    {
        Console.WriteLine(value.ParameterSet);
        Console.WriteLine($"validate score: {value.EvaluateScore}");
        Console.WriteLine($"training time: {value.TrainingTime}");
    }
}

var experimentOption = new Experiment.Option()
{
    EvaluateFunction = (MLContext context, IDataView data)=>
    {
        return context.MulticlassClassification.Evaluate(data, "species").MicroAccuracy;
    },
};
var experiment = context.AutoML().CreateExperiment(sweepablePipeline, experimentOption);
var reporter = new Reporter();
var result = await experiment.TrainAsync(split.TrainSet, validateFraction: 0.1f, reporter: reporter);

L2Regularization=5.623413 L1Regularization=1.7782794 HistorySize=2 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 0.25
training time: 0.0477345
L2Regularization=0.31622776 L1Regularization=10 HistorySize=1 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 0.25
training time: 0.0434859
L2Regularization=0.031622775 L1Regularization=0.00031622776 HistorySize=126 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 1
training time: 0.0770788
L2Regularization=0.0056234132 L1Regularization=0.0005623413 HistorySize=8 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 1
training time: 0.1373503
L2Regularization=0.0017782794 L1Regularization=0.00017782794 HistorySize=355 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 1
training time: 0.3291365
L2Regularization=0.0017

validate score: 0.4166666666666667
training time: 0.0439989
L2Regularization=0.0001 L1Regularization=0.00017782794 HistorySize=1 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 1
training time: 6.7494736
L2Regularization=1.7782794 L1Regularization=0.0005623413 HistorySize=16 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 0.6666666666666666
training time: 0.0401672
L2Regularization=0.01 L1Regularization=0.0009999999 HistorySize=178 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 1
training time: 0.1255509
L2Regularization=0.01 L1Regularization=0.0009999999 HistorySize=178 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 1
training time: 0.0944606
L2Regularization=0.0031622776 L1Regularization=0.00031622776 HistorySize=63 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=Fals

validate score: 1
training time: 0.239507
L2Regularization=0.0009999999 L1Regularization=0.17782794 HistorySize=22 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 1
training time: 0.6126209
L2Regularization=0.0001 L1Regularization=0.0031622776 HistorySize=11 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 1
training time: 6.2083166
L2Regularization=0.17782794 L1Regularization=1 HistorySize=22 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 0.5833333333333334
training time: 0.0387744
L2Regularization=0.0001 L1Regularization=1 HistorySize=1000 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 1
training time: 7.0137637
L2Regularization=0.31622776 L1Regularization=0.0005623413 HistorySize=63 ExampleWeightColumnName= OptimizationTolerance=1E-07 EnforceNonNegativity=False
validate score: 1
training time:

### Evaluate model
After training, the best model is saved in [`ExperimentResult.BestModel`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.ExperimentResult.html#MLNet_AutoPipeline_ExperimentResult_BestModel). The corresponding iteration is saved in [`ExperimentResult.BestIteration`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.ExperimentResult.html#MLNet_AutoPipeline_ExperimentResult_BestIteration).

In [21]:
var bestModel = result.BestModel;
var eval = bestModel.Transform(split.TestSet);
var metric = context.MulticlassClassification.Evaluate(eval, "species");
Console.WriteLine($"best model train score: {result.BestIteration.EvaluateScore}");
Console.WriteLine($"best model test score: {metric.MicroAccuracy}");

best model train score: 1
best model test score: 0.9387755102040817


### Compare with ML.Net EstimatorChain
In the following section, we will create a corresponding ML.Net estimatorchain without hyper parameters, and compare the training difference. ( It is likely that the ML.Net estimator performs better on test set because of overfit problem, but in most situation, sweepable pipeline should achieve higher score)

In [22]:
var estimatorChain = context.Transforms.Conversion.MapValueToKey("species", "species")
              .Append(context.Transforms.Concatenate("features", new string[] { "sepal_length", "sepal_width", "petal_length", "petal_width" }))
              .Append(context.MulticlassClassification.Trainers.SdcaMaximumEntropy("species", "features"));
              
var mlModel = estimatorChain.Fit(split.TrainSet);
var mlModel_eval_train = mlModel.Transform(split.TrainSet);
var mlModel_eval_test = mlModel.Transform(split.TestSet);
var mlModel_train_metric = context.MulticlassClassification.Evaluate(mlModel_eval_train, "species");
var mlModel_test_metric = context.MulticlassClassification.Evaluate(mlModel_eval_test, "species");
Console.WriteLine($"mlnet estimator chain train score: {mlModel_train_metric.MicroAccuracy}");
Console.WriteLine($"mlnet estimator chain test score: {mlModel_test_metric.MicroAccuracy}");

mlnet estimator chain train score: 1
mlnet estimator chain test score: 0.9591836734693877
