### Iris flower
In this example, we will use [ML.Net](https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet) and [MLNet.AutoPipeline](https://github.com/LittleLittleCloud/machinelearning-auto-pipeline) to create a sweepable pipeline and use that to predict the type of iris flower. You will learn:

- How to create and train sweepable pipeline.
- How to run hyperparameter tuning using sweepable pipeline
- How to create ML.Net estimator chain from sweepable pipeline with fine-tuned hyper parameters.

### Let's start!

### Iris dataset and task

The Iris flower dataset is a multivariate dataset introduced by Ronald Fisher. It consisits of 50 samples from each of three iris (setosa, virginica and versicolor). For each sample, four features are measured:

- sepals_width
- sepals_length
- petals_width
- petals_length

The task is given the four features, determining which kind of iris the given sample is. In machine learning, it is known as **multiclass classification**

### Install dependency

In [32]:
#i "nuget:https://pkgs.dev.azure.com/xiaoyuz0315/BigMiao/_packaging/MLNet-Auto-Pipeline/nuget/v3/index.json"
#r "nuget:MLNet.AutoPipeline,0.9.0-v202007298"

### Include namespace

In [33]:
using Microsoft.ML;
using Microsoft.ML.Data;
using MLNet.AutoPipeline;
using MLNet.AutoPipeline.Metric;
using System;
using System.Threading.Tasks;

### Load dataset
We are going to use ML.Net Data Api to load Iris dataset. First, we need to define the class of Iris, then we can call [`LoadFromTextFile`](https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.textloadersavercatalog.loadfromtextfile?view=ml-dotnet#Microsoft_ML_TextLoaderSaverCatalog_LoadFromTextFile_Microsoft_ML_DataOperationsCatalog_System_String_Microsoft_ML_Data_TextLoader_Column___System_Char_System_Boolean_System_Boolean_System_Boolean_System_Boolean_) to load Iris dataset from file. The data will be saved in [`IDataView`](https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.idataview?view=ml-dotnet), which can be used for further pipeline operation.

In [35]:
// class that hold Iris dataset
class Iris
{
    [LoadColumn(0)]
    public float sepal_length;

    [LoadColumn(1)]
    public float sepal_width;

    [LoadColumn(2)]
    public float petal_length;

    [LoadColumn(3)]
    public float petal_width;

    [LoadColumn(4)]
    public string species;
}
        
var context = new MLContext(seed:0);
var dataset = context.Data.LoadFromTextFile<Iris>(@".\iris.csv", separatorChar: ',', hasHeader: true);

// create train-test split on dataset
var split = context.Data.TrainTestSplit(dataset, 0.3);
Console.WriteLine($"train split: {split.TrainSet.Preview()}");
Console.WriteLine($"test split: {split.TestSet.Preview()}");

Unhandled exception: System.ArgumentOutOfRangeException: File does not exist at path: ./iris.csv (Parameter 'path')
   at Microsoft.ML.TextLoaderSaverCatalog.LoadFromTextFile[TInput](DataOperationsCatalog catalog, String path, Char separatorChar, Boolean hasHeader, Boolean allowQuoting, Boolean trimWhitespace, Boolean allowSparse)
   at Submission#38.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

### Create sweepable pipeline
Create sweepable pipeline is very similar to create ML.Net EstimatorChain. MLNet.AutoPipeline provides two extension method to help transfer a ML.Net EstimatorChain to a sweepable pipeline.

- [`Append<TLastTran>(TLastTran, INode)`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.EstimatorChainExtension.html#MLNet_AutoPipeline_EstimatorChainExtension_Append__1___0_MLNet_AutoPipeline_INode_)
- [`Append<TLastTrain>(EstimatorChain<TLastTrain>, INode)`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.EstimatorChainExtension.html#MLNet_AutoPipeline_EstimatorChainExtension_Append__1___0_MLNet_AutoPipeline_INode_)

The `INode` interface is a thin wrapper over `IEstimator` in ML.Net. It provides a uniform way for `SweepablePipeline` to sweep over hyper-parameters. All sweepable trainers in MLNet.AutoPipeline implements `INode` and its derivative interface `ISweepableNode`. The available trainers for multiclass-classification can be found in [`SweepableMultiClassificationTrainerExtension`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.SweepableMultiClassificationTrainerExtension.html). And it should cover all standard trainers in ML.Net. Other than pre-defined trainers, MLNet.AutoPipeline also provides [`API`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.AutoPipelineCatalog.html#MLNet_AutoPipeline_AutoPipelineCatalog_SweepableTrainer__2_System_Func_Microsoft_ML_MLContext___1___0__MLNet_AutoPipeline_OptionBuilder___1__System_String___System_String_System_String_) for creating custom sweepable trainers.

Below, a [`LightGbm`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.SweepableMultiClassificationTrainerExtension.html#MLNet_AutoPipeline_SweepableMultiClassificationTrainerExtension_LightGbm_MLNet_AutoPipeline_SweepableMultiClassificationTrainers_System_String_System_String_MLNet_AutoPipeline_OptionBuilder_Microsoft_ML_Trainers_LightGbm_LightGbmMulticlassTrainer_Options__Microsoft_ML_Trainers_LightGbm_LightGbmMulticlassTrainer_Options_) sweepable trainer is used. The way to set up sweeping range for hyperparameter is through [`OptionBuilder`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.OptionBuilder-1.html). Because its `optionBuilder` is null, a default [`LightGbmOptionBuilder`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.LightGbmOptionBuilder.html) will be used here.

In [26]:
var sweepablePipeline = context.Transforms.Conversion.MapValueToKey("species", "species")
              .Append(context.Transforms.Concatenate("features", new string[] { "sepal_length", "sepal_width", "petal_length", "petal_width" }))
              // We use LightGbm as trainer.
              .Append(context.AutoML().MultiClassification.LightGbm("species", "features"));

### Train and tune hyperparameters
MLNet.AutoPipeline provides [`Experiment`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.Experiment.html) class to help train sweepable pipeline. It allows you to create an experiment using sweepable pipeline, training on dataset and get result. During training, you can pass an IProgress reporter to observe sweeping over hyperparameters, and after training, experiment result with all iteration info is available, with which you can re-create a ML.Net estimator chain with exactly the same hyper parameter and reproduce the training result.

In [27]:
class Reporter : IProgress<IterationInfo>
{
    public void Report(IterationInfo value)
    {
        Console.WriteLine(value.ParameterSet);
        Console.WriteLine(value.SweepablePipeline.Summary());
        Console.WriteLine($"validate score: {value.ScoreMetric.Name}: {value.ScoreMetric.Score}");
        Console.WriteLine($"training time: {value.TrainingTime}");
    }
}

var experimentOption = new Experiment.Option()
{
    ScoreMetric = new MicroAccuracyMetric(),
    Label = "species",
    Iteration = 30,
};
var experiment = context.AutoML().CreateExperiment(sweepablePipeline, experimentOption);
var reporter = new Reporter();
var result = await experiment.TrainAsync(split.TrainSet, reporter: reporter);

Unhandled exception: System.ArgumentNullException: Value cannot be null. (Parameter 'data')
   at Microsoft.ML.Runtime.Contracts.CheckValue[T](IExceptionContext ctx, T val, String paramName)
   at Microsoft.ML.DataOperationsCatalog.TrainTestSplit(IDataView data, Double testFraction, String samplingKeyColumnName, Nullable`1 seed)
   at MLNet.AutoPipeline.Experiment.TrainAsync(IDataView train, Single validateFraction, IProgress`1 reporter, CancellationToken ct)
   at Submission#30.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

### Evaluate model
After training, the best model is saved in [`ExperimentResult.BestModel`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.ExperimentResult.html#MLNet_AutoPipeline_ExperimentResult_BestModel). The corresponding iteration is saved in [`ExperimentResult.BestIteration`](https://littlelittlecloud.github.io/machinelearning-auto-pipeline-site/api/MLNet.AutoPipeline.ExperimentResult.html#MLNet_AutoPipeline_ExperimentResult_BestIteration).

In [28]:
var bestModel = result.BestModel;
var eval = bestModel.Transform(split.TestSet);
var metric = context.MulticlassClassification.Evaluate(eval, "species");
Console.WriteLine($"best model test score: {metric.MicroAccuracy}");

// re-create ML.Net model using hyper parameters from best iteration
var bestIteration = result.BestIteration;
var estimator = bestIteration.BuildPipeline();

// retrain
var model = estimator.Fit(split.TrainSet);
var eval2 = model.Transform(split.TestSet);
var metric2 = context.MulticlassClassification.Evaluate(eval2, "species");
Console.WriteLine($"retrain model test score: {metric2.MicroAccuracy}"); // should be close with best model test score

Unhandled exception: System.NullReferenceException: Object reference not set to an instance of an object.
   at Submission#31.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

### Compare with ML.Net EstimatorChain
In the following section, we will create a corresponding ML.Net estimatorchain without hyper parameters, and compare the difference.

In [29]:
var estimatorChain = context.Transforms.Conversion.MapValueToKey("species", "species")
              .Append(context.Transforms.Concatenate("features", new string[] { "sepal_length", "sepal_width", "petal_length", "petal_width" }))
              .Append(context.MulticlassClassification.Trainers.LightGbm("species", "features"));
              
var mlModel = estimatorChain.Fit(split.TrainSet);
var mlModel_eval = mlModel.Transform(split.TestSet);
var mlModel_metric = context.MulticlassClassification.Evaluate(mlModel_eval, "species");
Console.WriteLine($"retrain model test score: {mlModel_metric.MicroAccuracy}")

Unhandled exception: System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Submission#32.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)