# Introduction to Automated Machine Learning (AutoML)

This sample shows how you can use AutoML to automate the process of training custom ML models.

In this case, we want to train an ML model that automatically applies a label to GitHub issues.

## Install NuGet packages

In [2]:
#r "nuget: Microsoft.ML.AutoML, 0.21.0-preview.23266.6"

## Add using statements

In [3]:
using System.Threading;
using System.IO;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.AutoML;

## Initialize MLContext

In [4]:
// Initialize MLContext
MLContext ctx = new MLContext();

## Use AutoML to infer column information

In [6]:
// Define data path
var dataPath = Path.GetFullPath(@"../Data/issues_train.tsv");

// Infer column information
ColumnInferenceResults columnInference =
    ctx.Auto().InferColumns(dataPath, separatorChar: '\t', labelColumnName: "Area", groupColumns: false);

Error: Microsoft.ML.AutoML.InferenceException: Unable to split the file provided into multiple, consistent columns. Readable formats include delimited files such as CSV/TSV. Check for a consistent number of columns and proper escaping and quoting.
   at Microsoft.ML.AutoML.ColumnInferenceApi.InferSplit(MLContext context, TextFileSample sample, Nullable`1 separatorChar, Nullable`1 allowQuotedStrings, Nullable`1 supportSparse)
   at Microsoft.ML.AutoML.ColumnInferenceApi.InferColumns(MLContext context, String path, ColumnInformation columnInfo, Nullable`1 separatorChar, Nullable`1 allowQuotedStrings, Nullable`1 supportSparse, Boolean trimWhitespace, Boolean groupColumns, Boolean hasHeader)
   at Microsoft.ML.AutoML.ColumnInferenceApi.InferColumns(MLContext context, String path, String labelColumn, Nullable`1 separatorChar, Nullable`1 allowQuotedStrings, Nullable`1 supportSparse, Boolean trimWhitespace, Boolean groupColumns)
   at Microsoft.ML.AutoML.AutoCatalog.InferColumns(String path, String labelColumnName, Nullable`1 separatorChar, Nullable`1 allowQuoting, Nullable`1 allowSparse, Boolean trimWhitespace, Boolean groupColumns)
   at Submission#5.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

In [17]:
columnInference

## Load data into IDataView

In [49]:
// Create text loader
TextLoader loader = ctx.Data.CreateTextLoader(columnInference.TextLoaderOptions);

// Load data into IDataView
IDataView data = loader.Load(dataPath);

Error: System.NullReferenceException: Object reference not set to an instance of an object.
   at Submission#48.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

## Remove columns

In [19]:
var columnsToExclude = new[]{"ID","Description"};

data = ctx.Transforms.DropColumns(columnsToExclude)
    .Fit(data)
    .Transform(data)

Error: System.ArgumentNullException: Value cannot be null. (Parameter 'input')
   at Microsoft.ML.Runtime.Contracts.CheckValue[T](IExceptionContext ctx, T val, String paramName)
   at Microsoft.ML.Data.TrivialEstimator`1.Fit(IDataView input)
   at Submission#19.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

## Split data into train / validation

80% of the dataset is used for training and 20% is used for validation (tuning)

In [20]:
var trainValidationData = ctx.Data.TrainTestSplit(data, testFraction: 0.2);

Error: System.ArgumentNullException: Value cannot be null. (Parameter 'data')
   at Microsoft.ML.Runtime.Contracts.CheckValue[T](IExceptionContext ctx, T val, String paramName)
   at Microsoft.ML.DataOperationsCatalog.TrainTestSplit(IDataView data, Double testFraction, String samplingKeyColumnName, Nullable`1 seed)
   at Submission#20.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

In [21]:
SweepablePipeline pipeline =
    ctx.Auto().Featurizer(data, columnInformation: columnInference.ColumnInformation)
        .Append(ctx.Transforms.Conversion.MapValueToKey(columnInference.ColumnInformation.LabelColumnName))
        .Append(ctx.Auto().MultiClassification(labelColumnName: columnInference.ColumnInformation.LabelColumnName))
        .Append(ctx.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

Error: System.NullReferenceException: Object reference not set to an instance of an object.
   at Submission#21.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

## Create AutoML experiment

In [22]:
AutoMLExperiment experiment = ctx.Auto().CreateExperiment();

## Configure experiment settings

In [23]:
experiment
.SetPipeline(pipeline)
.SetMulticlassClassificationMetric(MulticlassClassificationMetric.MacroAccuracy, labelColumn: columnInference.ColumnInformation.LabelColumnName)
.SetTrainingTimeInSeconds(120)
.SetDataset(trainValidationData);

Error: System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.ML.AutoML.AutoMLExperimentExtension.SetPipeline(AutoMLExperiment experiment, SweepablePipeline pipeline)
   at Submission#23.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

## Configure experiment monitor

In [24]:
public class AutoMLMonitor : IMonitor
{
    private readonly List<TrialResult> _completedTrials;
    private readonly SweepablePipeline _pipeline;

    public AutoMLMonitor(SweepablePipeline pipeline)
    {
        _completedTrials = new List<TrialResult>();
        _pipeline = pipeline;
    }

    public IEnumerable<TrialResult> GetCompletedTrials() => _completedTrials;

    public void ReportBestTrial(TrialResult result)
    {
        return;
    }

    public void ReportCompletedTrial(TrialResult result)
    {
        var trialId = result.TrialSettings.TrialId;
        var timeToTrain = result.DurationInMilliseconds;
        var pipeline = _pipeline.ToString(result.TrialSettings.Parameter);
        Console.WriteLine($"Trial {trialId} finished training in {timeToTrain}ms with pipeline {pipeline}");
        _completedTrials.Add(result);
    }

    public void ReportFailTrial(TrialSettings settings, Exception exception = null)
    {
        if (exception.Message.Contains("Operation was canceled."))
        {
            Console.WriteLine($"{settings.TrialId} cancelled. Time budget exceeded.");
        }
        Console.WriteLine($"{settings.TrialId} failed with exception {exception.Message}");
    }

    public void ReportRunningTrial(TrialSettings setting)
    {
        return;
    }
}

In [25]:
var monitor = new AutoMLMonitor(pipeline);
experiment.SetMonitor(monitor);

## Train the model

In [26]:
var cts = new CancellationTokenSource();
TrialResult experimentResults = await experiment.RunAsync(cts.Token);

Error: System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.ML.AutoML.AutoMLExperiment.RunAsync(CancellationToken ct)
   at Submission#26.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

## Get the best model

In [27]:
var bestModel = experimentResults.Model;
bestModel

Error: System.NullReferenceException: Object reference not set to an instance of an object.
   at Submission#27.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

## Display the metric for the best model

In [28]:
experimentResults.Metric

Error: System.NullReferenceException: Object reference not set to an instance of an object.
   at Submission#28.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

## Try out the model

### Make predictions

In [29]:
var predictions = bestModel.Transform(trainValidationData.TestSet);

Error: System.NullReferenceException: Object reference not set to an instance of an object.
   at Submission#29.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

### Display prediction results

In [30]:
predictions.Preview().RowView

Error: System.ArgumentNullException: Value cannot be null. (Parameter 'data')
   at Microsoft.ML.Runtime.Contracts.CheckValue[T](T val, String paramName)
   at Microsoft.ML.Data.DataDebuggerPreview..ctor(IDataView data, Int32 maxRows)
   at Microsoft.ML.DebuggerExtensions.Preview(IDataView data, Int32 maxRows)
   at Submission#30.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

## Save the best model

In [31]:
ctx.Model.Save(bestModel, data.Schema, "model.mlnet");

Error: System.NullReferenceException: Object reference not set to an instance of an object.
   at Submission#31.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)