# Model Training & Evaluation - .NET

This is a Polyglot Notebook typeof(int)ended to run on the .NET typeof(int)eractive kernel currently running on .NET 8.

This notebook's role is to build and evaluate model training pipelines, perform hyperparameter tuning, and find a series of best models for commit classification using ML.NET. Other model training efforts will be performed using Python in a separate notebook, but those efforts will focus on models that support ONNX export that can be imported typeof(int)o ML.NET. This is because the ultimate selected model will be typeof(int)egrated typeof(int)o GitStractor which runs on .NET and ML.NET is the best available vector to do that.

## Dependencies
Download and install NuGet packages and set up common imports

In [1]:
#r "nuget:Microsoft.Data.Analysis"
#r "nuget:Microsoft.ML"
#r "nuget:Microsoft.ML.AutoML"
#r "nuget:Newtonsoft.Json"
#r "nuget:Plotly.NET"
#r "nuget:Plotly.NET.Interactive"

using Microsoft.DotNet.Interactive.Formatting;
using Microsoft.Data.Analysis;
using Microsoft.ML;
using Microsoft.ML.AutoML;
using Microsoft.ML.AutoML.CodeGen;
using Microsoft.ML.SearchSpace;
using Microsoft.ML.SearchSpace.Option;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using Microsoft.ML.Transforms.Text;
using Newtonsoft.Json;
using System.Reflection;

Loading extensions from `/home/matteland/.nuget/packages/microsoft.ml.automl/0.21.1/interactive-extensions/dotnet/Microsoft.ML.AutoML.Interactive.dll`

Loading extensions from `/home/matteland/.nuget/packages/plotly.net.interactive/5.0.0/lib/netstandard2.1/Plotly.NET.Interactive.dll`

Loading extensions from `/home/matteland/.nuget/packages/microsoft.data.analysis/0.21.1/interactive-extensions/dotnet/Microsoft.Data.Analysis.Interactive.dll`

Loading extensions from `/home/matteland/.nuget/packages/skiasharp/2.88.6/interactive-extensions/dotnet/SkiaSharp.DotNet.Interactive.dll`

I actually wound up creating some .NET libraries to make some of the code here simpler. 

These were non-core to the analysis and were intended to benefit the community as a whole.

You can find these libraries and their code [on GitHub](https://github.com/IntegerMan/MattEland.ML)

In [2]:
//#r "nuget:MattEland.ML"
//#r "nuget:MattEland.ML.Charts"
//#r "nuget:MattEland.ML.DataFrames"
//#r "nuget:MattEland.ML.Interactive"
#r "/home/matteland/Documents/MattEland.ML/MattEland.ML/MattEland.ML/bin/Debug/net8.0/MattEland.ML.dll"
#r "/home/matteland/Documents/MattEland.ML/MattEland.ML/MattEland.ML.DataFrames/bin/Debug/net8.0/MattEland.ML.DataFrames.dll"
#r "/home/matteland/Documents/MattEland.ML/MattEland.ML/MattEland.ML.Charts/bin/Debug/net8.0/MattEland.ML.Charts.dll"
#r "/home/matteland/Documents/MattEland.ML/MattEland.ML/MattEland.ML.Interactive/bin/Debug/net8.0/MattEland.ML.Interactive.dll"

using MattEland.ML;
using MattEland.ML.Charts;
using MattEland.ML.DataFrames;
using MattEland.ML.Interactive;

await MattEland.ML.Interactive.InteractiveExtensions.Load(Microsoft.DotNet.Interactive.KernelInvocationContext.Current.HandlingKernel.RootKernel);

## Data Loading

In [3]:
var df = DataFrame.LoadCsv("data/Training.csv", separator: ',', header: true);
df.Sample(5)

index,PredictedLabel,ActualLabel,Message,Reasoning,Sha,Source,ParentSha,Parent2Sha,IsMerge,AuthorId,AuthorDateUtc,CommitterId,CommitterDateUtc,WorkItems,TotalFiles,ModifiedFiles,AddedFiles,DeletedFiles,TotalLines,NetLines,AddedLines,DeletedLines,HasAddedFiles,HasDeletedFiles,DayOfWeek,Month,Quarter,Year,Hour,TimeOfDay,IsWeekend,MessageLength,WordCount
0,True,False,improve parsing and diagnostics for HTTP version node (#3153),The commit directly mentions improving HTTP versions which indicates a fix related to known issues,4788158819fc5a8a1f98145887d78c3089fd3e5b,dotnetinteractive,2ebbb6ab46869970cba4928480a901e0e150d281,,False,6,2023-08-25 17:19:12Z,2,2023-08-25 21:19:12Z,1,5,5,0,0,1281,20,67,47,False,False,Friday,August,3,2023,17,Afternoon,False,61,9
1,False,False,Add Xunit.Combinatorial for test projects (#4545),XUnit.Combinatorial is a testing framework extension and not directly related to bug fixing.,0e7e25c7fc04065a2cb8b2001f2a18164b0db6a0,mlnet,a4b4660f28277f48a3455f769721453beefc532d,,False,142,2019-12-08 12:42:36Z,7,2019-12-08 17:42:36Z,1,2,2,0,0,110,2,2,0,False,False,Sunday,December,4,2019,12,Morning,True,49,6
2,False,False,rename functions,"Renaming is often for code clarity or refactoring, which doesn't directly indicate fixing a bug",ebc7b5f027331ba1b0483a1da481c5f43a92a7d0,dotnetinteractive,89142c346f010d456fc5d75f3d4cc8770a699683,,False,10,2021-08-10 17:28:06Z,14,2021-08-18 16:48:46Z,0,1,1,0,0,138,0,4,4,False,False,Tuesday,August,3,2021,17,Afternoon,False,16,2
3,False,False,adding html fotmatting (#2544),The commit is related to code styling or enhancement and not explicitly a bug fix.,ffdfe0472eb092844d780bdec9ec7a5bf36ee921,dotnetinteractive,9c08865d4fca112a6f38e47816e24380f0d68eb8,,False,10,2022-12-02 17:45:06Z,2,2022-12-02 22:45:06Z,1,3,3,0,0,529,106,113,7,False,False,Friday,December,4,2022,17,Afternoon,False,30,4
4,True,False,lock version for tests,Locking the codebase could be related to resolve issues with test stability or integration bugs caused by continuous changes.,537f46e89a1690a55315d805eb90a5877aff9de2,dotnetinteractive,d4b512ba7d6e957710fef4a80dd7aa36500ce494,,False,10,2020-11-11 10:19:59Z,14,2020-11-11 16:44:49Z,0,1,1,0,0,952,-1,4,5,False,False,Wednesday,November,4,2020,10,Morning,False,22,4


In [4]:
df.Columns.Select(c => c.Name + Environment.NewLine)

In [5]:
// Let's drop columns we don't want the model to learn from
df.Columns.Remove("PredictedLabel", "Reasoning", "AuthorId", "AuthorDateUtc", "CommitterId", "CommitterDateUtc", "ParentSha", "Parent2Sha", "DayOfWeek", "Month", "Quarter", "Year", "Hour", "TimeOfDay", "IsWeekend", "Sha", "Source");

df.Info()

index,Info,ActualLabel,Message,IsMerge,WorkItems,TotalFiles,ModifiedFiles,AddedFiles,DeletedFiles,TotalLines,NetLines,AddedLines,DeletedLines,HasAddedFiles,HasDeletedFiles,MessageLength,WordCount
0,DataType,System.Boolean,System.String,System.Boolean,System.Single,System.Single,System.Single,System.Single,System.Single,System.Single,System.Single,System.Single,System.Single,System.Boolean,System.Boolean,System.Single,System.Single
1,Length (excluding null values),499,499,499,499,499,499,499,499,499,499,499,499,499,499,499,499


Okay. That's the expected type for each and no missing rows. Looks like I was off somewhere and lost a row in my training data, but 500 -> 499 isn't a huge issue.

We don't need to do any additional feature engineering here since that was all done as part of EDA in `LabelledEDA.ipynb`, though we could one-hot encode Source if we were going to include it in the data to the model and we will be extracting text features from Message.

Let's do some final name cleanup for ActualLabel since ML.NET looks for the Label column by default.

In [6]:
df["ActualLabel"].SetName("Label");

And finally some fun descriptive statistics.

In [7]:
df.Description()

index,Description,WorkItems,TotalFiles,ModifiedFiles,AddedFiles,DeletedFiles,TotalLines,NetLines,AddedLines,DeletedLines,MessageLength,WordCount
0,Length (excluding null values),499.0,499.0,499.0,499.0,499.0,499.0,499.0,499.0,499.0,499.0,499.0
1,Max,2.0,588.0,587.0,8.0,4.0,98927.0,241.0,3997.0,3980.0,124.0,17.0
2,Min,0.0,1.0,0.0,0.0,0.0,1.0,-8.0,0.0,0.0,3.0,1.0
3,Mean,0.46292585,7.188377,6.747495,0.40480962,0.036072146,2628.0942,32.54108,73.951904,41.41082,41.216434,5.6152306


*Note*: The DataFrame doesn't give you as many, but you could integrate MathNet.Numerics for many additional statistical measurements. This isn't an EDA notebook so we won't do that here.

# Model Training

We're now going to start the series of determining what types of models work best with our data. We'll start broadly to get a good general sense and then dial in on a key model trainer or two to see how we can tune and optimize it.

In [8]:
// Creat a custom model tracker to record the various experiments we run
BinaryClassificationModelTracker modelTracker = new();

// Although the metric we probably care the most about is the Precision, we're going to focus on F1 Score during model training in order to encourage discovering the most balanced models between precision and recall
modelTracker.DefaultMetric = BinaryClassificationMetric.F1Score;

## Early ML.NET AutoML Experiments
Let's start with a simple AutoML experiment without any pipelines to see what kinds of models are performing best without transformations or manual tuning.

We just want to see what kind of "out of the box" baseline model performance we can get from AutoML without any customization and what model trainers and transforms tend to get selected.

In [9]:
// Everything flows from our MLContext object
MLContext context = new MLContext(seed: 42);
var monitor = context.Monitor();

In [10]:
var split = context.Data.TrainTestSplit(df, testFraction: 0.1, seed: 42);
IDataView train = split.TrainSet;
IDataView test = split.TestSet;

In [11]:
// Run the experiment - simplest one we'll do here, but let's just look at the simple options first
BinaryExperimentSettings settings = new() {
    MaxModels = 10,
    OptimizingMetric = BinaryClassificationMetric.F1Score,
};

var results = context.Auto().CreateBinaryClassificationExperiment(settings)
                            .Execute(train, test);

// Let's see what types of model trainers and transforms it considered and their F1 scores
results.RunDetails.OrderByDescending(r => r.ValidationMetrics.F1Score)
                  .Select(r => r.TrainerName + ": " + r.ValidationMetrics.F1Score + Environment.NewLine)

Error: Command cancelled.

In [None]:
monitor.Results

In [None]:
MLCharts.RenderConfusionMatrix(results.BestRun)

In [None]:
MLCharts.RenderClassificationMetrics(results.BestRun.ValidationMetrics)

In [None]:
// Record the best run in our model tracker so we can compare it to future models
modelTracker.Register("Simple AutoML - 10% Test Split", results.BestRun).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6666666666666667,0.7857142857142857,0.9,0.5294117647058824,0.75,0.96,0.7847058823529411,0.7916167381810545


Interesting. Overall decent metrics, but that's a very small quantity of rows in the test set.

We can get more confidence by choosing a larger chunk of test data.

In [None]:
var split = context.Data.TrainTestSplit(df, testFraction: 0.3, seed: 42);
IDataView train = split.TrainSet;
IDataView test = split.TestSet;

In [None]:
var results = context.Auto().CreateBinaryClassificationExperiment(settings)
                            .Execute(train, test);

// Let's see what types of models and transforms it considered and their F1 scores
results.RunDetails.OrderByDescending(r => r.ValidationMetrics.F1Score)
                  .Select(r => r.TrainerName + ": " + r.ValidationMetrics.F1Score + Environment.NewLine)

That's a lot more models in the same amount of time since it didn't need to split every time. The overall metrics are worse, which likely indicates that either the training suffered from having less data or the metrics were artificially high early on from having a small test set.

Let's see the confusion matrix with more test data.

In [None]:
MLCharts.ClassificationReport(results.BestRun)

In [None]:
modelTracker.Register("Simple AutoML - 30% Test Split", results.BestRun).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6666666666666667,0.7857142857142857,0.9,0.5294117647058824,0.75,0.96,0.7847058823529411,0.7916167381810545
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557


Now let's look at how our pipeline works for this model.

In [None]:
var model = results.BestRun.Model;
model

In [None]:
#!transformer-vis model -n -d 1

These steps generally make sense. Let's get a deeper picture of what that text featurizer is doing by rendering a deeper view without note annotations:

In [None]:
#!transformer-vis model -d 3

Looks like tokenization, bagging, NGram extraction and normalization. We can get more details by drilling in just to the children of the TextFeaturizingEstimator.Transformer and its direct children and annotating those.

In [None]:
var chain = model as TransformerChain<ITransformer>;
var textTransformer = chain.ToList()[2];
#!transformer-vis textTransformer -n -d 1

That's a lot, and I'm still working on improving this visualization and the quality and layout of its notes, but it looks like it does unigram and bigram extraction at the word level, then trigram extraction at the character level. It also uses L2 normalization to reduce noise from irrelevant features and handles case sensitivity.

Notably absent from this is removal of punctuation, stop word removal, stemming, or removal of numbers.

Still, it did all of this automatically, which isn't bad.

Now that we've seen how the pipeline works, let's drill into the `BinaryPredictionTransformer` to try to understand its model.

In [None]:
var predictor = chain.Last();
#!reflect predictor

Property,Type,Value
ThresholdColumn,String,Score
Threshold,Single,0
LabelColumnName,String,
Host,IHost,Microsoft.ML.Data.LocalEnvironment+Host
BindableMapper,ISchemaBindableMapper,Microsoft.ML.Data.SchemaBindablePredictorWrapper
TrainSchema,DataViewSchema,32 columns


In [None]:
using Microsoft.ML.Trainers.FastTree;

var randomForest = predictor as BinaryPredictionTransformer<FastForestBinaryModelParameters>;
var forestModel = randomForest.Model;

#!reflect forestModel
forestModel

Property,Type,Value
InnerOptions,String,ps=2 nl=4 iter=4 ff=1
NumFeatures,Int32,5861
MaxSplitFeatIdx,Int32,5857
InputType,DataViewType,Vector
OutputType,DataViewType,Single
Host,IHost,Microsoft.ML.Data.LocalEnvironment+Host


index,value
,
,
,
,
TrainedTreeEnsemble,"Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeEnsembleBias0TreeWeights[ 1, 1, 1, 1 ]Treesindexvalue0Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 103, 5857 ]NumericalSplitThresholds[ 0.057353932, 0.10190575, 9.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5764705882352941, 0.9047619047619048, -1, 0.3333333333333333 ]SplitGains[ 43.45517488043252, 8.790652661395029, 11.409432962374133 ]NumberOfLeaves4NumberOfNodes31Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 106, 5850 ]NumericalSplitThresholds[ 0.057353932, 0.1075328, 0.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.24031007751937986, 0.9047619047619048, -0.9245283018867925, -0.8333333333333334 ]SplitGains[ 39.1577375797019, 11.173417128979487, 12.302719747733544 ]NumberOfLeaves4NumberOfNodes32Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 50, 566, 628 ]NumericalSplitThresholds[ 0.11360833, 0.124133974, 0.052999895 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.7551020408163265, 0.9428571428571428, 0.2, 0.4 ]SplitGains[ 77.13147621155588, 7.7083423229054375, 12.69490786605904 ]NumberOfLeaves4NumberOfNodes33Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 50, 3873, 103 ]NumericalSplitThresholds[ 0.11360833, 0.1042572, 0.10190575 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5632183908045977, 0.9259259259259259, 0.2727272727272727, -1 ]SplitGains[ 56.28925136966692, 8.842590132649804, 6.330286019780814 ]NumberOfLeaves4NumberOfNodes3"
,
Bias,0
TreeWeights,"[ 1, 1, 1, 1 ]"
Trees,"indexvalue0Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 103, 5857 ]NumericalSplitThresholds[ 0.057353932, 0.10190575, 9.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5764705882352941, 0.9047619047619048, -1, 0.3333333333333333 ]SplitGains[ 43.45517488043252, 8.790652661395029, 11.409432962374133 ]NumberOfLeaves4NumberOfNodes31Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 106, 5850 ]NumericalSplitThresholds[ 0.057353932, 0.1075328, 0.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.24031007751937986, 0.9047619047619048, -0.9245283018867925, -0.8333333333333334 ]SplitGains[ 39.1577375797019, 11.173417128979487, 12.302719747733544 ]NumberOfLeaves4NumberOfNodes32Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 50, 566, 628 ]NumericalSplitThresholds[ 0.11360833, 0.124133974, 0.052999895 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.7551020408163265, 0.9428571428571428, 0.2, 0.4 ]SplitGains[ 77.13147621155588, 7.7083423229054375, 12.69490786605904 ]NumberOfLeaves4NumberOfNodes33Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 50, 3873, 103 ]NumericalSplitThresholds[ 0.11360833, 0.1042572, 0.10190575 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5632183908045977, 0.9259259259259259, 0.2727272727272727, -1 ]SplitGains[ 56.28925136966692, 8.842590132649804, 6.330286019780814 ]NumberOfLeaves4NumberOfNodes3"
index,value

index,value
,
,
,
,
Bias,0
TreeWeights,"[ 1, 1, 1, 1 ]"
Trees,"indexvalue0Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 103, 5857 ]NumericalSplitThresholds[ 0.057353932, 0.10190575, 9.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5764705882352941, 0.9047619047619048, -1, 0.3333333333333333 ]SplitGains[ 43.45517488043252, 8.790652661395029, 11.409432962374133 ]NumberOfLeaves4NumberOfNodes31Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 106, 5850 ]NumericalSplitThresholds[ 0.057353932, 0.1075328, 0.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.24031007751937986, 0.9047619047619048, -0.9245283018867925, -0.8333333333333334 ]SplitGains[ 39.1577375797019, 11.173417128979487, 12.302719747733544 ]NumberOfLeaves4NumberOfNodes32Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 50, 566, 628 ]NumericalSplitThresholds[ 0.11360833, 0.124133974, 0.052999895 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.7551020408163265, 0.9428571428571428, 0.2, 0.4 ]SplitGains[ 77.13147621155588, 7.7083423229054375, 12.69490786605904 ]NumberOfLeaves4NumberOfNodes33Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 50, 3873, 103 ]NumericalSplitThresholds[ 0.11360833, 0.1042572, 0.10190575 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5632183908045977, 0.9259259259259259, 0.2727272727272727, -1 ]SplitGains[ 56.28925136966692, 8.842590132649804, 6.330286019780814 ]NumberOfLeaves4NumberOfNodes3"
index,value
0,"Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 103, 5857 ]NumericalSplitThresholds[ 0.057353932, 0.10190575, 9.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5764705882352941, 0.9047619047619048, -1, 0.3333333333333333 ]SplitGains[ 43.45517488043252, 8.790652661395029, 11.409432962374133 ]NumberOfLeaves4NumberOfNodes3"
,

index,value
,
,
,
,
0,"Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 103, 5857 ]NumericalSplitThresholds[ 0.057353932, 0.10190575, 9.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5764705882352941, 0.9047619047619048, -1, 0.3333333333333333 ]SplitGains[ 43.45517488043252, 8.790652661395029, 11.409432962374133 ]NumberOfLeaves4NumberOfNodes3"
,
LeftChild,"[ 1, 2, -1 ]"
RightChild,"[ -2, -3, -4 ]"
NumericalSplitFeatureIndexes,"[ 1353, 103, 5857 ]"
NumericalSplitThresholds,"[ 0.057353932, 0.10190575, 9.5 ]"

Unnamed: 0,Unnamed: 1
LeftChild,"[ 1, 2, -1 ]"
RightChild,"[ -2, -3, -4 ]"
NumericalSplitFeatureIndexes,"[ 1353, 103, 5857 ]"
NumericalSplitThresholds,"[ 0.057353932, 0.10190575, 9.5 ]"
CategoricalSplitFlags,"[ False, False, False ]"
LeafValues,"[ -0.5764705882352941, 0.9047619047619048, -1, 0.3333333333333333 ]"
SplitGains,"[ 43.45517488043252, 8.790652661395029, 11.409432962374133 ]"
NumberOfLeaves,4
NumberOfNodes,3

Unnamed: 0,Unnamed: 1
LeftChild,"[ 1, 2, -1 ]"
RightChild,"[ -2, -3, -4 ]"
NumericalSplitFeatureIndexes,"[ 1353, 106, 5850 ]"
NumericalSplitThresholds,"[ 0.057353932, 0.1075328, 0.5 ]"
CategoricalSplitFlags,"[ False, False, False ]"
LeafValues,"[ -0.24031007751937986, 0.9047619047619048, -0.9245283018867925, -0.8333333333333334 ]"
SplitGains,"[ 39.1577375797019, 11.173417128979487, 12.302719747733544 ]"
NumberOfLeaves,4
NumberOfNodes,3

Unnamed: 0,Unnamed: 1
LeftChild,"[ 1, 2, -1 ]"
RightChild,"[ -2, -3, -4 ]"
NumericalSplitFeatureIndexes,"[ 50, 566, 628 ]"
NumericalSplitThresholds,"[ 0.11360833, 0.124133974, 0.052999895 ]"
CategoricalSplitFlags,"[ False, False, False ]"
LeafValues,"[ -0.7551020408163265, 0.9428571428571428, 0.2, 0.4 ]"
SplitGains,"[ 77.13147621155588, 7.7083423229054375, 12.69490786605904 ]"
NumberOfLeaves,4
NumberOfNodes,3

Unnamed: 0,Unnamed: 1
LeftChild,"[ 1, 2, -1 ]"
RightChild,"[ -2, -3, -4 ]"
NumericalSplitFeatureIndexes,"[ 50, 3873, 103 ]"
NumericalSplitThresholds,"[ 0.11360833, 0.1042572, 0.10190575 ]"
CategoricalSplitFlags,"[ False, False, False ]"
LeafValues,"[ -0.5632183908045977, 0.9259259259259259, 0.2727272727272727, -1 ]"
SplitGains,"[ 56.28925136966692, 8.842590132649804, 6.330286019780814 ]"
NumberOfLeaves,4
NumberOfNodes,3


Looks like a random forest with 7 shallow trees. Let's see a bit more details.

In [None]:
Console.WriteLine("Random Forest with the following trees:");

int index = 0;
foreach (var tree in forestModel.TrainedTreeEnsemble.Trees) {
    Console.WriteLine($"\tTree {index} with {tree.NumberOfLeaves} leaves and {tree.NumberOfNodes} nodes and a weight of {forestModel.TrainedTreeEnsemble.TreeWeights[index++]}.");
    Console.WriteLine($"\t\tMost important feature indexes: {string.Join(", ", tree.NumericalSplitFeatureIndexes)}");
    Console.WriteLine($"\t\tMost important feature thresholds: {string.Join(", ", tree.NumericalSplitThresholds)}");
}

Random Forest with the following trees:
	Tree 0 with 4 leaves and 3 nodes and a weight of 1.
		Most important feature indexes: 1353, 103, 5857
		Most important feature thresholds: 0.057353932, 0.10190575, 9.5
	Tree 1 with 4 leaves and 3 nodes and a weight of 1.
		Most important feature indexes: 1353, 106, 5850
		Most important feature thresholds: 0.057353932, 0.1075328, 0.5
	Tree 2 with 4 leaves and 3 nodes and a weight of 1.
		Most important feature indexes: 50, 566, 628
		Most important feature thresholds: 0.11360833, 0.124133974, 0.052999895
	Tree 3 with 4 leaves and 3 nodes and a weight of 1.
		Most important feature indexes: 50, 3873, 103
		Most important feature thresholds: 0.11360833, 0.1042572, 0.10190575


Unfortunately, the Ngram extraction makes it hard to determine what each index relates to, but at least we can see where there's feature overlap and similar thresholds.

In [None]:
modelTracker.Register("Simple AutoML - 30% Test Split", results.BestRun).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6666666666666667,0.7857142857142857,0.9,0.5294117647058824,0.75,0.96,0.7847058823529411,0.7916167381810545
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557


I have a much better trust of the accuracy of these metrics than the other ones and it oddly has perfect precision despite still focusing on the F1 score. So many trees and forests in the trainers list, which makes me think that we're overfitting and preferring those models or we're not giving enough time for the other trainers to converge on good solutions.

Let's evaluate the time aspect by seeing how the best metric improves over many trials.

In [None]:
// Run the experiment - simplest one we'll do here, but let's just look at the simple options first
BinaryExperimentSettings settings = new() {
    MaxModels = 100,
    OptimizingMetric = BinaryClassificationMetric.F1Score,
};

var results = context.Auto().CreateBinaryClassificationExperiment(settings).Execute(train, test);

// ML.NET didn't have any sort of learning rate chart built-in so I built something to collect and chart the metrics myself.
MLCharts.MetricImprovement(monitor)

In [None]:
MLCharts.MetricImprovementWithTrials(monitor)

Here it looks like things are pretty stable at the observed metric. We can probably get bits of additional performance from expanding our trials significantly, but progress looks to be more or less random and we can get better tuning with more manual control over hyperparameter tuning later. For now, this does illustrate that 10 - 20 trials is significant to get a good ballpark impression of a pipeline's basic performance.


Next, let's examine cross validation of the same experiment to see what it tells us about the level of confidence we can have in our test metrics.

In [None]:
BinaryExperimentSettings settings = new BinaryExperimentSettings() {
    MaxModels = 10,
    OptimizingMetric = BinaryClassificationMetric.F1Score,
};

var results = context.Auto().CreateBinaryClassificationExperiment(settings)
                            .Execute(df, numberOfCVFolds: 5, labelColumnName: "Label");

// Cross Validation results are a bit different since they carry metrics per fold per result
results

In [None]:
// Let's start by averaging the overall F1 scores of each model considered against all of its folds
results.RunDetails.OrderByDescending(r => r.Results.Max(v => v.ValidationMetrics.F1Score))
                  .Select(r => r.TrainerName + ": " + r.Results.Average(v => v.ValidationMetrics.F1Score) + Environment.NewLine)

In [None]:
// Now let's see the confusion matrix for the best model
MLCharts.ClassificationReport(results.BestRun.Results.First().ValidationMetrics)

In [None]:
modelTracker.Register("Simple AutoML - 5 Fold CV", results.BestRun).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6666666666666667,0.7857142857142857,0.9,0.5294117647058824,0.75,0.96,0.7847058823529411,0.7916167381810545
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - 5 Fold CV,0.7142857142857143,0.8297872340425532,0.8333333333333334,0.625,0.8285714285714286,0.935483870967742,0.8220766129032258,0.7981657871596612


Okay, so this is probably more of a reliable picture of our precision and overall F1 score.

It looks like we're seeing a lot of trees, forests, and light GBM and all of them share a common set of pipeline transformation steps.

Let's take a look at what's important to this model:

Sometimes this non-calibrated method can fail, generating blank charts / data due to no impacts. Unfortunately, this does not seem to respect random seeds or may be dependent on the CPU load for its results. This can result in some failures. In that case, it can be helpful to look at the table of columns considered.

During dev I saw mixed results including some feature importances, and it looks like it primarily cares about the message length, and the presence of character arrays that are part of phrases like "fix" and "update".

Note: PFI takes awhile to run and has some randomness associated with it. You can reduce the duration by dropping the permutation count, but this increases the impact of that randomness in feature importance.

Generally speaking the PFI columns this cell identifies make sense, but there are also a few things that show up that look unusual or seem to be oriented to a few specific cases and not broad generalizations.

Now that we've seen the base level of accuracy from simple AutoML and how its transformations work, let's get some more control over the model training pipeline and see if we can get results for some non-tree or forest model trainers.

## Building a Custom Pipeline

In ML.NET everything flows through a pipeline, much like a SciKit-Learn pipeline, that progressively transforms data from one state to another. You can use pipelines with a specific model trainer or you can use them with AutoML. When you involve AutoML, AutoML still selects a model it feels will perform best for you, but it uses the pipeline you give it. It also exposes more options for configuring AutoML's behavior, including model selection and hyperparameter tuning.

In this section we'll create a pipeline and use AutoML to determine the best models from it. We'll also see how it compares to our simple AutoML models from earlier.

We'll start by creating a simple AutoML featurizer and feed it schema information from our DataFrame. This will help it know how to handle the columns it works with.

In [None]:
var colTypes = df.GetColumnTypes(excludedColumns: new[] { "Label" });
colTypes

Unnamed: 0,Unnamed: 1
Text,[ Message ]
Numeric,"[ WorkItems, TotalFiles, ModifiedFiles, AddedFiles, DeletedFiles, TotalLines, NetLines, AddedLines, DeletedLines, MessageLength, WordCount ]"
Categorical,"[ IsMerge, HasAddedFiles, HasDeletedFiles ]"
Excluded,[ Label ]


In [None]:
// This featurizer will trigger one-hot encoding and text featurization and handle column concatenation down to a single features column for us
SweepablePipeline featurizer = context.Auto().Featurizer(df, 
                                           catelogicalColumns: colTypes.Categorical.ToArray(), 
                                           numericColumns: colTypes.Numeric.ToArray(),
                                           textColumns: colTypes.Text.ToArray(), 
                                           excludeColumns: colTypes.Excluded.ToArray());

In [None]:
#!pipeline-vis featurizer -n

In [None]:
// The classifier step tells AutoML what model trainers are enabled. We'll focus on those that don't require scaled data for simplicity at the moment
var classifier = context.Auto().BinaryClassification(
    useFastForest: true, 
    useLgbm: true, 
    useFastTree: true, 
    useLbfgsLogisticRegression: true, 
    useSdcaLogisticRegression: true);

In [None]:
// Let's assemble these into a simple pipeline (note, we'll add scaling later and assume more granular control)
var pipeline = featurizer
    .Append(classifier);

In [None]:
pipeline.Estimators

In [43]:
#!pipeline-vis pipeline -n -d 2

In [38]:
// Now let's run our experiment using our custom pipeline
var experiment = context.Auto().CreateExperiment()
    .SetPipeline(pipeline)
    .SetDataset(split)
    .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label")
    .SetMaxModelToExplore(10);

TrialResult result = await experiment.RunAsync();
ITransformer model = result.Model;

var scorer = model.Transform(split.TestSet);

// If the model supports calibration, we could use Evaluate instead
var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label");

// Let's see how it performed
MLCharts.ClassificationReport(evalResults)

In [39]:
// Record it for comparison
modelTracker.Register("Custom AutoML - Simple Pipeline", evalResults).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - 5 Fold CV,0.7142857142857143,0.8297872340425532,0.8333333333333334,0.625,0.8285714285714286,0.935483870967742,0.8220766129032258,0.7981657871596612
3,Custom AutoML - Simple Pipeline,0.6376811594202899,0.8287671232876712,1.0,0.4680851063829787,0.7983870967741935,1.0,0.7762733720180529,0.7265291524725294


Clearly we could be better overall, but we're in the territory of our models from earlier. Let's visualize the pipeline for comparison.

In [40]:
#!transformer-vis model -d 3

In [41]:
var textTransformer = (model as TransformerChain<ITransformer>).ToList()[3];

#!transformer-vis textTransformer -d 2 -n

This looks remarkably similar, but we see we do have a one hot encoding transformer now.

Also, the model here looks like a calibrated decision tree, not a random forest.

In [69]:
using Microsoft.ML.Calibrators;

var predictor = (BinaryPredictionTransformer<CalibratedModelParametersBase<FastTreeBinaryModelParameters, PlattCalibrator>>)((model as TransformerChain<ITransformer>).LastTransformer);
predictor

Interesting. It's difficult to poke into the structure of that model, but it does appear to be tree or potentially forest-based. For now we're just exploring the impacts of choices on our pipelines, so this is fine, but we'd want to look deeper into the model before seriously considering deploying it.

We're not in bad territory here with our model, but I'd like to consider some non-tree / forest-based models while still using AutoML.

Next, let's see what we can do to improve the accuracy of our model by taking greater control over the text featurization aspect. To do this, we'll replace our Featurizer with manual steps.

In [None]:
// We'll start with a missing value replacer. We shouldn't have any missing values in training, but perhaps the data we're predicting will.
MissingValueReplacingEstimator imputer = context.Transforms.ReplaceMissingValues(
    columns: colTypes.Numeric.Select(c => new InputOutputColumnPair(c, c)).ToArray(), 
    replacementMode: MissingValueReplacingEstimator.ReplacementMode.DefaultValue);

In [71]:
// The one-hot encoder wasn't bad, I guess. Let's keep it in there.
OneHotEncodingEstimator oneHot = context.Transforms.Categorical.OneHotEncoding(columns: colTypes.Categorical.Select(c => new InputOutputColumnPair(c,c)).ToArray());

In [78]:
// Having a scaler is a good idea for many models, so let's add that in
NormalizingEstimator scaler = context.Transforms.NormalizeMinMax(columns: colTypes.Numeric.Select(c => new InputOutputColumnPair(c, c)).ToArray());

In [79]:
// Let's add a text normalizer in there too and have it prune out junk punctuation
TextNormalizingEstimator textNorm = context.Transforms.Text.NormalizeText("Message", "Message", TextNormalizingEstimator.CaseMode.Lower, keepDiacritics: false, keepPunctuations: false, keepNumbers: false);

In [76]:
var options = new TextFeaturizingEstimator.Options() {
    OutputTokensColumnName = "MessageTokens",
    StopWordsRemoverOptions = new StopWordsRemovingEstimator.Options() {
        Language = TextFeaturizingEstimator.Language.English,
    },
    WordFeatureExtractor = new WordBagEstimator.Options() {
        NgramLength = 2,
        UseAllLengths = true,
    },
    CharFeatureExtractor = new WordBagEstimator.Options() {
        NgramLength = 3,
        UseAllLengths = true,
    },
    KeepDiacritics = false,
    KeepPunctuations = false,
    KeepNumbers = false,
    CaseMode = TextNormalizingEstimator.CaseMode.Lower,
    Norm = TextFeaturizingEstimator.NormFunction.L2,
};
TextFeaturizingEstimator textFeaturizer = context.Transforms.Text.FeaturizeText("Message", options, "Message");

In [83]:
// Concatenate down to a Features column
ColumnConcatenatingEstimator concat = context.Transforms.Concatenate("Features", inputColumnNames: colTypes.Numeric.Concat(new[] { "MessageTokens" }).ToArray());

In [None]:
// Let's create our pipeline
SweepablePipeline pipeline = imputer
    .Append(oneHot)
    .Append(scaler)
    .Append(textFeaturizer)
    .Append(concat)
    .Append(classifier);


In [84]:

// Now let's run our experiment using our custom pipeline
var experiment = context.Auto().CreateExperiment()
    .SetPipeline(pipeline)
    .SetDataset(split)
    .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label")
    .SetMaxModelToExplore(10);

TrialResult result = await experiment.RunAsync();
ITransformer model = result.Model;

#!transformer-vis model -d 3
MLCharts.ClassificationReport(context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label")) 

Error: System.InvalidOperationException: Concatenated columns should have the same type. Column 'MessageTokens' has type of String, but expected column type is Single.
   at Microsoft.ML.Transforms.ColumnConcatenatingEstimator.CheckInputsAndMakeColumn(SchemaShape inputSchema, String name, String[] sources)
   at Microsoft.ML.Transforms.ColumnConcatenatingEstimator.GetOutputSchema(SchemaShape inputSchema)
   at Microsoft.ML.Data.EstimatorChain`1.GetOutputSchema(SchemaShape inputSchema)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Microsoft.ML.AutoML.SweepablePipelineRunner.Run(TrialSettings settings)
   at Microsoft.ML.AutoML.SweepablePipelineRunner.RunAsync(TrialSettings settings, CancellationToken ct)
   at Microsoft.ML.AutoML.AutoMLExperiment.RunAsync(CancellationToken ct)
   at Submission#80.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

Logistic regression this time. Plus this is a calibrated model so can get probabilities at inferrencing time, which is nice.

In [49]:
var scorer = model.Transform(split.TestSet);

var evalResults = context.BinaryClassification.Evaluate(model.Transform(split.TestSet), labelColumnName: "Label");

MLCharts.ClassificationReport(evalResults)

In [50]:
modelTracker.Register("Custom AutoML Pipeline - No Trees / Random Forest", evalResults).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6666666666666667,0.7857142857142857,0.9,0.5294117647058824,0.75,0.96,0.7847058823529411,0.7916167381810545
1,Simple AutoML - 30% Test Split,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
2,Simple AutoML - 5 Fold CV,0.7142857142857143,0.8297872340425532,0.8333333333333334,0.625,0.8285714285714286,0.935483870967742,0.8220766129032258,0.7981657871596612
3,Custom AutoML Pipeline - Baseline,0.6575342465753424,0.8287671232876712,0.9230769230769232,0.5106382978723404,0.8083333333333333,0.9797979797979798,0.852675693101225,0.8108735504522139
4,Custom AutoML Pipeline - No Trees / Random Forest,0.6835443037974684,0.8287671232876712,0.84375,0.574468085106383,0.8245614035087719,0.9494949494949496,0.8140984311197077,0.7955669691994453


It looks like by disabling the random forest and tree algorithms, we got it to pay more attention to logistic regression and it gave us a better result in terms of both F1 and Precision (among others I care less about).

I also am less scared that logistic regression will overfit to this sample of train / test data.

In [51]:
MLCharts.FeatureImportances(context, model, split.TestSet, permutationCount: 3, numFeatures: 10)

I feel this model is much more generalized and like how it's able to incorporate both text and numericfeatures. The features largely make sense, though "on" is a stop word and is apparently not being removed by the featurizer. It'd be good to look for a pre-processing step to remove stop words because I do believe stemming and case comparison is already taking place.

Let's try reworking our pipeline to manually handle stop words.

In [52]:
TextFeaturizingEstimator.Options textOptions = new TextFeaturizingEstimator.Options()
{
    CaseMode = TextNormalizingEstimator.CaseMode.Lower,
    KeepDiacritics = false,
    KeepPunctuations = true, // This is helpful for issue identifier prefixes as well as syntax discussions
    KeepNumbers = false,

    // To support non-English languages (if training data also expands in diversity) this could switch to an aggregated stop words set from all languages.
    // Alternatively, we could use a language-specific model with a multi-class language classifier in front of it.
    StopWordsRemoverOptions = new StopWordsRemovingEstimator.Options()
    {
        Language = TextFeaturizingEstimator.Language.English
    },

    // Configure ngram extraction
    WordFeatureExtractor = new WordBagEstimator.Options()
    {
        NgramLength = 3,
        UseAllLengths = true // consider bigrams and trigrams as well
    },

    // Configure char ngram extraction - very useful for important chunks of words such as "fix"
    CharFeatureExtractor = new WordBagEstimator.Options()
    {
        NgramLength = 4,
        UseAllLengths = true // allow shorter blocks of characters to be considered
    },

    Norm = TextFeaturizingEstimator.NormFunction.L2,
};

TextFeaturizingEstimator textFeaturizer = context.Transforms.Text.FeaturizeText("Message", textOptions, "Message");

In [53]:
// Let's create a base pipeline from our common steps
var basePipeline = imputer
    .Append(boolConverter)
    .Append(textFeaturizer)
    .Append(concatenator);

// And a specific pipeline with the classifier added to the end
SweepablePipeline pipeline = basePipeline
    .Append(classifier);

In [54]:
// Disable our tree-based models and run the experiment again
var classifier = context.Auto().BinaryClassification(
    useFastTree: false,
    useFastForest: false, 
    useLgbm: false);

// Let's create our pipeline
SweepablePipeline pipeline = basePipeline.Append(classifier);

// Now let's run our experiment using our custom pipeline
var experiment = context.Auto().CreateExperiment()
    .SetPipeline(pipeline)
    .SetDataset(split)
    .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label")
    .SetTrainingTimeInSeconds(10);

TrialResult result = await experiment.RunAsync();
ITransformer model = result.Model;

var scorer = model.Transform(split.TestSet);

var evalResults = context.BinaryClassification.Evaluate(model.Transform(split.TestSet), labelColumnName: "Label");

MLCharts.ClassificationReport(evalResults)

In [55]:
modelTracker.Register("Custom AutoML Pipeline - Advanced Text Featurization", evalResults).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6666666666666667,0.7857142857142857,0.9,0.5294117647058824,0.75,0.96,0.7847058823529411,0.7916167381810545
1,Simple AutoML - 30% Test Split,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
2,Simple AutoML - 5 Fold CV,0.7142857142857143,0.8297872340425532,0.8333333333333334,0.625,0.8285714285714286,0.935483870967742,0.8220766129032258,0.7981657871596612
3,Custom AutoML Pipeline - Baseline,0.6575342465753424,0.8287671232876712,0.9230769230769232,0.5106382978723404,0.8083333333333333,0.9797979797979798,0.852675693101225,0.8108735504522139
4,Custom AutoML Pipeline - No Trees / Random Forest,0.6835443037974684,0.8287671232876712,0.84375,0.574468085106383,0.8245614035087719,0.9494949494949496,0.8140984311197077,0.7955669691994453
5,Custom AutoML Pipeline - Advanced Text Featurization,0.5555555555555557,0.7808219178082192,0.8,0.425531914893617,0.7768595041322314,0.9494949494949496,0.7375886524822695,0.6880814384901448


In [49]:
NormalizingEstimator normalizer = context.Transforms.NormalizeRobustScaling(numericColumns);

TextNormalizingEstimator textNormalizer = context.Transforms.Text.NormalizeText(
    inputColumnName: "Message", 
    outputColumnName: "Message", 
    caseMode: TextNormalizingEstimator.CaseMode.Lower, 
    keepDiacritics: false, 
    keepPunctuations: false, 
    keepNumbers: false);

WordTokenizingEstimator tokenizer = context.Transforms.Text.TokenizeIntoWords(inputColumnName: "Message", outputColumnName: "Message");

StopWordsRemovingEstimator stopWordsRemover = context.Transforms.Text.RemoveDefaultStopWords(inputColumnName: "Message", outputColumnName: "Message", language: StopWordsRemovingEstimator.Language.English);

### Logistic Regression Focus

In [122]:
// We're going to define our own search space for the Lbfgs trainer so it can be customized to find better hyperparameters
SearchSpace<LbfgsOption> searchSpace = new();
searchSpace["L1Regularization"] = new UniformSingleOption(min: 0.0f, max: 0.5f, logBase: false, defaultValue: 0.05f);
searchSpace["L2Regularization"] = new UniformSingleOption(min: 0.0f, max: 0.5f, logBase: false, defaultValue: 0.05f);
searchSpace

key,value
,
,
L1Regularization,Microsoft.ML.SearchSpace.Option.UniformSingleOptionMin0Max0.5LogBaseFalseFeatureSpaceDim1Step[ <null> ]Default[ 0.10000000149011612 ]
,
Min,0
Max,0.5
LogBase,False
FeatureSpaceDim,1
Step,[ <null> ]
Default,[ 0.10000000149011612 ]

Unnamed: 0,Unnamed: 1
Min,0
Max,0.5
LogBase,False
FeatureSpaceDim,1
Step,[ <null> ]
Default,[ 0.10000000149011612 ]

Unnamed: 0,Unnamed: 1
Min,0
Max,0.5
LogBase,False
FeatureSpaceDim,1
Step,[ <null> ]
Default,[ 0.10000000149011612 ]


In [123]:
// The last leg of our pipeline will be the custom trainer using LbfgsLogisticRegression with our custom search space
SweepableEstimator estimator = context.Auto().CreateSweepableEstimator((context, options) => 
    context.BinaryClassification.Trainers.LbfgsLogisticRegression(labelColumnName: "Label",
    l1Regularization: options.L1Regularization,
    l2Regularization: options.L2Regularization), 
searchSpace);

var pipeline = basePipeline.Append(estimator);

In [124]:
public class TrialParameters {
    public Dictionary<string, float> e1 { get; set; }
}

public class TrialParameterPipeline {
    public TrialParameters _pipeline_ { get; set; }
}

public class TrialValues {
    public TrialParameterPipeline Parameter { get; set; }
}

string lastJson = null;
TrialValues bestHyperparameters = null;

EventHandler<LoggingEventArgs> logBest = (s, e) => {
    if (e.Message.Contains("L2Regularization")) {
        string json = e.Message.Substring(e.Message.IndexOf("{"));
        json = json.Substring(0, json.LastIndexOf("}") + 1);
        lastJson = json;
    }

    if (e.Message.Contains("Update Best Trial"))
    {
        bestHyperparameters = JsonConvert.DeserializeObject<TrialValues>(lastJson);
    }
};

Formatter.Register<TrialValues>((values, writer) => {
    // Write a title above the table with the type of object it was
    writer.Write("<h4>Best Hyperparameter Values</h4>");
    writer.Write("<table>");
    writer.Write("<thead><tr><th>Hyperparameter</th><th>Value</th></tr></thead>");
    foreach (var kvp in values.Parameter._pipeline_.e1) {
        writer.Write("<tr><th><strong>");
        writer.Write(kvp.Key);
        writer.Write("</strong></th><td>");
        writer.Write(kvp.Value);
        writer.Write("</td></tr>");
    }
    writer.Write("</table>");
}, "text/html");

In [125]:
// Now let's run our experiment using our custom pipeline, specifying the tuner this time around
var experiment = context.Auto().CreateExperiment()
    .SetPipeline(pipeline)
    .SetDataset(split)
    .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label")
    .SetTrainingTimeInSeconds(30)
    .SetEciCostFrugalTuner(); // This was the default tuner before, but let's make it official

context.Log += logBest;
TrialResult result = await experiment.RunAsync();
context.Log -= logBest;

bestHyperparameters

Hyperparameter,Value
L1Regularization,0.0065167905
L2Regularization,0.138471


In [126]:
ITransformer model = result.Model;

var scorer = model.Transform(split.TestSet);

var evalResults = context.BinaryClassification.Evaluate(model.Transform(split.TestSet), labelColumnName: "Label");
(evalResults, evalResults.ConfusionMatrix.GetFormattedConfusionTable())

index,value
Item1,"Microsoft.ML.Data.CalibratedBinaryClassificationMetricsLogLoss0.7099515597352173LogLossReduction0.2167821918925049Entropy0.9064548231489877AreaUnderRocCurve0.7908875993982377Accuracy0.8424657534246576PositivePrecision0.9PositiveRecall0.574468085106383NegativePrecision0.8275862068965517NegativeRecall0.9696969696969697F1Score0.7012987012987012AreaUnderPrecisionRecallCurve0.7692839911293207ConfusionMatrixMicrosoft.ML.Data.ConfusionMatrixPerClassPrecision[ 0.9, 0.8275862068965517 ]PerClassRecall[ 0.574468085106383, 0.9696969696969697 ]Countsindexvalue0[ 27, 20 ]1[ 3, 96 ]NumberOfClasses2"
,
LogLoss,0.7099515597352173
LogLossReduction,0.2167821918925049
Entropy,0.9064548231489877
AreaUnderRocCurve,0.7908875993982377
Accuracy,0.8424657534246576
PositivePrecision,0.9
PositiveRecall,0.574468085106383
NegativePrecision,0.8275862068965517

index,value
LogLoss,0.7099515597352173
LogLossReduction,0.2167821918925049
Entropy,0.9064548231489877
AreaUnderRocCurve,0.7908875993982377
Accuracy,0.8424657534246576
PositivePrecision,0.9
PositiveRecall,0.574468085106383
NegativePrecision,0.8275862068965517
NegativeRecall,0.9696969696969697
F1Score,0.7012987012987012

index,value
PerClassPrecision,"[ 0.9, 0.8275862068965517 ]"
PerClassRecall,"[ 0.574468085106383, 0.9696969696969697 ]"
Counts,"indexvalue0[ 27, 20 ]1[ 3, 96 ]"
index,value
0,"[ 27, 20 ]"
1,"[ 3, 96 ]"
NumberOfClasses,2

index,value
0,"[ 27, 20 ]"
1,"[ 3, 96 ]"
