# Model Training & Evaluation - .NET

This is a Polyglot Notebook typeof(int)ended to run on the .NET typeof(int)eractive kernel currently running on .NET 8.

This notebook's role is to build and evaluate model training pipelines, perform hyperparameter tuning, and find a series of best models for commit classification using ML.NET. Other model training efforts will be performed using Python in a separate notebook, but those efforts will focus on models that support ONNX export that can be imported typeof(int)o ML.NET. This is because the ultimate selected model will be typeof(int)egrated typeof(int)o GitStractor which runs on .NET and ML.NET is the best available vector to do that.

## Dependencies
Download and install NuGet packages and set up common imports

In [1]:
#r "nuget:Microsoft.Data.Analysis"
#r "nuget:Microsoft.ML"
#r "nuget:Microsoft.ML.AutoML"
#r "nuget:Newtonsoft.Json"
#r "nuget:Plotly.NET"
#r "nuget:Plotly.NET.Interactive"

using Microsoft.DotNet.Interactive.Formatting;
using Microsoft.Data.Analysis;
using Microsoft.ML;
using Microsoft.ML.AutoML;
using Microsoft.ML.AutoML.CodeGen;
using Microsoft.ML.SearchSpace;
using Microsoft.ML.SearchSpace.Option;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;
using Microsoft.ML.Trainers.LightGbm;
using Microsoft.ML.Transforms;
using Microsoft.ML.Transforms.Text;
using Newtonsoft.Json;
using System.Reflection;

Loading extensions from `/home/matteland/.nuget/packages/plotly.net.interactive/5.0.0/lib/netstandard2.1/Plotly.NET.Interactive.dll`

Loading extensions from `/home/matteland/.nuget/packages/microsoft.data.analysis/0.21.1/interactive-extensions/dotnet/Microsoft.Data.Analysis.Interactive.dll`

Loading extensions from `/home/matteland/.nuget/packages/microsoft.ml.automl/0.21.1/interactive-extensions/dotnet/Microsoft.ML.AutoML.Interactive.dll`

Loading extensions from `/home/matteland/.nuget/packages/skiasharp/2.88.6/interactive-extensions/dotnet/SkiaSharp.DotNet.Interactive.dll`

I actually wound up creating some .NET libraries to make some of the code here simpler. 

These were non-core to the analysis and were intended to benefit the community as a whole.

You can find these libraries and their code [on GitHub](https://github.com/IntegerMan/MattEland.ML)

In [2]:
//#r "nuget:MattEland.ML"
//#r "nuget:MattEland.ML.Charts"
//#r "nuget:MattEland.ML.DataFrames"
//#r "nuget:MattEland.ML.Interactive"
#r "/home/matteland/Documents/MattEland.ML/MattEland.ML/MattEland.ML/bin/Debug/net8.0/MattEland.ML.dll"
#r "/home/matteland/Documents/MattEland.ML/MattEland.ML/MattEland.ML.DataFrames/bin/Debug/net8.0/MattEland.ML.DataFrames.dll"
#r "/home/matteland/Documents/MattEland.ML/MattEland.ML/MattEland.ML.Charts/bin/Debug/net8.0/MattEland.ML.Charts.dll"
#r "/home/matteland/Documents/MattEland.ML/MattEland.ML/MattEland.ML.Interactive/bin/Debug/net8.0/MattEland.ML.Interactive.dll"

using MattEland.ML;
using MattEland.ML.Charts;
using MattEland.ML.DataFrames;
using MattEland.ML.Interactive;

await MattEland.ML.Interactive.InteractiveExtensions.Load(Microsoft.DotNet.Interactive.KernelInvocationContext.Current.HandlingKernel.RootKernel);

## Data Loading

In [3]:
var df = DataFrame.LoadCsv("data/Training.csv", separator: ',', header: true);
df.Sample(5)

index,PredictedLabel,ActualLabel,Message,Reasoning,Sha,Source,ParentSha,Parent2Sha,IsMerge,AuthorId,AuthorDateUtc,CommitterId,CommitterDateUtc,WorkItems,TotalFiles,ModifiedFiles,AddedFiles,DeletedFiles,TotalLines,NetLines,AddedLines,DeletedLines,HasAddedFiles,HasDeletedFiles,DayOfWeek,Month,Quarter,Year,Hour,TimeOfDay,IsWeekend,MessageLength,WordCount
0,False,False,Update TorchSharp to 0.98.3 (#6436),This commit likely corresponds to an update or release of a library rather than fixing a bug.,17c061acd1cc87a84b9a821ff95f235d836a3737,mlnet,8481c2a00e73e0c0ac97e112483b2d3a35a4a7f8,,False,141,2022-11-07 17:19:39Z,7,2022-11-07 22:19:39Z,1,1,1,0,0,83,0,1,1,False,False,Monday,November,4,2022,17,Afternoon,False,35,5
1,True,False,Return distinct array of ParameterSet when ProposeSweep is called (#368),"The commit addresses an issue related to the functionality by ensuring uniqueness in return values, likely a bug",7f8caf7e08ab23b5b2117fb788af4a846276eb36,mlnet,fc7286c7d9aa9218c4c8da3357b1c58361c7e8f5,,False,41,2018-06-19 12:36:30Z,15,2018-06-19 16:36:30Z,1,5,3,2,0,861,93,97,4,True,False,Tuesday,June,2,2018,12,Morning,False,72,10
2,True,False,always use file extension when opening a notebook,Prompts for correct usage which might fix user errors,08613b6a6da27f0c2b71f05324501e0d92004285,dotnetinteractive,d0fcf581bd92657aa3a4ac5bc7ade5d98f882b46,,False,8,2021-04-27 12:41:45Z,14,2021-04-27 19:01:05Z,0,4,4,0,0,4190,8,29,21,False,False,Tuesday,April,2,2021,12,Morning,False,49,8
3,False,False,Using the latest Numeric vector apis (#2082),Adopting an API does not necessarily fix a bug.,bdc9a9e1348abfdd2287d43e2633089e9eba36c7,mlnet,da973babd77475093ef6633a7d0ca2fb45d65e46,,False,19,2019-01-09 13:04:37Z,7,2019-01-09 18:04:37Z,1,5,5,0,0,4296,0,134,134,False,False,Wednesday,January,1,2019,13,Afternoon,False,44,7
4,True,False,add support for clearing variables from HttpKernel (#3467),The commit addresses an issue identified in ticket #3467,3ba37fe05153ad2d2e7750c4e72ed8b094506146,dotnetinteractive,0e90328dd36e15ea60f8c3959f93369fde9db670,,False,6,2024-02-26 19:52:19Z,2,2024-02-27 00:52:19Z,1,7,5,2,0,3201,71,80,9,True,False,Monday,February,1,2024,19,Evening,False,58,8


In [4]:
df.Columns.Select(c => c.Name + Environment.NewLine)

In [5]:
// Let's drop columns we don't want the model to learn from
df.Columns.Remove("PredictedLabel", "Reasoning", "AuthorId", "AuthorDateUtc", "CommitterId", "CommitterDateUtc", "ParentSha", "Parent2Sha", "DayOfWeek", "Month", "Quarter", "Year", "Hour", "TimeOfDay", "IsWeekend", "Sha");

// NOTE: We're keeping Source for fairness comparison later on

df.Info()

index,Info,ActualLabel,Message,Source,IsMerge,WorkItems,TotalFiles,ModifiedFiles,AddedFiles,DeletedFiles,TotalLines,NetLines,AddedLines,DeletedLines,HasAddedFiles,HasDeletedFiles,MessageLength,WordCount
0,DataType,System.Boolean,System.String,System.String,System.Boolean,System.Single,System.Single,System.Single,System.Single,System.Single,System.Single,System.Single,System.Single,System.Single,System.Boolean,System.Boolean,System.Single,System.Single
1,Length (excluding null values),499,499,499,499,499,499,499,499,499,499,499,499,499,499,499,499,499


Okay. That's the expected type for each and no missing rows. Looks like I was off somewhere and lost a row in my training data, but 500 -> 499 isn't a huge issue.

We don't need to do any additional feature engineering here since that was all done as part of EDA in `LabelledEDA.ipynb`, though we could one-hot encode Source if we were going to include it in the data to the model and we will be extracting text features from Message.

Let's do some final name cleanup for ActualLabel since ML.NET looks for the Label column by default.

In [6]:
//df["Label"] = new PrimitiveDataFrameColumn<int>("Label", df["ActualLabel"].Cast<bool>().Select(c => c ? 1 : 0));
//df.Columns.Remove("ActualLabel");
df["ActualLabel"].SetName("Label");

And finally some fun descriptive statistics.

In [7]:
df.Description()

index,Description,WorkItems,TotalFiles,ModifiedFiles,AddedFiles,DeletedFiles,TotalLines,NetLines,AddedLines,DeletedLines,MessageLength,WordCount
0,Length (excluding null values),499.0,499.0,499.0,499.0,499.0,499.0,499.0,499.0,499.0,499.0,499.0
1,Max,2.0,588.0,587.0,8.0,4.0,98927.0,241.0,3997.0,3980.0,124.0,17.0
2,Min,0.0,1.0,0.0,0.0,0.0,1.0,-8.0,0.0,0.0,3.0,1.0
3,Mean,0.46292585,7.188377,6.747495,0.40480962,0.036072146,2628.0942,32.54108,73.951904,41.41082,41.216434,5.6152306


*Note*: The DataFrame doesn't give you as many, but you could integrate MathNet.Numerics for many additional statistical measurements. This isn't an EDA notebook so we won't do that here.

# Model Training

We're now going to start the series of determining what types of models work best with our data. We'll start broadly to get a good general sense and then dial in on a key model trainer or two to see how we can tune and optimize it.

In [8]:
// Creat a custom model tracker to record the various experiments we run
BinaryClassificationModelTracker modelTracker = new();

// Although the metric we probably care the most about is the Precision, we're going to focus on F1 Score during model training in order to encourage discovering the most balanced models between precision and recall
modelTracker.DefaultMetric = BinaryClassificationMetric.F1Score;

## Early ML.NET AutoML Experiments
Let's start with a simple AutoML experiment without any pipelines to see what kinds of models are performing best without transformations or manual tuning.

We just want to see what kind of "out of the box" baseline model performance we can get from AutoML without any customization and what model trainers and transforms tend to get selected.

In [9]:
// Everything flows from our MLContext object
int seed = 42;
MLContext context = new(seed: seed) {
    GpuDeviceId = 0,
    FallbackToCpu = true,
};

In [10]:
// Early initial experiments will use train / test splits. Subsequent experiments will use cross-validation
DataFrame dfNoSource = df.Clone();
dfNoSource.Columns.Remove("Source");

var split = context.Data.TrainTestSplit(dfNoSource, testFraction: 0.1, seed: seed);

In [11]:
// Run the experiment - simplest one we'll do here, but let's just look at the simple options first
BinaryExperimentSettings settings = new() {
    MaxModels = 10,
    OptimizingMetric = BinaryClassificationMetric.F1Score,
};

var results = context.Auto().CreateBinaryClassificationExperiment(settings)
                            .Execute(split.TrainSet, split.TestSet);

// Let's see what types of model trainers and transforms it considered and their F1 scores
results.RunDetails.OrderByDescending(r => r.ValidationMetrics.F1Score)
                  .Select(r => r.TrainerName + ": " + r.ValidationMetrics.F1Score + Environment.NewLine)

In [12]:
MLCharts.RenderConfusionMatrix(results.BestRun)

In [13]:
MLCharts.RenderClassificationMetrics(results.BestRun.ValidationMetrics)

In [14]:
// Record the best run in our model tracker so we can compare it to future models
modelTracker.Register("Simple AutoML - 10% Test Split", results.BestRun.ValidationMetrics).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1,0.4705882352941176,0.7352941176470589,1,0.8211764705882353,0.8290542115376778


Interesting. Overall decent metrics, but that's a very small quantity of rows in the test set.

We can get more confidence by choosing a larger chunk of test data.

In [15]:
var split = context.Data.TrainTestSplit(dfNoSource, testFraction: 0.3, seed: seed);

In [16]:
var results = context.Auto().CreateBinaryClassificationExperiment(settings)
                            .Execute(split.TrainSet, split.TestSet);

// Let's see what types of models and transforms it considered and their F1 scores
results.RunDetails.OrderByDescending(r => r.ValidationMetrics.F1Score)
                  .Select(r => r.TrainerName + ": " + r.ValidationMetrics.F1Score + Environment.NewLine)

That's a lot more models in the same amount of time since it didn't need to split every time. The overall metrics are worse, which likely indicates that either the training suffered from having less data or the metrics were artificially high early on from having a small test set.

Let's see the confusion matrix with more test data.

In [17]:
MLCharts.ClassificationReport(results.BestRun)

Now let's look at how our pipeline works for this model.

In [18]:
var model = results.BestRun.Model;
model

In [19]:
#!transformer-vis model -n -d 1

These steps generally make sense. Let's get a deeper picture of what that text featurizer is doing by rendering a deeper view without note annotations:

In [20]:
#!transformer-vis model -d 3

Looks like tokenization, bagging, NGram extraction and normalization. We can get more details by drilling in just to the children of the TextFeaturizingEstimator.Transformer and its direct children and annotating those.

In [21]:
var chain = model as TransformerChain<ITransformer>;
var textTransformer = chain.ToList()[2];
#!transformer-vis textTransformer -n -d 1

That's a lot, and I'm still working on improving this visualization and the quality and layout of its notes, but it looks like it does unigram and bigram extraction at the word level, then trigram extraction at the character level. It also uses L2 normalization to reduce noise from irrelevant features and handles case sensitivity.

Notably absent from this is removal of punctuation, stop word removal, stemming, or removal of numbers.

Still, it did all of this automatically, which isn't bad.

Now that we've seen how the pipeline works, let's drill into the `BinaryPredictionTransformer` to try to understand its model.

In [23]:
var predictor = chain.Last();
#!reflect predictor

Property,Type,Value
ThresholdColumn,String,Score
Threshold,Single,0
LabelColumnName,String,
Host,IHost,Microsoft.ML.Data.LocalEnvironment+Host
BindableMapper,ISchemaBindableMapper,Microsoft.ML.Data.SchemaBindablePredictorWrapper
TrainSchema,DataViewSchema,32 columns


In [24]:
using Microsoft.ML.Trainers.FastTree;

var randomForest = predictor as BinaryPredictionTransformer<FastForestBinaryModelParameters>;
var forestModel = randomForest.Model;

#!reflect forestModel
forestModel

Property,Type,Value
InnerOptions,String,ps=2 nl=4 iter=4 ff=1
NumFeatures,Int32,5861
MaxSplitFeatIdx,Int32,5857
InputType,DataViewType,Vector
OutputType,DataViewType,Single
Host,IHost,Microsoft.ML.Data.LocalEnvironment+Host


index,value
,
,
,
,
TrainedTreeEnsemble,"Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeEnsembleBias0TreeWeights[ 1, 1, 1, 1 ]Treesindexvalue0Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 103, 5857 ]NumericalSplitThresholds[ 0.057353932, 0.10190575, 9.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5764705882352941, 0.9047619047619048, -1, 0.3333333333333333 ]SplitGains[ 43.45517488043252, 8.790652661395029, 11.409432962374133 ]NumberOfLeaves4NumberOfNodes31Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 106, 5850 ]NumericalSplitThresholds[ 0.057353932, 0.1075328, 0.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.24031007751937986, 0.9047619047619048, -0.9245283018867925, -0.8333333333333334 ]SplitGains[ 39.1577375797019, 11.173417128979487, 12.302719747733544 ]NumberOfLeaves4NumberOfNodes32Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 50, 566, 628 ]NumericalSplitThresholds[ 0.11360833, 0.124133974, 0.052999895 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.7551020408163265, 0.9428571428571428, 0.2, 0.4 ]SplitGains[ 77.13147621155588, 7.7083423229054375, 12.69490786605904 ]NumberOfLeaves4NumberOfNodes33Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 50, 3873, 103 ]NumericalSplitThresholds[ 0.11360833, 0.1042572, 0.10190575 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5632183908045977, 0.9259259259259259, 0.2727272727272727, -1 ]SplitGains[ 56.28925136966692, 8.842590132649804, 6.330286019780814 ]NumberOfLeaves4NumberOfNodes3"
,
Bias,0
TreeWeights,"[ 1, 1, 1, 1 ]"
Trees,"indexvalue0Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 103, 5857 ]NumericalSplitThresholds[ 0.057353932, 0.10190575, 9.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5764705882352941, 0.9047619047619048, -1, 0.3333333333333333 ]SplitGains[ 43.45517488043252, 8.790652661395029, 11.409432962374133 ]NumberOfLeaves4NumberOfNodes31Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 106, 5850 ]NumericalSplitThresholds[ 0.057353932, 0.1075328, 0.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.24031007751937986, 0.9047619047619048, -0.9245283018867925, -0.8333333333333334 ]SplitGains[ 39.1577375797019, 11.173417128979487, 12.302719747733544 ]NumberOfLeaves4NumberOfNodes32Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 50, 566, 628 ]NumericalSplitThresholds[ 0.11360833, 0.124133974, 0.052999895 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.7551020408163265, 0.9428571428571428, 0.2, 0.4 ]SplitGains[ 77.13147621155588, 7.7083423229054375, 12.69490786605904 ]NumberOfLeaves4NumberOfNodes33Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 50, 3873, 103 ]NumericalSplitThresholds[ 0.11360833, 0.1042572, 0.10190575 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5632183908045977, 0.9259259259259259, 0.2727272727272727, -1 ]SplitGains[ 56.28925136966692, 8.842590132649804, 6.330286019780814 ]NumberOfLeaves4NumberOfNodes3"
index,value

index,value
,
,
,
,
Bias,0
TreeWeights,"[ 1, 1, 1, 1 ]"
Trees,"indexvalue0Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 103, 5857 ]NumericalSplitThresholds[ 0.057353932, 0.10190575, 9.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5764705882352941, 0.9047619047619048, -1, 0.3333333333333333 ]SplitGains[ 43.45517488043252, 8.790652661395029, 11.409432962374133 ]NumberOfLeaves4NumberOfNodes31Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 106, 5850 ]NumericalSplitThresholds[ 0.057353932, 0.1075328, 0.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.24031007751937986, 0.9047619047619048, -0.9245283018867925, -0.8333333333333334 ]SplitGains[ 39.1577375797019, 11.173417128979487, 12.302719747733544 ]NumberOfLeaves4NumberOfNodes32Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 50, 566, 628 ]NumericalSplitThresholds[ 0.11360833, 0.124133974, 0.052999895 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.7551020408163265, 0.9428571428571428, 0.2, 0.4 ]SplitGains[ 77.13147621155588, 7.7083423229054375, 12.69490786605904 ]NumberOfLeaves4NumberOfNodes33Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 50, 3873, 103 ]NumericalSplitThresholds[ 0.11360833, 0.1042572, 0.10190575 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5632183908045977, 0.9259259259259259, 0.2727272727272727, -1 ]SplitGains[ 56.28925136966692, 8.842590132649804, 6.330286019780814 ]NumberOfLeaves4NumberOfNodes3"
index,value
0,"Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 103, 5857 ]NumericalSplitThresholds[ 0.057353932, 0.10190575, 9.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5764705882352941, 0.9047619047619048, -1, 0.3333333333333333 ]SplitGains[ 43.45517488043252, 8.790652661395029, 11.409432962374133 ]NumberOfLeaves4NumberOfNodes3"
,

index,value
,
,
,
,
0,"Microsoft.ML.Trainers.FastTree.QuantileRegressionTreeLeftChild[ 1, 2, -1 ]RightChild[ -2, -3, -4 ]NumericalSplitFeatureIndexes[ 1353, 103, 5857 ]NumericalSplitThresholds[ 0.057353932, 0.10190575, 9.5 ]CategoricalSplitFlags[ False, False, False ]LeafValues[ -0.5764705882352941, 0.9047619047619048, -1, 0.3333333333333333 ]SplitGains[ 43.45517488043252, 8.790652661395029, 11.409432962374133 ]NumberOfLeaves4NumberOfNodes3"
,
LeftChild,"[ 1, 2, -1 ]"
RightChild,"[ -2, -3, -4 ]"
NumericalSplitFeatureIndexes,"[ 1353, 103, 5857 ]"
NumericalSplitThresholds,"[ 0.057353932, 0.10190575, 9.5 ]"

Unnamed: 0,Unnamed: 1
LeftChild,"[ 1, 2, -1 ]"
RightChild,"[ -2, -3, -4 ]"
NumericalSplitFeatureIndexes,"[ 1353, 103, 5857 ]"
NumericalSplitThresholds,"[ 0.057353932, 0.10190575, 9.5 ]"
CategoricalSplitFlags,"[ False, False, False ]"
LeafValues,"[ -0.5764705882352941, 0.9047619047619048, -1, 0.3333333333333333 ]"
SplitGains,"[ 43.45517488043252, 8.790652661395029, 11.409432962374133 ]"
NumberOfLeaves,4
NumberOfNodes,3

Unnamed: 0,Unnamed: 1
LeftChild,"[ 1, 2, -1 ]"
RightChild,"[ -2, -3, -4 ]"
NumericalSplitFeatureIndexes,"[ 1353, 106, 5850 ]"
NumericalSplitThresholds,"[ 0.057353932, 0.1075328, 0.5 ]"
CategoricalSplitFlags,"[ False, False, False ]"
LeafValues,"[ -0.24031007751937986, 0.9047619047619048, -0.9245283018867925, -0.8333333333333334 ]"
SplitGains,"[ 39.1577375797019, 11.173417128979487, 12.302719747733544 ]"
NumberOfLeaves,4
NumberOfNodes,3

Unnamed: 0,Unnamed: 1
LeftChild,"[ 1, 2, -1 ]"
RightChild,"[ -2, -3, -4 ]"
NumericalSplitFeatureIndexes,"[ 50, 566, 628 ]"
NumericalSplitThresholds,"[ 0.11360833, 0.124133974, 0.052999895 ]"
CategoricalSplitFlags,"[ False, False, False ]"
LeafValues,"[ -0.7551020408163265, 0.9428571428571428, 0.2, 0.4 ]"
SplitGains,"[ 77.13147621155588, 7.7083423229054375, 12.69490786605904 ]"
NumberOfLeaves,4
NumberOfNodes,3

Unnamed: 0,Unnamed: 1
LeftChild,"[ 1, 2, -1 ]"
RightChild,"[ -2, -3, -4 ]"
NumericalSplitFeatureIndexes,"[ 50, 3873, 103 ]"
NumericalSplitThresholds,"[ 0.11360833, 0.1042572, 0.10190575 ]"
CategoricalSplitFlags,"[ False, False, False ]"
LeafValues,"[ -0.5632183908045977, 0.9259259259259259, 0.2727272727272727, -1 ]"
SplitGains,"[ 56.28925136966692, 8.842590132649804, 6.330286019780814 ]"
NumberOfLeaves,4
NumberOfNodes,3


Looks like a random forest with 7 shallow trees. Let's see a bit more details.

In [25]:
Console.WriteLine("Random Forest with the following trees:");

int index = 0;
foreach (var tree in forestModel.TrainedTreeEnsemble.Trees) {
    Console.WriteLine($"\tTree {index} with {tree.NumberOfLeaves} leaves and {tree.NumberOfNodes} nodes and a weight of {forestModel.TrainedTreeEnsemble.TreeWeights[index++]}.");
    Console.WriteLine($"\t\tMost important feature indexes: {string.Join(", ", tree.NumericalSplitFeatureIndexes)}");
    Console.WriteLine($"\t\tMost important feature thresholds: {string.Join(", ", tree.NumericalSplitThresholds)}");
}

Random Forest with the following trees:
	Tree 0 with 4 leaves and 3 nodes and a weight of 1.
		Most important feature indexes: 1353, 103, 5857
		Most important feature thresholds: 0.057353932, 0.10190575, 9.5
	Tree 1 with 4 leaves and 3 nodes and a weight of 1.
		Most important feature indexes: 1353, 106, 5850
		Most important feature thresholds: 0.057353932, 0.1075328, 0.5
	Tree 2 with 4 leaves and 3 nodes and a weight of 1.
		Most important feature indexes: 50, 566, 628
		Most important feature thresholds: 0.11360833, 0.124133974, 0.052999895
	Tree 3 with 4 leaves and 3 nodes and a weight of 1.
		Most important feature indexes: 50, 3873, 103
		Most important feature thresholds: 0.11360833, 0.1042572, 0.10190575


Unfortunately, the Ngram extraction makes it hard to determine what each index relates to, but at least we can see where there's feature overlap and similar thresholds.

In [26]:
modelTracker.Register("Simple AutoML - 30% Test Split", results.BestRun.ValidationMetrics).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557


## Exploring the Effect of Multiple Trial Runs on AutoML

I have a much better trust of the accuracy of these metrics than the other ones and it oddly has perfect precision despite still focusing on the F1 score. So many trees and forests in the trainers list, which makes me think that we're overfitting and preferring those models or we're not giving enough time for the other trainers to converge on good solutions.

Let's evaluate the time aspect by seeing how the best metric improves over many trials.

In [28]:
MLContext context = new(seed: seed) {
    GpuDeviceId = 0,
    FallbackToCpu = true,
};
ContextMonitor monitor = context.Monitor();

// Run the experiment - simplest one we'll do here, but let's just look at the simple options first
BinaryExperimentSettings settings = new() {
    MaxModels = 100,
    OptimizingMetric = BinaryClassificationMetric.F1Score,
};

var results = context.Auto().CreateBinaryClassificationExperiment(settings).Execute(split.TrainSet, split.TestSet);

// ML.NET didn't have any sort of learning rate chart built-in so I built something to collect and chart the metrics myself.
MLCharts.MetricImprovement(monitor)

In [29]:
MLCharts.MetricImprovementWithTrials(monitor)

In [30]:
modelTracker.Register("Simple AutoML - Additional Training Time", results.BestRun.ValidationMetrics).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - Additional Training Time,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365


Here it looks like things are pretty stable at the observed metric. We can probably get bits of additional performance from expanding our trials significantly, but progress looks to be more or less random and we can get better tuning with more manual control over hyperparameter tuning later. For now, this does illustrate that 10 - 20 trials is significant to get a good ballpark impression of a pipeline's basic performance.

## Cross-validation


Next, let's examine cross validation of the same experiment to see what it tells us about the level of confidence we can have in our test metrics.

In [31]:
MLContext context = new(seed: seed) {
    GpuDeviceId = 0,
    FallbackToCpu = true,
};

BinaryExperimentSettings settings = new BinaryExperimentSettings() {
    MaxModels = 10,
    OptimizingMetric = BinaryClassificationMetric.F1Score,
};

var results = context.Auto().CreateBinaryClassificationExperiment(settings)
                            .Execute(split.TrainSet, numberOfCVFolds: 5, labelColumnName: "Label");

// Cross Validation results are a bit different since they carry metrics per fold per result
results

In [32]:
// Let's start by averaging the overall F1 scores of each model considered against all of its folds
results.RunDetails.OrderByDescending(r => r.Results.Max(v => v.ValidationMetrics.F1Score))
                  .Select(r => r.TrainerName + ": " + r.Results.Average(v => v.ValidationMetrics.F1Score) + Environment.NewLine)

In [33]:
var model = results.BestRun.Results.MaxBy(r => r.ValidationMetrics.F1Score).Model;
model

In [34]:
// Now let's see the confusion matrix against our validation data
var evalMetrics = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label");

MLCharts.ClassificationReport(evalMetrics)

In [35]:
modelTracker.Register("Simple AutoML - Cross Validation", results.BestRun.Results.First().ValidationMetrics).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - Additional Training Time,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
3,Simple AutoML - Cross Validation,0.25,0.7096774193548387,0.5,0.1666666666666666,0.7321428571428571,0.9318181818181818,0.7436868686868687,0.5096252184860618


Okay, so this is probably more of a reliable picture of our precision and overall F1 score.

It looks like we're seeing a lot of trees, forests, and light GBM and all of them share a common set of pipeline transformation steps.

Now that we've seen the base level of accuracy from simple AutoML and how its transformations work, let's get some more control over the model training pipeline and see if we can get results for some non-tree or forest model trainers.

## Building a Custom Pipeline with the AutoML Featurizer

In ML.NET everything flows through a pipeline, much like a SciKit-Learn pipeline, that progressively transforms data from one state to another. You can use pipelines with a specific model trainer or you can use them with AutoML. When you involve AutoML, AutoML still selects a model it feels will perform best for you, but it uses the pipeline you give it. It also exposes more options for configuring AutoML's behavior, including model selection and hyperparameter tuning.

In this section we'll create a pipeline and use AutoML to determine the best models from it. We'll also see how it compares to our simple AutoML models from earlier.

We'll start by creating a simple AutoML featurizer and feed it schema information from our DataFrame. This will help it know how to handle the columns it works with.

In [27]:
var colTypes = df.GetColumnTypes(excludedColumns: new[] { "Label", "Source" });
colTypes

Unnamed: 0,Unnamed: 1
Text,[ Message ]
Numeric,"[ WorkItems, TotalFiles, ModifiedFiles, AddedFiles, DeletedFiles, TotalLines, NetLines, AddedLines, DeletedLines, MessageLength, WordCount ]"
Categorical,"[ IsMerge, HasAddedFiles, HasDeletedFiles ]"
Excluded,"[ Label, Source ]"


In [28]:
// This featurizer will trigger one-hot encoding and text featurization and handle column concatenation down to a single features column for us
SweepablePipeline featurizer = context.Auto().Featurizer(df, 
                                           catelogicalColumns: colTypes.Categorical.ToArray(), 
                                           numericColumns: colTypes.Numeric.ToArray(),
                                           textColumns: colTypes.Text.ToArray(), 
                                           excludeColumns: colTypes.Excluded.ToArray());

In [29]:
// The classifier step tells AutoML what model trainers are enabled. We'll focus on those that don't require scaled data for simplicity at the moment
var classifier = context.Auto().BinaryClassification(
    useFastForest: true, 
    useLgbm: true, 
    useFastTree: true, 
    useLbfgsLogisticRegression: false, 
    useSdcaLogisticRegression: false);

In [30]:
// From here on out we'll be using cross-validation on 90% of the data with 10% held out for final validation metrics
var split = context.Data.TrainTestSplit(dfNoSource, testFraction: 0.2, seed: seed);

In [31]:
MLContext context = new(seed: seed) {
    GpuDeviceId = 0,
    FallbackToCpu = true,
};

// Now let's run our experiment using our custom pipeline
var experiment = context.Auto().CreateExperiment()
    .SetPipeline(featurizer.Append(classifier))
    .SetDataset(split.TrainSet, fold: 5) // Cross validation on the training split
    .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label")
    .SetMaxModelToExplore(10);

TrialResult result = await experiment.RunAsync();

// Generate metrics using our validation set
ITransformer model = result.Model;
var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label");

// Let's see how it performed
MLCharts.ClassificationReport(evalResults)

F1 Score during training: 0.580711900958139


In [32]:
#!transformer-vis model -d 1 -n

model

In [33]:
var enumTransformer = ((IEnumerable<Microsoft.ML.ITransformer>) model);
var textTransformer = enumTransformer.ToList()[2]; 
#!transformer-vis textTransformer -d 2

In [34]:
var parameter = result.TrialSettings.Parameter;

foreach (var key in parameter.Keys.Where(k => k[0] != '_')) {
    Console.WriteLine($"{key}: {parameter[key]}");
}

e0: {"OutputColumnNames":["WorkItems","TotalFiles","ModifiedFiles","AddedFiles","DeletedFiles","TotalLines","NetLines","AddedLines","DeletedLines","MessageLength","WordCount"],"InputColumnNames":["WorkItems","TotalFiles","ModifiedFiles","AddedFiles","DeletedFiles","TotalLines","NetLines","AddedLines","DeletedLines","MessageLength","WordCount"]}
e1: {"OutputColumnNames":["IsMerge","HasAddedFiles","HasDeletedFiles"],"InputColumnNames":["IsMerge","HasAddedFiles","HasDeletedFiles"]}
e2: {"OutputColumnNames":["IsMerge","HasAddedFiles","HasDeletedFiles"],"InputColumnNames":["IsMerge","HasAddedFiles","HasDeletedFiles"]}
e3: {"InputColumnName":"Message","OutputColumnName":"Message"}
e4: {"InputColumnNames":["Message","WorkItems","TotalFiles","ModifiedFiles","AddedFiles","DeletedFiles","TotalLines","NetLines","AddedLines","DeletedLines","MessageLength","WordCount","IsMerge","HasAddedFiles","HasDeletedFiles"],"OutputColumnName":"Features"}
e5: {"NumberOfLeaves":4,"MinimumExampleCountPerLeaf":20,

In [35]:
// Save the model
context.Model.Save(model, ((IDataView)df).Schema, $"models/TextFeaturizerAuto.zip");

// Record the model
modelTracker.Register("TextFeaturizerAuto", evalResults).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,TextFeaturizerAuto,0.6530612244897959,0.8191489361702128,0.9411764705882352,0.5,0.7922077922077922,0.9838709677419356,0.828125,0.8101232634508131


## Custom Pipeline with Custom Text Processing

In [47]:
// Standardize our numeric colums via scaling and imputing missing values
MissingValueReplacingEstimator imputer = context.Transforms.ReplaceMissingValues(columns: colTypes.Numeric.Select(c => new InputOutputColumnPair(c, c)).ToArray(), replacementMode: MissingValueReplacingEstimator.ReplacementMode.DefaultValue);
NormalizingEstimator scaler = context.Transforms.NormalizeRobustScaling(columns: colTypes.Numeric.Select(c => new InputOutputColumnPair(c, c)).ToArray());

// Standardize our boolean columns as singles
TypeConvertingEstimator boolConverter = context.Transforms.Conversion.ConvertType(columns: colTypes.Categorical.Select(c => new InputOutputColumnPair(c, c)).ToArray(), outputKind: DataKind.Single);
    
// Text pre-processing
TextNormalizingEstimator textNormalizer = context.Transforms.Text.NormalizeText(inputColumnName: "Message", outputColumnName: "Message", caseMode: TextNormalizingEstimator.CaseMode.Lower, keepDiacritics: false, keepPunctuations: false, keepNumbers: false);

// Word trigrams / bigrams / unigrams
WordTokenizingEstimator wordTokenizer = context.Transforms.Text.TokenizeIntoWords(inputColumnName: "Message", outputColumnName: "MessageWords");
StopWordsRemovingEstimator stopRemover = context.Transforms.Text.RemoveDefaultStopWords(inputColumnName: "MessageWords", outputColumnName: "MessageWords", language: StopWordsRemovingEstimator.Language.English);
ValueToKeyMappingEstimator labelConverter = context.Transforms.Conversion.MapValueToKey(inputColumnName: "MessageWords", outputColumnName: "MessageWords");
NgramExtractingEstimator ngramExtractor = context.Transforms.Text.ProduceNgrams(inputColumnName: "MessageWords", outputColumnName: "MessageWords", ngramLength: 3, useAllLengths: true, weighting: NgramExtractingEstimator.WeightingCriteria.TfIdf);
LpNormNormalizingEstimator wordNorm = context.Transforms.NormalizeLpNorm(inputColumnName: "MessageWords", outputColumnName: "MessageWords", norm: LpNormNormalizingEstimator.NormFunction.L2);

// Character ngrams
TokenizingByCharactersEstimator charTokenizer = context.Transforms.Text.TokenizeIntoCharactersAsKeys(inputColumnName: "Message", outputColumnName: "MessageChars");
NgramExtractingEstimator charNgram = context.Transforms.Text.ProduceNgrams(inputColumnName: "MessageChars", outputColumnName: "MessageChars", ngramLength: 3, useAllLengths: true, skipLength: 1);
LpNormNormalizingEstimator charNorm = context.Transforms.NormalizeLpNorm(inputColumnName: "MessageChars", outputColumnName: "MessageChars", norm: LpNormNormalizingEstimator.NormFunction.L2);

// We'll concatenate the word and Ngram features together, along with all of our numeric and boolean columns (Note: this does not concatenate the Source column. This column is for model accuracy/fairness comparison only)
ColumnConcatenatingEstimator concat = context.Transforms.Concatenate("Features", inputColumnNames: colTypes.Numeric.Concat(colTypes.Categorical).Concat(new[] { "MessageWords", "MessageChars"}).ToArray());

// Build a common base pipeline without the classifier
var basePipeline = imputer
    .Append(boolConverter)
    .Append(scaler)
    .Append(textNormalizer)
    .Append(wordTokenizer)
    .Append(stopRemover)
    .Append(labelConverter)
    .Append(ngramExtractor)
    .Append(wordNorm)
    .Append(charTokenizer)
    .Append(charNgram)
    .Append(charNorm)
    .Append(concat);

In [37]:
// Now that we have scaling in place, let's use all available classifiers and see what we get
var classifier = context.Auto().BinaryClassification(
    useFastForest: true, 
    useLgbm: true, 
    useFastTree: true, 
    useLbfgsLogisticRegression: true, 
    useSdcaLogisticRegression: true);

// Build a pipeline with the classifier appended
SweepablePipeline pipeline = basePipeline
    .Append(classifier);

// Now let's run our experiment using our custom pipeline
MLContext context = new(seed: seed) {
    GpuDeviceId = 0,
    FallbackToCpu = true,
};
var experiment = context.Auto().CreateExperiment()
    .SetPipeline(pipeline)
    .SetDataset(split.TrainSet, fold: 5) // Cross-validation using 90% of the data
    .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label")
    .SetMaxModelToExplore(10);

var result = await experiment.RunAsync();

Console.WriteLine($"F1 Score during training: {result.Metric}");

// Let's see how it performed
ITransformer model = result.Model;
var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label");
MLCharts.ClassificationReport(evalResults)

F1 Score during training: 0.637655674855002


In [38]:
#!transformer-vis model -d 1

model

In [40]:
var parameter = result.TrialSettings.Parameter;

foreach (var key in parameter.Keys.Where(k => k[0] != '_')) {
    Console.WriteLine($"{key}: {parameter[key]}");
}

e0: {}
e1: {"NumberOfLeaves":4,"MinimumExampleCountPerLeaf":20,"NumberOfTrees":4,"MaximumBinCountPerFeature":255,"FeatureFraction":1,"LearningRate":0.09999999999999998,"LabelColumnName":"Label","FeatureColumnName":"Features","DiskTranspose":false}
e2: {"NumberOfTrees":4,"NumberOfLeaves":4,"FeatureFraction":1,"LabelColumnName":"Label","FeatureColumnName":"Features"}
e3: {"NumberOfLeaves":4,"MinimumExampleCountPerLeaf":20,"LearningRate":1,"NumberOfTrees":4,"SubsampleFraction":1,"MaximumBinCountPerFeature":255,"FeatureFraction":1,"L1Regularization":2E-10,"L2Regularization":1,"LabelColumnName":"Label","FeatureColumnName":"Features"}
e4: {"L1Regularization":1,"L2Regularization":1,"LabelColumnName":"Label","FeatureColumnName":"Features"}
e5: {"L1Regularization":1,"L2Regularization":0.1,"LabelColumnName":"Label","FeatureColumnName":"Features"}


In [39]:
// Record the model
modelTracker.Register("CustomPipelineAuto", evalResults).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,TextFeaturizerAuto,0.6530612244897959,0.8191489361702128,0.9411764705882352,0.5,0.7922077922077922,0.9838709677419356,0.828125,0.8101232634508131
3,CustomPipelineAuto,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.8001512096774194,0.7660228021913676


## Using the Pipeline for Specific Model Trainers

In [43]:
var split = context.Data.TrainTestSplit(df, testFraction: 0.2, seed: seed);

### Fast Tree (Decision Tree)

In [45]:
// Now that we have scaling in place, let's use all available classifiers and see what we get
var classifier = context.Auto().BinaryClassification(
    useFastForest: false, 
    useLgbm: false, 
    useFastTree: true, 
    useLbfgsLogisticRegression: false, 
    useSdcaLogisticRegression: false);

// Build a pipeline with the classifier appended
SweepablePipeline pipeline = basePipeline
    .Append(classifier);

// Now let's run our experiment using our custom pipeline
MLContext context = new(seed: seed) {
    GpuDeviceId = 0,
    FallbackToCpu = true,
};
var experiment = context.Auto().CreateExperiment()
    .SetPipeline(pipeline)
    .SetDataset(split.TrainSet, fold: 5) // Cross-validation using 90% of the data
    .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label")
    .SetMaxModelToExplore(10);

var result = await experiment.RunAsync();

// Let's see how it performed
ITransformer model = result.Model;
var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label");
MLCharts.ClassificationReport(evalResults)

In [51]:
// Record the model
modelTracker.Register("CustomPipeline - Fast Tree", evalResults).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - Additional Training Time,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
3,Simple AutoML - Cross Validation,0.25,0.7096774193548387,0.5,0.1666666666666666,0.7321428571428571,0.9318181818181818,0.7436868686868687,0.5096252184860618
4,TextFeaturizerAuto,0.6530612244897959,0.8191489361702128,0.9411764705882352,0.5,0.7922077922077922,0.9838709677419356,0.828125,0.8101232634508131
5,CustomPipelineAuto,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.8001512096774194,0.7660228021913676
6,CustomPipeline - Fast Tree,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653


### Fast Forest (Random Forest)

In [52]:
MLContext context = new(seed: seed) {
    GpuDeviceId = 0,
    FallbackToCpu = true,
};

using (ContextMonitor monitor = context.Monitor()) {
    // Now that we have scaling in place, let's use all available classifiers and see what we get
    var classifier = context.Auto().BinaryClassification(
        useFastForest: true, 
        useLgbm: false, 
        useFastTree: false, 
        useLbfgsLogisticRegression: false, 
        useSdcaLogisticRegression: false);

    // Build a pipeline with the classifier appended
    SweepablePipeline pipeline = basePipeline
        .Append(classifier);

    var experiment = context.Auto().CreateExperiment()
        .SetPipeline(pipeline)
        .SetDataset(split.TrainSet, fold: 5) // Cross-validation using 90% of the data
        .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label")
        .SetMaxModelToExplore(10);

    var result = await experiment.RunAsync();

    Console.WriteLine($"F1 Score during training: {result.Metric}");

    // Let's see how it performed
    ITransformer model = result.Model;
    var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label");
    MLCharts.ClassificationReport(evalResults).Display();

    Console.WriteLine("Hyperparameters:");
    foreach (var kvp in monitor.BestTrial.Hyperparameters) {
        Console.WriteLine($"{kvp.Key}: {kvp.Value}");
    }
}

F1 Score during training: 0.5963530778164924


Hyperparameters:
NumberOfTrees: 4
NumberOfLeaves: 10
FeatureFraction: 0.81887907
LabelColumnName: Label
FeatureColumnName: Features


In [53]:
// Record the model
modelTracker.Register("CustomPipeline - Fast Forest", evalResults).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - Additional Training Time,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
3,Simple AutoML - Cross Validation,0.25,0.7096774193548387,0.5,0.1666666666666666,0.7321428571428571,0.9318181818181818,0.7436868686868687,0.5096252184860618
4,TextFeaturizerAuto,0.6530612244897959,0.8191489361702128,0.9411764705882352,0.5,0.7922077922077922,0.9838709677419356,0.828125,0.8101232634508131
5,CustomPipelineAuto,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.8001512096774194,0.7660228021913676
6,CustomPipeline - Fast Tree,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
7,CustomPipeline - Fast Forest,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653


### LGBM (Logistic Regression Gradient Boosted)

In [54]:
// Now that we have scaling in place, let's use all available classifiers and see what we get
var classifier = context.Auto().BinaryClassification(
    useFastForest: false, 
    useLgbm: true, 
    useFastTree: false, 
    useLbfgsLogisticRegression: false, 
    useSdcaLogisticRegression: false);

// Build a pipeline with the classifier appended
SweepablePipeline pipeline = basePipeline
    .Append(classifier);

// Now let's run our experiment using our custom pipeline
MLContext context = new(seed: seed) {
    GpuDeviceId = 0,
    FallbackToCpu = true,
};
var experiment = context.Auto().CreateExperiment()
    .SetPipeline(pipeline)
    .SetDataset(split.TrainSet, fold: 5) // Cross-validation using 90% of the data
    .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label")
    .SetMaxModelToExplore(10);

var result = await experiment.RunAsync();

Console.WriteLine($"F1 Score during training: {result.Metric}");

// Let's see how it performed
ITransformer model = result.Model;
var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label");
MLCharts.ClassificationReport(evalResults)

F1 Score during training: 0.6066192445547285


This model notably took significantly longer to train versus other models

In [55]:
// Record the model
modelTracker.Register("CustomPipeline - LGBM", evalResults).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - Additional Training Time,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
3,Simple AutoML - Cross Validation,0.25,0.7096774193548387,0.5,0.1666666666666666,0.7321428571428571,0.9318181818181818,0.7436868686868687,0.5096252184860618
4,TextFeaturizerAuto,0.6530612244897959,0.8191489361702128,0.9411764705882352,0.5,0.7922077922077922,0.9838709677419356,0.828125,0.8101232634508131
5,CustomPipelineAuto,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.8001512096774194,0.7660228021913676
6,CustomPipeline - Fast Tree,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
7,CustomPipeline - Fast Forest,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
8,CustomPipeline - LGBM,0.6451612903225806,0.7659574468085106,0.6666666666666666,0.625,0.8125,0.8387096774193549,0.7429435483870968,0.7034998598523355


### LBFGS Logistic Regression

In [56]:
// Now that we have scaling in place, let's use all available classifiers and see what we get
var classifier = context.Auto().BinaryClassification(
    useFastForest: false, 
    useLgbm: false, 
    useFastTree: false, 
    useLbfgsLogisticRegression: true, 
    useSdcaLogisticRegression: false);

// Build a pipeline with the classifier appended
SweepablePipeline pipeline = basePipeline
    .Append(classifier);

// Now let's run our experiment using our custom pipeline
MLContext context = new(seed: seed) {
    GpuDeviceId = 0,
    FallbackToCpu = true,
};
var experiment = context.Auto().CreateExperiment()
    .SetPipeline(pipeline)
    .SetDataset(split.TrainSet, fold: 5) // Cross-validation using 90% of the data
    .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label")
    .SetMaxModelToExplore(10);

var result = await experiment.RunAsync();

Console.WriteLine($"F1 Score during training: {result.Metric}");

// Let's see how it performed
ITransformer model = result.Model;
var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label");
MLCharts.ClassificationReport(evalResults)

F1 Score during training: 0.5902606525020319


In [57]:
// Record the model
modelTracker.Register("CustomPipeline - LBFGS Logistic Regression", evalResults).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - Additional Training Time,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
3,Simple AutoML - Cross Validation,0.25,0.7096774193548387,0.5,0.1666666666666666,0.7321428571428571,0.9318181818181818,0.7436868686868687,0.5096252184860618
4,TextFeaturizerAuto,0.6530612244897959,0.8191489361702128,0.9411764705882352,0.5,0.7922077922077922,0.9838709677419356,0.828125,0.8101232634508131
5,CustomPipelineAuto,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.8001512096774194,0.7660228021913676
6,CustomPipeline - Fast Tree,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
7,CustomPipeline - Fast Forest,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
8,CustomPipeline - LGBM,0.6451612903225806,0.7659574468085106,0.6666666666666666,0.625,0.8125,0.8387096774193549,0.7429435483870968,0.7034998598523355
9,CustomPipeline - LBFGS Logistic Regression,0.6296296296296297,0.7872340425531915,0.7727272727272727,0.53125,0.7916666666666666,0.9193548387096774,0.7772177419354839,0.7458687196208998


### Stochastic Dual Coordinate Ascent Logistic Regression

In [58]:
// Now that we have scaling in place, let's use all available classifiers and see what we get
var classifier = context.Auto().BinaryClassification(
    useFastForest: false, 
    useLgbm: false, 
    useFastTree: false, 
    useLbfgsLogisticRegression: false, 
    useSdcaLogisticRegression: true);

// Build a pipeline with the classifier appended
SweepablePipeline pipeline = basePipeline
    .Append(classifier);

// Now let's run our experiment using our custom pipeline
MLContext context = new(seed: seed) {
    GpuDeviceId = 0,
    FallbackToCpu = true,
};
var experiment = context.Auto().CreateExperiment()
    .SetPipeline(pipeline)
    .SetDataset(split.TrainSet, fold: 5) // Cross-validation using 90% of the data
    .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label")
    .SetMaxModelToExplore(10);

var result = await experiment.RunAsync();

Console.WriteLine($"F1 Score during training: {result.Metric}");

// Let's see how it performed
ITransformer model = result.Model;
var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label");
MLCharts.ClassificationReport(evalResults)

F1 Score during training: 0


In [59]:
// Record the model
modelTracker.Register("CustomPipeline - SDCA Logistic Regression", evalResults).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - Additional Training Time,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
3,Simple AutoML - Cross Validation,0.25,0.7096774193548387,0.5,0.1666666666666666,0.7321428571428571,0.9318181818181818,0.7436868686868687,0.5096252184860618
4,TextFeaturizerAuto,0.6530612244897959,0.8191489361702128,0.9411764705882352,0.5,0.7922077922077922,0.9838709677419356,0.828125,0.8101232634508131
5,CustomPipelineAuto,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.8001512096774194,0.7660228021913676
6,CustomPipeline - Fast Tree,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
7,CustomPipeline - Fast Forest,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
8,CustomPipeline - LGBM,0.6451612903225806,0.7659574468085106,0.6666666666666666,0.625,0.8125,0.8387096774193549,0.7429435483870968,0.7034998598523355
9,CustomPipeline - LBFGS Logistic Regression,0.6296296296296297,0.7872340425531915,0.7727272727272727,0.53125,0.7916666666666666,0.9193548387096774,0.7772177419354839,0.7458687196208998


## Other Models Considered

### Initial Phi-3 Classifier

In [60]:
modelTracker.Register("Phi-3 LLM Classifier", 140, 5, 106, 249).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - Additional Training Time,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
3,Simple AutoML - Cross Validation,0.25,0.7096774193548387,0.5,0.1666666666666666,0.7321428571428571,0.9318181818181818,0.7436868686868687,0.5096252184860618
4,TextFeaturizerAuto,0.6530612244897959,0.8191489361702128,0.9411764705882352,0.5,0.7922077922077922,0.9838709677419356,0.828125,0.8101232634508131
5,CustomPipelineAuto,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.8001512096774194,0.7660228021913676
6,CustomPipeline - Fast Tree,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
7,CustomPipeline - Fast Forest,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
8,CustomPipeline - LGBM,0.6451612903225806,0.7659574468085106,0.6666666666666666,0.625,0.8125,0.8387096774193549,0.7429435483870968,0.7034998598523355
9,CustomPipeline - LBFGS Logistic Regression,0.6296296296296297,0.7872340425531915,0.7727272727272727,0.53125,0.7916666666666666,0.9193548387096774,0.7772177419354839,0.7458687196208998


### Fine-Tuned Roberta Model

In [61]:
modelTracker.Register("Fine-Tuned Roberta", 8, 24, 3, 59).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - Additional Training Time,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
3,Simple AutoML - Cross Validation,0.25,0.7096774193548387,0.5,0.1666666666666666,0.7321428571428571,0.9318181818181818,0.7436868686868687,0.5096252184860618
4,TextFeaturizerAuto,0.6530612244897959,0.8191489361702128,0.9411764705882352,0.5,0.7922077922077922,0.9838709677419356,0.828125,0.8101232634508131
5,CustomPipelineAuto,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.8001512096774194,0.7660228021913676
6,CustomPipeline - Fast Tree,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
7,CustomPipeline - Fast Forest,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
8,CustomPipeline - LGBM,0.6451612903225806,0.7659574468085106,0.6666666666666666,0.625,0.8125,0.8387096774193549,0.7429435483870968,0.7034998598523355
9,CustomPipeline - LBFGS Logistic Regression,0.6296296296296297,0.7872340425531915,0.7727272727272727,0.53125,0.7916666666666666,0.9193548387096774,0.7772177419354839,0.7458687196208998


### Azure Machine Learning Automated ML

In [62]:
modelTracker.Register("Azure ML Studio Automated ML", 63, 82, 4, 350).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - Additional Training Time,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
3,Simple AutoML - Cross Validation,0.25,0.7096774193548387,0.5,0.1666666666666666,0.7321428571428571,0.9318181818181818,0.7436868686868687,0.5096252184860618
4,TextFeaturizerAuto,0.6530612244897959,0.8191489361702128,0.9411764705882352,0.5,0.7922077922077922,0.9838709677419356,0.828125,0.8101232634508131
5,CustomPipelineAuto,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.8001512096774194,0.7660228021913676
6,CustomPipeline - Fast Tree,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
7,CustomPipeline - Fast Forest,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
8,CustomPipeline - LGBM,0.6451612903225806,0.7659574468085106,0.6666666666666666,0.625,0.8125,0.8387096774193549,0.7429435483870968,0.7034998598523355
9,CustomPipeline - LBFGS Logistic Regression,0.6296296296296297,0.7872340425531915,0.7727272727272727,0.53125,0.7916666666666666,0.9193548387096774,0.7772177419354839,0.7458687196208998


In [63]:
modelTracker.Register("Azure ML Studio Automated ML w. Deep Learning", 18, 26, 1, 105).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - Additional Training Time,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
3,Simple AutoML - Cross Validation,0.25,0.7096774193548387,0.5,0.1666666666666666,0.7321428571428571,0.9318181818181818,0.7436868686868687,0.5096252184860618
4,TextFeaturizerAuto,0.6530612244897959,0.8191489361702128,0.9411764705882352,0.5,0.7922077922077922,0.9838709677419356,0.828125,0.8101232634508131
5,CustomPipelineAuto,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.8001512096774194,0.7660228021913676
6,CustomPipeline - Fast Tree,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
7,CustomPipeline - Fast Forest,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
8,CustomPipeline - LGBM,0.6451612903225806,0.7659574468085106,0.6666666666666666,0.625,0.8125,0.8387096774193549,0.7429435483870968,0.7034998598523355
9,CustomPipeline - LBFGS Logistic Regression,0.6296296296296297,0.7872340425531915,0.7727272727272727,0.53125,0.7916666666666666,0.9193548387096774,0.7772177419354839,0.7458687196208998


### SciKit-Learn Models

In [64]:
var dfSkLearn = DataFrame.LoadCsv("data/sklearn_models.csv", separator: ',', header: true);
dfSkLearn

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,SciKit-Learn RandomForest,0.11655914,0.72743434,1.0,0.062068965,0.722449,1.0,0.79256225,0.72843164
1,SciKit-Learn SVC,0.5151479,0.80357575,0.89830506,0.36551723,0.7909091,0.9830508,0.8224089,0.7444055
2,SciKit-Learn LogisticRegression,0.21262464,0.74343437,0.9047619,0.13103448,0.7364017,0.99435025,0.79452163,0.6942291
3,SciKit-Learn MultinomialNB,0.42165935,0.7434949,0.60240966,0.3448276,0.77163464,0.90677965,0.7541789,0.61116856
4,SciKit-Learn KNeighbors,0.052473117,0.71741414,1.0,0.027586207,0.7151515,1.0,0.6781059,0.46135667
5,SciKit-Learn MLP,0.33737585,0.74949497,0.6923077,0.24827586,0.75615215,0.9548023,0.76464856,0.6409752


In [65]:
modelTracker.Merge(dfSkLearn).ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - Additional Training Time,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
3,Simple AutoML - Cross Validation,0.25,0.7096774193548387,0.5,0.1666666666666666,0.7321428571428571,0.9318181818181818,0.7436868686868687,0.5096252184860618
4,TextFeaturizerAuto,0.6530612244897959,0.8191489361702128,0.9411764705882352,0.5,0.7922077922077922,0.9838709677419356,0.828125,0.8101232634508131
5,CustomPipelineAuto,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.8001512096774194,0.7660228021913676
6,CustomPipeline - Fast Tree,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
7,CustomPipeline - Fast Forest,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
8,CustomPipeline - LGBM,0.6451612903225806,0.7659574468085106,0.6666666666666666,0.625,0.8125,0.8387096774193549,0.7429435483870968,0.7034998598523355
9,CustomPipeline - LBFGS Logistic Regression,0.6296296296296297,0.7872340425531915,0.7727272727272727,0.53125,0.7916666666666666,0.9193548387096774,0.7772177419354839,0.7458687196208998


## Model Decision-Making

In [66]:
var dfModels = modelTracker.ToDataFrame();

DataFrame.SaveCsv(dfModels, "data/InitialModelsEvaluated.csv", separator: ',', header: true);

dfModels

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
0,Simple AutoML - 10% Test Split,0.6399999999999999,0.7857142857142857,1.0,0.4705882352941176,0.7352941176470589,1.0,0.8211764705882353,0.8290542115376778
1,Simple AutoML - 30% Test Split,0.6478873239436619,0.8287671232876712,0.9583333333333334,0.4893617021276595,0.8032786885245902,0.98989898989899,0.8457984096281969,0.8049616280288557
2,Simple AutoML - Additional Training Time,0.6571428571428571,0.8356164383561644,1.0,0.4893617021276595,0.8048780487804879,1.0,0.8233397807865893,0.7960885789039365
3,Simple AutoML - Cross Validation,0.25,0.7096774193548387,0.5,0.1666666666666666,0.7321428571428571,0.9318181818181818,0.7436868686868687,0.5096252184860618
4,TextFeaturizerAuto,0.6530612244897959,0.8191489361702128,0.9411764705882352,0.5,0.7922077922077922,0.9838709677419356,0.828125,0.8101232634508131
5,CustomPipelineAuto,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.8001512096774194,0.7660228021913676
6,CustomPipeline - Fast Tree,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
7,CustomPipeline - Fast Forest,0.6551724137931033,0.7872340425531915,0.7307692307692307,0.59375,0.8088235294117647,0.8870967741935484,0.7920866935483871,0.7585989334375653
8,CustomPipeline - LGBM,0.6451612903225806,0.7659574468085106,0.6666666666666666,0.625,0.8125,0.8387096774193549,0.7429435483870968,0.7034998598523355
9,CustomPipeline - LBFGS Logistic Regression,0.6296296296296297,0.7872340425531915,0.7727272727272727,0.53125,0.7916666666666666,0.9193548387096774,0.7772177419354839,0.7458687196208998


Looks like my best models in terms of F1 Score are:

1. Phi-3 LLM
2. Fast Forest
3. Fast Tree
4. LGBM
5. LBFGS Logistic Regression

_Note: Simple AutoML models excluded from this list because they're just other instances of Fast Forest and Fast Tree models_

F1 Score is good for building a short list of models to consider, but Precision (specifically Positive Precision) is critical to identify models that will generate reliable positive predictions. I don't care as much about positive recall - partially because it impacts the F1 Score which I already care about, and partially because false negatives are better than false positives in this analysis scenario.

I don't want to use simple AutoML or TextFeaturizerAuto because they are likely to fixate on stop words or repository specific strings such as common URLs or maintainers. The custom pipeline improves on these things by obscuring that information, but the overall metrics suffer a little as a result.

When looking at Precision, and factoring in the custom pipeline model, the best high-precision models with high F1 Scores are:

1. Fast Forest
2. LBFGS Logistic Regression
3. Fast Tree
4. LGBM

I suspect that some of these models need additional training time and hyperparameter tuning to reach their peak values, so the remainder of this notebook will focus on developing final versions of these 4 models for comparison.

## Developing Final Models

### Fast Forest

In [67]:
SearchSpace<FastForestOption> searchSpace = new();
searchSpace["NumberOfTrees"] = new UniformIntOption(3, 12, defaultValue: 4);
searchSpace["NumberOfLeaves"] = new UniformIntOption(10, 100, defaultValue: 10);
searchSpace["FeatureFraction"] = new UniformDoubleOption(0.5, 0.9, defaultValue: 0.78);

using (ContextMonitor monitor = context.Monitor())
{ 
    SweepableEstimator estimator = context.Auto().CreateSweepableEstimator((context, options) => 
        context.BinaryClassification.Trainers.FastForest(new FastForestBinaryTrainer.Options() {
            NumberOfTrees = options.NumberOfTrees,
            NumberOfLeaves = options.NumberOfLeaves,
            FeatureFraction = options.FeatureFraction
        }), searchSpace);

    var result = await context.Auto().CreateExperiment()
        .SetPipeline(basePipeline.Append(estimator))
        .SetDataset(split.TrainSet, fold: 5) // Cross-validation using 90% of the data
        .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label", predictedColumn: "Label")
        .SetMaxModelToExplore(10)
        .SetEciCostFrugalTuner()
        .RunAsync();

    // Let's see how it performed
    ITransformer model = result.Model;
    var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label", predictedLabelColumnName: "Label");

    modelTracker.Register("Hyperparameter-Tuned FastForest", evalResults);
    var chart = MLCharts.ClassificationReport(evalResults);
    chart.Display();

    // Display the Hyperparameters for our best trial
    Console.WriteLine("Hyperparameters:\r\n");
    foreach (var kvp in monitor.BestTrial.Hyperparameters) {
        Console.WriteLine($"{kvp.Key}: {kvp.Value}");
    }

    MLCharts.MetricImprovementWithTrials(monitor).Display();

    // Calculate the permutation feature importance and list the top 10 features by their autogenerated names
    context.BinaryClassification.PermutationFeatureImportanceNonCalibrated(model, model.Transform(df.Sample(50)), permutationCount:1)
            .OrderByDescending(f => Math.Abs(f.Value.F1Score.Mean))
            .Where(f => Math.Abs(f.Value.F1Score.Mean) > 0)
            .Take(10)
            .Select(f => new {Feature=f.Key, Impact=f.Value.F1Score.Mean}).Display();    
}

Hyperparameters:

FeatureFraction: 0.7630911693650682
NumberOfLeaves: 10
NumberOfTrees: 3


index,value
,
,
,
0,"{ Feature = MessageChars.i|x, Impact = -0.16521739130434782 }FeatureMessageChars.i|xImpact-0.16521739130434782"
,
Feature,MessageChars.i|x
Impact,-0.16521739130434782
1,"{ Feature = MessageChars.b, Impact = 0.06666666666666665 }FeatureMessageChars.bImpact0.06666666666666665"
,
Feature,MessageChars.b

Unnamed: 0,Unnamed: 1
Feature,MessageChars.i|x
Impact,-0.16521739130434782

Unnamed: 0,Unnamed: 1
Feature,MessageChars.b
Impact,0.06666666666666665

Unnamed: 0,Unnamed: 1
Feature,MessageChars.<␠>|n|<␠>
Impact,-0.02857142857142847


### LBFGS Logistic Regression

In [68]:
SearchSpace<LbfgsOption> searchSpace = new();
searchSpace["L1Regularization"] = new UniformDoubleOption(0, 1, defaultValue: 0.07);
searchSpace["L2Regularization"] = new UniformDoubleOption(0, 1, defaultValue: 0.005);

using (ContextMonitor monitor = context.Monitor())
{ 
    SweepableEstimator estimator = context.Auto().CreateSweepableEstimator((context, options) => 
        context.BinaryClassification.Trainers.LbfgsLogisticRegression(new LbfgsLogisticRegressionBinaryTrainer.Options() {
            L1Regularization = options.L1Regularization,
            L2Regularization = options.L2Regularization
        }), searchSpace);        

    var result = await context.Auto().CreateExperiment()
        .SetPipeline(basePipeline.Append(estimator))
        .SetDataset(split.TrainSet, fold: 5) // Cross-validation using 90% of the data
        .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label", predictedColumn: "Label")
        .SetMaxModelToExplore(50)
        .SetEciCostFrugalTuner()
        .RunAsync();

    // Let's see how it performed
    ITransformer model = result.Model;
    var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label", predictedLabelColumnName: "Label");

    modelTracker.Register("Hyperparameter-Tuned Logistic Regression", evalResults).ToDataFrame();
    var chart = MLCharts.ClassificationReport(evalResults);
    chart.Display();

    // Display the Hyperparameters for our best trial
    Console.WriteLine("Hyperparameters:\r\n");
    foreach (var kvp in monitor.BestTrial.Hyperparameters) {
        Console.WriteLine($"{kvp.Key}: {kvp.Value}");
    }

    MLCharts.MetricImprovementWithTrials(monitor).Display();

    // Calculate the permutation feature importance and list the top 10 features by their autogenerated names
    context.BinaryClassification.PermutationFeatureImportanceNonCalibrated(model, model.Transform(df.Sample(50)), permutationCount:1)
            .OrderByDescending(f => Math.Abs(f.Value.F1Score.Mean))
            .Where(f => Math.Abs(f.Value.F1Score.Mean) > 0)
            .Take(10)
            .Select(f => new {Feature=f.Key, Impact=f.Value.F1Score.Mean}).Display();     
}

Hyperparameters:

L1Regularization: 0.03702414667604828
L2Regularization: 0


index,value
,
,
,
,
,
,
0,"{ Feature = MessageWords.fix, Impact = -0.0942857142857143 }FeatureMessageWords.fixImpact-0.0942857142857143"
,
Feature,MessageWords.fix
Impact,-0.0942857142857143

Unnamed: 0,Unnamed: 1
Feature,MessageWords.fix
Impact,-0.0942857142857143

Unnamed: 0,Unnamed: 1
Feature,MessageWords.bug
Impact,-0.07999999999999996

Unnamed: 0,Unnamed: 1
Feature,WordCount
Impact,-0.046666666666666745

Unnamed: 0,Unnamed: 1
Feature,MessageChars.r|o
Impact,-0.03384615384615386

Unnamed: 0,Unnamed: 1
Feature,MessageChars.s|s
Impact,-0.03384615384615386

Unnamed: 0,Unnamed: 1
Feature,MessageWords.polish|getcolumn
Impact,-0.03384615384615386


### Light GBM

In [69]:
public class HypertuningParameters {
    public int NumberOfLeaves { get; set; }
    public int NumberOfTrees { get; set; }
    public double LearningRate { get; set; }
    public int MinimumExampleCountPerLeaf { get; set; }
}

SearchSpace<HypertuningParameters> searchSpace = new();
searchSpace["NumberOfLeaves"] = new UniformIntOption(5, 30, defaultValue: 10);
searchSpace["NumberOfTrees"] = new UniformIntOption(1, 12, defaultValue: 3);
searchSpace["LearningRate"] = new UniformDoubleOption(0, 1, defaultValue: 0.5);
searchSpace["MinimumExampleCountPerLeaf"] = new UniformIntOption(0, 10, defaultValue: 5);

using (ContextMonitor monitor = context.Monitor())
{ 
    SweepableEstimator estimator = context.Auto().CreateSweepableEstimator((context, options) => 
        context.BinaryClassification.Trainers.LightGbm(new Microsoft.ML.Trainers.LightGbm.LightGbmBinaryTrainer.Options() {
            NumberOfLeaves = options.NumberOfLeaves,
            MinimumExampleCountPerLeaf = options.MinimumExampleCountPerLeaf,
            LearningRate = options.LearningRate,
            NumberOfIterations = options.NumberOfTrees
        }), searchSpace);        

    var result = await context.Auto().CreateExperiment()
        .SetPipeline(basePipeline.Append(estimator))
        .SetDataset(split.TrainSet, fold: 5) // Cross-validation using 90% of the data
        .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label", predictedColumn: "Label")
        .SetMaxModelToExplore(10)
        .SetEciCostFrugalTuner()
        .RunAsync();

    // Let's see how it performed
    ITransformer model = result.Model;
    var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label", predictedLabelColumnName: "Label");

    modelTracker.Register("Hyperparameter-Tuned Light GBM", evalResults).ToDataFrame();
    var chart = MLCharts.ClassificationReport(evalResults);
    chart.Display();

    // Display the Hyperparameters for our best trial
    Console.WriteLine("Hyperparameters:\r\n");
    foreach (var kvp in monitor.BestTrial.Hyperparameters) {
        Console.WriteLine($"{kvp.Key}: {kvp.Value}");
    }

    MLCharts.MetricImprovementWithTrials(monitor).Display();

    // Calculate the permutation feature importance and list the top 10 features by their autogenerated names
    context.BinaryClassification.PermutationFeatureImportanceNonCalibrated(model, model.Transform(df.Sample(50)), permutationCount:1)
            .OrderByDescending(f => Math.Abs(f.Value.F1Score.Mean))
            .Where(f => Math.Abs(f.Value.F1Score.Mean) > 0)
            .Take(10)
            .Select(f => new {Feature=f.Key, Impact=f.Value.F1Score.Mean}).Display();    
}

Hyperparameters:

LearningRate: 0.5
MinimumExampleCountPerLeaf: 5
NumberOfLeaves: 10
NumberOfTrees: 3


index,value
,
,
,
,
,
,
,
,
,
,

Unnamed: 0,Unnamed: 1
Feature,MessageChars.i|x
Impact,-0.24086021505376354

Unnamed: 0,Unnamed: 1
Feature,MessageChars.<␠>|n|<␠>
Impact,-0.10752688172043023

Unnamed: 0,Unnamed: 1
Feature,MessageChars.r|e|o
Impact,-0.08453837597330371

Unnamed: 0,Unnamed: 1
Feature,MessageChars.<␠>|t
Impact,-0.06451612903225823

Unnamed: 0,Unnamed: 1
Feature,MessageChars.i|m|e
Impact,0.06451612903225801

Unnamed: 0,Unnamed: 1
Feature,MessageChars.e|<␠>|b
Impact,-0.04086021505376347

Unnamed: 0,Unnamed: 1
Feature,MessageChars.t|a
Impact,-0.04086021505376347

Unnamed: 0,Unnamed: 1
Feature,MessageChars.r|a
Impact,0.02580645161290307

Unnamed: 0,Unnamed: 1
Feature,MessageChars.r|r|r
Impact,-0.024193548387096864

Unnamed: 0,Unnamed: 1
Feature,MessageChars.r|o|r
Impact,-0.024193548387096864


### Linear SVM

In [70]:
public class HypertuningParameters {
    public float Lambda {get; set;}
    public int BatchSize { get; set; }
    public bool NoBias {get; set;}
    public bool Shuffle {get; set;}
    public bool PerformProjection {get; set;}
}

SearchSpace<HypertuningParameters> searchSpace = new();
searchSpace["Lambda"] = new UniformDoubleOption(1E-06f, 1, defaultValue: 0.0001f, logBase: true);
searchSpace["BatchSize"] = new UniformIntOption(1, 128, defaultValue: 1);
searchSpace["NoBias"] = new ChoiceOption(true, false);
searchSpace["Shuffle"] = new ChoiceOption(true, false);
searchSpace["PerformProjection"] = new ChoiceOption(true, false);

using (ContextMonitor monitor = context.Monitor())
{ 
    SweepableEstimator estimator = context.Auto().CreateSweepableEstimator((context, options) => 
        context.BinaryClassification.Trainers.LinearSvm(new Microsoft.ML.Trainers.LinearSvmTrainer.Options() {
            Lambda = options.Lambda,
            BatchSize = options.BatchSize,
            NoBias = options.NoBias,
            Shuffle = options.Shuffle,
            PerformProjection = options.PerformProjection
        }), searchSpace);        

    var result = await context.Auto().CreateExperiment()
        .SetPipeline(basePipeline.Append(estimator))
        .SetDataset(split.TrainSet, fold: 5) // Cross-validation using 90% of the data
        .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label", predictedColumn: "Label")
        .SetMaxModelToExplore(10)
        .SetEciCostFrugalTuner()
        .RunAsync();

    // Let's see how it performed
    ITransformer model = result.Model;
    var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label", predictedLabelColumnName: "Label");

    modelTracker.Register("Hyperparameter-Tuned Linear SVM", evalResults).ToDataFrame();
    var chart = MLCharts.ClassificationReport(evalResults);
    chart.Display();

    // Display the Hyperparameters for our best trial
    Console.WriteLine("Hyperparameters:\r\n");
    foreach (var kvp in monitor.BestTrial.Hyperparameters) {
        Console.WriteLine($"{kvp.Key}: {kvp.Value}");
    }

    MLCharts.MetricImprovementWithTrials(monitor).Display();

    // Calculate the permutation feature importance and list the top 10 features by their autogenerated names
    context.BinaryClassification.PermutationFeatureImportanceNonCalibrated(model, model.Transform(df.Sample(50)), permutationCount:1)
            .OrderByDescending(f => Math.Abs(f.Value.F1Score.Mean))
            .Where(f => Math.Abs(f.Value.F1Score.Mean) > 0)
            .Take(10)
            .Select(f => new {Feature=f.Key, Impact=f.Value.F1Score.Mean}).Display();    
}

Hyperparameters:

BatchSize: 1
Lambda: 0.00014314041558061833
NoBias: True
PerformProjection: True
Shuffle: True


index,value
,
,
,
,
,
,
,
,
,
,

Unnamed: 0,Unnamed: 1
Feature,WordCount
Impact,-0.10833333333333339

Unnamed: 0,Unnamed: 1
Feature,NetLines
Impact,0.06666666666666676

Unnamed: 0,Unnamed: 1
Feature,WorkItems
Impact,-0.04367816091954024

Unnamed: 0,Unnamed: 1
Feature,MessageLength
Impact,-0.04367816091954024

Unnamed: 0,Unnamed: 1
Feature,MessageWords.update
Impact,-0.04367816091954024

Unnamed: 0,Unnamed: 1
Feature,MessageChars.o
Impact,0.04086021505376336

Unnamed: 0,Unnamed: 1
Feature,MessageChars.m
Impact,0.04086021505376336

Unnamed: 0,Unnamed: 1
Feature,AddedFiles
Impact,0.04086021505376336

Unnamed: 0,Unnamed: 1
Feature,MessageChars.r
Impact,0.04086021505376336

Unnamed: 0,Unnamed: 1
Feature,MessageChars.a
Impact,0.04086021505376336


### Averaged Perceptron

In [71]:
public class HypertuningParameters {
    public float LearningRate {get; set;}
    public bool DecreaseLearningRate {get; set;}
    public float L2Regularization {get; set;}
}

SearchSpace<HypertuningParameters> searchSpace = new();
searchSpace["LearningRate"] = new UniformDoubleOption(0.0001f, 1f, defaultValue: 0.98f, logBase: true);
searchSpace["L2Regularization"] = new UniformDoubleOption(0, 5, defaultValue: 0f);
searchSpace["DecreaseLearningRate"] = new ChoiceOption(true, false);

using (ContextMonitor monitor = context.Monitor())
{ 
    SweepableEstimator estimator = context.Auto().CreateSweepableEstimator((context, options) => 
        context.BinaryClassification.Trainers.AveragedPerceptron(new Microsoft.ML.Trainers.AveragedPerceptronTrainer.Options() {
            LearningRate = options.LearningRate,
            L2Regularization = options.L2Regularization,
            DecreaseLearningRate = options.DecreaseLearningRate
        }), searchSpace);        

    var result = await context.Auto().CreateExperiment()
        .SetPipeline(basePipeline.Append(estimator))
        .SetDataset(split.TrainSet, fold: 5) // Cross-validation using 90% of the data
        .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label", predictedColumn: "Label")
        .SetMaxModelToExplore(50)
        .SetEciCostFrugalTuner()
        .RunAsync();

    // Let's see how it performed
    ITransformer model = result.Model;
    var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label", predictedLabelColumnName: "Label");

    modelTracker.Register("Hyperparameter-Tuned Averaged Perceptron", evalResults).ToDataFrame();
    var chart = MLCharts.ClassificationReport(evalResults);
    chart.Display();

    // Display the Hyperparameters for our best trial
    Console.WriteLine("Hyperparameters:\r\n");
    foreach (var kvp in monitor.BestTrial.Hyperparameters) {
        Console.WriteLine($"{kvp.Key}: {kvp.Value}");
    }

    MLCharts.MetricImprovementWithTrials(monitor).Display();

    // Calculate the permutation feature importance and list the top 10 features by their autogenerated names
    context.BinaryClassification.PermutationFeatureImportanceNonCalibrated(model, model.Transform(df.Sample(50)), permutationCount:1)
            .OrderByDescending(f => Math.Abs(f.Value.F1Score.Mean))
            .Where(f => Math.Abs(f.Value.F1Score.Mean) > 0)
            .Take(10)
            .Select(f => new {Feature=f.Key, Impact=f.Value.F1Score.Mean}).Display();    
}

Hyperparameters:

DecreaseLearningRate: True
L2Regularization: 0
LearningRate: 0.9909187220538312


index,value
,
,
,
,
,
,
,
,
,
,

Unnamed: 0,Unnamed: 1
Feature,NetLines
Impact,-0.13250517598343692

Unnamed: 0,Unnamed: 1
Feature,WordCount
Impact,-0.0869565217391306

Unnamed: 0,Unnamed: 1
Feature,MessageLength
Impact,-0.0869565217391306

Unnamed: 0,Unnamed: 1
Feature,MessageWords.fix
Impact,-0.0869565217391306

Unnamed: 0,Unnamed: 1
Feature,MessageWords.dependencies
Impact,-0.06324110671936767

Unnamed: 0,Unnamed: 1
Feature,MessageWords.add
Impact,-0.06324110671936767

Unnamed: 0,Unnamed: 1
Feature,MessageWords.readonly
Impact,-0.06324110671936767

Unnamed: 0,Unnamed: 1
Feature,HasAddedFiles
Impact,-0.06324110671936767

Unnamed: 0,Unnamed: 1
Feature,MessageWords.fixes
Impact,0.05797101449275355

Unnamed: 0,Unnamed: 1
Feature,MessageChars.i|x
Impact,0.05797101449275355


### Field-Aware Factorization Machine

In [72]:
public class HypertuningParameters {
    public float LearningRate {get; set;}
    public int NumberOfIterations {get; set;}
    public int LatentDimension {get; set;}
    public float LambdaLinear {get; set;}
    public float LambdaLatent {get; set;}
    public float Radius {get; set;}
}

SearchSpace<HypertuningParameters> searchSpace = new();
searchSpace["LearningRate"] = new UniformDoubleOption(0.0001f, 1f, defaultValue: 0.98f, logBase: true);
searchSpace["NumberOfIterations"] = new UniformIntOption(0, 20, defaultValue: 5);
searchSpace["LatentDimension"] = new UniformIntOption(4, 100, defaultValue: 20);
searchSpace["LambdaLinear"] = new UniformDoubleOption(1E-08f, 1, defaultValue: 0.0001f);
searchSpace["LambdaLatent"] = new UniformDoubleOption(1E-08f, 1, defaultValue: 0.0001f);
searchSpace["Radius"] = new UniformDoubleOption(0.1, 1, defaultValue: 0.5f);

using (ContextMonitor monitor = context.Monitor())
{ 
    SweepableEstimator estimator = context.Auto().CreateSweepableEstimator((context, options) => 
        context.BinaryClassification.Trainers.FieldAwareFactorizationMachine(new Microsoft.ML.Trainers.FieldAwareFactorizationMachineTrainer.Options() {
            LearningRate = options.LearningRate,
            NumberOfIterations = options.NumberOfIterations,
            LatentDimension = options.LatentDimension,
            LambdaLinear = options.LambdaLinear,
            LambdaLatent = options.LambdaLatent,
            Radius = options.Radius
        }), searchSpace);        

    var result = await context.Auto().CreateExperiment()
        .SetPipeline(basePipeline.Append(estimator))
        .SetDataset(split.TrainSet, fold: 5) // Cross-validation using 90% of the data
        .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label", predictedColumn: "Label")
        .SetMaxModelToExplore(10)
        .SetEciCostFrugalTuner()
        .RunAsync();

    // Let's see how it performed
    ITransformer model = result.Model;
    var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label", predictedLabelColumnName: "Label");

    modelTracker.Register("Hyperparameter-Tuned Field-Aware Factorization Machine", evalResults).ToDataFrame();
    var chart = MLCharts.ClassificationReport(evalResults);
    chart.Display();

    // Display the Hyperparameters for our best trial
    Console.WriteLine("Hyperparameters:\r\n");
    foreach (var kvp in monitor.BestTrial.Hyperparameters) {
        Console.WriteLine($"{kvp.Key}: {kvp.Value}");
    }

    MLCharts.MetricImprovementWithTrials(monitor).Display();

    // Note: Field-Aware Factorization Machine does not support permutation feature importance
}

Hyperparameters:

LambdaLatent: 9.99999993922529E-09
LambdaLinear: 0.02503582141143403
LatentDimension: 10
LearningRate: 0.9999999078965999
NumberOfIterations: 7
Radius: 0.48538832029952605


### Local Deep Learning Support Vector Machine

[Research Article](https://www.microsoft.com/en-us/research/video/local-deep-kernel-learning-for-efficient-non-linear-svm-prediction-2/)

In [73]:
/*
public class HypertuningParameters {
    public int TreeDepth {get; set;}
    public float LambdaW {get; set;}
    public float LambdaTheta {get; set;}
    public float LambdaThetaprime {get; set;}
    public float Sigma {get; set;}
    public bool UseBias {get; set;}
    public int NumberOfIterations {get; set;}
}

SearchSpace<HypertuningParameters> searchSpace = new();
searchSpace["TreeDepth"] = new UniformIntOption(1, 128, defaultValue: 3);
searchSpace["LambdaW"] = new UniformDoubleOption(0.0001f, 1f, defaultValue: 0.1f, logBase: true);
searchSpace["LambdaTheta"] = new UniformDoubleOption(0.0001f, 1f, defaultValue: 0.01f, logBase: true);
searchSpace["LambdaThetaprime"] = new UniformDoubleOption(0.0001f, 1f, defaultValue: 0.01f, logBase: true);
searchSpace["Sigma"] = new UniformDoubleOption(0.0001f, 1f, defaultValue: 1f, logBase: true);
searchSpace["UseBias"] = new ChoiceOption(true, false);
searchSpace["LearningRate"] = new UniformDoubleOption(0.0001f, 1f, defaultValue: 1f, logBase: true);
searchSpace["NumberOfIterations"] = new UniformIntOption(1, 1000, defaultValue: 150);
*/

using (ContextMonitor monitor = context.Monitor())
{ 
    /*
    SweepableEstimator estimator = context.Auto().CreateSweepableEstimator((context, options) => 
        context.BinaryClassification.Trainers.LdSvm(new LdSvmTrainer.Options() {
            TreeDepth = options.TreeDepth,
            LambdaW = options.LambdaW,
            LambdaTheta = options.LambdaTheta,
            LambdaThetaprime = options.LambdaThetaprime,
            Sigma = options.Sigma,
            UseBias = options.UseBias,
            NumberOfIterations = options.NumberOfIterations,
        }), searchSpace);        

    var result = await context.Auto().CreateExperiment()
        .SetPipeline(basePipeline.Append(estimator))
        .SetDataset(split.TrainSet, fold: 5) // Cross-validation using 90% of the data
        .SetBinaryClassificationMetric(BinaryClassificationMetric.F1Score, labelColumn: "Label", predictedColumn: "Label")
        .SetMaxModelToExplore(1)
        .SetEciCostFrugalTuner()
        .RunAsync();
        */

    var pipeline = basePipeline.Append(context.BinaryClassification.Trainers.LdSvm(new LdSvmTrainer.Options() {
        TreeDepth = 5,
        LambdaW = 0.1f,
        LambdaTheta = 0.01f,
        LambdaThetaprime = 0.01f,
        Sigma = 1f,
        UseBias = true,
        NumberOfIterations = 1500,
    }));

    var model = pipeline.Fit(split.TrainSet);

    // Let's see how it performed
    var evalResults = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(split.TestSet), labelColumnName: "Label", predictedLabelColumnName: "Label");

    modelTracker.Register("Local Deep Learning Support Vector Machine", evalResults).ToDataFrame();
    var chart = MLCharts.ClassificationReport(evalResults);
    chart.Display();

    // Display the Hyperparameters for our best trial
    /*
    Console.WriteLine("Hyperparameters:\r\n");
    foreach (var kvp in monitor.BestTrial.Hyperparameters) {
        Console.WriteLine($"{kvp.Key}: {kvp.Value}");
    }
    */

    //MLCharts.MetricImprovementWithTrials(monitor).Display();
    /*
    context.BinaryClassification.PermutationFeatureImportanceNonCalibrated(model, model.Transform(df.Sample(50)), permutationCount:1)
        .OrderByDescending(f => Math.Abs(f.Value.F1Score.Mean))
        .Where(f => Math.Abs(f.Value.F1Score.Mean) > 0)
        .Take(10)
        .Select(f => new {Feature=f.Key, Impact=f.Value.F1Score.Mean}).Display();    
        */
}
//*/

In [74]:
    context.BinaryClassification.PermutationFeatureImportanceNonCalibrated(model, model.Transform(df), permutationCount:3)
        .OrderByDescending(f => Math.Abs(f.Value.F1Score.Mean))
        .Where(f => Math.Abs(f.Value.F1Score.Mean) > 0)
        .Take(10)
        .Select(f => new {Feature=f.Key, Impact=f.Value.F1Score.Mean}).Display();    


## Testing our Model

In [55]:
public class CommitInfo {
    public string Message { get; set; }
    public bool IsMerge {get; set;}
    public float WorkItems {get; set;}
    public float TotalFiles {get; set;}
    public float ModifiedFiles {get; set;}
    public float AddedFiles {get; set;}
    public float DeletedFiles {get; set;}
    public float TotalLines {get; set;}
    public float NetLines {get; set;}
    public float AddedLines {get; set;}
    public float DeletedLines {get; set;}
    public bool HasAddedFiles {get; set;}
    public bool HasDeletedFiles {get; set;}
    public float MessageLength {get; set;}
    public float WordCount {get; set;}
}

public class CommitClassification {
    public bool PredictedLabel { get; set; }
    public float Probability { get; set; }
}

var engine = context.Model.CreatePredictionEngine<CommitInfo, CommitClassification>(model);

In [56]:
var commit = new CommitInfo {
    Message = "Fixes issue #1234",
    IsMerge = false,
    WorkItems = 1,
    TotalFiles = 1,
    ModifiedFiles = 1,
    AddedFiles = 0,
    DeletedFiles = 0,
    TotalLines = 10,
    NetLines = 10,
    AddedLines = 0,
    DeletedLines = 0,
    HasAddedFiles = false,
    HasDeletedFiles = false,
    MessageLength = 16,
    WordCount = 3
};

var prediction = engine.Predict(commit);
prediction

Unnamed: 0,Unnamed: 1
PredictedLabel,True
Probability,0.565935


In [57]:
var commit = new CommitInfo {
    Message = "Added a Dark Theme",
    IsMerge = false,
    WorkItems = 1,
    TotalFiles = 8,
    ModifiedFiles = 2,
    AddedFiles = 6,
    DeletedFiles = 0,
    TotalLines = 196,
    NetLines = 55,
    AddedLines = 88,
    DeletedLines = 37,
    HasAddedFiles = true,
    HasDeletedFiles = false,
    MessageLength = 20,
    WordCount = 4
};

var prediction = engine.Predict(commit);
prediction

Unnamed: 0,Unnamed: 1
PredictedLabel,False
Probability,0.44746587


In [59]:
df["Source"].ValueCounts()

index,Values,Counts
0,dotnetinteractive,289
1,emergence,34
2,gitstractor,16
3,mlnet,146
4,wherewolf,14


In [66]:
Dictionary<string, DataFrame> dfsBySource = new();

foreach (var source in df["Source"].Cast<string>().Distinct()) {
    dfsBySource[source] = df.Filter(df["Source"].ElementwiseEquals(source));
}

// Display value counts by key
dfsBySource.Select(kvp => kvp.Key + ": " + kvp.Value.Rows.Count)

In [67]:
foreach (var kvp in dfsBySource) {
    string source = kvp.Key;
    DataFrame dfSource = kvp.Value;

    var sourceEval = context.BinaryClassification.EvaluateNonCalibrated(model.Transform(dfSource), labelColumnName: "Label", predictedLabelColumnName: "Label");
    Console.WriteLine($"Source: {source}");
    MLCharts.ClassificationReport(sourceEval).Display();
}


Source: dotnetinteractive


Source: emergence


Source: gitstractor


Source: mlnet


Source: wherewolf


### Accuracy by Source

In [52]:
prediction

Unnamed: 0,Unnamed: 1
Label,False
Prediction,False
Probability,0.565935


## Remaining Work

- [x] Accuracy by Source Calculation
- [ ] Number Replacement
- [ ] URL Replacement
- [ ] Username Replacement
- [x] SciKit-Learn Models
- [x] [Feature Contributions](https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.explainabilitycatalog.calculatefeaturecontribution?view=ml-dotnet)
- [x] Model Hyperparameter Tuning
- [ ] Better Hyperparameter Visualizations

In [76]:
modelTracker.ToDataFrame()

index,Model,F1 Score,Accuracy,Positive Precision,Positive Recall,Negative Precision,Negative Recall,AUC,AUCPR
⏮⏪◀️Page1▶️⏩⏭️,⏮⏪◀️Page1▶️⏩⏭️,⏮⏪◀️Page1▶️⏩⏭️,⏮⏪◀️Page1▶️⏩⏭️,⏮⏪◀️Page1▶️⏩⏭️,⏮⏪◀️Page1▶️⏩⏭️,⏮⏪◀️Page1▶️⏩⏭️,⏮⏪◀️Page1▶️⏩⏭️,⏮⏪◀️Page1▶️⏩⏭️,⏮⏪◀️Page1▶️⏩⏭️
