# ML.NET Advanced (Simple Exercise) — .NET Interactive Notebook

This notebook gives you a **minimal, clean C# setup** in Jupyter for ML.NET:

1) Create a tiny Vietnamese sentiment dataset
2) Train/Test split
3) Build a pipeline (FeaturizeText → NormalizeMinMax → Concatenate → SDCA)
4) Train & Evaluate (Accuracy, AUC, F1, etc.)
5) Save/Load model
6) Predict (single & batch)

> **Prereq**: Install the .NET Interactive Jupyter kernel (one-time):
>
```bash
dotnet tool install -g Microsoft.dotnet-interactive
dotnet interactive jupyter install
```

Then open this notebook and select **.NET (C#)** kernel.

In [1]:
#r "nuget: Microsoft.ML, 3.0.1"
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;

## Define data schema

In [2]:
public class ReviewInput
{
    [LoadColumn(0)]
    public string Text { get; set; } = string.Empty;

    [LoadColumn(1), ColumnName("Label")]
    public bool Label { get; set; }
}

public class ReviewPrediction
{
    [ColumnName("PredictedLabel")] public bool Prediction { get; set; }
    public float Probability { get; set; }
    public float Score { get; set; }
}

## Create a tiny sample dataset (Vietnamese reviews)

In [3]:
var samples = new List<ReviewInput>
{
    new() { Text = "Dịch vụ rất tốt", Label = true },
    new() { Text = "Tôi rất hài lòng", Label = true },
    new() { Text = "Nhanh và chuyên nghiệp", Label = true },
    new() { Text = "Thái độ nhân viên thân thiện", Label = true },
    new() { Text = "Sản phẩm chất lượng", Label = true },
    new() { Text = "Tuyệt vời", Label = true },
    new() { Text = "Quá tệ, tôi thất vọng", Label = false },
    new() { Text = "Chậm chạp và không hỗ trợ", Label = false },
    new() { Text = "Thái độ không tốt", Label = false },
    new() { Text = "Sẽ không quay lại nữa", Label = false },
    new() { Text = "Bình thường, không có gì đặc biệt", Label = false },
    new() { Text = "Giá cao và không xứng đáng", Label = false },
};

Console.WriteLine($"Total samples: {samples.Count}");

Total samples: 12


## MLContext and Train/Test Split

In [4]:
var ml = new MLContext(seed: 42);
var allData = ml.Data.LoadFromEnumerable(samples);

// 80/20 split
var split = ml.Data.TrainTestSplit(allData, testFraction: 0.2);
var train = split.TrainSet;
var test  = split.TestSet;
Console.WriteLine($"Train rows: {ml.Data.CreateEnumerable<ReviewInput>(train, false).Count()} | Test rows: {ml.Data.CreateEnumerable<ReviewInput>(test, false).Count()}");

Train rows: 10 | Test rows: 2


## Build a simple advanced pipeline
Pipeline: **FeaturizeText → NormalizeMinMax → Concatenate → SDCA (Logistic Regression)**

In [5]:
var pipeline = ml.Transforms.Text.FeaturizeText("TextFeaturized", nameof(ReviewInput.Text))
    .Append(ml.Transforms.NormalizeMinMax("TextFeaturized"))
    .Append(ml.Transforms.Concatenate("Features", "TextFeaturized"))
    .Append(ml.BinaryClassification.Trainers.SdcaLogisticRegression(
        labelColumnName: "Label",
        featureColumnName: "Features"));

var model = pipeline.Fit(train);
Console.WriteLine("Model trained.");

Model trained.


## Evaluate

In [6]:
var preds = model.Transform(test);
var m = ml.BinaryClassification.Evaluate(preds, labelColumnName: "Label", scoreColumnName: "Score");
Console.WriteLine($"Accuracy: {m.Accuracy:P2}\nAUC: {m.AreaUnderRocCurve:P2}\nF1: {m.F1Score:P2}");

Accuracy: 50.00%
AUC: 0.00%
F1: 66.67%


## Save model to .zip

In [8]:
using System.IO;

var modelPath = Path.Combine(Directory.GetCurrentDirectory(), "reviewModel.zip");
ml.Model.Save(model, train.Schema, modelPath);
Console.WriteLine($"Saved to: {modelPath}");

Saved to: c:\AI .NET\Notebook\reviewModel.zip


## Load model & single prediction

In [10]:
var fs = File.OpenRead(modelPath);
var loaded = ml.Model.Load(fs, out _);
var engine = ml.Model.CreatePredictionEngine<ReviewInput, ReviewPrediction>(loaded);

var input = new ReviewInput { Text = "Chất lượng rất tuyệt vời" };
var pred = engine.Predict(input);
Console.WriteLine($"Text: {input.Text}\nPrediction: {pred.Prediction} | Prob: {pred.Probability:F4} | Score: {pred.Score:F4}");

Text: Chất lượng rất tuyệt vời
Prediction: True | Prob: 0.9994 | Score: 7.4323


## Batch prediction

In [11]:
var batch = new []
{
    new ReviewInput { Text = "Quá tệ và phục vụ chậm" },
    new ReviewInput { Text = "Tôi hài lòng về trải nghiệm" },
    new ReviewInput { Text = "Bình thường" }
};

var dv = ml.Data.LoadFromEnumerable(batch);
var scored = loaded.Transform(dv);
var results = ml.Data.CreateEnumerable<ReviewPrediction>(scored, reuseRowObject: false).ToList();

for (int i = 0; i < batch.Length; i++)
{
    Console.WriteLine($"{i+1}. '{batch[i].Text}' => Pred: {results[i].Prediction}, Prob: {results[i].Probability:F4}");
}

1. 'Quá tệ và phục vụ chậm' => Pred: True, Prob: 0.8483
2. 'Tôi hài lòng về trải nghiệm' => Pred: True, Prob: 0.9833
3. 'Bình thường' => Pred: False, Prob: 0.3601


### Try other transforms/trainers (optional)
You can experiment by replacing SDCA with L-BFGS Logistic Regression:

```csharp
var pipeline = ml.Transforms.Text.FeaturizeText("TextFeaturized", nameof(ReviewInput.Text))
    .Append(ml.Transforms.NormalizeMinMax("TextFeaturized"))
    .Append(ml.Transforms.Concatenate("Features", "TextFeaturized"))
    .Append(ml.BinaryClassification.Trainers.LbfgsLogisticRegression(labelColumnName: "Label", featureColumnName: "Features"));
```

Or try `NormalizeMeanVariance` instead of `NormalizeMinMax`.