# ML.Net - Samples - CreditCardFraudDetection -> ( Anomaly Detection)

# Spam Detection for Text Messages

| ML.NET version | API type          | Status                        | App Type    | Data type | Scenario            | ML Task                   | Algorithms                  |
|----------------|-------------------|-------------------------------|-------------|-----------|---------------------|---------------------------|-----------------------------|
| v1.4           | Dynamic API | Might need to update project structure to match template | Jupyter-Notebbok | .tsv files | Spam detection | Two-class classification | Averaged Perceptron (linear learner) |

In this sample, you'll see how to use [ML.NET](https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet) to predict whether a text message is spam. In the world of machine learning, this type of prediction is known as **binary classification**.

## Problem

Our goal here is to predict whether a text message is spam (an irrelevant/unwanted message). We will use the [SMS Spam Collection Data Set](https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection) from UCI, which contains close to 6000 messages that have been classified as being "spam" or "ham" (not spam). We will use this dataset to train a model that can take in new message and predict whether they are spam or not.

This is an example of binary classification, as we are classifying the text messages into one of two categories.


## Solution

To solve this problem, first we will build an estimator to define the ML pipeline we want to use. Then we will train this estimator on existing data, evaluate how good it is, and lastly we'll consume the model to predict whether a few examples messages are spam.

![Build -> Train -> Evaluate -> Consume](../shared_content/modelpipeline.png)

### 1. Build Model

To build the model we will:

* Define how to read the spam dataset that will be downloaded from https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection. 

* Apply several data transformations:

    * Convert the label ("spam" or "ham") to a boolean ("true" represents spam) so we can use it with a binary classifier. 
    * Featurize the text message into a numeric vector so a machine learning trainer can use it

* Add a trainer (such as `StochasticDualCoordinateAscent`).

The initial code is similar to the following:


In [1]:
// ML.NET Nuget packages 
#r "nuget:Microsoft.ML"     

// ML.NET TimeSeries Nuget packages 
#r "nuget:Microsoft.ML.TimeSeries" 

Installed package Microsoft.ML.TimeSeries version 1.5.0

Installed package Microsoft.ML version 1.5.0

## Using C# Class

In [3]:
using System;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Net;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;
using System.Collections.Generic;
using static Microsoft.ML.TrainCatalogBase;
using static Microsoft.ML.DataOperationsCatalog;
using System.Diagnostics;

## Declare data-classes for input data and predictions

In [6]:
public interface IModelEntity {
    void PrintToConsole();
}

public class TransactionObservation : IModelEntity
{
    // Note we're not loading the 'Time' column, since que don't need it as a feature
    [LoadColumn(0)]
    public float Time;

    [LoadColumn(1)]
    public float V1;

    [LoadColumn(2)]
    public float V2;

    [LoadColumn(3)]
    public float V3;

    [LoadColumn(4)]
    public float V4;

    [LoadColumn(5)]
    public float V5;

    [LoadColumn(6)]
    public float V6;

    [LoadColumn(7)]
    public float V7;

    [LoadColumn(8)]
    public float V8;

    [LoadColumn(9)]
    public float V9;

    [LoadColumn(10)]
    public float V10;

    [LoadColumn(11)]
    public float V11;

    [LoadColumn(12)]
    public float V12;

    [LoadColumn(13)]
    public float V13;

    [LoadColumn(14)]
    public float V14;

    [LoadColumn(15)]
    public float V15;

    [LoadColumn(16)]
    public float V16;

    [LoadColumn(17)]
    public float V17;

    [LoadColumn(18)]
    public float V18;

    [LoadColumn(19)]
    public float V19;

    [LoadColumn(20)]
    public float V20;

    [LoadColumn(21)]
    public float V21;

    [LoadColumn(22)]
    public float V22;

    [LoadColumn(23)]
    public float V23;

    [LoadColumn(24)]
    public float V24;

    [LoadColumn(25)]
    public float V25;

    [LoadColumn(26)]
    public float V26;

    [LoadColumn(27)]
    public float V27;

    [LoadColumn(28)]
    public float V28;

    [LoadColumn(29)]
    public float Amount;

    [LoadColumn(30)]
    public bool Label;

    public void PrintToConsole() {
        Console.WriteLine($"Label: {Label}");
        Console.WriteLine($"Features: [V1] {V1} [V2] {V2} [V3] {V3} ... [V28] {V28} Amount: {Amount}");
    }       
}

public class TransactionFraudPrediction : IModelEntity
{
    public bool Label;
    public bool PredictedLabel;
    public float Score;
    public float Probability;

    public void PrintToConsole()
    {
        Console.WriteLine($"Predicted Label: {PredictedLabel}");
        Console.WriteLine($"Probability: {Probability}  ({Score})");
    }
}

In [11]:
public class Predictor
{
    private readonly string _modelfile;
    private readonly string _dasetFile;

    public Predictor(string modelfile, string dasetFile)
    {
        _modelfile = modelfile ?? throw new ArgumentNullException(nameof(modelfile));
        _dasetFile = dasetFile ?? throw new ArgumentNullException(nameof(dasetFile));
    }

    public void RunMultiplePredictions(int numberOfPredictions)
    {
        var mlContext = new MLContext();

        // Load data as input for predictions
        IDataView inputDataForPredictions = mlContext.Data.LoadFromTextFile<TransactionObservation>(_dasetFile, separatorChar: ',', hasHeader: true);

        Console.WriteLine($"Predictions from saved model:");

        ITransformer model = mlContext.Model.Load(_modelfile, out var inputSchema);

        var predictionEngine = mlContext.Model.CreatePredictionEngine<TransactionObservation, TransactionFraudPrediction>(model);

        Console.WriteLine($"\n \n Test {numberOfPredictions} transactions, from the test datasource, that should be predicted as fraud (true):");

        mlContext.Data.CreateEnumerable<TransactionObservation>(inputDataForPredictions, reuseRowObject: false)
                    .Where(x => x.Label > 0)
                    .Take(numberOfPredictions)
                    .Select(testData => testData)
                    .ToList()
                    .ForEach(testData =>
                                {
                                    Console.WriteLine($"--- Transaction ---");
                                    testData.PrintToConsole();
                                    predictionEngine.Predict(testData).PrintToConsole();
                                    Console.WriteLine($"-------------------");
                                });


        Console.WriteLine($"\n \n Test {numberOfPredictions} transactions, from the test datasource, that should NOT be predicted as fraud (false):");

        mlContext.Data.CreateEnumerable<TransactionObservation>(inputDataForPredictions, reuseRowObject: false)
                   .Where(x => x.Label < 1)
                   .Take(numberOfPredictions)
                   .ToList()
                   .ForEach(testData =>
                               {
                                   Console.WriteLine($"--- Transaction ---");
                                   testData.PrintToConsole();
                                   predictionEngine.Predict(testData).PrintToConsole();
                                   Console.WriteLine($"-------------------");
                               });
    }
}

Unhandled exception: (28,33): error CS0019: O operador ">" não pode ser aplicado a operandos dos tipos "bool" e "int"
(44,32): error CS0019: O operador "<" não pode ser aplicado a operandos dos tipos "bool" e "int"

### Constants

In [12]:
//File paths
string assetsPath = @"./datasets/CreditCardFraudDetection";
string zipDataSet = Path.Combine(assetsPath, "input", "creditcardfraud-dataset.zip");
string fullDataSetFilePath = Path.Combine(assetsPath, "input", "creditcard.csv");
string trainDataSetFilePath = Path.Combine(assetsPath, "output", "trainData.csv"); 
string testDataSetFilePath = Path.Combine(assetsPath, "output", "testData.csv");
string modelFilePath = Path.Combine(assetsPath, "output", "randomizedPca.zip");
string trainOutput = "./datasets/CreditCardFraudDetection/output";

### ConsoleHelper

In [13]:
public static class ConsoleHelper
{
    public static void PrintPrediction(string prediction)
    {
        Console.WriteLine($"*************************************************");
        Console.WriteLine($"Predicted : {prediction}");
        Console.WriteLine($"*************************************************");
    }

    public static void PrintRegressionPredictionVersusObserved(string predictionCount, string observedCount)
    {
        Console.WriteLine($"-------------------------------------------------");
        Console.WriteLine($"Predicted : {predictionCount}");
        Console.WriteLine($"Actual:     {observedCount}");
        Console.WriteLine($"-------------------------------------------------");
    }

    public static void PrintRegressionMetrics(string name, RegressionMetrics metrics)
    {
        Console.WriteLine($"*************************************************");
        Console.WriteLine($"*       Metrics for {name} regression model      ");
        Console.WriteLine($"*------------------------------------------------");
        Console.WriteLine($"*       LossFn:        {metrics.LossFunction:0.##}");
        Console.WriteLine($"*       R2 Score:      {metrics.RSquared:0.##}");
        Console.WriteLine($"*       Absolute loss: {metrics.MeanAbsoluteError:#.##}");
        Console.WriteLine($"*       Squared loss:  {metrics.MeanSquaredError:#.##}");
        Console.WriteLine($"*       RMS loss:      {metrics.RootMeanSquaredError:#.##}");
        Console.WriteLine($"*************************************************");
    }

    public static void PrintBinaryClassificationMetrics(string name, CalibratedBinaryClassificationMetrics metrics)
    {
        Console.WriteLine($"************************************************************");
        Console.WriteLine($"*       Metrics for {name} binary classification model      ");
        Console.WriteLine($"*-----------------------------------------------------------");
        Console.WriteLine($"*       Accuracy: {metrics.Accuracy:P2}");
        Console.WriteLine($"*       Area Under Curve:      {metrics.AreaUnderRocCurve:P2}");
        Console.WriteLine($"*       Area under Precision recall Curve:  {metrics.AreaUnderPrecisionRecallCurve:P2}");
        Console.WriteLine($"*       F1Score:  {metrics.F1Score:P2}");
        Console.WriteLine($"*       LogLoss:  {metrics.LogLoss:#.##}");
        Console.WriteLine($"*       LogLossReduction:  {metrics.LogLossReduction:#.##}");
        Console.WriteLine($"*       PositivePrecision:  {metrics.PositivePrecision:#.##}");
        Console.WriteLine($"*       PositiveRecall:  {metrics.PositiveRecall:#.##}");
        Console.WriteLine($"*       NegativePrecision:  {metrics.NegativePrecision:#.##}");
        Console.WriteLine($"*       NegativeRecall:  {metrics.NegativeRecall:P2}");
        Console.WriteLine($"************************************************************");
    }

    public static void PrintMultiClassClassificationMetrics(string name, MulticlassClassificationMetrics metrics)
    {
        Console.WriteLine($"************************************************************");
        Console.WriteLine($"*    Metrics for {name} multi-class classification model   ");
        Console.WriteLine($"*-----------------------------------------------------------");
        Console.WriteLine($"    AccuracyMacro = {metrics.MacroAccuracy:0.####}, a value between 0 and 1, the closer to 1, the better");
        Console.WriteLine($"    AccuracyMicro = {metrics.MicroAccuracy:0.####}, a value between 0 and 1, the closer to 1, the better");
        Console.WriteLine($"    LogLoss = {metrics.LogLoss:0.####}, the closer to 0, the better");
        Console.WriteLine($"    LogLoss for class 1 = {metrics.PerClassLogLoss[0]:0.####}, the closer to 0, the better");
        Console.WriteLine($"    LogLoss for class 2 = {metrics.PerClassLogLoss[1]:0.####}, the closer to 0, the better");
        Console.WriteLine($"    LogLoss for class 3 = {metrics.PerClassLogLoss[2]:0.####}, the closer to 0, the better");
        Console.WriteLine($"************************************************************");
    }
    
    public static void PrintRegressionFoldsAverageMetrics(string algorithmName, IReadOnlyList<CrossValidationResult<RegressionMetrics>> crossValidationResults)
    {
        var L1 = crossValidationResults.Select(r => r.Metrics.MeanAbsoluteError);
        var L2 = crossValidationResults.Select(r => r.Metrics.MeanSquaredError);
        var RMS = crossValidationResults.Select(r => r.Metrics.RootMeanSquaredError);
        var lossFunction = crossValidationResults.Select(r => r.Metrics.LossFunction);
        var R2 = crossValidationResults.Select(r => r.Metrics.RSquared);

        Console.WriteLine($"*************************************************************************************************************");
        Console.WriteLine($"*       Metrics for {algorithmName} Regression model      ");
        Console.WriteLine($"*------------------------------------------------------------------------------------------------------------");
        Console.WriteLine($"*       Average L1 Loss:    {L1.Average():0.###} ");
        Console.WriteLine($"*       Average L2 Loss:    {L2.Average():0.###}  ");
        Console.WriteLine($"*       Average RMS:          {RMS.Average():0.###}  ");
        Console.WriteLine($"*       Average Loss Function: {lossFunction.Average():0.###}  ");
        Console.WriteLine($"*       Average R-squared: {R2.Average():0.###}  ");
        Console.WriteLine($"*************************************************************************************************************");
    }
    
    public static void PrintMulticlassClassificationFoldsAverageMetrics(
                                     string algorithmName,
                                   IReadOnlyList<CrossValidationResult<MulticlassClassificationMetrics>> crossValResults
                                                                       )
    {
        var metricsInMultipleFolds = crossValResults.Select(r => r.Metrics);

        var microAccuracyValues = metricsInMultipleFolds.Select(m => m.MicroAccuracy);
        var microAccuracyAverage = microAccuracyValues.Average();
        var microAccuraciesStdDeviation = CalculateStandardDeviation(microAccuracyValues);
        var microAccuraciesConfidenceInterval95 = CalculateConfidenceInterval95(microAccuracyValues);

        var macroAccuracyValues = metricsInMultipleFolds.Select(m => m.MacroAccuracy);
        var macroAccuracyAverage = macroAccuracyValues.Average();
        var macroAccuraciesStdDeviation = CalculateStandardDeviation(macroAccuracyValues);
        var macroAccuraciesConfidenceInterval95 = CalculateConfidenceInterval95(macroAccuracyValues);

        var logLossValues = metricsInMultipleFolds.Select(m => m.LogLoss);
        var logLossAverage = logLossValues.Average();
        var logLossStdDeviation = CalculateStandardDeviation(logLossValues);
        var logLossConfidenceInterval95 = CalculateConfidenceInterval95(logLossValues);

        var logLossReductionValues = metricsInMultipleFolds.Select(m => m.LogLossReduction);
        var logLossReductionAverage = logLossReductionValues.Average();
        var logLossReductionStdDeviation = CalculateStandardDeviation(logLossReductionValues);
        var logLossReductionConfidenceInterval95 = CalculateConfidenceInterval95(logLossReductionValues);

        Console.WriteLine($"*************************************************************************************************************");
        Console.WriteLine($"*       Metrics for {algorithmName} Multi-class Classification model      ");
        Console.WriteLine($"*------------------------------------------------------------------------------------------------------------");
        Console.WriteLine($"*       Average MicroAccuracy:    {microAccuracyAverage:0.###}  - Standard deviation: ({microAccuraciesStdDeviation:#.###})  - Confidence Interval 95%: ({microAccuraciesConfidenceInterval95:#.###})");
        Console.WriteLine($"*       Average MacroAccuracy:    {macroAccuracyAverage:0.###}  - Standard deviation: ({macroAccuraciesStdDeviation:#.###})  - Confidence Interval 95%: ({macroAccuraciesConfidenceInterval95:#.###})");
        Console.WriteLine($"*       Average LogLoss:          {logLossAverage:#.###}  - Standard deviation: ({logLossStdDeviation:#.###})  - Confidence Interval 95%: ({logLossConfidenceInterval95:#.###})");
        Console.WriteLine($"*       Average LogLossReduction: {logLossReductionAverage:#.###}  - Standard deviation: ({logLossReductionStdDeviation:#.###})  - Confidence Interval 95%: ({logLossReductionConfidenceInterval95:#.###})");
        Console.WriteLine($"*************************************************************************************************************");
    }    

    public static double CalculateStandardDeviation (IEnumerable<double> values)
    {
        double average = values.Average();
        double sumOfSquaresOfDifferences = values.Select(val => (val - average) * (val - average)).Sum();
        double standardDeviation = Math.Sqrt(sumOfSquaresOfDifferences / (values.Count()-1));
        return standardDeviation;
    }

    public static double CalculateConfidenceInterval95(IEnumerable<double> values)
    {
        double confidenceInterval95 = 1.96 * CalculateStandardDeviation(values) / Math.Sqrt((values.Count()-1));
        return confidenceInterval95;
    }

    public static void PrintClusteringMetrics(string name, ClusteringMetrics metrics)
    {
        Console.WriteLine($"*************************************************");
        Console.WriteLine($"*       Metrics for {name} clustering model      ");
        Console.WriteLine($"*------------------------------------------------");
        Console.WriteLine($"*       Average Distance: {metrics.AverageDistance}");
        Console.WriteLine($"*       Davies Bouldin Index is: {metrics.DaviesBouldinIndex}");
        Console.WriteLine($"*************************************************");
    }    
    
    public static void PeekDataViewInConsole(MLContext mlContext, IDataView dataView, IEstimator<ITransformer> pipeline, int numberOfRows = 4)
    {
        string msg = string.Format("Peek data in DataView: Showing {0} rows with the columns", numberOfRows.ToString());
        ConsoleWriteHeader(msg);

        //https://github.com/dotnet/machinelearning/blob/master/docs/code/MlNetCookBook.md#how-do-i-look-at-the-intermediate-data
        var transformer = pipeline.Fit(dataView);
        var transformedData = transformer.Transform(dataView);

        // 'transformedData' is a 'promise' of data, lazy-loading. call Preview  
        //and iterate through the returned collection from preview.

        var preViewTransformedData = transformedData.Preview(maxRows: numberOfRows);

        foreach (var row in preViewTransformedData.RowView)
        {
            var ColumnCollection = row.Values;
            string lineToPrint = "Row--> ";
            foreach (KeyValuePair<string, object> column in ColumnCollection)
            {
                lineToPrint += $"| {column.Key}:{column.Value}";
            }
            Console.WriteLine(lineToPrint + "\n");
        }
    }
    
    public static void PeekVectorColumnDataInConsole(MLContext mlContext, string columnName, IDataView dataView, IEstimator<ITransformer> pipeline, int numberOfRows = 4)
    {
        string msg = string.Format("Peek data in DataView: : Show {0} rows with just the '{1}' column", numberOfRows, columnName );
        ConsoleWriteHeader(msg);

        var transformer = pipeline.Fit(dataView);
        var transformedData = transformer.Transform(dataView);

        // Extract the 'Features' column.
        var someColumnData = transformedData.GetColumn<float[]>(columnName)
                                                    .Take(numberOfRows).ToList();

        // print to console the peeked rows

        int currentRow = 0;
        someColumnData.ForEach(row => {
                                        currentRow++;
                                        String concatColumn = String.Empty;
                                        foreach (float f in row)
                                        {
                                            concatColumn += f.ToString();                                              
                                        }

                                        Console.WriteLine();
                                        string rowMsg = string.Format("**** Row {0} with '{1}' field value ****", currentRow, columnName);
                                        Console.WriteLine(rowMsg);
                                        Console.WriteLine(concatColumn);
                                        Console.WriteLine();
                                      });
    }
    
    public static void ConsoleWriteHeader(params string[] lines)
    {
        var defaultColor = Console.ForegroundColor;
        Console.ForegroundColor = ConsoleColor.Yellow;
        Console.WriteLine(" ");
        foreach (var line in lines)
        {
            Console.WriteLine(line);
        }
        var maxLength = lines.Select(x => x.Length).Max();
        Console.WriteLine(new string('#', maxLength));
        Console.ForegroundColor = defaultColor;
    }

    public static void ConsoleWriterSection(params string[] lines)
    {
        var defaultColor = Console.ForegroundColor;
        Console.ForegroundColor = ConsoleColor.Blue;
        Console.WriteLine(" ");
        foreach (var line in lines)
        {
            Console.WriteLine(line);
        }
        var maxLength = lines.Select(x => x.Length).Max();
        Console.WriteLine(new string('-', maxLength));
        Console.ForegroundColor = defaultColor;
    }
    
}

## Methods 

In [14]:
public static void PrepDatasets(MLContext mlContext, string fullDataSetFilePath, string trainDataSetFilePath, string testDataSetFilePath)
{
    // Only prep-datasets if train and test datasets don't exist yet
    if (!File.Exists(trainDataSetFilePath) &&
        !File.Exists(testDataSetFilePath))
    {
        Console.WriteLine("===== Preparing train/test datasets =====");

        // Load the original single dataset
        IDataView originalFullData = mlContext.Data.LoadFromTextFile<TransactionObservation>(fullDataSetFilePath, separatorChar: ',', hasHeader: true);

        // Split the data 80:20 into train and test sets, train and evaluate.
        TrainTestData trainTestData = mlContext.Data.TrainTestSplit(originalFullData, testFraction: 0.2, seed: 1);

        // 80% of original dataset
        IDataView trainData = trainTestData.TrainSet;

        // 20% of original dataset
        IDataView testData = trainTestData.TestSet;

        // Inspect TestDataView to make sure there are true and false observations in test dataset, after spliting 
        InspectData(mlContext, testData, 4);

        // Save train split
        using (var fileStream = File.Create(trainDataSetFilePath))
        {
            mlContext.Data.SaveAsText(trainData, fileStream, separatorChar: ',', headerRow: true, schema: true);
        }

        // Save test split 
        using (var fileStream = File.Create(testDataSetFilePath))
        {
            mlContext.Data.SaveAsText(testData, fileStream, separatorChar: ',', headerRow: true, schema: true);
        }
    }
}


public static ITransformer TrainModel(MLContext mlContext, IDataView trainDataView)
{

    // Get all the feature column names (All except the Label and the IdPreservationColumn)
    string[] featureColumnNames = trainDataView.Schema.AsQueryable()
        .Select(column => column.Name)                               // Get all the column names
        .Where(name => name != nameof(TransactionObservation.Label)) // Do not include the Label column
        .Where(name => name != "IdPreservationColumn")               // Do not include the IdPreservationColumn/StratificationColumn
        .Where(name => name != nameof(TransactionObservation.Time))  // Do not include the Time column. Not needed as feature column
     .ToArray();


    // Create the data process pipeline
    IEstimator<ITransformer> dataProcessPipeline = mlContext.Transforms.Concatenate("Features", featureColumnNames)
                                                                       .Append(mlContext.Transforms.DropColumns(new string[] { nameof(TransactionObservation.Time) }))
                                                                       .Append(mlContext.Transforms.NormalizeLpNorm(outputColumnName: "NormalizedFeatures", inputColumnName: "Features"));

    // In Anomaly Detection, the learner assumes all training examples have label 0, as it only learns from normal examples.
    // If any of the training examples has label 1, it is recommended to use a Filter transform to filter them out before training:
    IDataView normalTrainDataView = mlContext.Data.FilterRowsByColumn(trainDataView, columnName: nameof(TransactionObservation.Label), lowerBound: 0, upperBound: 1);


    // (OPTIONAL) Peek data (such as 2 records) in training DataView after applying the ProcessPipeline's transformations into "Features" 
    ConsoleHelper.PeekDataViewInConsole(mlContext, normalTrainDataView, dataProcessPipeline, 2);
    ConsoleHelper.PeekVectorColumnDataInConsole(mlContext, "NormalizedFeatures", normalTrainDataView, dataProcessPipeline, 2);


    var options = new RandomizedPcaTrainer.Options
    {
        FeatureColumnName = "NormalizedFeatures",   // The name of the feature column. The column data must be a known-sized vector of Single.
        ExampleWeightColumnName = null,				// The name of the example weight column (optional). To use the weight column, the column data must be of type Single.
        Rank = 28,									// The number of components in the PCA.
        Oversampling = 20,							// Oversampling parameter for randomized PCA training.
        EnsureZeroMean = true,						// If enabled, data is centered to be zero mean.
        Seed = 1									// The seed for random number generation.
    };


	// Create an anomaly detector. Its underlying algorithm is randomized PCA.
	IEstimator<ITransformer> trainer = mlContext.AnomalyDetection.Trainers.RandomizedPca(options: options);

	EstimatorChain<ITransformer> trainingPipeline = dataProcessPipeline.Append(trainer);

    ConsoleHelper.ConsoleWriteHeader("=============== Training model ===============");

	TransformerChain<ITransformer> model = trainingPipeline.Fit(normalTrainDataView);

    ConsoleHelper.ConsoleWriteHeader("=============== End of training process ===============");

    return model;
}

private static void EvaluateModel(MLContext mlContext, ITransformer model, IDataView testDataView)
{
    // Evaluate the model and show accuracy stats
    Console.WriteLine("===== Evaluating Model's accuracy with Test data =====");

    var predictions = model.Transform(testDataView);

    AnomalyDetectionMetrics metrics = mlContext.AnomalyDetection.Evaluate(predictions);

    ConsoleHelper.PrintAnomalyDetectionMetrics("RandomizedPca", metrics);
}

public static void InspectData(MLContext mlContext, IDataView data, int records)
{
    //We want to make sure we have True and False observations
    Console.WriteLine("Show 4 fraud transactions (true)");
    ShowObservationsFilteredByLabel(mlContext, data, label: true, count: records);

    Console.WriteLine("Show 4 NOT-fraud transactions (false)");
    ShowObservationsFilteredByLabel(mlContext, data, label: false, count: records);
}

public static void ShowObservationsFilteredByLabel(MLContext mlContext, IDataView dataView, bool label = true, int count = 2)
{
    // Convert to an enumerable of user-defined type. 
    var data = mlContext.Data.CreateEnumerable<TransactionObservation>(dataView, reuseRowObject: false)
                                    .Where(x => Math.Abs(x.Label - (label ? 1 : 0)) < float.Epsilon)
                                    // Take a couple values as an array.
                                    .Take(count)
                                    .ToList();

    // Print to console
    data.ForEach(row => { row.PrintToConsole(); });
}

public static void UnZipDataSet(string zipDataSet, string destinationFile)
{
    if (!File.Exists(destinationFile))
    {
        var destinationDirectory = Path.GetDirectoryName(destinationFile);
        ZipFile.ExtractToDirectory(zipDataSet, $"{destinationDirectory}");
    }
}

private static void SaveModel(MLContext mlContext, ITransformer model, string modelFilePath, DataViewSchema trainingDataSchema)
{
    mlContext.Model.Save(model,trainingDataSchema, modelFilePath);

    Console.WriteLine("Saved model to " + modelFilePath);
}

public static void CopyModelAndDatasetFromTrainingProject(string trainOutput, string assetsPath)
{
     if (!File.Exists(Path.Combine(trainOutput, "testData.csv")) ||
         !File.Exists(Path.Combine(trainOutput, "randomizedPca.zip")))
     {
         Console.WriteLine("***** YOU NEED TO RUN THE TRAINING PROJECT IN THE FIRST PLACE *****");
         Console.WriteLine("=============== Continue ===============");
         Environment.Exit(0);
     }

     // copy files from train output
     foreach (var file in Directory.GetFiles(trainOutput))
     {

         //Console.WriteLine(Path.Combine(Path.Combine(Environment.CurrentDirectory, "datasets\\CreditCardFraudDetection\\input"), Path.GetFileName(file)));
                       
         var fileDestination = Path.Combine(Path.Combine(Environment.CurrentDirectory, "datasets\\CreditCardFraudDetection\\input"), Path.GetFileName(file));
         if (File.Exists(fileDestination))
         {
             //LocalConsoleHelper.DeleteAssets(fileDestination);
             File.Delete(fileDestination);
         }

         File.Copy(file, Path.Combine(Path.Combine(Environment.CurrentDirectory, "datasets\\CreditCardFraudDetection\\input"), Path.GetFileName(file)));
     }
}


Unhandled exception: (100,19): error CS0117: '"ConsoleHelper" não contém uma definição para "PrintAnomalyDetectionMetrics"
(117,58): error CS0019: O operador "-" não pode ser aplicado a operandos dos tipos "bool" e "int"

### Trainer

In [15]:
 // Unzip the original dataset as it is too large for GitHub repo if not zipped
UnZipDataSet(zipDataSet, fullDataSetFilePath);

// Create a common ML.NET context.
// Seed set to any number so you have a deterministic environment for repeateable results
MLContext mlContext = new MLContext(seed: 1);

// Prepare data and create Train/Test split datasets
PrepDatasets(mlContext, fullDataSetFilePath, trainDataSetFilePath, testDataSetFilePath);

// Load Datasets
IDataView trainingDataView = mlContext.Data.LoadFromTextFile<TransactionObservation>(trainDataSetFilePath, separatorChar: ',', hasHeader: true);
IDataView testDataView = mlContext.Data.LoadFromTextFile<TransactionObservation>(testDataSetFilePath, separatorChar: ',', hasHeader: true);

// Train Model
ITransformer model = TrainModel(mlContext, trainingDataView);

// Evaluate quality of Model
EvaluateModel(mlContext, model, testDataView);

// Save model
SaveModel(mlContext, model, modelFilePath, trainingDataView.Schema);

Console.WriteLine("=============== Press any key ===============");

Unhandled exception: (2,1): error CS0103: O nome "UnZipDataSet" não existe no contexto atual
(9,1): error CS0103: O nome "PrepDatasets" não existe no contexto atual
(16,22): error CS0103: O nome "TrainModel" não existe no contexto atual
(19,1): error CS0103: O nome "EvaluateModel" não existe no contexto atual
(22,1): error CS0103: O nome "SaveModel" não existe no contexto atual

## Predictor

In [9]:
CopyModelAndDatasetFromTrainingProject(trainOutput, assetsPath);

var inputDatasetForPredictions = Path.Combine(assetsPath,"input", "testData.csv");
var modelFilePath = Path.Combine(assetsPath, "input", "fastTree.zip");

// Create model predictor to perform a few predictions
var modelPredictor = new Predictor(modelFilePath,inputDatasetForPredictions);

modelPredictor.RunMultiplePredictions(numberOfPredictions:5);

Console.WriteLine("=============== The End ===============");

Predictions from saved model:

 
 Test 5 transactions, from the test datasource, that should be predicted as fraud (true):
--- Transaction ---
Label: True
Features: [V1] 0,008430365 [V2] 4,137837 [V3] -6,2406964 ... [V28] 0,51357377 Amount: 1
Predicted Label: True
Probability: 0,98571074  (10,5846405)
-------------------
--- Transaction ---
Label: True
Features: [V1] 0,026779227 [V2] 4,132464 [V3] -6,5606 ... [V28] 0,4966991 Amount: 1
Predicted Label: True
Probability: 0,99932486  (18,249823)
-------------------
--- Transaction ---
Label: True
Features: [V1] 1,0238739 [V2] 2,0014853 [V3] -4,769752 ... [V28] 0,020205539 Amount: 1
Predicted Label: True
Probability: 0,99317914  (12,452319)
-------------------
--- Transaction ---
Label: True
Features: [V1] -2,5358522 [V2] 5,793644 [V3] -7,618463 ... [V28] 0,30820465 Amount: 1
Predicted Label: True
Probability: 0,99453425  (13,009439)
-------------------
--- Transaction ---
Label: True
Features: [V1] 0,3782745 [V2] 3,9147968 [V3] -5,726872 