# ML.Net - Samples - Power Anomaly Detection

# Spam Detection for Text Messages

| ML.NET version | API type          | Status                        | App Type    | Data type | Scenario            | ML Task                   | Algorithms                  |
|----------------|-------------------|-------------------------------|-------------|-----------|---------------------|---------------------------|-----------------------------|
| v1.4           | Dynamic API | Might need to update project structure to match template | Jupyter-Notebbok | .tsv files | Spam detection | Two-class classification | Averaged Perceptron (linear learner) |

In this sample, you'll see how to use [ML.NET](https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet) to predict whether a text message is spam. In the world of machine learning, this type of prediction is known as **binary classification**.

## Problem

Our goal here is to predict whether a text message is spam (an irrelevant/unwanted message). We will use the [SMS Spam Collection Data Set](https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection) from UCI, which contains close to 6000 messages that have been classified as being "spam" or "ham" (not spam). We will use this dataset to train a model that can take in new message and predict whether they are spam or not.

This is an example of binary classification, as we are classifying the text messages into one of two categories.


## Solution

To solve this problem, first we will build an estimator to define the ML pipeline we want to use. Then we will train this estimator on existing data, evaluate how good it is, and lastly we'll consume the model to predict whether a few examples messages are spam.

![Build -> Train -> Evaluate -> Consume](../shared_content/modelpipeline.png)

### 1. Build Model

To build the model we will:

* Define how to read the spam dataset that will be downloaded from https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection. 

* Apply several data transformations:

    * Convert the label ("spam" or "ham") to a boolean ("true" represents spam) so we can use it with a binary classifier. 
    * Featurize the text message into a numeric vector so a machine learning trainer can use it

* Add a trainer (such as `StochasticDualCoordinateAscent`).

The initial code is similar to the following:


In [3]:
// ML.NET Nuget packages installation 
#r "nuget:Microsoft.ML" 
#r "nuget:Microsoft.ML.TimeSeries" 
#r "nuget:Microsoft.ML.Mkl.Redist"

Installing package Microsoft.ML.Mkl.Redist...

## Using C# Class

In [11]:
using System;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Net;
using Microsoft.ML;
using Microsoft.ML.Data;
using System.Collections.Generic;
using static Microsoft.ML.TrainCatalogBase;
using static Microsoft.ML.DataOperationsCatalog;
using System.Diagnostics;
using Microsoft.ML.TimeSeries;
using Microsoft.ML.Transforms.TimeSeries;

## Declare data-classes for input data and predictions

In [12]:
    class MeterData
    {
        [LoadColumn(0)]
        public string name { get; set; }
        [LoadColumn(1)]
        public DateTime time { get; set; }
        [LoadColumn(2)]
        public float ConsumptionDiffNormalized { get; set; }
    }

class SpikePrediction
    {
        [VectorType(3)]
        public double[] Prediction { get; set; }
    }

### Constants

In [13]:
private static string TrainingDataPath = @"./datasets/PowerAnomalyDetection/power-export_min.csv";
private static string ModelPath =  @"./datasets/PowerAnomalyDetection/MLModels/PowerAnomalyDetectionModel.zip";

### Methods

In [14]:
public static void DetectAnomalies(MLContext mlContext,IDataView dataView)
{
    ITransformer trainedModel = mlContext.Model.Load(ModelPath, out var modelInputSchema);

    var transformedData = trainedModel.Transform(dataView);

    // Getting the data of the newly created column as an IEnumerable
    IEnumerable<SpikePrediction> predictions =
        mlContext.Data.CreateEnumerable<SpikePrediction>(transformedData, false);
    
    var colCDN = dataView.GetColumn<float>("ConsumptionDiffNormalized").ToArray();
    var colTime = dataView.GetColumn<DateTime>("time").ToArray();

    // Output the input data and predictions
    Console.WriteLine("======Displaying anomalies in the Power meter data=========");
    Console.WriteLine("Date              \tReadingDiff\tAlert\tScore\tP-Value");

    int i = 0;
    foreach (var p in predictions)
    {
        Console.WriteLine("{0}\t{1:0.0000}\t{2:0.00}\t{3:0.00}\t{4:0.00}", 
            colTime[i], colCDN[i], 
            p.Prediction[0], p.Prediction[1], p.Prediction[2]);
        i++;
    }
}

public static void BuildTrainModel(MLContext mlContext, IDataView dataView)
{
    // Configure the Estimator
    const int PValueSize = 30;
    const int SeasonalitySize = 30;
    const int TrainingSize = 90;
    const int ConfidenceInterval = 98;            

    string outputColumnName = nameof(SpikePrediction.Prediction);
    string inputColumnName = nameof(MeterData.ConsumptionDiffNormalized);

    var trainigPipeLine = mlContext.Transforms.DetectSpikeBySsa(
        outputColumnName,
        inputColumnName,
        confidence: ConfidenceInterval,
        pvalueHistoryLength: PValueSize,
        trainingWindowSize: TrainingSize,
        seasonalityWindowSize: SeasonalitySize);

    ITransformer trainedModel = trainigPipeLine.Fit(dataView);

    // STEP 6: Save/persist the trained model to a .ZIP file
    mlContext.Model.Save(trainedModel, dataView.Schema, ModelPath);

    Console.WriteLine("The model is saved to {0}", ModelPath);
    Console.WriteLine("");
}


## Evaluate

In [15]:
var mlContext = new MLContext(seed:0);

// load data
var dataView = mlContext.Data.LoadFromTextFile<MeterData>(
   TrainingDataPath,
   separatorChar: ',',
   hasHeader: true);

// transform options
BuildTrainModel(mlContext, dataView);  // using SsaSpikeEstimator

//DetectAnomalies(mlContext, dataView);

//Console.WriteLine("\nPress any key to exit");

Unhandled exception: System.TypeInitializationException: The type initializer for 'Microsoft.ML.Transforms.TimeSeries.FftUtils' threw an exception.
 ---> System.DllNotFoundException: Unable to load DLL 'MklImports' or one of its dependencies: Não foi possível encontrar o módulo especificado. (0x8007007E)
   at Microsoft.ML.Transforms.TimeSeries.FftUtils.ErrorMessage(Int32 status)
   at Microsoft.ML.Transforms.TimeSeries.FftUtils..cctor()
   --- End of inner exception stack trace ---
   at Microsoft.ML.Transforms.TimeSeries.FftUtils.ComputeForwardFft(Double[] inputRe, Double[] inputIm, Double[] outputRe, Double[] outputIm, Int32 length)
   at Microsoft.ML.Transforms.TimeSeries.TrajectoryMatrix.CacheInputSeriesFft()
   at Microsoft.ML.Transforms.TimeSeries.TrajectoryMatrix.FftMultiply(Single[] vector, Single[] result, Boolean add, Int32 srcIndex, Int32 dstIndex)
   at Microsoft.ML.Transforms.TimeSeries.TrajectoryMatrix.ComputeUnnormalizedTrajectoryCovarianceMat(Single[] cov)
   at Microsoft.ML.Transforms.TimeSeries.TrajectoryMatrix.ComputeSvd(Single[]& singularValues, Single[]& leftSingularvectors)
   at Microsoft.ML.Transforms.TimeSeries.AdaptiveSingularSpectrumSequenceModelerInternal.TrainCore(Single[] dataArray, Int32 originalSeriesLength)
   at Microsoft.ML.Transforms.TimeSeries.AdaptiveSingularSpectrumSequenceModelerInternal.Train(RoleMappedData data)
   at Microsoft.ML.Transforms.TimeSeries.SsaSpikeDetector..ctor(IHostEnvironment env, Options options, IDataView input)
   at Microsoft.ML.Transforms.TimeSeries.SsaSpikeEstimator.Fit(IDataView input)
   at Submission#24.BuildTrainModel(MLContext mlContext, IDataView dataView)
   at Submission#25.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)