# ML.Net - Samples - Anomaly Detection Sales 

## Spike Detection and Change Point Detection of Product sales

| ML.NET version | API type          | Status                        | App Type    | Data type | Scenario            | ML Task                   | Algorithms                  |
|----------------|-------------------|-------------------------------|-------------|-----------|---------------------|---------------------------|-----------------------------|
| v1.5         | Dynamic API | Up-to-date | Jupyter Notebook | .csv files | Product Sales Spike Detection| Time Series - Anomaly Detection | IID Spike Detection and IID Change point Detection |

In this introductory sample, you'll see how to use [ML.NET](https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet) to detect **spikes** and **change points** in Product sales. In the world of machine learning, this type of task is called TimeSeries Anomaly Detection.

In this sample, you'll see how to use [ML.NET](https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet) to predict whether a text message is spam. In the world of machine learning, this type of prediction is known as **binary classification**.

## Problem

We are having data on Product sales over 3 months period in which the sales are high and normal. we identify sudden spikes in Product sales so that we can use this spiked data to analyze trends in sales of Product. 

To solve this problem, we will build an ML model that takes as inputs: 
* Date-Month
* ProductSales over 3 months period

and predicts the spikes and changepoints in Product sales.

## Dataset

We have created sample dataset for Product sales. The dataset `product-sales.csv` can be found [here](./SpikeDetection/Data/product-sales.csv)

Format of **Product Sales DataSet** looks like below.

| Month  | ProductSales |
|--------|--------------|
| 1-Jan  | 271          |
| 2-Jan  | 150.9        |
| .....  | .....        |
| 1-Feb  | 199.3        |
| ...    | ....         |

The data format in Product Sales dataset is referenced from **shampoo-sales dataset** and the license for shampoo-sales dataset is available [here](./SpikeDetection/Data/SHAMPOO-SALES-LICENSE.txt).


The algorithms **IID Spike Detection** or **IID Change point Detection** are suited for dataset that is **independent and identically distributed**. In probability theory and statistics, a collection of random variables is independent and identically distributed(IID) if each random variable has the same probability distribution as the others and all are mutually independent. More information is available on wikipedia [here](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables)

## ML task - Time Series Anomaly Detection

Anomaly detection is the process of detecting outliers in the data.Anomaly detection in time-series refers to detecting time stamps, or points on a given input time-series, at which the time-series behaves differently from what was expected. These deviations are typically indicative of some events of interest in the problem domain: a cyber-attack on user accounts, power outage, bursting RPS on a server, memory leak, etc.

## Spike Detection

Spikes are attributed to sudden yet temporary bursts in the values of the input time-series.  In practice, they can happen due to a variety of reasons depending on the application: outages, cyber-attacks, viral web content, etc. Therefore, in many applications, it is important to detect spikes.

![spikeDetection](../shared_content/SpikeDetection.png)

## Change point Detection

Change points mark the beginning of more persistent deviations in the behavior of time-series from what was expected.In practice, these type of changes in the behavior of time-series are usually triggered by some fundamental changes in the dynamics of the system. For example, in system telemetry monitoring, an introduction of a memory leak can cause a (slow) trend in the time-series of memory usage after certain point in time. 

![ChangepointDetection](../shared_content/ChangePointDetection.png)


## Solution

To solve this problem, you build and train an ML model on existing training data, evaluate how good it is (analyzing the obtained metrics), and lastly you can consume/test the model to predict the demand given input data variables.

![Build -> Train -> Evaluate -> Consume](../shared_content/modelpipeline.png)

However, in this example we will build and train the model to demonstrate the Time Series anomaly detection library since it detects on actual data and does not have an evaluate method.  We will then review the detected anomalies in the Prediction output column.

The process of building and training models is the same for spike detection and change point detection; the main difference is the algorithm that you use (DetectIidSpike vs. DetectIidChangePoint).

In [1]:
// ML.NET Nuget packages installation 
#r "nuget:Microsoft.ML" 
#r "nuget:Microsoft.ML.TimeSeries" 

Installed package Microsoft.ML.TimeSeries version 1.5.0

Installed package Microsoft.ML version 1.5.0

## Using C# Class

In [12]:
using System;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Net;
using Microsoft.ML;
using Microsoft.ML.Data;
using System.Collections.Generic;
using static Microsoft.ML.TrainCatalogBase;
using static Microsoft.ML.DataOperationsCatalog;
using System.Diagnostics;

## Declare data-classes for input data and predictions

In [13]:
public class ProductSalesData
{
    [LoadColumn(0)]
    public string Month;

    [LoadColumn(1)]
    public float numSales;
}

public class ProductSalesPrediction
{
    //vector to hold alert,score,p-value values
    [VectorType(3)]
    public double[] Prediction { get; set; }
}

### Constants

In [14]:
private static string DatasetPath = @"./datasets/AnomalyDetection_Sales/product-sales.csv";
private static string ModelPath =  @"./datasets/AnomalyDetection_Sales/MLModels/ProductSalesModel.zip";
private static MLContext mlContext;

### Methods

In [15]:
static void DetectSpike(int size,IDataView dataView)
{
   Console.WriteLine("===============Detect temporary changes in pattern===============");

    //STEP 1: Create Esimtator   
    var estimator = mlContext.Transforms.DetectIidSpike(outputColumnName: nameof(ProductSalesPrediction.Prediction), inputColumnName: nameof(ProductSalesData.numSales),confidence: 95, pvalueHistoryLength: size / 4);

    //STEP 2:The Transformed Model.
    //In IID Spike detection, we don't need to do training, we just need to do transformation. 
    //As you are not training the model, there is no need to load IDataView with real data, you just need schema of data.
    //So create empty data view and pass to Fit() method. 
    ITransformer tansformedModel = estimator.Fit(CreateEmptyDataView());

    //STEP 3: Use/test model
    //Apply data transformation to create predictions.
    IDataView transformedData = tansformedModel.Transform(dataView);
    var predictions = mlContext.Data.CreateEnumerable<ProductSalesPrediction>(transformedData, reuseRowObject: false);
              
    Console.WriteLine("Alert\tScore\tP-Value");
    foreach (var p in predictions)
    {
        if (p.Prediction[0] == 1)
        {
            Console.BackgroundColor = ConsoleColor.DarkYellow;
            Console.ForegroundColor = ConsoleColor.Black;
        }
        Console.WriteLine("{0}\t{1:0.00}\t{2:0.00}", p.Prediction[0], p.Prediction[1], p.Prediction[2]);
        Console.ResetColor();
    }
    Console.WriteLine("");
}

static void DetectChangepoint(int size, IDataView dataView)
{
  Console.WriteLine("===============Detect Persistent changes in pattern===============");

  //STEP 1: Setup transformations using DetectIidChangePoint
  var estimator = mlContext.Transforms.DetectIidChangePoint(outputColumnName: nameof(ProductSalesPrediction.Prediction), inputColumnName: nameof(ProductSalesData.numSales), confidence: 95, changeHistoryLength: size / 4);

  //STEP 2:The Transformed Model.
  //In IID Change point detection, we don't need need to do training, we just need to do transformation. 
  //As you are not training the model, there is no need to load IDataView with real data, you just need schema of data.
  //So create empty data view and pass to Fit() method. 
  ITransformer tansformedModel = estimator.Fit(CreateEmptyDataView());

  //STEP 3: Use/test model
  //Apply data transformation to create predictions.
  IDataView transformedData = tansformedModel.Transform(dataView);
  var predictions = mlContext.Data.CreateEnumerable<ProductSalesPrediction>(transformedData, reuseRowObject: false);
               
  Console.WriteLine($"{nameof(ProductSalesPrediction.Prediction)} column obtained post-transformation.");
  Console.WriteLine("Alert\tScore\tP-Value\tMartingale value");
    
  foreach(var p in predictions)
  {
     if (p.Prediction[0] == 1)
     {
         Console.WriteLine("{0}\t{1:0.00}\t{2:0.00}\t{3:0.00}  <-- alert is on, predicted changepoint", p.Prediction[0], p.Prediction[1], p.Prediction[2], p.Prediction[3]);
     }
     else
     { 
         Console.WriteLine("{0}\t{1:0.00}\t{2:0.00}\t{3:0.00}",  p.Prediction[0], p.Prediction[1], p.Prediction[2], p.Prediction[3]);                  
     }            
  }
  Console.WriteLine("");
}

private static IDataView CreateEmptyDataView()
{
    //Create empty DataView. We just need the schema to call fit()
    IEnumerable<ProductSalesData> enumerableData = new List<ProductSalesData>();
    var dv = mlContext.Data.LoadFromEnumerable(enumerableData);
    return dv;
}

## Evaluate

In [16]:
 // Create MLContext to be shared across the model creation workflow objects 
mlContext = new MLContext();

//assign the Number of records in dataset file to cosntant variable
const int size = 36;

//Load the data into IDataView.
//This dataset is used while prediction/detecting spikes or changes.
IDataView dataView = mlContext.Data.LoadFromTextFile<ProductSalesData>(path: DatasetPath, hasHeader: true, separatorChar: ',');

//To detech temporay changes in the pattern
DetectSpike(size,dataView);

//To detect persistent change in the pattern
DetectChangepoint(size, dataView);

Console.WriteLine("=============== End of process ===============");

Alert	Score	P-Value
0	271,00	0,50
0	150,90	0,00
0	188,10	0,41
0	124,30	0,13
0	185,30	0,47
0	173,50	0,47
0	236,80	0,19
0	229,50	0,27
0	197,80	0,48
0	127,90	0,13
1	341,50	0,00
0	190,90	0,48
0	199,30	0,48
0	154,50	0,24
0	215,10	0,42
0	278,30	0,19
0	196,40	0,43
0	292,00	0,17
0	231,00	0,45
0	308,60	0,18
0	294,90	0,19
1	426,60	0,00
0	269,50	0,47
0	347,30	0,21
0	344,70	0,27
0	445,40	0,06
0	320,90	0,49
0	444,30	0,12
0	406,30	0,29
0	442,40	0,21
1	580,50	0,00
0	412,60	0,45
1	687,00	0,01
0	480,30	0,40
0	586,30	0,20
0	651,90	0,14

Prediction column obtained post-transformation.
Alert	Score	P-Value	Martingale value
0	271,00	0,50	0,00
0	150,90	0,00	2,33
0	188,10	0,41	2,80
0	124,30	0,13	9,16
0	185,30	0,47	9,77
0	173,50	0,47	10,41
0	236,80	0,19	24,46
0	229,50	0,27	42,38
1	197,80	0,48	44,23  <-- alert is on, predicted changepoint
0	127,90	0,13	145,25
0	341,50	0,00	0,01
0	190,90	0,48	0,01
0	199,30	0,48	0,00
0	154,50	0,24	0,00
0	215,10	0,42	0,00
0	278,30	0,19	0,00
0	196,40	0,43	0,00
0	292,00	0,17	0,01
0	