# Anomaly Detection | Embeddings & Cosine Distance

## Embeddings

Embeddings are numerical representations of data, typically in a lower-dimensional vector space, that capture and preserve the semantic meaning of the input. By mapping words, phrases, or other forms of data to vectors, embeddings enable machines to understand and process the data based on their contextual relationships and similarities. 

In this sample [text-embedding-ada-002](https://openai.com/index/new-and-improved-embedding-model/) is used to detect anomalies in time series data in a simplified use case.

## Azure Environment

To execute the sample code Azure service specific information like endpoint, api key etc. is needed. ([Details and instructions can be found here](../../CreateEnv/CreateEnv.azcli))


## Step 1: Create OpenAIClient

The OpenAIClient from Azure.AI.OpenAI is a .NET client library that acts as the centralized point for all .NET functionality that want to interact with a deployed Azure OpenAI Large Language Model. It provides methods to access the OpenAI REST APIs for various tasks such as text completion, text embedding, and chat completion, etc.. 

In [13]:
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.17"
#r "nuget: DotNetEnv, 2.5.0"
#r "nuget: MathNet.Numerics, 5.0.0"

using Azure; 
using Azure.AI.OpenAI;
using DotNetEnv;
using System.IO;
using System.Text.Json; 

//configuration file is created during environment creation
static string _configurationFile = @"../../Configuration/application.env";
Env.Load(_configurationFile);

string openAIApiKey = Environment.GetEnvironmentVariable("AOAI_APIKEY") ?? "";
string openAIEndpoint = Environment.GetEnvironmentVariable("AOAI_ENDPOINT") ?? "";
string embeddingDeploymentName = Environment.GetEnvironmentVariable("EMBEDDING_DEPLOYMENTNAME") ?? "";

AzureKeyCredential azureKeyCredential = new AzureKeyCredential(openAIApiKey);
OpenAIClient openAIClient = new OpenAIClient(new Uri(openAIEndpoint), azureKeyCredential);


Console.WriteLine($"OpenAI Client created...");

OpenAI Client created...


## Step2: Prepare Time Series Data

### Reference Time Series Data

For this simplified sample, we are using [time series data](../../TestData/TestData_Reference.txt) from a fictitious device called Compressor1, which records a time stamp, current pressure, and current energy consumption. Over the first three seconds, both energy consumption and the pressure produced by the device increase. This is followed by a period of four data points where both energy consumption and pressure remain stable. In the final three data points, energy consumption drops to zero, and the pressure gradually decreases. This pattern reflects the operational dynamics of Compressor1 over the observed time period.

![TimeSeriesTestData](../../media/img/TestData_Reference.png)

### Time Series with Anomalies

A compressor that requires maintenance, such as cleaning of a filter, may consume the same amount of energy but fail to produce the same level of pressure. This is evident in the test data, where despite consistent energy consumption, the pressure fails to reach its highest level over four consecutive data points, indicating a drop in efficiency and the need for maintenance.

![TimeSeriesDegradation](../../media/img/TestData_Degradation.png)


The time series data from Compressor1 is quite chatty, containing numerous additional values that are not essential for the analysis. Since the primary focus is on monitoring pressure and energy consumption, the data has is streamlined by removing all extraneous values, leaving only the relevant information for embeddings. There's streamlined [reference data](../../TestData/TestData_ReferenceRaw.txt), streamlined [data with anomalies](../../TestData/TestData_DegradationRaw.txt) as described above and for double checking time series data which [**isn't equal** to the reference data but quite similar](../../TestData/TestData_OkRaw.txt). 



## Step 3: Create Embeddings



In [14]:
//Reference Data
string referenceData = File.ReadAllText("../../TestData/TestData_ReferenceRaw.txt");
EmbeddingsOptions embeddingsOptions = new EmbeddingsOptions(embeddingDeploymentName, new List<string> { referenceData });
var response = await openAIClient.GetEmbeddingsAsync( embeddingsOptions);
float[] embeddingReferenceData = response.Value.Data[0].Embedding.ToArray();

//Degradation Data
string degradationData = File.ReadAllText("../../TestData/TestData_DegradationRaw.txt");
embeddingsOptions = new EmbeddingsOptions(embeddingDeploymentName, new List<string> { degradationData });
response = await openAIClient.GetEmbeddingsAsync( embeddingsOptions);
float[] embeddingDegradationData = response.Value.Data[0].Embedding.ToArray();

//Ok'ish Data
string okData = File.ReadAllText("../../TestData/TestData_OkRaw.txt");
embeddingsOptions = new EmbeddingsOptions(embeddingDeploymentName, new List<string> { okData });
response = await openAIClient.GetEmbeddingsAsync( embeddingsOptions);
float[] embeddingOkData = response.Value.Data[0].Embedding.ToArray();

Console.WriteLine($"Embeddings created...");


Embeddings created...


## Step 4: Calculate Cosine Distance



In [15]:
using MathNet.Numerics;

float distanceDegradation = Distance.Cosine(embeddingReferenceData, embeddingDegradationData);
float distanceOk = Distance.Cosine(embeddingReferenceData, embeddingOkData);

Console.WriteLine($"Distance data with anomaly: {distanceDegradation}");
Console.WriteLine($"Distance data without anomaly: {distanceOk}");
Console.WriteLine("...");


Distance data with anomaly: 0.0023622494
Distance data without anomaly: 0.00046974447
...
