# 02 RAG | 03 Embeddings | 01 Vector Distance

## Intro

Cosine similarity is a metric used to determine the cosine of the angle between two non-zero vectors in a multi-dimensional space. It's a measure of similarity between two vectors, with a value ranging from -1 (completely dissimilar) to 1 (completely similar). 

This metric is widely used in various domains including text analysis, recommendation systems, and machine learning. The appeal of cosine similarity lies in its effectiveness, especially in high-dimensional spaces, and its independence from vector magnitude which can be particularly useful in text analysis where the length of the documents can vary significantly.

## Azure Environment

To execute the sample code Azure service specific information like endpoint, api key etc. is needed ([Details and instructions can be found here](../01_DemoEnvironment/01_Environment.ipynb))

## Step 1: Create OpenAIClient

The OpenAIClient from Azure.AI.OpenAI is a .NET client library that acts as the centralized point for all .NET functionality that want to interact with a deployed Azure OpenAI Large Language Model. It provides methods to access the OpenAI REST APIs for various tasks such as text completion, text embedding, and chat completion, etc.. It also allows developers to specify the model, engine, and options for each request, such as temperature, frequency penalty, presence penalty, and stop sequences. 

The OpenAIClient can connect to any Azure OpenAI resource or to the non-Azure OpenAI inference endpoint, making it a versatile and powerful tool for .NET development with OpenAI.

In [1]:
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.16"
#r "nuget: DotNetEnv, 2.5.0"

using Azure; 
using Azure.AI.OpenAI;
using DotNetEnv;
using System.IO;

//configuration file is created during environment creation
static string _configurationFile = @"../Configuration/application.env";
Env.Load(_configurationFile);

string oAiApiKey = Environment.GetEnvironmentVariable("WS_AOAI_APIKEY") ?? "WS_AOAI_APIKEY not found";
string oAiEndpoint = Environment.GetEnvironmentVariable("WS_AOAI_ENDPOINT") ?? "WS_AOAI_ENDPOINT not found";
string embeddingDeploymentName = Environment.GetEnvironmentVariable("WS_EMBEDDING_DEPLOYMENTNAME") ?? "WS_EMBEDDING_DEPLOYMENTNAME not found";

string assetsFolder = Path.Combine(Directory.GetCurrentDirectory(), "..", "assets");

AzureKeyCredential azureKeyCredential = new AzureKeyCredential(oAiApiKey);
OpenAIClient openAIClient = new OpenAIClient(new Uri(oAiEndpoint), azureKeyCredential);

Console.WriteLine($"OpenAI Client created...");


OpenAI Client created...


## Step 2: Cosine similarity

The following method provides the Cosine distance between two given vectors.

Azure and OpenAI provide various tools and platforms that leverage similarity metrics like cosine similarity for various applications. For instance, Azure Machine Learning or [Azure AI Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) offer functionalities that allow for text analytics and building recommendation systems where cosine similarity can be employed to measure similarity between items.

Another alternative is to use ```MathNet.Numerics``` package and the Distance method. This method provides a variety of distance metrics, including cosine similarity. The following code snippet shows how to use this method to calculate cosine similarity between two vectors.


In [2]:
static double CalculateCosineDistance(double[] vectorA, double[] vectorB)
{
    if (vectorA.Length != vectorB.Length)
    {
        throw new ArgumentException("Vectors must be the same length");
    }

    double dotProduct = 0;
    double magnitudeA = 0;
    double magnitudeB = 0;

    for (int i = 0; i < vectorA.Length; i++)
    {
        dotProduct += vectorA[i] * vectorB[i];
        magnitudeA += Math.Pow(vectorA[i], 2);
        magnitudeB += Math.Pow(vectorB[i], 2);
    }

    double cosineSimilarity = dotProduct / (Math.Sqrt(magnitudeA) * Math.Sqrt(magnitudeB));
    double cosineDistance = 1 - cosineSimilarity;

    return cosineDistance;
}

## Step 3: Calculate Embeddings

Embeddings: 
- Information from `Path.Combine(assetsFolder, "Embedding", "WikiSuperBowl2024.txt)`, which contains information about the Super Bowl 2024 will be embedded.
- Information from `Path.Combine(assetsFolder, "Embedding", "WikiAKS.txt)`, which contains information about Azure Kubernetes Service (AKS) will be embedded.
- The phrase `The Kansas City Chiefs won the Super Bowl 2024` will be embedded
- The question `Who won the Super Bowl 2024` will be embedded.



In [3]:
// Vectorize input text from file (Super Bowl 2024)
string documentationPage = Path.Combine(assetsFolder,"Embedding", "WikiSuperBowl2024.txt");
string textToBeVecorized = File.ReadAllText(documentationPage);

EmbeddingsOptions embeddingsOptions = new EmbeddingsOptions(embeddingDeploymentName, new List<string> { textToBeVecorized });
var modelResponse = await openAIClient.GetEmbeddingsAsync( embeddingsOptions);
float[] vectorDocumentationSuperBowl = modelResponse.Value.Data[0].Embedding.ToArray();

Console.WriteLine($"Vector from {documentationPage} created...");

// Vectorize input text from file (Super Bowl 2024)
documentationPage = Path.Combine(assetsFolder,"Embedding", "WikiAKS.txt");
textToBeVecorized = File.ReadAllText(documentationPage);

embeddingsOptions = new EmbeddingsOptions(embeddingDeploymentName, new List<string> { textToBeVecorized });
modelResponse = await openAIClient.GetEmbeddingsAsync( embeddingsOptions);
float[] vectorDocumentationAKS = modelResponse.Value.Data[0].Embedding.ToArray();

Console.WriteLine($"Vector from {documentationPage} created...");

//Vectorize "top answer"
textToBeVecorized = "The Kansas City Chiefs won the Super Bowl 2024";
embeddingsOptions = new EmbeddingsOptions(embeddingDeploymentName, new List<string> { textToBeVecorized });
modelResponse = await openAIClient.GetEmbeddingsAsync( embeddingsOptions);
float[] vectorTopAnswer = modelResponse.Value.Data[0].Embedding.ToArray();

Console.WriteLine($"Vector from {textToBeVecorized} created...");


// Vectorize question
string question = "Who won the Super Bowl 2024?";
embeddingsOptions = new EmbeddingsOptions(embeddingDeploymentName, new List<string> { question });
modelResponse = await openAIClient.GetEmbeddingsAsync(embeddingsOptions);
float[] vectorQuery = modelResponse.Value.Data[0].Embedding.ToArray();

Console.WriteLine($"Vector from '{question}' created...");

Vector from c:\Sourcen\GitHubProjects\OpenAI.Workshop\03_Embedding\..\assets\Embedding\WikiSuperBowl2024.txt created...
Vector from c:\Sourcen\GitHubProjects\OpenAI.Workshop\03_Embedding\..\assets\Embedding\WikiAKS.txt created...
Vector from The Kansas City Chiefs won the Super Bowl 2024 created...
Vector from 'Who won the Super Bowl 2024?' created... Vector Length: 1536


## Step 4: Calculate Cosine Distance

Cosine distance for "Super Bowl Information", "AKS Information" and the phrase "The Kansas City Chiefs won the Super Bowl 2024" are calculated.

In [None]:
#r "nuget: MathNet.Numerics, 5.0.0"
using MathNet.Numerics;
float distanceSuperBowl = Distance.Cosine(vectorDocumentationSuperBowl, vectorQuery);
float distanceAKS = Distance.Cosine(vectorDocumentationAKS, vectorQuery);
float distanceTop = Distance.Cosine(vectorTopAnswer, vectorQuery);

Console.WriteLine($"Vector distance Super Bowl: {distanceSuperBowl}");
Console.WriteLine($"Vector distance AKS: {distanceAKS}");
Console.WriteLine($"Vector distance Top Answer: {distanceTop}");
