# 02 RAG | 02 Embeddings | 03 Cosine Similarity

## Intro

Cosine similarity is a metric used to determine the cosine of the angle between two non-zero vectors in a multi-dimensional space. It's a measure of similarity between two vectors, with a value ranging from -1 (completely dissimilar) to 1 (completely similar). 

This metric is widely used in various domains including text analysis, recommendation systems, and machine learning. The appeal of cosine similarity lies in its effectiveness, especially in high-dimensional spaces, and its independence from vector magnitude which can be particularly useful in text analysis where the length of the documents can vary significantly.

## Step 1: Load existing embeddings

Embeddings: 
- Vector created from `Path.Combine(assetsFolder, "Embedding", "WikiSuperBowl2024.txt)`, which contains information about the Super Bowl 2024 is loaded.
- Vector created from `Path.Combine(assetsFolder, "Embedding", "WikiAKS.txt)`, which contains information about Azure Kubernetes Service (AKS) is loaded.
- Vector created from the phrase `The Kansas City Chiefs won the Super Bowl 2024` is loaded
- Vector created from the question `Who won the Super Bowl 2024` is loaded

The vectors / embeddings are created in the [TextEmbeddings Notebook](./01_TextEmbeddings.ipynb)

In [1]:
#r "nuget: DotNetEnv, 2.5.0"

using DotNetEnv; 
using System.IO;

//configuration file is created during environment creation
static string configurationFile = @"../../Configuration/application.env";
Env.Load(configurationFile);

// Load embeddings
string assetsFolder = Environment.GetEnvironmentVariable("WS_ASSETS_FOLDER") ?? "WS_ASSETS_FOLDER not found";;

string wikiAKSFileName = Path.Combine(assetsFolder, "Embedding", "TextEmbedding_WikiAks.txt");
string wikiSuperBowlFileName = Path.Combine(assetsFolder, "Embedding", "TextEmbedding_WikiSuperBowl.txt");
string queryVectorFileName = Path.Combine(assetsFolder, "Embedding", "TextEmbedding_Query.txt");
string statementVectorFileName = Path.Combine(assetsFolder, "Embedding", "TextEmbedding_Statement.txt");

float[] wikiAKSVector = File.ReadAllLines(wikiAKSFileName).Select(float.Parse).ToArray();
float[] wikiSuperBowlVector = File.ReadAllLines(wikiSuperBowlFileName).Select(float.Parse).ToArray();
float[] queryVector = File.ReadAllLines(queryVectorFileName).Select(float.Parse).ToArray();
float[] statementVector = File.ReadAllLines(statementVectorFileName).Select(float.Parse).ToArray();

Console.WriteLine("Embeddings loaded successfully...");


Embeddings loaded successfully...


## Step 2: Calculate Cosine Similarity





In [2]:
#r "nuget: System.Numerics.Tensors, 9.0.0-preview.5.24306.7"

using System.Numerics.Tensors;

public float CalculateCosineSimilarity(float[] vector1, float[] vector2)
{
    ReadOnlySpan<float> span1 = new ReadOnlySpan<float>(vector1);
    ReadOnlySpan<float> span2 = new ReadOnlySpan<float>(vector2);

    return TensorPrimitives.CosineSimilarity(span1, span2);
}

float similarityWikiAKSQuery = CalculateCosineSimilarity(wikiAKSVector, queryVector);
float similarityWikiSuperBowlQuery = CalculateCosineSimilarity(wikiSuperBowlVector, queryVector);
float similarityStatementQuery = CalculateCosineSimilarity(statementVector, queryVector);

Console.WriteLine($"Similarity between WikiAKS and Query: {similarityWikiAKSQuery}");
Console.WriteLine($"Similarity between WikiSuperBowl and Query: {similarityWikiSuperBowlQuery}");
Console.WriteLine($"Similarity between Statement and Query: {similarityStatementQuery}");


Similarity between WikiAKS and Query: 0.632726
Similarity between WikiSuperBowl and Query: 0.8209593
Similarity between Statement and Query: 0.88918716


## Step 3: Understand Results

- The highest Cosine Similarity is achieved by comparing the statement *The Kansas City Chiefs won the Super Bowl 2024* with the question *Who won the Super Bowl 2024*
- The second highest Cosine Similarity is achieved by comparing the [Wikipedia information](../../assets/Embedding/WikiSuperBowl2024.txt) about the super bowl with the question *Who won the Super Bowl 2024*
- The lowest Cosine Similarity is achieved by comparing the [Azure Kubernetes Documenation](../../assets/Embedding/WikiAksVector.txt) with the question *Who won the Super Bowl 2024*

Showing how close the statement and the question are from a semantic meaning. 