# 03 Embeddings

## Intro 

Embeddings are dense, low-dimensional vector representations of words, phrases, or facts that capture their semantic meaning. They can be used to preserve semantic content by mapping similar phrases or facts to nearby points in the embedding space. 

OpenAI offers embeddings models that can transform text into vectors, allowing to represent textual information in a numerical format. These vectors can then be used to calculate e.g. Cosine Distance between vectors to identify similar semantic meaning of different phrases.

## Step 1 - Read Environment / Create OpenAIClient instance

In [1]:
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.7"
#r "nuget: DotNetEnv, 2.5.0"

using Azure; 
using Azure.AI.OpenAI;
using DotNetEnv;
using System.IO;
using System.Text.Json; 

//configuration file is created during environment creation
//if you skipped the deployment just remove the code and provide values from your deployment
static string _configurationFile = @"../01/application.env";
Env.Load(_configurationFile);

string oAiApiKey = Environment.GetEnvironmentVariable("AOAI_APIKEY");
string oAiEndpoint = Environment.GetEnvironmentVariable("AOAI_ENDPOINT");
string embeddingDeploymentName = Environment.GetEnvironmentVariable("EMBEDDING_DEPLOYMENTNAME");

AzureKeyCredential azureKeyCredential = new AzureKeyCredential(oAiApiKey);
OpenAIClient openAIClient = new OpenAIClient(new Uri(oAiEndpoint), azureKeyCredential);

Console.WriteLine($"OpenAI Client created...");

OpenAI Client created...


## Step 2 - Define facts


In [2]:
//Some arbitrary facts
Dictionary<string, string> informations = new Dictionary<string, string>();

informations.Add(
    "ID: 1; Company Music", 
    @"Firma Musik is one of the world's leading record labels. 
      It has signed famous singers and is very profitable! 
      The flagship of Contoso Music is a group that performs under the name 'Contoso Only'!"
);
informations.Add(
    "ID: 2; Company Maritim", 
    @"Company Heavy Industry Maritime products. 
      The current bestseller is the transporter 'Contoso XL Heavy 2000'."
);
informations.Add(
    "ID: 3; Company Agriculture", 
    @"Company Agriculture is a German start-up that focuses on the production of milk and grain. 
    Since this is a start-up, no further information is available!"
);

Console.WriteLine("Information collected / created!");

Information collected / created!


## Step 3 - Calculate Embeddings

In [3]:
//Calculate Embeddings
EmbeddingsOptions embeddingsOptions; 
Dictionary<string, float[]> vectors = new Dictionary<string, float[]>(); 
foreach (var information in informations) {
    embeddingsOptions = new EmbeddingsOptions(information.Value);
    Response<Embeddings> embedding = await openAIClient.GetEmbeddingsAsync(embeddingDeploymentName, embeddingsOptions); 
    vectors.Add(information.Key, embedding.Value.Data[0].Embedding.ToArray<float>()); 
    Console.WriteLine($"Embedding for '{information.Key}' created!");
}

Embedding for 'ID: 1; Company Music' created!
Embedding for 'ID: 2; Company Maritim' created!
Embedding for 'ID: 3; Company Agriculture' created!


In [4]:
#r "nuget: MathNet.Numerics, 5.0.0"
using MathNet.Numerics;

//Perform semantic search
string query = "Who produces Container Ships?"; 
embeddingsOptions = new EmbeddingsOptions(query);
Response<Embeddings> embeddings = await openAIClient.GetEmbeddingsAsync(embeddingDeploymentName, embeddingsOptions); 
float[] searchVector = embeddings.Value.Data[0].Embedding.ToArray<float>();

foreach(var fact in vectors) {
    float distance = Distance.Cosine(searchVector, fact.Value);
    Console.WriteLine($"Cosine distance {fact.Key}: {distance}");
}

Cosine distance ID: 1; Company Music: 0.25061116
Cosine distance ID: 2; Company Maritim: 0.15340991
Cosine distance ID: 3; Company Agriculture: 0.2441843
