# OpenAI.Clustering with .NET

## Intro

A simplified sample to cluster data using Azure OpenAI and C#

### Step 1: Azure environment

This [Azure CLI script](../CreateEnv/CreateEnv.azcli) creates:

- an Azure Open AI instance
- deploys text-embedding-ada-002 to calculate embeddings

The script provides necessary credentials to connect to Azure OpenAI (e.g. API key and endpoint information) and stores them in environment variables.
```azurecli
$ENV:AZURE_OPENAI_ENDPOINT = $csEndpoint
$ENV:AZURE_OPENAI_API_KEY = $csApiKey
$ENV:AZURE_OPENAI_DEPLOYMENTNAME = $modelDeploymentName
``````

### Step 2: Housekeeping 

- Import nuget packages
- Define arbitrary facts 
- Create an instance of ***OpenAIClient()***

Replace `apiEndpoint`, `apiKey` and `embeddingModelDeploymentName` with values from your Azure OpenAI instance.

In [5]:
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.6"
#r "nuget: ScottPlot, 4.1.67"
#r "nuget: umap, 1.0.34015"

using Azure.AI.OpenAI;
using Azure;
using UMAP;
using ScottPlot; 
using System.Drawing;

//Define Azure OpenAI information
Uri apiEndpoint = new Uri("https://Your_Azure_OpenAI_API_endpoint");
string apiKey = "<<Your Azure OpenAI API key>>";
string embeddingModelDeploymentName = "<<your Azure OpenAI embedding deployment name>>";

AzureKeyCredential azureKeyCredential = new AzureKeyCredential(apiKey);
OpenAIClient openAIClient = new OpenAIClient(apiEndpoint, azureKeyCredential);
  
//Define arbitrary facts
string[] facts = {
    "The operating system Windows 11 was released October 5th, 2021.", 
    "Windows 10 as an operating system was introduced to the market in 2015.", 
    "Discrete manufacturing is a process that produces products in individual, separate pieces.", 
}; 

#!share --from c# facts --as facts
#!share --from c# openAIClient --as openAIClient
#!share --from c# embeddingModelDeploymentName --as embeddingModelDeploymentName


### Step 3: Create Embeddings

For each fact a vector (embedding) with 1536 dimensions is created. This is done by calling `GetEmbeddingsAsync()` on `openAIClient`

In [6]:
List<float[]> vectors = new List<float[]>();
foreach(string fact in facts) {
    EmbeddingsOptions embeddingsOptions = new EmbeddingsOptions(fact);
    var modelResponse = await openAIClient.GetEmbeddingsAsync(embeddingModelDeploymentName, embeddingsOptions);
    float[] vector = modelResponse.Value.Data[0].Embedding.ToArray<float>();
    Console.WriteLine($"Vector for fact '{fact}' created!");
    vectors.Add(vector);    
}

#!share --from c# vectors --as vectors

Vector for fact 'The operating system Windows 11 was released October 5th, 2021.' created!
Vector for fact 'Windows 10 as an operating system was introduced to the market in 2015.' created!
Vector for fact 'Discrete manufacturing is a process that produces products in individual, separate pieces.' created!


### Step 4: Reduce dimensionality

The UMAP nuget package is used to reduce the 1536 dimensions of every vector to a 2-dimensional representation.

In [7]:
//Reduce dimensionality of vectors
Umap umap = new Umap(numberOfNeighbors: 2);
var numberOfEpochs = umap.InitializeFit(vectors.ToArray());
for (int i = 0; i < numberOfEpochs; i++) {
    umap.Step();
}
float[][] embeddings = umap.GetEmbedding();
foreach (float[] reducedEmbedding in embeddings) {
    Console.WriteLine(string.Join(", ", reducedEmbedding));
}

-4.066317, 9.107991
-1.9780208, 19.144762
-0.029582918, 15.228906


### Step 5: Visualize 

The 2-dimensional vectors are visualized using the ScottPlot nuget package:

In [8]:
string diagramName = "vectors.png";

Plot plot = new Plot(400, 300);
        
foreach(float[] embedding in embeddings) {
    plot.AddPoint(embedding[0], embedding[1], Color.Black, 5, ScottPlot.MarkerShape.filledCircle); 
}

plot.SaveFig("Vectors.png");

Console.WriteLine($"Saved diagram: {diagramName}");

Saved diagram: vectors.png


![](Vectors.png)