## K-means Clustering in C# using OpenAI
We use a simple k-means algorithm to demonstrate how clustering can be done. Clustering can help discover valuable, hidden groupings within the data. The dataset is created in the [Get_embeddings_from_dataset Notebook](Get_embeddings_from_dataset.ipynb) Notebook.

In [1]:
#r "nuget: Azure.AI.OpenAI, *-*"

In [3]:
#r "nuget:Microsoft.DotNet.Interactive.AIUtilities, 1.0.0-beta.23552.1"

In [4]:
using Microsoft.DotNet.Interactive;
using Microsoft.DotNet.Interactive.AIUtilities;
using Azure;
using Azure.AI.OpenAI;

## Run this cell, it will prompt you for the apiKey, endPoint, and chatDeployment

In [14]:
var azureOpenAIKey = await Kernel.GetPasswordAsync("Provide your OPEN_AI_KEY");

// Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
var azureOpenAIEndpoint = await Kernel.GetInputAsync("Provide the OPEN_AI_ENDPOINT");

// Enter the deployment name you chose when you deployed the model.
var chatDeployment = await Kernel.GetInputAsync("Provide chat deployment name");

In [15]:
OpenAIClient client = new (new Uri(azureOpenAIEndpoint), new AzureKeyCredential(azureOpenAIKey.GetClearTextPassword()));

In [7]:
#r "nuget: Microsoft.ML,  3.0.0-preview.23511.1"

In [8]:

using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;

In [9]:
public class DataRow{
    public string ProducIt {get;set;} 
    public string UserId {get;set;} 
    public int Score {get;set;} 
    public string Summary {get;set;} 
    public string Text {get;set;} 
    public int TokenCount {get;set;} 
    [VectorType(1536)]
    public float[] Embedding {get;set;} 
};

In [10]:
using System.Text.Json;
using System.Text.Json.Serialization;
using System.IO;

var filePath = Path.Combine("..","..","..","Data","fine_food_reviews_with_embeddings_1k.json");

var foodReviewsData = JsonSerializer.Deserialize<DataRow[]>(File.ReadAllText(filePath));

### 1. Find the clusters using K-means
We show the simplest use of K-means. You can pick the number of clusters that fits your use case best.

First, a new instance of the `MLContext` class is created. 

Next, the `LoadFromEnumerable` method of the `Data` property of the `context` object is called to load the `foodReviewsData` into an `IDataView` object, which is a flexible, efficient way of describing tabular data (numeric and text).

A pipeline is then defined using the `Clustering.Trainers.KMeans` method of the `context` object. This method creates a new K-Means clustering trainer. The first argument is the name of the feature column (in this case, "Embedding"), and the `numberOfClusters` parameter is set to 4, indicating that the algorithm should group the data into 4 clusters.

The `Fit` method is then called on the pipeline, passing in the `idv` object. This trains the model on the loaded data and returns the trained model.

The `Transform` method is then called on the `model` object, passing in the `idv` object. This applies the trained model to the loaded data, assigning each data point to a cluster.

Finally, the `GetClusterCentroids` method is called on the `Model` property of the `model` object. This method retrieves the centroids of the clusters identified by the model. The centroids are stored in the `centroids` variable.


In [11]:
var context = new MLContext();
var idv = context.Data.LoadFromEnumerable(foodReviewsData);
var pipeline =  context.Clustering.Trainers.KMeans("Embedding", numberOfClusters: 4);
var model = pipeline.Fit(idv);
var clusteredData = model.Transform(idv);

VBuffer<float>[] centroids = default;
model.Model.GetClusterCentroids(ref centroids, out var _);

### 2. Text samples in the clusters & naming the clusters
Let's show samples from each cluster. We'll use GPT to name the clusters, based on a random sample of 5 reviews from that cluster.
Iterating over the clusters' centroids we find the most relevant reviewes using `CosineSimilarityComparer`. The we randomly pick 5 for each cluster.

In [12]:
var rnd = new Random(42);

var examples = centroids.Select(c => {
    var embedding = c.GetValues().ToArray();
    var samples = foodReviewsData
        .ScoreBySimilarityTo(embedding, new CosineSimilarityComparer<float[]>(v => v), r => r.Embedding )
        .OrderByDescending(e => e.Value)
        .Select(e => e.Key)
        .Take(200)
        .Shuffle()
        .Take(5);

    return new {
            CenstroidEmbedding = embedding,
            Reviews = samples
            };
    }
).ToArray();

Using the 5 random samples of each cluster we ask GPT for the common theme

In [16]:
foreach (var example in examples)
{
    var prompt =
$"""
What do the following customer reviews have in common?
Customer reviews:
{string.Join("\n", example.Reviews.Select(r => $"{r.Score}, {r.Summary}: {r.Text}"))}
Theme:
""";
    var options= new ChatCompletionsOptions{
        Messages ={ new ChatMessage(ChatRole.User, prompt)},
        Temperature = 0f,
    };

    var response = await client.GetChatCompletionsAsync(chatDeployment ,options);
    var theme = response.Value.Choices.FirstOrDefault()?.Message?.Content;
    var text = new StringBuilder($"Cluster theme : {theme}");
    foreach (var review in example.Reviews)
    {
        text.AppendLine();
        text.AppendLine($"{review.Score}, {review.Summary}: {review.Text}");
    }
    text.ToString().Display();
}

Cluster theme : The common theme among these customer reviews is that they are all discussing food or beverage products.
4, Good, but not Wolfgang Puck good: Honestly, I have to admit that I expected a little better. That's not to say that this is bad coffee - in fact it's quite bold without being too acidic, and pretty satisfying overall. I think my main problem is that Wolfgang Puck's name is attached to it, so perhaps it set my expectations a little high. I have a Wolfgang Puck knife set that I adore, and is very high quality for what I paid for it. This coffee was on sale, so it was well worth it also, I just hoped for something that would knock my socks off - which it didn't. I also purchased the Breakfast blend, and Jamaica me crazy at the same time. The breakfast blend was the best, in my opinion, and the jamaican coffee smelled the best, but was the least successful.

5, YUMMY: Very good, still have a little left. I like the salty taste (that's why I buy them)  Not too salty

Cluster theme : The common theme among these customer reviews is that they all express positive opinions and satisfaction with the products being reviewed.
4, Great flavor: These are so delicious. Great flavor for autumn or any time. We all love them. But not too sweet, way more crunchy than chewy, this makes it a bit messy. Its like a crunchy cookie. Everyone loves crunchy cookies!

5, Yum: My kids love these Earnest Eats snacks. I like that they are more nutritional and healthier that traditional snack bars in the grocery store. I also admire the use non-oil alternative products such as almond butter and non-processed sugar such as dried fruit for sweetness. I have already ordered these bars 3 times and will continue!

5, Outstanding product!.....: Great flavor.....lotsa &#34;heat&#34;....I use Cayenne almost everyday, and this organic product is so much better than the store bought brands...

5, Yum: My kids love these Earnest Eats snacks. I like that they are more nutritiona

Cluster theme : The common theme among these customer reviews is positive feedback and satisfaction with the products or services mentioned.
5, Loved these gluten free healthy bars, saved $$ ordering on Amazon: These Kind Bars are so good and healthy & gluten free.  My daughter came across them and loves them for a quick snack between her hectic schedule of classes & work. Most times she won't have time to eat a full meal and these are such a great alternative to fast food.  I will order again & this time I'll get a few for moi! Really loved the coconut too..

4, Fruitables Crunchy Dog Treats: My lab goes wild for these and I am almost tempted to have a go at some myself, as they smell so great. So far I have resisted the temptation despite the fact that they are filled with healthy ingredients.

5, Healthy Dog Food: This is a very healthy dog food. Good for their digestion. Also good for small puppies. My dog eats her required amount at every feeding.

5, Fantastic for energeti

Cluster theme : The common theme among these customer reviews is that they all express positive opinions about the products being reviewed.
5, Deeeee-lish!: For far too long I was a devotee of the Starbucks roast.  This is so much better.  Rich, tasty, strong coffee without the BITTER bite.  Lovely crema.  Works great in my automatic espresso machine, too.  Love, love. love Lavazza.

5, Love this coffee...: This is the best coffee ever! Wish I could order a box of 100 at a time as we go thru a box of 80 in about a month and a half. Buying it online is soooo much cheaper than buying at the grocery store.

5, Delicious!!!!: A coffee treat. Now that my husband and I drink this coffee, there is no going back to the plain stuff ;).

5, Jamaican Blue beans: Excellent coffee bean for roasting. Our family just purchased another 5 pounds for more roasting. Plenty of flavor and mild on acidity when roasted to a dark brown bean and before any oil appears on the bean itself (455F @ 17 minut