## K-means Clustering in C# using OpenAI
We use a simple k-means algorithm to demonstrate how clustering can be done. Clustering can help discover valuable, hidden groupings within the data. The dataset is created in the [Get_embeddings_from_dataset Notebook](Get_embeddings_from_dataset.ipynb) Notebook.

In [2]:
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.9"

In [3]:
#r "nuget:Microsoft.DotNet.Interactive.AIUtilities, 1.0.0-beta.23557.4"

In [3]:
using Microsoft.DotNet.Interactive;
using Microsoft.DotNet.Interactive.AIUtilities;
using Azure;
using Azure.AI.OpenAI;

## Run this cell, it will prompt you for the apiKey, endPoint, and chatDeployment

In [4]:
var azureOpenAIKey = await Kernel.GetPasswordAsync("Provide your OPEN_AI_KEY");

// Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
var azureOpenAIEndpoint = await Kernel.GetInputAsync("Provide the OPEN_AI_ENDPOINT");

// Enter the deployment name you chose when you deployed the model.
var chatDeployment = await Kernel.GetInputAsync("Provide chat deployment name");

In [5]:
OpenAIClient client = new (new Uri(azureOpenAIEndpoint), new AzureKeyCredential(azureOpenAIKey.GetClearTextPassword()));

In [6]:
#r "nuget: Microsoft.ML,  3.0.0-preview.23511.1"

In [7]:

using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;

In [8]:
public class DataRow{
    public string ProducIt {get;set;} 
    public string UserId {get;set;} 
    public int Score {get;set;} 
    public string Summary {get;set;} 
    public string Text {get;set;} 
    public int TokenCount {get;set;} 
    [VectorType(1536)]
    public float[] Embedding {get;set;} 
};

In [9]:
using System.Text.Json;
using System.Text.Json.Serialization;
using System.IO;

var filePath = Path.Combine("..","..","..","Data","fine_food_reviews_with_embeddings_1k.json");

var foodReviewsData = JsonSerializer.Deserialize<DataRow[]>(File.ReadAllText(filePath));

### 1. Find the clusters using K-means
We show the simplest use of K-means. You can pick the number of clusters that fits your use case best.

First, a new instance of the `MLContext` class is created. 

Next, the `LoadFromEnumerable` method of the `Data` property of the `context` object is called to load the `foodReviewsData` into an `IDataView` object, which is a flexible, efficient way of describing tabular data (numeric and text).

A pipeline is then defined using the `Clustering.Trainers.KMeans` method of the `context` object. This method creates a new K-Means clustering trainer. The first argument is the name of the feature column (in this case, "Embedding"), and the `numberOfClusters` parameter is set to 4, indicating that the algorithm should group the data into 4 clusters.

The `Fit` method is then called on the pipeline, passing in the `idv` object. This trains the model on the loaded data and returns the trained model.

The `Transform` method is then called on the `model` object, passing in the `idv` object. This applies the trained model to the loaded data, assigning each data point to a cluster.

Finally, the `GetClusterCentroids` method is called on the `Model` property of the `model` object. This method retrieves the centroids of the clusters identified by the model. The centroids are stored in the `centroids` variable.


In [10]:
var context = new MLContext();
var idv = context.Data.LoadFromEnumerable(foodReviewsData);
var pipeline =  context.Clustering.Trainers.KMeans("Embedding", numberOfClusters: 4);
var model = pipeline.Fit(idv);
var clusteredData = model.Transform(idv);

VBuffer<float>[] centroids = default;
model.Model.GetClusterCentroids(ref centroids, out var _);

### 2. Text samples in the clusters & naming the clusters
Let's show samples from each cluster. We'll use GPT to name the clusters, based on a random sample of 5 reviews from that cluster.
Iterating over the clusters' centroids we find the most relevant reviewes using `CosineSimilarityComparer`. The we randomly pick 5 for each cluster.

In [11]:
var rnd = new Random(42);

var examples = centroids.Select(c => {
    var embedding = c.GetValues().ToArray();
    var samples = foodReviewsData
        .ScoreBySimilarityTo(embedding, new CosineSimilarityComparer<float[]>(v => v), r => r.Embedding )
        .OrderByDescending(e => e.Value)
        .Select(e => e.Key)
        .Take(200)
        .Shuffle()
        .Take(5);

    return new {
            CenstroidEmbedding = embedding,
            Reviews = samples
            };
    }
).ToArray();

Using the 5 random samples of each cluster we ask GPT for the common theme

In [12]:
foreach (var example in examples)
{
    var prompt =
$"""
What do the following customer reviews have in common?
Customer reviews:
{string.Join("\n", example.Reviews.Select(r => $"{r.Score}, {r.Summary}: {r.Text}"))}
Theme:
""";
    var options= new ChatCompletionsOptions{
        Messages ={ new ChatMessage(ChatRole.User, prompt)},
        Temperature = 0f,
        DeploymentName = chatDeployment
    };

    var response = await client.GetChatCompletionsAsync(options);
    var theme = response.Value.Choices.FirstOrDefault()?.Message?.Content;
    var text = new StringBuilder($"Cluster theme : {theme}");
    foreach (var review in example.Reviews)
    {
        text.AppendLine();
        text.AppendLine($"{review.Score}, {review.Summary}: {review.Text}");
    }
    text.ToString().Display();
}

Cluster theme : The common theme among these customer reviews is that the products being reviewed are all highly enjoyable and have good flavor.
5, 5 Star Tea at a super price!: This tea is very good and perfect for the price....you can't go wrong with this product. A must buy for the tea lover.

3, Hot Apple Cider: This was very good for the most part. I like to drink it in the chilly mornings we've had lately (in Vermont). The only bad thing is sometimes if I don't drink it fast enough, I end up with the residue which isn't that good.<br />I appreciate it being sent to me in a timely manner.

2, Super healthy but tastes like ass: I have no idea what these people are thinking saying this stuff tastes great. They must work for Nutiva or have a friend who does.<br />This stuff is super healthy, I'm sure - but tastes friggin' awful. It's like cardboard pulp. I make smoothies with it which my mom said look like diarrhea.<br /><br />I'm stuck with this huge 3 lb bag - which, in its de

Cluster theme : The common theme among these customer reviews is that they are all discussing different beverages (tea and coffee) and expressing their opinions about the taste and quality of the products.
5, Great taste: This product is great.  My husband and I both love it.  It is good as a supplement, and for cooking.  It has a wonderful scent and a soft taste.  It is a very healthy product.  We are pleased.

5, got what I expected: This is a black tea with french vanilla essence, so obviously it's not going to taste like most vanilla people are used to. Black tea has a strong flavor so the vanilla doesn't stand out as much and also natural unsweetened vanilla tastes a lot different than commercial vanilla products or artificial vanilla.<br /><br />I'm guessing a lot of people are used to artificial vanilla since it's a lot sweeter. The difference between real vanilla and artificial vanilla is like the difference between a watermelon and watermelon flavor.<br /><br />I do have on

Cluster theme : The common theme among these customer reviews is that they all express positive opinions about the products being reviewed.
5, my dogs love the peanut butter!: First off, read the ingredients, no crazy words I can't pronounce, which means it's all natural! I got the peanut butter treats for my two children: husky/shepherd and a lab/shepherd mix, and they can't get enough of them. Plus, on Amazon, this is such a great value for 16oz. Beats petsmart for sure! I'll be looking for more Zuke's products now that I've discovered them. Thanks Zuke's!

5, 5 Hour Energy works very well...: Great price if you buy in bulk.  Product works very well.  No side effects.  I have not tried a flavor that I didn't like.

5, South Beach energy bars dark choc: Want a party in your mouth and a healthy bar at the same time??  My South Beach dark chocolate bars came from Amazon just as my husband was working to increase his fiber for health sake.  This dark chocolate, chewy taste is fantab

Cluster theme : The common theme among these customer reviews is positive feedback and satisfaction with the products.
5, Favorite Coffee Pods!: I had been buying Senseo coffee pods to use in my Hamilton Beach pod brewer.  The Senseo pods became impossible to find locally, so I looked online.  I drink decaf coffee, and it's often difficult to find decaf in different variaties.  When I found these pods from Baronet, I decided to take a chance on them.  They are so much better than the ones from Senseo!  The flavor is rich and the price is quite a bit less than the old pods I was buying. The Baronet pods are also stronger in flavor, which is great if you want to make a larger mug of coffee.  They come in flavors, too!  Highly recommend!!

5, Very Nice Medium Roast: This coffee has the best aroma!  Great medium taste.  I have replaced our Starbucks Pike's Peak Roast because my husband and I like the taste better.  I store it in a sealed container, don't mind that I can't store it in my