## K-means Clustering in C# using OpenAI
We use a simple k-means algorithm to demonstrate how clustering can be done. Clustering can help discover valuable, hidden groupings within the data. The dataset is created in the [Get_embeddings_from_dataset Notebook](Get_embeddings_from_dataset.ipynb) Notebook.

In [7]:
#r "nuget: Azure.AI.OpenAI, *-*"

## Run this cell, it will prompt you for the apiKey, endPoint, and chatDeployment

In [8]:
var azureOpenAIKey = await Kernel.GetPasswordAsync("Provide your OPEN_AI_KEY");

// Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
var azureOpenAIEndpoint = await Kernel.GetInputAsync("Provide the OPEN_AI_ENDPOINT");

// Enter the deployment name you chose when you deployed the model.
var chatDeployment = await Kernel.GetInputAsync("Provide chat deployment name");

In [9]:
using Azure;
using Azure.AI.OpenAI;

In [10]:
OpenAIClient client = new (new Uri(azureOpenAIEndpoint), new AzureKeyCredential(azureOpenAIKey.GetClearTextPassword()));

In [1]:
#r "nuget:Microsoft.DotNet.Interactive.AIUtilities, 1.0.0-beta.23517.1"

In [2]:
#r "nuget: Microsoft.ML,  3.0.0-preview.23511.1"

In [4]:
using Microsoft.DotNet.Interactive;
using Microsoft.DotNet.Interactive.AIUtilities;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;

In [12]:
public class DataRow{
    public string ProducIt {get;set;} 
    public string UserId {get;set;} 
    public int Score {get;set;} 
    public string Summary {get;set;} 
    public string Text {get;set;} 
    public int TokenCount {get;set;} 
    [VectorType(1536)]
    public float[] Embedding {get;set;} 
};

In [13]:
using System.Text.Json;
using System.Text.Json.Serialization;
using System.IO;

var filePath = Path.Combine("..","..","..","Data","fine_food_reviews_with_embeddings_1k.json");

var foodReviewsData = JsonSerializer.Deserialize<DataRow[]>(File.ReadAllText(filePath));

### 1. Find the clusters using K-means
We show the simplest use of K-means. You can pick the number of clusters that fits your use case best.

First, a new instance of the `MLContext` class is created. 

Next, the `LoadFromEnumerable` method of the `Data` property of the `context` object is called to load the `foodReviewsData` into an `IDataView` object, which is a flexible, efficient way of describing tabular data (numeric and text).

A pipeline is then defined using the `Clustering.Trainers.KMeans` method of the `context` object. This method creates a new K-Means clustering trainer. The first argument is the name of the feature column (in this case, "Embedding"), and the `numberOfClusters` parameter is set to 4, indicating that the algorithm should group the data into 4 clusters.

The `Fit` method is then called on the pipeline, passing in the `idv` object. This trains the model on the loaded data and returns the trained model.

The `Transform` method is then called on the `model` object, passing in the `idv` object. This applies the trained model to the loaded data, assigning each data point to a cluster.

Finally, the `GetClusterCentroids` method is called on the `Model` property of the `model` object. This method retrieves the centroids of the clusters identified by the model. The centroids are stored in the `centroids` variable.


In [15]:
var context = new MLContext();
var idv = context.Data.LoadFromEnumerable(foodReviewsData);
var pipeline =  context.Clustering.Trainers.KMeans("Embedding", numberOfClusters: 4);
var model = pipeline.Fit(idv);
var clusteredData = model.Transform(idv);

VBuffer<float>[] centroids = default;
model.Model.GetClusterCentroids(ref centroids, out var _);

### 2. Text samples in the clusters & naming the clusters
Let's show samples from each cluster. We'll use GPT to name the clusters, based on a random sample of 5 reviews from that cluster.
Iterating over the clusters' centroids we find the most relevant reviewes using `CosineSimilarityComparer`. The we randomly pick 5 for each cluster.

In [25]:
var rnd = new Random(42);

var examples = centroids.Select(c => {
    var embedding = c.GetValues().ToArray();
    var reviews = new LinkedList<DataRow>(foodReviewsData
        .ScoreBySimilarityTo(embedding, new CosineSimilarityComparer<float[]>(v => v), r => r.Embedding )
        .OrderByDescending(e => e.Value)
        .Select(e => e.Key)
        .Take(200));

    var samples = new List<DataRow>();
    for(var i = 0; i < 5; i++)
    {
        var sample = reviews.ElementAt(rnd.Next(reviews.Count));
        samples.Add(sample);
        reviews.Remove(sample);
    }
    return new {
            CenstroidEmbedding = embedding,
            Reviews = samples
            };
    }
).ToArray();

Using the 5 random samples of each cluster we ask GPT for the common theme

In [26]:
foreach (var example in examples)
{
    var prompt =
$"""
What do the following customer reviews have in common?
Customer reviews:
{string.Join("\n", example.Reviews.Select(r => $"{r.Score}, {r.Summary}: {r.Text}"))}
Theme:
""";
    var options= new ChatCompletionsOptions{
        Messages ={ new ChatMessage(ChatRole.User, prompt)},
        Temperature = 0f,
    };

    var response = await client.GetChatCompletionsAsync(chatDeployment ,options);
    var theme = response.Value.Choices.FirstOrDefault()?.Message?.Content;
    var text = new StringBuilder($"Cluster theme : {theme}");
    foreach (var review in example.Reviews)
    {
        text.AppendLine();
        text.AppendLine($"{review.Score}, {review.Summary}: {review.Text}");
    }
    text.ToString().Display();
}

Cluster theme : The common theme among these customer reviews is that they are all reviewing different beverages (tea, coffee, cappuccino) and expressing their opinions and experiences with them.
5, Hoping it's all it promises to be!: The tea leaves arrived in perfect condition with clear instructions for use.  I've made and drank two batches of the tea ( each makes four 8-oz cups) and I assume all the health-giving properties are doing their thing!  The tea itself is bland and pretty tasteless.  I drink it icy cold and it's like drinking water - no better, no worse.  People can add sweetener and drink it hot or warm also, but I just drink it in place of one of my usual water beverages because that is what I prefer.  It's a health drink and I can't imagine anyone disliking it, and that's the main thing.  You don't have to hold your nose and choke it down.  It's fine.

5, Enjoyable, quick cups of coffee with no waste: My mother loves this coffee and the pods fit her coffee maker. It 

Cluster theme : The common theme among these customer reviews is that they all discuss the positive aspects of the products they purchased.
5, Great product, but too much of it.: This pearls were exactly what I was looking for and made delicious tapioca pudding. My only complaint is that you have to order so many boxes. What am I going to do with all that tapioca?? Two or three would've lasted me a long time.

5, Found what I been looking for: I have found what I was looking for in a popcorn seasoning. I have been looking for at least three years. I plan to order more soon.

4, Not bad.: These are small and very salty. The taste is good, but very strong, so it's a good thing the package contains a small amount. It only takes a few little crisps to cure my salty/crunchy craving. I can snack on one package for an entire day. Of course these would not be a good snack if you're very hungry, because there isn't enough there to fill you up. For less than $1 per pack it's an okay deal.


Cluster theme : The common theme among these customer reviews is that the products mentioned (Good Girl Treats, liver treats, cat food, Nylabone chew-toy stick, Greenies) are highly praised and loved by the customers' pets. The customers also mention positive effects on their pets' behavior, health, and overall well-being. Additionally, some customers mention the difficulty of finding the products in stores and express gratitude for being able to purchase them online.
5, Good Girl Treats (GGTs): I purchased these treats on a whim for my dog a couple of years ago to assist with training.  She loved them and is well behaved! The grandkids sing a song and our dog will dance for a treat. My dog is eight years old and thanks to Dingo Treats, the vet says she has teeth like a two year old dog!  The only problem with the treats is that its extremely difficult to purchase in the store.  Thank goodness they are available on Amazon.

5, Dogs love it.: This is the "all gone" treat after dinner

Cluster theme : All of the customer reviews have a positive rating of 5 stars.
5, Loved these gluten free healthy bars, saved $$ ordering on Amazon: These Kind Bars are so good and healthy & gluten free.  My daughter came across them and loves them for a quick snack between her hectic schedule of classes & work. Most times she won't have time to eat a full meal and these are such a great alternative to fast food.  I will order again & this time I'll get a few for moi! Really loved the coconut too..

5, Great product, but too much of it.: This pearls were exactly what I was looking for and made delicious tapioca pudding. My only complaint is that you have to order so many boxes. What am I going to do with all that tapioca?? Two or three would've lasted me a long time.

5, Party Peanuts: Great product for the price. Mix with the Asian rice crackers for an excellent snack.  Big container lasts a long time. Only lightly slighted. Peanuts are whole and large.

5, These fresh berries 