## K-means Clustering in C# using OpenAI
We use a simple k-means algorithm to demonstrate how clustering can be done. Clustering can help discover valuable, hidden groupings within the data. The dataset is created in the [Get_embeddings_from_dataset Notebook](Get_embeddings_from_dataset.ipynb) Notebook.

In [1]:
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.9"

In [2]:
#r "nuget:Microsoft.DotNet.Interactive.AIUtilities, 1.0.0-beta.23606.2"

Loading extension script from `C:\Users\dicolomb\.nuget\packages\microsoft.dotnet.interactive.aiutilities\1.0.0-beta.23606.2\interactive-extensions\dotnet\extension.dib`

In [3]:
using Microsoft.DotNet.Interactive;
using Microsoft.DotNet.Interactive.AIUtilities;
using Azure;
using Azure.AI.OpenAI;

## Run this cell, it will prompt you for the apiKey, endPoint, and chatDeployment

In [4]:
var azureOpenAIKey = await Kernel.GetPasswordAsync("Provide your OPEN_AI_KEY");

// Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
var azureOpenAIEndpoint = await Kernel.GetInputAsync("Provide the OPEN_AI_ENDPOINT");

// Enter the deployment name you chose when you deployed the model.
var chatDeployment = await Kernel.GetInputAsync("Provide chat deployment name");

In [5]:
OpenAIClient client = new (new Uri(azureOpenAIEndpoint), new AzureKeyCredential(azureOpenAIKey.GetClearTextPassword()));

In [6]:
#r "nuget: Microsoft.ML,  3.0.0-preview.23511.1"

In [7]:

using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;

In [8]:
public class DataRow{
    public string ProducIt {get;set;} 
    public string UserId {get;set;} 
    public int Score {get;set;} 
    public string Summary {get;set;} 
    public string Text {get;set;} 
    public int TokenCount {get;set;} 
    [VectorType(1536)]
    public float[] Embedding {get;set;} 
};

In [9]:
using System.Text.Json;
using System.Text.Json.Serialization;
using System.IO;

var filePath = Path.Combine("..","..","..","Data","fine_food_reviews_with_embeddings_1k.json");

var foodReviewsData = JsonSerializer.Deserialize<DataRow[]>(File.ReadAllText(filePath));

### 1. Find the clusters using K-means
We show the simplest use of K-means. You can pick the number of clusters that fits your use case best.

First, a new instance of the `MLContext` class is created. 

Next, the `LoadFromEnumerable` method of the `Data` property of the `context` object is called to load the `foodReviewsData` into an `IDataView` object, which is a flexible, efficient way of describing tabular data (numeric and text).

A pipeline is then defined using the `Clustering.Trainers.KMeans` method of the `context` object. This method creates a new K-Means clustering trainer. The first argument is the name of the feature column (in this case, "Embedding"), and the `numberOfClusters` parameter is set to 4, indicating that the algorithm should group the data into 4 clusters.

The `Fit` method is then called on the pipeline, passing in the `idv` object. This trains the model on the loaded data and returns the trained model.

The `Transform` method is then called on the `model` object, passing in the `idv` object. This applies the trained model to the loaded data, assigning each data point to a cluster.

Finally, the `GetClusterCentroids` method is called on the `Model` property of the `model` object. This method retrieves the centroids of the clusters identified by the model. The centroids are stored in the `centroids` variable.


In [10]:
var context = new MLContext();
var idv = context.Data.LoadFromEnumerable(foodReviewsData);
var pipeline =  context.Clustering.Trainers.KMeans("Embedding", numberOfClusters: 4);
var model = pipeline.Fit(idv);
var clusteredData = model.Transform(idv);

VBuffer<float>[] centroids = default;
model.Model.GetClusterCentroids(ref centroids, out var _);

### 2. Text samples in the clusters & naming the clusters
Let's show samples from each cluster. We'll use GPT to name the clusters, based on a random sample of 5 reviews from that cluster.
Iterating over the clusters' centroids we find the most relevant reviewes using `CosineSimilarityComparer`. The we randomly pick 5 for each cluster.

In [11]:
var rnd = new Random(42);

var examples = centroids.Select(c => {
    var embedding = c.GetValues().ToArray();
    var samples = foodReviewsData
        .ScoreBySimilarityTo(embedding, new CosineSimilarityComparer<float[]>(v => v), r => r.Embedding )
        .OrderByDescending(e => e.Score)
        .Select(e => e.Value)
        .Take(200)
        .Shuffle()
        .Take(5);

    return new {
            CenstroidEmbedding = embedding,
            Reviews = samples
            };
    }
).ToArray();

Using the 5 random samples of each cluster we ask GPT for the common theme

In [12]:
foreach (var example in examples)
{
    var prompt =
$"""
What do the following customer reviews have in common?
Customer reviews:
{string.Join("\n", example.Reviews.Select(r => $"{r.Score}, {r.Summary}: {r.Text}"))}
Theme:
""";
    var options= new ChatCompletionsOptions{
        Messages ={ new ChatMessage(ChatRole.User, prompt)},
        Temperature = 0f,
        DeploymentName = chatDeployment
    };

    var response = await client.GetChatCompletionsAsync(options);
    var theme = response.Value.Choices.FirstOrDefault()?.Message?.Content;
    var text = new StringBuilder($"Cluster theme : {theme}");
    foreach (var review in example.Reviews)
    {
        text.AppendLine();
        text.AppendLine($"{review.Score}, {review.Summary}: {review.Text}");
    }
    text.ToString().Display();
}

Cluster theme : The common theme in these customer reviews is that they are all positive reviews about different beverages (coffee and tea).
5, Yummy!: I love this tea.  When I made it, it was a purplish color, and kind of strong.  If you wanted it weaker you might want to not let it steep as long as I did.  You can definitely make out both the blackberry and vanilla, and they blend very nicely.  There is a slight aftertaste, but it isn't unpleasant.  I made it hot, but didn't finish it and the air conditioning was on, so it got cold.  It was quite cold when I tried it for the second time, and it tasted great!  This is the type of tea I like drinking in the afternoon; I won't be switching my morning Lady Grey tea for this.<br /><br />I love the tea bag design.  With the pyramid tea bag I can see what the tea is, rather than look through an opaque square tea bag.  The tea tasted very fresh, too.

1, NOT HAPPY!!: If I can learn anything from this I have learned~You get what you pay fo

Cluster theme : The common theme among these customer reviews is that they are all positive and express satisfaction with the product.
5, Yummy and Healthy: Loved the cranberry-like flavor and slightly crunchy texture.  Worked well with wheat bread. A little on the expensive side but my kids like it too.

4, Worth the money, but nothing to write home about.: First of all, shipping and packaging went very well. I'm impressed in that regard. After reading the previous review, I was kind of expecting to be blown away by the wonderful smell and taste. I'm not saying it tastes or smells bad, but it's not as amazing as the other reviewer puts it. The fragrance and taste is hard to describe, but good. I brewed it at a relatively hot temperature for about 6 minutes, and added sugar with it. The tea tastes fruity but with a cinnamon kick to it as well as floral undertones. I would recommend this product to another person, but I don't think I'll be able to use all 16oz of it. Buying a smaller

Cluster theme : The common theme among these customer reviews is that they all discuss the taste and quality of the products they purchased.
1, KNOCK OFF: This is not the product that should come in this box.  I have had this tea from a local store which was excellent, very flavorful pellets.  This was a powdered product which was tasteless I can only assume it is another Chinese knock off.  Don't bother spending money for it.

4, yummy cookies!: We've ordered for them every month and only once the package inside the seal had a small hole and wasn't as fresh.

5, So convenient, for so little!: I needed two vanilla beans for the Love Goddess cake that my husbands adores. So you can spend exorbitant amounts of money at the grocery store for more vanilla beans than you need, or you can order the amount you need with free shipping. Each bean was in its own plastic vacuum package that looked like it could keep for awhile if you needed it to. The cake was, of course, delicious, and my t

Cluster theme : The common theme among these customer reviews is that they are all discussing different products and their experiences with them.
5, Delicious!: Love these packs. I have made pretzel dogs, bites and sticks. I  have even used the pretzel mix to make calzones and pizza. Both came out great.

2, dingos: The red part of the dingos was missing from 3 of the balls in one bag and two of them in the other. The others have not been opened yet and I found this very upsetting since this is the part the the puppies go for first. I don't mind paying that price if it is as shown. The bag show no white only balls.<br />Theresa Miller

5, my dogs love the peanut butter!: First off, read the ingredients, no crazy words I can't pronounce, which means it's all natural! I got the peanut butter treats for my two children: husky/shepherd and a lab/shepherd mix, and they can't get enough of them. Plus, on Amazon, this is such a great value for 16oz. Beats petsmart for sure! I'll be looki