## Get embeddings
This notebook contains some helpful snippets you can use to embed text with the 'text-embedding-ada-002' model via Azure OpenAI API.

## Installation
Install the Azure Open AI SDK using the below command.

In [1]:
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.14"

In [None]:
#r "nuget:Microsoft.DotNet.Interactive.AIUtilities, 1.0.0-beta.24129.1"

In [3]:
using Microsoft.DotNet.Interactive;
using Microsoft.DotNet.Interactive.AIUtilities;

## Run this cell, it will prompt you for the apiKey, endPoint, and embedding deployment

In [4]:
var azureOpenAIKey = await Kernel.GetPasswordAsync("Provide your OPEN_AI_KEY");

// Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
var azureOpenAIEndpoint = await Kernel.GetInputAsync("Provide the OPEN_AI_ENDPOINT");

// Enter the deployment name you chose when you deployed the model.
var deployment = await Kernel.GetInputAsync("Provide embedding deployment name");

### Import namesapaces and create an instance of `OpenAiClient` using the `azureOpenAIEndpoint` and the `azureOpenAIKey`

In [5]:
using Azure;
using Azure.AI.OpenAI;

In [6]:
OpenAIClient client = new (new Uri(azureOpenAIEndpoint), new AzureKeyCredential(azureOpenAIKey.GetClearTextPassword()));

### 1. Load the dataset
The dataset used in this example is [fine-food reviews](https://www.kaggle.com/snap/amazon-fine-food-reviews) from Amazon. The dataset contains a total of 568,454 food reviews Amazon users left up to October 2012. We will use a subset of this dataset, consisting of 1,000 most recent reviews for illustration purposes. The reviews are in English and tend to be positive or negative. Each review has a ProductId, UserId, Score, review title (Summary) and review body (Text).

We will combine the review summary and review text into a single combined text. The model will encode this combined text and it will output a single vector embedding.

Let's load the `fine_food_reviews_1k.csv` dataset using the `value` kernel

In [7]:
#!value --name dataSet --from-url https://raw.githubusercontent.com/openai/openai-cookbook/main/examples/data/fine_food_reviews_1k.csv

### Loading `Microsoft.Data.Analysis` lastest package

In [8]:
#i "nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-libraries/nuget/v3/index.json"

In [1]:
#r "nuget: Microsoft.Data.Analysis, 0.21.0"

Loading extensions from `C:\Users\dicolomb\.nuget\packages\microsoft.data.analysis\0.21.0\interactive-extensions\dotnet\Microsoft.Data.Analysis.Interactive.dll`

In [10]:
using Microsoft.Data.Analysis;

In [11]:
#!set --name dataSet --value @value:dataSet

var dataFrame = DataFrame.LoadCsvFromString(dataSet);
dataFrame.Head(3).Display();

index,Column0,Time,ProductId,UserId,Score,Summary,Text
0,0,1351123200.0,B003XPF9BO,A3R7JR3FMEBXQB,5,where does one start...and stop... with a treat like this,Wanted to save some to bring to my Chicago family but my North Carolina family ate all 4 boxes before I could pack. These are excellent...could serve to anyone
1,1,1351123200.0,B003JK537S,A3JBPC3WFUT5ZP,1,Arrived in pieces,"Not pleased at all. When I opened the box, most of the rings were broken in pieces. A total waste of money."
2,2,1351123200.0,B000JMBE7M,AQX1N6A51QOKG,4,"It isn't blanc mange, but isn't bad . . .","I'm not sure that custard is really custard without eggs. But this comes close. I got it for use in a ""Vegan pancake"" recipe. We were having houseguests who were Vegan and I wanted to make some special breakfasts while they were here. One of the cooking/recipe sites had a recipe using this and there were lots of great reviews. I tried the recipe and it turned out like wallpaper paste -- yuck!<br />However, the so-called custard isn't so bad. I think it's probably just cornstarch and annatto (yellow coloring with a slight flavor). It's fun playing with it. You could dress it up with fruit. Seems to come out on the thin side when you make it as directed, so I use less milk because I like my custards to set firm. As a custard sauce it's fine. I would say it tastes something between a pudding and a custard.<br /><br />If you want a really good egg-free ""custard"" get an original recipe for ""blanc mange."" It takes a lot longer to make, but it's certainly worth the difference."


### use tokenizer to calculate the token count

In [12]:
var tokenizer = await Tokenizer.CreateAsync(TokenizerModel.ada2);
var maxTokens = 200;
var subset = dataFrame.Clone();
var tokenCount = ((IEnumerable<string>)subset["Text"]).Select(x => tokenizer.GetTokenCount(x));
subset.Columns.Add( new Int32DataFrameColumn("tokens", tokenCount));


In [13]:
subset = subset.Filter(subset["tokens"].ElementwiseLessThanOrEqual(maxTokens));

In [14]:
subset.Head(6).Display();

index,Column0,Time,ProductId,UserId,Score,Summary,Text,tokens
0,0,1351123200.0,B003XPF9BO,A3R7JR3FMEBXQB,5,where does one start...and stop... with a treat like this,Wanted to save some to bring to my Chicago family but my North Carolina family ate all 4 boxes before I could pack. These are excellent...could serve to anyone,34
1,1,1351123200.0,B003JK537S,A3JBPC3WFUT5ZP,1,Arrived in pieces,"Not pleased at all. When I opened the box, most of the rings were broken in pieces. A total waste of money.",26
2,4,1351123200.0,B001BORBHO,A1AFOYZ9HSM2CZ,5,Happy with the product,My dog was suffering with itchy skin. He had been eating Natural Choice brand (cheaper) since he was a puppy. I was nervous to change foods. The vet suggested to change foods sand see if the skin issues cleared up. Wellness brand did the job. My dog seems to love the food and the skin issues cleared up within a few weeks.,77
3,5,1351123200.0,B008PSM0BQ,A3OUFIMGL2K6RS,4,Good Sauce,"This is a good all purpose sauce. Has good flavor that the heat doesn't overpower. Not really that spicy unless you use a whole bunch. 10 good drops is about enough to add a little heat to a pot of soup, but a lot more is needed if you want a lingering burn. Heat isn't quite up to par with other products out there, (such as Spontaneous Combustion) but this has the true aged cayenne hot sauce flavor.",100
4,6,1351123200.0,B008YA1LQK,A9YEAAQVHFUTX,5,Blackcat,Great coffee! Love all Green Mountain coffee and all the wonderful flavors. Would and do recommend this coffee to all my friends.,27
5,7,1351123200.0,B001KP6B98,ABWCUS3HBDZRS,5,Excellent product,After scouring every store in town for orange peels and not finding anything satisfactory I turned to the online options.<br /><br /> I received the candied orange peels today and I found exactly what I was looking for. The peels are perfect for the fruit cake I plan to bake. The peels are not crystallized with sugar which is great I like the texture and the taste of the peels and I am gonna order another box soon.,93


### 2. Get embeddings and save them for future reuse

In [15]:
using Microsoft.ML.Data;

Use the batch approach when calculating a lot of embeddings.

In [16]:
var texts = ((IEnumerable<string>)subset["Text"]).ToArray();
var chunks = texts.Chunk(16).ToArray();
var embeddings = new List<VBuffer<float>>();

foreach(var chunk in chunks)
{
    var response = await client.GetEmbeddingsAsync(new EmbeddingsOptions(deployment, chunk));
    embeddings.AddRange( response.Value.Data.Select(e => new VBuffer<float>(1536, e.Embedding.ToArray())));
}
var embeddingsColumn = new VBufferDataFrameColumn<float>("embeddings", embeddings);
subset.Columns.Add(embeddingsColumn);
subset.Head(1).Display();

index,Column0,Time,ProductId,UserId,Score,Summary,Text,tokens,embeddings
,,,,,,,,,
0,0,1351123200.0,B003XPF9BO,A3R7JR3FMEBXQB,5.0,where does one start...and stop... with a treat like this,Wanted to save some to bring to my Chicago family but my North Carolina family ate all 4 boxes before I could pack. These are excellent...could serve to anyone,34.0,"[ 0.0068575335, -0.028527338, 0.0065081255, -0.017594472, -0.0020066448, 0.013636695, -0.007001215, -0.03471871, -0.004424742, -0.037722964, 0.011899453, 0.0034026427, -0.017999392, 0.002989558, 0.008568651, 0.017163426, 0.025928007, -0.03450972, -0.006083612, -0.024412818 ... (more) ]IsDenseTrueLength1536(values)[ 0.0068575335, -0.028527338, 0.0065081255, -0.017594472, -0.0020066448, 0.013636695, -0.007001215, -0.03471871, -0.004424742, -0.037722964, 0.011899453, 0.0034026427, -0.017999392, 0.002989558, 0.008568651, 0.017163426, 0.025928007, -0.03450972, -0.006083612, -0.024412818 ... (more) ]"
,,,,,,,,,
IsDense,True,,,,,,,,
Length,1536,,,,,,,,
(values),"[ 0.0068575335, -0.028527338, 0.0065081255, -0.017594472, -0.0020066448, 0.013636695, -0.007001215, -0.03471871, -0.004424742, -0.037722964, 0.011899453, 0.0034026427, -0.017999392, 0.002989558, 0.008568651, 0.017163426, 0.025928007, -0.03450972, -0.006083612, -0.024412818 ... (more) ]",,,,,,,,

Unnamed: 0,Unnamed: 1
IsDense,True
Length,1536
(values),"[ 0.0068575335, -0.028527338, 0.0065081255, -0.017594472, -0.0020066448, 0.013636695, -0.007001215, -0.03471871, -0.004424742, -0.037722964, 0.011899453, 0.0034026427, -0.017999392, 0.002989558, 0.008568651, 0.017163426, 0.025928007, -0.03450972, -0.006083612, -0.024412818 ... (more) ]"


### save the data for later use

In [17]:
record DataRow(string ProducIt, string UserId, int Score, string Summary, string Text, int TokenCount, float[] Embedding);

var data = subset.Rows.Select(r => new DataRow(
    r["ProductId"].ToString(), 
    r["UserId"].ToString(), 
    (r["Score"].ToString() == null ? 0 : Convert.ToInt32(r["Score"].ToString())), 
    r["Summary"].ToString(), 
    r["Text"].ToString(), 
    (int)r["tokens"], 
    ((VBuffer<float>)r["embeddings"]).DenseValues().ToArray())
    ).ToArray();


In [18]:
using System.Text.Json;
using System.Text.Json.Serialization;
using System.IO;


var filePath = Path.Combine("..","..","..","Data","fine_food_reviews_with_embeddings_1k.json");

var options = new JsonSerializerOptions
{
    WriteIndented = true,
};

var jsonString = JsonSerializer.Serialize(data, options);
await System.IO.File.WriteAllTextAsync(filePath, jsonString);
