# Semantic text search using embeddings

We can search through all our reviews semantically in a very efficient manner and at very low cost, by embedding our search query, and then finding the most similar reviews. The dataset is created in the [Get_embeddings_from_dataset Notebook](Get_embeddings_from_dataset.ipynb).

## Installation
Install the Azure Open AI SDK using the below command.

In [1]:
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.14"

In [None]:
#r "nuget:Microsoft.DotNet.Interactive.AIUtilities, 1.0.0-beta.24129.1"

using Microsoft.DotNet.Interactive;
using Microsoft.DotNet.Interactive.AIUtilities;

## Run this cell, it will prompt you for the apiKey, endPoint, and embedding deployment

In [3]:
var azureOpenAIKey = await Kernel.GetPasswordAsync("Provide your OPEN_AI_KEY");

// Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
var azureOpenAIEndpoint = await Kernel.GetInputAsync("Provide the OPEN_AI_ENDPOINT");

// Enter the deployment name you chose when you deployed the model.
var deployment = await Kernel.GetInputAsync("Provide embedding deployment name");

### Import namesapaces and create an instance of `OpenAiClient` using the `azureOpenAIEndpoint` and the `azureOpenAIKey`

In [4]:
using Azure;
using Azure.AI.OpenAI;

In [5]:
OpenAIClient client = new (new Uri(azureOpenAIEndpoint), new AzureKeyCredential(azureOpenAIKey.GetClearTextPassword()));

In [6]:
record DataRow(string ProducIt, string UserId, float Score, string Summary, string Text, int TokenCount, float[] Embedding);

In [7]:
using System.Text.Json;
using System.Text.Json.Serialization;
using System.IO;

var filePath = Path.Combine("..","..","..","Data","fine_food_reviews_with_embeddings_1k.json");

var data = JsonSerializer.Deserialize<DataRow[]>(File.ReadAllText(filePath));

In [8]:
data.Take(2).Display();

index,value
,
,
0,"DataRow { ProducIt = B003XPF9BO, UserId = A3R7JR3FMEBXQB, Score = 5, Summary = where does one start...and stop... with a treat like this, Text = Wanted to save some to bring to my Chicago family but my North Carolina family ate all 4 boxes before I could pack. These are excellent...could serve to a...ProducItB003XPF9BOUserIdA3R7JR3FMEBXQBScore5Summarywhere does one start...and stop... with a treat like thisTextWanted to save some to bring to my Chicago family but my North Carolina family ate all 4 boxes before I could pack. These are excellent...could serve to anyoneTokenCount34Embedding[ 0.0068575335, -0.028527338, 0.0065081255, -0.017594472, -0.0020066448, 0.013636695, -0.007001215, -0.03471871, -0.004424742, -0.037722964, 0.011899453, 0.0034026427, -0.017999392, 0.002989558, 0.008568651, 0.017163426, 0.025928007, -0.03450972, -0.006083612, -0.024412818 ... (1516 more) ]"
,
ProducIt,B003XPF9BO
UserId,A3R7JR3FMEBXQB
Score,5
Summary,where does one start...and stop... with a treat like this
Text,Wanted to save some to bring to my Chicago family but my North Carolina family ate all 4 boxes before I could pack. These are excellent...could serve to anyone
TokenCount,34

Unnamed: 0,Unnamed: 1
ProducIt,B003XPF9BO
UserId,A3R7JR3FMEBXQB
Score,5
Summary,where does one start...and stop... with a treat like this
Text,Wanted to save some to bring to my Chicago family but my North Carolina family ate all 4 boxes before I could pack. These are excellent...could serve to anyone
TokenCount,34
Embedding,"[ 0.0068575335, -0.028527338, 0.0065081255, -0.017594472, -0.0020066448, 0.013636695, -0.007001215, -0.03471871, -0.004424742, -0.037722964, 0.011899453, 0.0034026427, -0.017999392, 0.002989558, 0.008568651, 0.017163426, 0.025928007, -0.03450972, -0.006083612, -0.024412818 ... (1516 more) ]"

Unnamed: 0,Unnamed: 1
ProducIt,B003JK537S
UserId,A3JBPC3WFUT5ZP
Score,1
Summary,Arrived in pieces
Text,"Not pleased at all. When I opened the box, most of the rings were broken in pieces. A total waste of money."
TokenCount,26
Embedding,"[ -0.030615676, -0.014484274, -0.008392513, -0.012360001, -0.022380034, 0.010150757, -0.014013666, -0.024419336, 0.013425405, -0.020366875, 0.011961292, 0.027530579, 0.015843809, -0.01980476, -0.0050753783, 0.0320275, 0.01729485, 0.014431984, -0.007300963, -0.029099273 ... (1516 more) ]"


Let's define the function `SearchReviews` that is used to search a dataset of reviews using embeddings. The function takes an array of data rows, a query string, and an optional result count (defaulting to 5), and returns an array of the top matching reviews.

The code starts by making an asynchronous request to an AI service (likely OpenAI) to generate an embedding for the query. The `GetEmbeddingsAsync` method of the `client` object is used to make this request. The method takes an instance of `EmbeddingsOptions` as a parameter, which specifies the deployment of the embedding model and the text to be embedded (in this case, the query). The response from the AI service is then processed to extract the query's embedding.

Next, it calculates the similarity between the query's embedding and the embeddings of all data rows using the `ScoreBySimilarityTo` method. This calculates a measure of similarity between two non-zero vectors, between the query's embedding and each row's embedding. The `CosineSimilarityComparer<float[]>(t => t)` is used to specify how to calculate the similarity.

The resulting scores are then ordered in descending order and the top `resultCount` scores are selected. This means that the method is returning the top `resultCount` rows that have the highest similarity scores with the query's embedding.

Finally, the it extracts the text of each selected row using the `Select(r => r.Key.Text)` expression and converts the resulting collection to an array. This array of review texts is returned as the search results.

In [9]:
async Task<string[]> SearchReviews(DataRow[] data, string query, int resultCount = 5)
{
    var queryEmbeddingResponse = await client.GetEmbeddingsAsync(new EmbeddingsOptions(deployment, new []{ query }));
    var queryEmbedding = queryEmbeddingResponse.Value.Data[0].Embedding.ToArray();
    var result = data
        .ScoreBySimilarityTo(queryEmbedding, new CosineSimilarityComparer<float[]>(t => t),e => e.Embedding.ToArray())
        .OrderByDescending(s => s.Score)
        .Take(resultCount)
        .Select(r => r.Value.Text);

        return result.ToArray();
}

In [10]:
var results = await SearchReviews(data, "whole wheat pasta",3);
foreach(var result in results)
{
    result.Display();
}

Barilla Whole Grain Fusilli with Vegetable Marinara is tasty and has an excellent chunky vegetable marinara.  I just wish there was more of it.  If you aren't starving or on a diet, the 9oz serving is enough for lunch although you might want to add a piece of fruit to feel full.  The whole grain fusilli cooked to al dente tenderness following the instructions and the chunky marinara sauce is so good that I wished there was more of it.  Rarely do I eat sauce alone but this sauce is good enough to.

tastes so good. Worth the money. My boyfriend hates wheat pasta and LOVES this. cooks fast tastes great.I love this brand and started buying more of their pastas. Bulk is best.

Anything this company makes is worthwhile eating! My favorite is their Trenne.<br />Their whole wheat pasta is the best I have ever had.

In [11]:
var results = await SearchReviews(data, "bad delivery",1);
foreach(var result in results)
{
    result.Display();
}

The bag came broken. Product was leaking out of the box, due to poor packing standards.<br />Hope next items arrive unscathed. Quinoa tasted good.

In [12]:
var results = await SearchReviews(data, "spoilt",1);
foreach(var result in results)
{
    result.Display();
}

The bag came broken. Product was leaking out of the box, due to poor packing standards.<br />Hope next items arrive unscathed. Quinoa tasted good.

In [13]:
var results = await SearchReviews(data, "pet food",1);
foreach(var result in results)
{
    result.Display();
}

The only dry food my queen cat will eat. Helps prevent hair balls. Good packaging. Arrives promptly. Recommended by a friend who sells pet food.