# Question answering using embeddings-based search
GPT excels at answering questions, but only on topics it remembers from its training data.
What should you do if you want GPT to answer questions about unfamiliar topics? E.g.,
- Recent events after Sep 2021
- Your non-public documents
- Information from past conversations
- etc.

This notebook demonstrates a two-step Search-Ask method for enabling GPT to answer questions using a library of reference text.

 1. **Search:** search your library of text for relevant text sections
 2. **Ask:** insert the retrieved text sections into a message to GPT and ask it the question"

## Why search is better than fine-tuning

GPT can learn knowledge in two ways:

 - Via model weights (i.e., fine-tune the model on a training set)
 - Via model inputs (i.e., insert the knowledge into an input message)

Although fine-tuning can feel like the more natural option—training on data is how GPT learned all of its other knowledge, after all—we generally do not recommend it as a way to teach the model knowledge. Fine-tuning is better suited to teaching specialized tasks or styles, and is less reliable for factual recall.

As an analogy, model weights are like long-term memory. When you fine-tune a model, it's like studying for an exam a week away. When the exam arrives, the model may forget details, or misremember facts it never read.

In contrast, message inputs are like short-term memory. When you insert knowledge into a message, it's like taking an exam with open notes. With notes in hand, the model is more likely to arrive at correct answers.

One downside of text search relative to fine-tuning is that each model is limited by a maximum amount of text it can read at once:

| Model           | Maximum text length       |
|-----------------|---------------------------|
| `gpt-3.5-turbo` | 4,096 tokens (~5 pages)   |
| `gpt-4`         | 8,192 tokens (~10 pages)  |
| `gpt-4-32k`     | 32,768 tokens (~40 pages) |

Continuing the analogy, you can think of the model like a student who can only look at a few pages of notes at a time, despite potentially having shelves of textbooks to draw upon.

Therefore, to build a system capable of drawing upon large quantities of text to answer questions, we recommend using a Search-Ask approach.
Continuing the analogy, you can think of the model like a student who can only look at a few pages of notes at a time, despite potentially having shelves of textbooks to draw upon.

Therefore, to build a system capable of drawing upon large quantities of text to answer questions, we recommend using a Search-Ask approach.

## Search
Text can be searched in many ways. E.g.,
- Lexical-based search
- Graph-based search
- Embedding-based search

This example notebook uses embedding-based search. [Embeddings](https://platform.openai.com/docs/guides/embeddings) are simple to implement and work especially well with questions, as questions often don't lexically overlap with their answers.

Consider embeddings-only search as a starting point for your own system. Better search systems might combine multiple search methods, along with features like popularity, recency, user history, redundancy with prior search results, click rate data, etc. Q&A retrieval performance may also be improved with techniques like [HyDE](https://arxiv.org/abs/2212.10496), in which questions are first transformed into hypothetical answers before being embedded. Similarly, GPT can also potentially improve search results by automatically transforming questions into sets of keywords or search terms.

## Full procedure
Specifically, this notebook demonstrates the following procedure:
1. Prepare search data (once per document)
    1. Collect: We'll download a few hundred Wikipedia articles about the 2022 Olympics
    2. Chunk: Documents are split into short, mostly self-contained sections to be embedded
    3. Embed: Each section is embedded with the OpenAI API
    4. Store: Embeddings are saved (for large datasets, use a vector database)
2. Search (once per query)
    1. Given a user question, generate an embedding for the query from the OpenAI API
    2. Using the embeddings, rank the text sections by relevance to the query
3. Ask (once per query)
    1. Insert the question and the most relevant sections into a message to GPT
    2. Return GPT's answer

### Costs
Because GPT is more expensive than embeddings search, a system with a decent volume of queries will have its costs dominated by step 3.

- For `gpt-3.5-turbo` using ~1,000 tokens per query, it costs ~$0.002 per query, or ~500 queries per dollar (as of Apr 2023)
- For `gpt-4`, again assuming ~1,000 tokens per query, it costs ~$0.03 per query, or ~30 queries per dollar (as of Apr 2023)
Of course, exact costs will depend on the system specifics and usage patterns.

## Preamble
We'll begin by:
- Importing the necessary libraries
- Selecting models for embeddings search and question answering

## Installation
Install the Azure Open AI SDK using the below command.

In [1]:
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.14"

In [None]:
#r "nuget:Microsoft.DotNet.Interactive.AIUtilities, 1.0.0-beta.24129.1"

using Microsoft.DotNet.Interactive;
using Microsoft.DotNet.Interactive.AIUtilities;

## Run this cell, it will prompt you for the apiKey, endPoint, embeddingDeployment, and chatDeployment

In [3]:
var azureOpenAIKey = await Kernel.GetPasswordAsync("Provide your OPEN_AI_KEY");

// Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
var azureOpenAIEndpoint = await Kernel.GetInputAsync("Provide the OPEN_AI_ENDPOINT");

// Enter the deployment name you chose when you deployed the model.
var embeddingDeployment = await Kernel.GetInputAsync("Provide embedding deployment name");
var chatDeployment = await Kernel.GetInputAsync("Provide chat deployment name");

### Import namesapaces and create an instance of `OpenAiClient` using the `azureOpenAIEndpoint` and the `azureOpenAIKey`

In [4]:
using Azure;
using Azure.AI.OpenAI;

In [5]:
OpenAIClient client = new (new Uri(azureOpenAIEndpoint), new AzureKeyCredential(azureOpenAIKey.GetClearTextPassword()));

### Motivating example: GPT cannot answer questions about current events
Because the training data for gpt-3.5-turbo and gpt-4 mostly ends in September 2021, the models cannot answer questions about more recent events, such as the 2022 Winter Olympics.

For example, let's try asking 'Which athletes won the gold medal in curling in 2022?':

In [6]:
var options= new ChatCompletionsOptions{
    Messages =
		{
			new ChatRequestSystemMessage( @"You answer questions about the 2022 Winter Olympics."),
			new ChatRequestUserMessage(@"Which athletes won the gold medal in curling at the 2022 Winter Olympics?")
		},
    Temperature = 0f,
	DeploymentName = chatDeployment
};

var response = await client.GetChatCompletionsAsync(options);

response.Value.Choices.FirstOrDefault()?.Message?.Content?.Display();

As an AI language model, I don't have real-time data. However, I can provide you with general information. The gold medalists in curling at the 2022 Winter Olympics will be determined during the event. The winners will be the team that finishes in first place in the men's and women's curling competitions. To find out the specific winners, you can check the official website of the International Olympic Committee or reliable sports news sources.

### You can give GPT knowledge about a topic by inserting it into an input message
To help give the model knowledge of curling at the 2022 Winter Olympics, we can copy and paste the top half of a relevant Wikipedia article into our message:

In [7]:
#!value --name wikipediaArticle 
Curling at the 2022 Winter Olympics

Article
Talk
Read
Edit
View history
From Wikipedia, the free encyclopedia
Curling
at the XXIV Olympic Winter Games
Curling pictogram.svg
Curling pictogram
Venue	Beijing National Aquatics Centre
Dates	2–20 February 2022
No. of events	3 (1 men, 1 women, 1 mixed)
Competitors	114 from 14 nations
← 20182026 →
Men's curling
at the XXIV Olympic Winter Games
Medalists
1st place, gold medalist(s)		 Sweden
2nd place, silver medalist(s)		 Great Britain
3rd place, bronze medalist(s)		 Canada
Women's curling
at the XXIV Olympic Winter Games
Medalists
1st place, gold medalist(s)		 Great Britain
2nd place, silver medalist(s)		 Japan
3rd place, bronze medalist(s)		 Sweden
Mixed doubles's curling
at the XXIV Olympic Winter Games
Medalists
1st place, gold medalist(s)		 Italy
2nd place, silver medalist(s)		 Norway
3rd place, bronze medalist(s)		 Sweden
Curling at the
2022 Winter Olympics
Curling pictogram.svg
Qualification
Statistics
Tournament
Men
Women
Mixed doubles
vte
The curling competitions of the 2022 Winter Olympics were held at the Beijing National Aquatics Centre, one of the Olympic Green venues. Curling competitions were scheduled for every day of the games, from February 2 to February 20.[1] This was the eighth time that curling was part of the Olympic program.

In each of the men's, women's, and mixed doubles competitions, 10 nations competed. The mixed doubles competition was expanded for its second appearance in the Olympics.[2] A total of 120 quota spots (60 per sex) were distributed to the sport of curling, an increase of four from the 2018 Winter Olympics.[3] A total of 3 events were contested, one for men, one for women, and one mixed.[4]

Qualification
Main article: Curling at the 2022 Winter Olympics – Qualification
Qualification to the Men's and Women's curling tournaments at the Winter Olympics was determined through two methods (in addition to the host nation). Nations qualified teams by placing in the top six at the 2021 World Curling Championships. Teams could also qualify through Olympic qualification events which were held in 2021. Six nations qualified via World Championship qualification placement, while three nations qualified through qualification events. In men's and women's play, a host will be selected for the Olympic Qualification Event (OQE). They would be joined by the teams which competed at the 2021 World Championships but did not qualify for the Olympics, and two qualifiers from the Pre-Olympic Qualification Event (Pre-OQE). The Pre-OQE was open to all member associations.[5]

For the mixed doubles competition in 2022, the tournament field was expanded from eight competitor nations to ten.[2] The top seven ranked teams at the 2021 World Mixed Doubles Curling Championship qualified, along with two teams from the Olympic Qualification Event (OQE) – Mixed Doubles. This OQE was open to a nominated host and the fifteen nations with the highest qualification points not already qualified to the Olympics. As the host nation, China qualified teams automatically, thus making a total of ten teams per event in the curling tournaments.[6]

Summary
Nations	Men	Women	Mixed doubles	Athletes
 Australia			Yes	2
 Canada	Yes	Yes	Yes	12
 China	Yes	Yes	Yes	12
 Czech Republic			Yes	2
 Denmark	Yes	Yes		10
 Great Britain	Yes	Yes	Yes	10
 Italy	Yes		Yes	6
 Japan		Yes		5
 Norway	Yes		Yes	6
 ROC	Yes	Yes		10
 South Korea		Yes		5
 Sweden	Yes	Yes	Yes	11
 Switzerland	Yes	Yes	Yes	12
 United States	Yes	Yes	Yes	11
Total: 14 NOCs	10	10	10	114
Competition schedule

The Beijing National Aquatics Centre served as the venue of the curling competitions.
Curling competitions started two days before the Opening Ceremony and finished on the last day of the games, meaning the sport was the only one to have had a competition every day of the games. The following was the competition schedule for the curling competitions:

RR	Round robin	SF	Semifinals	B	3rd place play-off	F	Final
Date
Event
Wed 2	Thu 3	Fri 4	Sat 5	Sun 6	Mon 7	Tue 8	Wed 9	Thu 10	Fri 11	Sat 12	Sun 13	Mon 14	Tue 15	Wed 16	Thu 17	Fri 18	Sat 19	Sun 20
Men's tournament								RR	RR	RR	RR	RR	RR	RR	RR	RR	SF	B	F	
Women's tournament									RR	RR	RR	RR	RR	RR	RR	RR	SF	B	F
Mixed doubles	RR	RR	RR	RR	RR	RR	SF	B	F												
Medal summary
Medal table
Rank	Nation	Gold	Silver	Bronze	Total
1	 Great Britain	1	1	0	2
2	 Sweden	1	0	2	3
3	 Italy	1	0	0	1
4	 Japan	0	1	0	1
 Norway	0	1	0	1
6	 Canada	0	0	1	1
Totals (6 entries)	3	3	3	9
Medalists
Event	Gold	Silver	Bronze
Men
details	 Sweden
Niklas Edin
Oskar Eriksson
Rasmus Wranå
Christoffer Sundgren
Daniel Magnusson	 Great Britain
Bruce Mouat
Grant Hardie
Bobby Lammie
Hammy McMillan Jr.
Ross Whyte	 Canada
Brad Gushue
Mark Nichols
Brett Gallant
Geoff Walker
Marc Kennedy
Women
details	 Great Britain
Eve Muirhead
Vicky Wright
Jennifer Dodds
Hailey Duff
Mili Smith	 Japan
Satsuki Fujisawa
Chinami Yoshida
Yumi Suzuki
Yurika Yoshida
Kotomi Ishizaki	 Sweden
Anna Hasselborg
Sara McManus
Agnes Knochenhauer
Sofia Mabergs
Johanna Heldin
Mixed doubles
details	 Italy
Stefania Constantini
Amos Mosaner	 Norway
Kristin Skaslien
Magnus Nedregotten	 Sweden
Almida de Val
Oskar Eriksson
Teams
Men
 Canada	 China	 Denmark	 Great Britain	 Italy
Skip: Brad Gushue
Third: Mark Nichols
Second: Brett Gallant
Lead: Geoff Walker
Alternate: Marc Kennedy

Skip: Ma Xiuyue
Third: Zou Qiang
Second: Wang Zhiyu
Lead: Xu Jingtao
Alternate: Jiang Dongxu

Skip: Mikkel Krause
Third: Mads Nørgård
Second: Henrik Holtermann
Lead: Kasper Wiksten
Alternate: Tobias Thune

Skip: Bruce Mouat
Third: Grant Hardie
Second: Bobby Lammie
Lead: Hammy McMillan Jr.
Alternate: Ross Whyte

Skip: Joël Retornaz
Third: Amos Mosaner
Second: Sebastiano Arman
Lead: Simone Gonin
Alternate: Mattia Giovanella

 Norway	 ROC	 Sweden	 Switzerland	 United States
Skip: Steffen Walstad
Third: Torger Nergård
Second: Markus Høiberg
Lead: Magnus Vågberg
Alternate: Magnus Nedregotten

Skip: Sergey Glukhov
Third: Evgeny Klimov
Second: Dmitry Mironov
Lead: Anton Kalalb
Alternate: Daniil Goriachev

Skip: Niklas Edin
Third: Oskar Eriksson
Second: Rasmus Wranå
Lead: Christoffer Sundgren
Alternate: Daniel Magnusson

Fourth: Benoît Schwarz
Third: Sven Michel
Skip: Peter de Cruz
Lead: Valentin Tanner
Alternate: Pablo Lachat

Skip: John Shuster
Third: Chris Plys
Second: Matt Hamilton
Lead: John Landsteiner
Alternate: Colin Hufman

Women
 Canada	 China	 Denmark	 Great Britain	 Japan
Skip: Jennifer Jones
Third: Kaitlyn Lawes
Second: Jocelyn Peterman
Lead: Dawn McEwen
Alternate: Lisa Weagle

Skip: Han Yu
Third: Wang Rui
Second: Dong Ziqi
Lead: Zhang Lijun
Alternate: Jiang Xindi

Skip: Madeleine Dupont
Third: Mathilde Halse
Second: Denise Dupont
Lead: My Larsen
Alternate: Jasmin Lander

Skip: Eve Muirhead
Third: Vicky Wright
Second: Jennifer Dodds
Lead: Hailey Duff
Alternate: Mili Smith

Skip: Satsuki Fujisawa
Third: Chinami Yoshida
Second: Yumi Suzuki
Lead: Yurika Yoshida
Alternate: Kotomi Ishizaki

 ROC	 South Korea	 Sweden	 Switzerland	 United States
Skip: Alina Kovaleva
Third: Yulia Portunova
Second: Galina Arsenkina
Lead: Ekaterina Kuzmina
Alternate: Maria Komarova

Skip: Kim Eun-jung
Third: Kim Kyeong-ae
Second: Kim Cho-hi
Lead: Kim Seon-yeong
Alternate: Kim Yeong-mi

Skip: Anna Hasselborg
Third: Sara McManus
Second: Agnes Knochenhauer
Lead: Sofia Mabergs
Alternate: Johanna Heldin

Fourth: Alina Pätz
Skip: Silvana Tirinzoni
Second: Esther Neuenschwander
Lead: Melanie Barbezat
Alternate: Carole Howald

Skip: Tabitha Peterson
Third: Nina Roth
Second: Becca Hamilton
Lead: Tara Peterson
Alternate: Aileen Geving

Mixed doubles
 Australia	 Canada	 China	 Czech Republic	 Great Britain
Female: Tahli Gill
Male: Dean Hewitt

Female: Rachel Homan
Male: John Morris

Female: Fan Suyuan
Male: Ling Zhi

Female: Zuzana Paulová
Male: Tomáš Paul

Female: Jennifer Dodds
Male: Bruce Mouat

 Italy	 Norway	 Sweden	 Switzerland	 United States
Female: Stefania Constantini
Male: Amos Mosaner

Female: Kristin Skaslien
Male: Magnus Nedregotten

Female: Almida de Val
Male: Oskar Eriksson

Female: Jenny Perret
Male: Martin Rios

Female: Vicky Persinger
Male: Chris Plys

In [9]:
#!set --name wikipedia_article_on_curling --value @value:wikipediaArticle

var options= new ChatCompletionsOptions{
    Messages =
		{
			new ChatRequestSystemMessage( @"You answer questions about the 2022 Winter Olympics."),
			new ChatRequestUserMessage($"""""
            Use the below article on the 2022 Winter Olympics to answer the subsequent question. If the answer cannot be found, write "I don't know."
            Article:
            """
            {wikipedia_article_on_curling}
            """
            Question: Which athletes won the gold medal in curling at the 2022 Winter Olympics?
            """"")
		},
    Temperature = 0f,
    DeploymentName= chatDeployment
};

var response = await client.GetChatCompletionsAsync(options);

response.Value.Choices.FirstOrDefault()?.Message?.Content?.Display();

The athletes who won the gold medal in curling at the 2022 Winter Olympics are as follows:

Men's Curling: Sweden (Niklas Edin, Oskar Eriksson, Rasmus Wranå, Christoffer Sundgren, Daniel Magnusson)

Women's Curling: Great Britain (Eve Muirhead, Vicky Wright, Jennifer Dodds, Hailey Duff, Mili Smith)

Mixed Doubles Curling: Italy (Stefania Constantini, Amos Mosaner)

Thanks to the Wikipedia article included in the input message, GPT answers correctly.

In this particular case, GPT was intelligent enough to realize that the original question was underspecified, as there were three curling gold medal events, not just one.

Of course, this example partly relied on human intelligence. We knew the question was about curling, so we inserted a Wikipedia article on curling.

The rest of this notebook shows how to automate this knowledge insertion with embeddings-based search.

## 1. Prepare search data
To save you the time & expense, we've prepared a pre-embedded dataset of a few hundred Wikipedia articles about the 2022 Winter Olympics.
To see how we constructed this dataset, or to modify it yourself, see [Embedding Wikipedia articles for search](Embedding_Wikipedia_articles_for_search.ipynb)

In [8]:
public record PageBlockWithEmbeddings(string PageTitle, string Block, float[] Embedding);

In [9]:
using System.Text.Json;
using System.Text.Json.Serialization;
using System.IO;

var filePath = Path.Combine("..","..","..","Data","wikipedia_embeddings.json");

var olympicsData = JsonSerializer.Deserialize<PageBlockWithEmbeddings[]>(File.ReadAllText(filePath));

In [10]:
olympicsData.Take(4).DisplayTable();

PageTitle,Block,Embedding
2022 Winter Olympics,"The 2022 Winter Olympics, officially called the XXIV Olympic Winter Games () and commonly known as Beijing 2022 (2022), was an international winter multi-sport event held from 4 to 20 February 2022 in Beijing, China, and surrounding areas with competition in selected events beginning 2 February 2022.{{cite web|title=SuperSport|url=https://supersport.com/news/cd6663a2-8236-44d8-a8b9-4fa192190da7/%7B%7B%20url()-%3Ecurrent()%20%7D%7D|access-date=25 February 2022|website=supersport.com|language=ZA|archive-date=25 February 2022|archive-url=https://web.archive.org/web/20220225173447/https://supersport.com/news/cd6663a2-8236-44d8-a8b9-4fa192190da7/%7B%7B%20url()-%3Ecurrent()%20%7D%7D|url-status=live}} It was the 24th edition of the Winter Olympic Games.","[ -0.007000031, -0.025182178, -0.010695381, -0.005645496, -0.026770474, 0.010612124, -0.01401287, -0.00023596284, -0.0060201543, -0.022453895, 0.03729934, 0.0129561415, -0.030408185, -0.023824442, -0.014550841, -0.027718328, 0.016702726, -0.0068463245, 0.0015754872, -0.015831726 ... (1516 more) ]"
2022 Winter Olympics,"Beijing was selected as host city on 31 July 2015 at the 128th IOC Session in Kuala Lumpur, Malaysia, marking its second time hosting the Olympics, and the last of three consecutive Olympics hosted in East Asia following the 2018 Winter Olympics in Pyeongchang County, South Korea, and the 2020 Summer Olympics in Tokyo, Japan. Having previously hosted the 2008 Summer Olympics, Beijing became the first city to have hosted both the Summer and Winter Olympics. The venues for the Games were concentrated around Beijing, its suburb Yanqing District, and Zhangjiakou, with some events (including the ceremonies and curling) repurposing venues originally built for Beijing 2008 (such as Beijing National Stadium and the Beijing National Aquatics Centre).","[ 0.008893254, -0.012699212, -0.007250349, -0.0007829965, -0.020767841, 0.024421562, -0.03750137, -0.0052427067, -0.0037869278, -0.019334264, 0.016835019, 0.010022355, -0.0055186385, -0.0069966186, -0.02300067, -0.008538032, 0.014373833, -0.011024591, -0.0111577995, 0.0054932656 ... (1516 more) ]"
2022 Winter Olympics,"The Games featured a record 109 events across 15 disciplines, with big air freestyle skiing and women's monobob making their Olympic debuts as medal events, as well as several new mixed competitions. A total of 2,871 athletes representing 91 teams competed in the Games, with Haiti and Saudi Arabia making their Winter Olympic debut.","[ -0.009414442, 0.0101670865, -0.0019868554, -0.023944318, -0.0073287217, 0.01984942, -0.017017433, 0.0011855755, -0.00017719848, -0.03722404, 0.017514944, 0.006266727, -0.013177667, -0.025602689, -0.0065027256, -0.001007779, 0.029697588, -0.017132243, -0.01583106, -0.0075710993 ... (1516 more) ]"
2022 Winter Olympics,"Beijing's hosting of the Games was subject to various concerns and controversies including those related to human rights violations in China, such as the Uyghur genocide, which led to calls for a boycott of the games.{{Cite news|last=Reyes|first=Yacob|date=8 December 2021|title=Beijing Olympics: These countries have announced diplomatic boycotts|work=[[Axios (website)|Axios]]|url=https://www.axios.com/diplomatic-boycott-beijing-olympics-list-countries-73e1240f-b925-40bf-ae67-648e774971c8.html|access-date=5 February 2022|archive-date=4 February 2022|archive-url=https://web.archive.org/web/20220204210817/https://www.axios.com/diplomatic-boycott-beijing-olympics-list-countries-73e1240f-b925-40bf-ae67-648e774971c8.html|url-status=live}}{{Cite news|last1=Allen-Ebrahimian|first1=Bethany|last2=Baker|first2=Kendall|date=1 February 2022|title=The IOC stays silent on human rights in China|work=[[Axios (website)|Axios]]|url=https://www.axios.com/winter-olympics-beijing-ioc-silence-human-rights-31ec1273-d894-4a67-993b-4b4156d42d44.html|access-date=5 February 2022|archive-date=5 February 2022|archive-url=https://web.archive.org/web/20220205020342/https://www.axios.com/winter-olympics-beijing-ioc-silence-human-rights-31ec1273-d894-4a67-993b-4b4156d42d44.html|url-status=live}} Like the Summer Olympics held six months earlier in Tokyo, the COVID-19 pandemic resulted in the implementation of health and safety protocols, and, for the second Games in a row, the Games being closed to the public (with selected events open to invited guests at a reduced capacity).","[ 0.0005398922, -0.037917137, 0.008794742, -0.0010528916, -0.01572492, -0.006063091, -0.02190536, -0.005261198, -0.014212407, -0.021031754, 0.020875288, 0.009987801, -0.016402943, -0.004302839, -0.018997686, 0.0067476337, 0.031136906, 0.0013633806, 0.00018142414, -0.012680336 ... (1516 more) ]"


 ## 2. Search
    
 
 Now we'll define a search function that:
 - Takes a user query and a dataframe with text & embedding columns
 - Embeds the user query with the OpenAI API
 - Uses distance between query embedding and text embeddings to rank the texts
 - Returns two lists:
    - The top N texts, ranked by relevance
    - Their corresponding relevance scores

Let's define an asynchronous method named `SearchAsync` that takes a query, a collection of knowledge base entries, and an optional result count (defaulting to 5), and returns a collection of search results.

The method starts by making an asynchronous request to an AI service (likely OpenAI) to generate an embedding for the query. The `GetEmbeddingsAsync` method of the `client` object is used to make this request. The method takes an instance of `EmbeddingsOptions` as a parameter, which specifies the deployment of the embedding model and the text to be embedded (in this case, the query). The response from the AI service is then processed to extract the query's embedding.

Next, the method calculates the similarity between the query's embedding and the embeddings of all knowledge base entries using the `ScoreBySimilarityTo` method. This method likely calculates the cosine similarity, a measure of similarity between two non-zero vectors, between the query's embedding and each entry's embedding. The `CosineSimilarityComparer<float[]>(t => t)` is used to specify how to calculate the cosine similarity.

The resulting scores are then ordered in descending order, filtered to include only scores greater than 0.8, and the top `resultCount` scores are selected. This means that the method is returning the top `resultCount` entries that have a similarity score greater than 0.8 with the query's embedding.

Finally, the method creates a new instance of `SearchResult` for each selected entry, associating each entry with its similarity score. These instances are returned as the search results.

In [11]:
public record SearchResult(string Text, float Score);
public async Task<IEnumerable<SearchResult>> SearchAsync(string query, IEnumerable<PageBlockWithEmbeddings> knowledge, int resultCount = 5){
    var response = await client.GetEmbeddingsAsync(new EmbeddingsOptions(embeddingDeployment, new [] {query}));
    var queryEmbedding = response.Value.Data[0].Embedding.ToArray();

    var result = knowledge
        .ScoreBySimilarityTo(queryEmbedding, new CosineSimilarityComparer<float[]>(t => t),e => e.Embedding.ToArray())
        .OrderByDescending(s => s.Score)
        .Where(s => s.Score > 0.8)
        .Take(resultCount)
        .Select(r => new  SearchResult(r.Value.Block, r.Score));

        return result;  
}

In [12]:
var search = await SearchAsync("curling gold medal", olympicsData);

search.DisplayTable();

Text,Score
Two bronze medals were awarded to Daniela Maier and Fanny Smith for a third-place tie in the freestyle women's ski cross event following a decision by the Court of Arbitration for Sport.{{cite web|url=https://www.tas-cas.org/fileadmin/user_upload/CAS_Media_Release_8741.pdf|title=Court of Arbitration for Sport Media Release|access-date=13 December 2022|publisher=[[Court of Arbitration for Sport]]|date=13 December 2022|archive-date=13 December 2022|archive-url=https://web.archive.org/web/20221213171856/https://www.tas-cas.org/fileadmin/user_upload/CAS_Media_Release_8741.pdf|url-status=live}},0.84077555
"Biathletes Johannes Thingnes Bø, Quentin Fillon Maillet, and Marte Olsbu Røiseland, and cross-country skier Alexander Bolshunov won the most total medals at the games with five each.{{cite web |title=Beijing 2022 |url=https://www.teamgb.com/competitions/beijing-2022/6dWdXrzU85Vn1jF6ZC9Onl |publisher=[[British Olympic Association]] |access-date=26 February 2022 |archive-date=18 March 2022 |archive-url=https://web.archive.org/web/20220318145104/https://www.teamgb.com/competitions/beijing-2022/6dWdXrzU85Vn1jF6ZC9Onl |url-status=live }} Bø also earned the most gold medals with four.{{cite news |author=[[Agence France-Presse]] |title=Norwegian Biathlete Boe Gets Fourth Beijing Olympics Gold Medal |url=https://www.barrons.com/news/norwegian-biathlete-boe-gets-fourth-beijing-olympics-gold-medal-01645189808 |access-date=27 March 2022 |work=[[Barron's (newspaper)|Barron's]] |date=18 February 2022 |archive-date=22 February 2023 |archive-url=https://web.archive.org/web/20230222200651/https://www.barrons.com/news/norwegian-biathlete-boe-gets-fourth-beijing-olympics-gold-medal-01645189808 |url-status=live }} Snowboarder Zoi Sadowski-Synnott of New Zealand won the first Winter Olympic gold medal for that nation.{{cite news |first1=Bryan Armen |last1=Graham |title=Zoi Sadowski-Synnott Wins New Zealand's First Ever Winter Olympic Gold |url=https://www.theguardian.com/sport/2022/feb/05/zoi-sadowski-synnott-new-zealand-first-winter-olympic-gold-snowboard-beijing-2022-tess-coady |access-date=12 July 2022 |work=[[The Guardian]] |date=5 February 2022 |archive-date=26 February 2022 |archive-url=https://web.archive.org/web/20220226175310/https://www.theguardian.com/sport/2022/feb/05/zoi-sadowski-synnott-new-zealand-first-winter-olympic-gold-snowboard-beijing-2022-tess-coady |url-status=live }} Germany achieved a podium sweep in the men's two-man bobsleigh competition with Francesco Friedrich and Thorsten Margis	winning gold, Johannes Lochner and Florian Bauer earning silver, and Christoph Hafer and Matthias Sommer attaining bronze.{{cite news |last1=Levinsohn |first1=Dan |title=Germany Sweeps Two-Man Bobsled Podium with Friedrich, Lochner, Hafer |url=https://www.nbcolympics.com/news/recap-two-man-final-heats |access-date=19 February 2022 |agency=[[NBC Sports]] |date=15 February 2022 |archive-date=19 March 2023 |archive-url=https://web.archive.org/web/20230319113128/https://www.nbcolympics.com/news/recap-two-man-final-heats |url-status=live }}",0.83861834
"thumb|Medals of 2022 Winter Olympics Norway finished at the top of the medal table for the second successive Winter Olympics, winning a total of 37 medals, of which 16 were gold, setting a new record for the largest number of gold medals won at a single Winter Olympics. Germany finished second with 12 golds and 27 medals overall, and the host nation China finished third with nine gold medals, marking their most successful performance in Winter Olympics history. The team representing the ROC ended up with the second largest number of medals won at the Games, with 32, but finished ninth on the medal table, as only six gold medals were won by the delegation. Traditional Winter powerhouse Canada; despite having won 26 medals, only four of them were gold, resulting in a finish outside the top ten in the medal table for the first time since 1988 (34 years).{{cite news|first=Spencer|last=Donna|url=https://www.cbc.ca/sports/olympics/winter/beijing-2022-team-canada-wrap-1.6358707|title=Canada caps COVID Olympic Winter Games in Beijing with 26 medals, including 4 gold|agency=[[Canadian Press]]|date=20 February 2022|website=www.cbc.ca/|publisher=[[CBC Sports]]|access-date=22 February 2022|archive-date=19 March 2023|archive-url=https://web.archive.org/web/20230319113124/https://www.cbc.ca/sports/olympics/winter/beijing-2022-team-canada-wrap-1.6358707|url-status=live}}{{cite news|date=20 February 2022|title=Canada finish outside medals top 10 for first time in 34 years|url=https://en.as.com/en/2022/02/20/olympic_games/1645368256_325990.html|work=[[Diario AS]]|location=Madrid, Spain|access-date=22 February 2022|archive-date=19 March 2023|archive-url=https://web.archive.org/web/20230319113119/https://en.as.com/en/2022/02/20/olympic_games/1645368256_325990.html|url-status=live}}",0.8366928
"Norway finished at the top of the medal table for the second successive Winter Olympics, winning a total of 37 medals, of which 16 were gold, setting a new record for the largest number of gold medals won at a single Winter Olympics. The host nation China finished third with nine gold medals and also eleventh place by total medals won, marking its most successful performance in Winter Olympics history.{{cite news|last=Church|first=Ben|date=20 February 2022|title=Norway tops Beijing 2022 medal table after record-breaking performance|url=https://www.cnn.com/2022/02/20/sport/norway-beijing-2022-medal-table-spt-intl/index.html|work=[[CNN]]|location=Atlanta, Georgia, U.S.|access-date=22 February 2022|archive-date=21 February 2022|archive-url=https://web.archive.org/web/20220221232935/https://www.cnn.com/2022/02/20/sport/norway-beijing-2022-medal-table-spt-intl/index.html|url-status=live}}",0.833906
"Overall 29 nations received at least one medal, and 23 of them won at least one gold medal. Athletes from Norway won the most medals overall, with 37, and the most gold medals, with 16. The latter record was the highest gold medal tally at a single Winter Games.{{cite news |last1=Stuhlbarg |first1=Nate |title=Norway Retains Title with most Medals at 2022 Winter Olympics |url=https://www.nbcolympics.com/news/norway-retains-title-most-medals-2022-winter-olympics |access-date=27 March 2022 |agency=[[NBC Sports]] |date=20 February 2022 |archive-date=20 February 2022 |archive-url=https://web.archive.org/web/20220220195122/https://www.nbcolympics.com/news/norway-retains-title-most-medals-2022-winter-olympics |url-status=live }} Host nation China won nine gold medals surpassing its gold medal tally of five during the 2010 winter edition.{{cite web |title=China, Japan Set New Medal Marks in Winter Olympics |url=https://ocasia.org/news/2816-china-japan-set-new-medal-marks-in-winter-olympics.html |publisher=[[Olympic Council of Asia]] |access-date=12 July 2022 |archive-date=17 February 2023 |archive-url=https://web.archive.org/web/20230217005811/https://ocasia.org/news/2816-china-japan-set-new-medal-marks-in-winter-olympics.html |url-status=live }} Athletes from that nation also won 15 medals overall, which eclipsed its record of 11 at both the 2006 and 2010 winter editions.{{cite news |last1=Stuhlbarg |first1=Nate |title=Norway Retains Title with Most Medals at 2022 Winter Olympics |url=https://www.nbcolympics.com/news/norway-retains-title-most-medals-2022-winter-olympics |access-date=27 March 2022 |agency=[[NBC Sports]] |date=20 February 2022 |archive-date=20 February 2022 |archive-url=https://web.archive.org/web/20220220195122/https://www.nbcolympics.com/news/norway-retains-title-most-medals-2022-winter-olympics |url-status=live }}",0.8283325


## 3.Ask
With the search function above, we can now automatically retrieve relevant knowledge and insert it into messages to GPT.

Below, we define a function `AskAsync` that:

 - Takes a user query
 - Searches for text relevant to the query
 - Stuffs that text into a message for GPT
 - Sends the message to GPT
 - Returns GPT's answer

The `AskAsync` method starts by calling the `SearchAsync` method with the user's question and a dataset about the 2022 Winter Olympics (`olympicsData`). The `SearchAsync` method searches the dataset for relevant information and returns a list of search results.

Next, the method constructs a string `articles` that contains all the search results. Each search result is formatted as a section of a Wikipedia article. The search results are joined together with newline characters in between.

The method then constructs a `userQuestion` string that instructs the AI to use the articles to answer the question. If the answer cannot be found in the articles, the AI is instructed to respond with "I could not find an answer."

The `userQuestion` string is then used to create an instance of `ChatCompletionsOptions`. This object is used to specify the parameters for a chat completion request to the OpenAI API. The `Messages` property of the object is set to a list that contains a system message and a user message. The system message instructs the AI that it answers questions about the 2022 Winter Olympics. The user message is the `userQuestion` string. The `Temperature` property is set to 0, which means that the AI will generate more deterministic responses. The `MaxTokens` property is set to 3500, which limits the length of the AI's response. The `DeploymentName` property is set to `chatDeployment`, which likely specifies the deployment of the chat model.

The method then makes an asynchronous request to the OpenAI API to get chat completions. The `GetChatCompletionsAsync` method of the `client` object is used to make this request. The method takes the `ChatCompletionsOptions` instance as a parameter.

Finally, the method processes the response from the OpenAI API to extract the AI's answer. The `Value.Choices.FirstOrDefault()?.Message?.Content` expression is used to get the content of the first choice in the response. The method then returns this answer.

In [13]:
var tokenizer = await Tokenizer.CreateAsync(TokenizerModel.gpt35);

public async Task<string> AskAsync(string question){

    var searchResults = await SearchAsync(question, olympicsData);

    var articles = string.Join("\n", searchResults.Select(s => $"""
    Wikipedia article section:
    {s.Text}

    """));

    var userQuestion = $"""""
                Use the below articles on the 2022 Winter Olympics to answer the subsequent question. If the answer cannot be found in the articles, write "I could not find an answer."
                
                
                {articles}
                

                Question: {question}
                """"";

    var options= new ChatCompletionsOptions{
        Messages =
            {
                new ChatRequestSystemMessage(@"You answer questions about the 2022 Winter Olympics."),
                new ChatRequestUserMessage(userQuestion)
            },
        Temperature = 0f,
        MaxTokens = 3500,
        DeploymentName = chatDeployment
    };

    var response = await client.GetChatCompletionsAsync(options);

    var answer = response.Value.Choices.FirstOrDefault()?.Message?.Content;  
    return answer;
}

In [14]:
await AskAsync("How many gold medals in total?")

The athletes from Norway won a total of 16 gold medals at the 2022 Winter Olympics.

In [15]:
await AskAsync("Where did the 2022 winter Olympics took place?")

The 2022 Winter Olympics took place in Beijing, China.