# Question answering using fusion retriever architecture
This notebook builds ontop of [Question answering using embeddings-based search](Question_answering_using_embeddings.ipynb) but the data will be loaded from wikipedia using [llamaindex](https://www.llamaindex.ai/)

## Installation
Install the Azure Open AI SDK using the below command.

In [1]:
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.12"

In [None]:
#r "nuget:Microsoft.DotNet.Interactive.AIUtilities, 1.0.0-beta.24054.2"

using Microsoft.DotNet.Interactive;
using Microsoft.DotNet.Interactive.AIUtilities;

## Run this cell, it will prompt you for the apiKey, endPoint, and chatDeployment

In [3]:
var azureOpenAIKey = await Kernel.GetPasswordAsync("Provide your OPEN_AI_KEY");

// Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
var azureOpenAIEndpoint = await Kernel.GetInputAsync("Provide the OPEN_AI_ENDPOINT");

// Enter the deployment name you chose when you deployed the model.
var chatDeployment = await Kernel.GetInputAsync("Provide chat deployment name");

### Import namesapaces and create an instance of `OpenAiClient` using the `azureOpenAIEndpoint` and the `azureOpenAIKey`

In [4]:
using Azure;
using Azure.AI.OpenAI;

In [5]:
OpenAIClient client = new (new Uri(azureOpenAIEndpoint), new AzureKeyCredential(azureOpenAIKey.GetClearTextPassword()));

## 1. Prepare search data
We need to use python to load and index data. Read the [guide](https://github.com/dotnet/interactive/blob/main/docs/jupyter-in-polyglot-notebooks.md) to get started with python in Polyglot Notebooks and this [doc](https://github.com/dotnet/interactive/blob/main/docs/adding-jupyter-kernels.md) on how to connect.
First we need to connect a Python Kernel, in this example we are using [Anaconda](https://www.anaconda.com/) based deployment and a conda environment called `AI`.
The environment retuires the following packages:
- [LlamaIndex](https://docs.llamaindex.ai/en/stable/getting_started/installation.html)
- [HuggingFace](https://huggingface.co/docs/huggingface_hub/installation)
- [wikipedia](https://github.com/goldsmith/Wikipedia)

In [6]:
#!connect jupyter --kernel-name python3 --kernel-spec python3 --conda-env AI

The `#!connect jupyter` feature is in preview. Please report any feedback or issues at https://github.com/dotnet/interactive/issues/new/choose.

Kernel added: #!python3

In [17]:
using System.Linq;
using System.Text.Json;
using Microsoft.DotNet.Interactive;
using Microsoft.DotNet.Interactive.Commands;
using Microsoft.DotNet.Interactive.Events;
using Microsoft.DotNet.Interactive.Formatting;

In [18]:
var pythonKernel = Kernel.Root.FindKernelByName("python3");

Exporting values to `python3` kernel

In [7]:
var azureOpenAIKeyAsString = azureOpenAIKey.GetClearTextPassword();

In [8]:
#!set --value @csharp:azureOpenAIKeyAsString --name azureOpenAIKey
#!set --value @csharp:chatDeployment --name chatDeployment
#!set --value @csharp:azureOpenAIEndpoint --name azureOpenAIEndpoint


Now we need to setup the python kernel.
1. import llama_index
2. use [llamahub](https://llamahub.ai/) to load 
    1. [Wikipedia Reader](https://llamahub.ai/l/wikipedia?from=loaders)
    2. [QueryRewritingRetrieverPack](https://llamahub.ai/l/llama_packs-fusion_retriever-query_rewrite)

In [9]:
from llama_index import download_loader
from llama_index import Document
from llama_index import ServiceContext
from llama_index.embeddings import OpenAIEmbedding
from llama_index.text_splitter import SentenceSplitter
from llama_index.extractors import TitleExtractor
from llama_index.ingestion import IngestionPipeline, IngestionCache
from llama_index.llama_pack import download_llama_pack
from llama_index.embeddings import HuggingFaceEmbedding
from llama_index.llms import AzureOpenAI
from typing import Any, Dict, List
from llama_index.readers.base import BaseReader
from llama_index.readers.schema.base import Document
import wikipedia
from llama_index.node_parser import ( SentenceSplitter, SemanticSplitterNodeParser)
from llama_index.ingestion import IngestionPipeline, IngestionCache
from llama_index import VectorStoreIndex
from llama_index.indices.vector_store import VectorStoreIndex
from llama_index.llama_pack.base import BaseLlamaPack
from llama_index.schema import TextNode
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.indices.service_context import ServiceContext
from llama_index.retrievers import QueryFusionRetriever
import nest_asyncio

nest_asyncio.apply()

llm = AzureOpenAI(
    engine= chatDeployment,
    model= chatDeployment,
    temperature=0.0,
    azure_endpoint= azureOpenAIEndpoint,
    api_key= azureOpenAIKey,
    api_version="2023-07-01-preview"
)
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5") 
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)    
QueryRewritingRetrieverPack = download_llama_pack("QueryRewritingRetrieverPack", "./query_rewriting_pack")


Loading documents from wikipedia using [LlamaIndex loading](https://docs.llamaindex.ai/en/stable/understanding/loading/loading.html)

In [10]:
pages = wikipedia.search("2022 winter olympics")
documents = []
for page in pages:
    try:
        wikipedia.set_lang("en")
        page_content = wikipedia.page(page).content
        documents.append(Document(text=page_content))
    except:
        pages.remove(page)

Now we use a pipeline to create a set of nodes and compute embeddings. 

In [11]:
splitter = SemanticSplitterNodeParser(buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model)       
# create the pipeline with transformations
pipeline = IngestionPipeline( transformations=[ splitter, embed_model ])
# run the pipeline
nodes = pipeline.run(documents=documents)

In [12]:
index = VectorStoreIndex(nodes, service_context=service_context)
vector_retriever = index.as_retriever(similarity_top_k=10)
fusion_retriever = QueryFusionRetriever(
        [vector_retriever],
        llm = service_context.llm,
        similarity_top_k=10,
        num_queries=16,  # set this to 1 to disable query generation
        mode="reciprocal_rerank",
        # query_gen_prompt="...",  # we could override the query generation prompt here
        verbose = True
        )

In [19]:
public async Task<string[]> Search(string query){
    await pythonKernel.SendAsync(new SubmitCode(
        $"""    
        retrievedNodes = fusion_retriever.retrieve("{query}")
        articles = []
        for node in retrievedNodes:
            articles.append(node.text)
        """));
    
    var getValue = new RequestValue("articles", JsonFormatter.MimeType);
    var result = await pythonKernel.SendAsync(getValue);
    var returnValueProduced = result.Events.OfType<ValueProduced>().LastOrDefault();
    var json = returnValueProduced.FormattedValue.Value;

    var searchResults =  JsonSerializer.Deserialize<string[]>(json);
    return searchResults;
}

In [20]:
var tokenizer = await Tokenizer.CreateAsync(TokenizerModel.gpt35);

public async Task<string> AskAsync(string question){

    var searchResults = await Search(question);

    var articles = string.Join("\n", searchResults.Select(s => $"""
    Wikipedia article section:
    {s}

    """));

    var userQuestion = $"""""
                Use the below articles on the 2022 Winter Olympics to answer the subsequent question. If the answer cannot be found in the articles, write "I could not find an answer."
                                
                {articles}
                
                Question: {question}
                """"";

    var options= new ChatCompletionsOptions{
        Messages =
            {
                new ChatRequestSystemMessage(@"You answer questions about the 2022 Winter Olympics."),
                new ChatRequestUserMessage(userQuestion)
            },
        Temperature = 0f,
        MaxTokens = 3500,
        DeploymentName = chatDeployment
    };

    var response = await client.GetChatCompletionsAsync(options);

    var answer = response.Value.Choices.FirstOrDefault()?.Message?.Content;  
    return answer;
}

In [21]:
await AskAsync("Where did the 2022 winter Olympics took place?")

Generated queries:
1. Location of 2022 Winter Olympics
2. City that hosted 2022 Winter Olympics
3. Country where 2022 Winter Olympics were held
4. Venues of 2022 Winter Olympics
5. 2022 Winter Olympics host city details
6. Information about the place where 2022 Winter Olympics took place
7. 2022 Winter Olympics location history
8. Details about the 2022 Winter Olympics location
9. Where were the 2022 Winter Olympics held?
10. Host city of the 2022 Winter Olympics
11. 2022 Winter Olympics host country
12. Location and venues of 2022 Winter Olympics
13. Information on 2022 Winter Olympics host city
14. 2022 Winter Olympics location and details
15. Which city hosted the 2022 Winter Olympics?


The 2022 Winter Olympics took place in Beijing, China.

In [22]:
await AskAsync("What countries did take part in the 2022 winter Olympics? Write me the complete list of the countries.")

Generated queries:
1. List of all countries that participated in the 2022 Winter Olympics
2. Which nations competed in the 2022 Winter Olympics?
3. Full list of countries in the 2022 Winter Olympics
4. Names of countries that took part in the 2022 Winter Olympics
5. How many countries participated in the 2022 Winter Olympics?
6. Participating nations in the 2022 Winter Olympics
7. Countries that competed in the 2022 Winter Olympics
8. Complete list of 2022 Winter Olympics participating countries
9. All countries that were in the 2022 Winter Olympics
10. 2022 Winter Olympics participants by country
11. Countries that sent athletes to the 2022 Winter Olympics
12. Which countries were represented in the 2022 Winter Olympics?
13. List of nations that competed in the 2022 Winter Olympics
14. Countries that took part in the 2022 Winter Olympics
15. Full roster of countries in the 2022 Winter Olympics.


I could not find an answer.

In [147]:
await AskAsync("What countries did take part in the 2022 winter Olympics, what months where they held?")

Generated queries:
1. List of countries participating in the 2022 winter Olympics
2. Winter Olympic countries in 2022
3. Which nations competed in the 2022 winter Olympics?
4. Countries involved in the 2022 winter Olympics
5. 2022 winter Olympics participants by country
6. What countries were represented in the 2022 winter Olympics?
7. Nations that took part in the 2022 winter Olympics
8. 2022 winter Olympics: participating countries
9. Countries that competed in the 2022 winter Olympics
10. Winter Olympic nations in 2022
11. List of countries and months for the 2022 winter Olympics


The countries that took part in the 2022 Winter Olympics were not mentioned in the provided articles. However, it is mentioned that Norway led the total medal standings with 39 medals, Germany had 31 medals, Canada had 29 medals, and South Korea won 17 medals. The Winter Olympics were held between 4 and 20 February 2022.