# lab3-RAG-KernelMemory-Embedding


You might have seen a lots of applications like "Chat to your own data" or so on. This is the most common LLM architecture parent name RAG . A simple RAG flow will be like

1 Retrieval:
retrievs relevant information from a pre-existing database, knowledge graph, or corpus. This retrieved information serves as contextual knowledge for the generative model.
The retrieval step involves querying the database using techniques like keyword search, semantic similarity, or more advanced methods like dense retrieval.

2 Augmentation:

The retrieved information is then integrated or augmented into the generative model.
This augmentation process enhances the generative model's understanding of the context by providing relevant background information.

3 Generation:

With the augmented context, the generative model produces output.
This output could be in the form of text generation, such as answering a question, completing a sentence, or generating a full document.

4 Refinement (optional):
Optionally, the generated output can undergo refinement or post-processing steps to ensure coherence, correctness, and fluency.
Refinement techniques may include language model fine-tuning, paraphrasing, or other forms of text improvement.

5 Output:

The final output is delivered to the user or downstream application.
This output benefits from both the generative capabilities of the model and the contextual knowledge retrieved during the process.




## Implement a simple RAG 


### Embeddings
The first step is to load our data into a vector store. 

Text information is saved as a long vector of numbers, known as "embeddings." 
Meaning similarity of store text can be measured by the distance between 2 vectors in a high-dimensional space. 
When a query is made, it's converted into an embedding vector and compared to existing vectors to find similar matches. 
Semantic memory provides matches ranked by similarity rather than exact matches
[Ref](https://learn.microsoft.com/en-us/semantic-kernel/memories/embeddings)


### Kernel Memory
We are going to utilize Kernel Memory to perform all RAG tasks behind the scene. 

1. Extract text: recognize the file format and extract the information
2. Partition the text in small chunks, to optimize search
3. Extract embedding using an LLM embedding generator
4. Save embedding into a vector index such as Azure AI Search, Qdrant or other DBs.

[Ref](https://github.com/microsoft/kernel-memory?tab=readme-ov-file)

Note, Kernel Memory is an independent project which was originated from Semantic Kernel then seperated out


In [7]:
#r "nuget: Microsoft.KernelMemory.core,  0.29.240219.2"
#!import config/Settings.cs 


In [13]:
using Microsoft.KernelMemory;

var (useAzureOpenAI, model, azureEndpoint, apiKey, orgId) = Settings.LoadFromFile();

    var embeddingConfig = new AzureOpenAIConfig
    {
        APIKey = apiKey,
        Deployment = "text-embedding-ada-002",
        Endpoint = azureEndpoint,
        APIType = AzureOpenAIConfig.APITypes.EmbeddingGeneration,
        Auth = AzureOpenAIConfig.AuthTypes.APIKey
    };

    var chatConfig = new AzureOpenAIConfig
    {
        APIKey = apiKey,
        Deployment = model,
        Endpoint = azureEndpoint,
        APIType = AzureOpenAIConfig.APITypes.ChatCompletion,
        Auth = AzureOpenAIConfig.AuthTypes.APIKey
    };

var memory = new KernelMemoryBuilder()
    // .WithOpenAIDefaults(env["OPENAI_API_KEY"])
    .WithAzureOpenAITextGeneration(chatConfig)
    .WithAzureOpenAITextEmbeddingGeneration(embeddingConfig)
    .WithSimpleVectorDb()
    .Build<MemoryServerless>();



info: Microsoft.KernelMemory.Handlers.TextExtractionHandler[0]
      Handler 'extract' ready
info: Microsoft.KernelMemory.Handlers.TextPartitioningHandler[0]
      Handler 'partition' ready
info: Microsoft.KernelMemory.Handlers.SummarizationHandler[0]
      Handler 'summarize' ready
info: Microsoft.KernelMemory.Handlers.GenerateEmbeddingsHandler[0]
      Handler 'gen_embeddings' ready, 1 embedding generators
info: Microsoft.KernelMemory.Handlers.SaveRecordsHandler[0]
      Handler save_records ready, 1 vector storages
info: Microsoft.KernelMemory.Handlers.DeleteDocumentHandler[0]
      Handler 'private_delete_document' ready
info: Microsoft.KernelMemory.Handlers.DeleteIndexHandler[0]
      Handler 'private_delete_index' ready
info: Microsoft.KernelMemory.Handlers.DeleteGeneratedFilesHandler[0]
      Handler 'delete_generated_files' ready


In [None]:
await memory.ImportDocumentAsync("sample-SK-Readme.pdf", documentId: "doc001");

var question = "What's Semantic Kernel?";

var answer = await memory.AskAsync(question);

Console.WriteLine($"Question: {question}\n\nAnswer: {answer.Result}");