# lab3-KernelMemory-embedding-RAG


You might have seen a lots of applications like "Chat to your own data" or so on. This is the most common LLM architecture parent name RAG . A simple RAG flow will be like

1 Retrieval:
retrievs relevant information from a pre-existing database, knowledge graph, or corpus. This retrieved information serves as contextual knowledge for the generative model.
The retrieval step involves querying the database using techniques like keyword search, semantic similarity, or more advanced methods like dense retrieval.

2 Augmentation:

The retrieved information is then integrated or augmented into the generative model.
This augmentation process enhances the generative model's understanding of the context by providing relevant background information.

3 Generation:

With the augmented context, the generative model produces output.
This output could be in the form of text generation, such as answering a question, completing a sentence, or generating a full document.

4 Refinement (optional):
Optionally, the generated output can undergo refinement or post-processing steps to ensure coherence, correctness, and fluency.
Refinement techniques may include language model fine-tuning, paraphrasing, or other forms of text improvement.

5 Output:

The final output is delivered to the user or downstream application.
This output benefits from both the generative capabilities of the model and the contextual knowledge retrieved during the process.




## Implement a simple RAG using KernelMemory - the easy path 


### Embeddings
The initial step involves loading our data into a vector store.

Textual information is encoded as long vectors of numbers, termed "embeddings." The semantic similarity of stored text is gauged by the distance between two vectors in a high-dimensional space. Upon querying, the input is transformed into an embedding vector and contrasted against existing vectors to identify similar matches. Semantic memory offers matches ranked by similarity rather than exact matches
[Read More](https://learn.microsoft.com/en-us/semantic-kernel/memories/embeddings)


### Kernel Memory
In this example we are going to utilize Kernel Memory to perform all RAG tasks behind the scene. 

1. Extract text: recognize the file format and extract the information
2. Partition the text in small chunks, to optimize search
3. Extract embedding using an LLM embedding generator
4. Save embedding into a vector index such as Azure AI Search, Qdrant or other DBs.

[Read More](https://github.com/microsoft/kernel-memory?tab=readme-ov-file)

Note, Kernel Memory is an independent project which was originated from Semantic Kernel then seperated out


In [1]:
#r "nuget: Microsoft.KernelMemory.core,  0.29.240219.2"
#!import config/Settings.cs 



In [2]:
using Microsoft.KernelMemory;

var (useAzureOpenAI, model, azureEndpoint, apiKey, orgId) = Settings.LoadFromFile();

    var embeddingConfig = new AzureOpenAIConfig
    {
        APIKey = apiKey,
        Deployment = "text-embedding-ada-002",
        Endpoint = azureEndpoint,
        APIType = AzureOpenAIConfig.APITypes.EmbeddingGeneration,
        Auth = AzureOpenAIConfig.AuthTypes.APIKey
    };

    var chatConfig = new AzureOpenAIConfig
    {
        APIKey = apiKey,
        Deployment = model,
        Endpoint = azureEndpoint,
        APIType = AzureOpenAIConfig.APITypes.ChatCompletion,
        Auth = AzureOpenAIConfig.AuthTypes.APIKey
    };

var memory = new KernelMemoryBuilder()
    // .WithOpenAIDefaults(env["OPENAI_API_KEY"])
    .WithAzureOpenAITextGeneration(chatConfig)
    .WithAzureOpenAITextEmbeddingGeneration(embeddingConfig)
    .WithSimpleVectorDb()
    .Build<MemoryServerless>();



info: Microsoft.KernelMemory.Handlers.TextExtractionHandler[0]
      Handler 'extract' ready
info: Microsoft.KernelMemory.Handlers.TextPartitioningHandler[0]
      Handler 'partition' ready
info: Microsoft.KernelMemory.Handlers.SummarizationHandler[0]
      Handler 'summarize' ready
info: Microsoft.KernelMemory.Handlers.GenerateEmbeddingsHandler[0]
      Handler 'gen_embeddings' ready, 1 embedding generators
info: Microsoft.KernelMemory.Handlers.SaveRecordsHandler[0]
      Handler save_records ready, 1 vector storages
info: Microsoft.KernelMemory.Handlers.DeleteDocumentHandler[0]
      Handler 'private_delete_document' ready
info: Microsoft.KernelMemory.Handlers.DeleteIndexHandler[0]
      Handler 'private_delete_index' ready
info: Microsoft.KernelMemory.Handlers.DeleteGeneratedFilesHandler[0]
      Handler 'delete_generated_files' ready


In [3]:
await memory.ImportDocumentAsync("./pdf/TheHappyPrince.pdf", documentId: "doc001");

var question = "What is the name of the Prince?";

var answer = await memory.AskAsync(question);

Console.WriteLine($"Question: {question}\n\nAnswer: {answer.Result}");

Question: What is the name of the Prince?

Answer: The name of the Prince is the Happy Prince.


In [4]:
 question = "Who the Prince is talking to in the story?";
 answer = await memory.AskAsync(question);

Console.WriteLine($"Question: {question}\n\nAnswer: {answer.Result}");

Question: Who the Prince is talking to in the story?

Answer: In the story, the Prince is not directly conversing with anyone. The narrative describes the interactions between a Swallow and various elements of the story, such as the Reed and the statue of the Happy Prince. The Swallow, upon deciding to rest between the feet of the statue of the Happy Prince, is the character who experiences the events, such as feeling drops of water fall on him, which he initially mistakes for rain. The confusion arises from the fact that the Swallow is the one engaging with the environment and the statue, not the Prince himself engaging in a conversation.


## Implement a simple RAG - use Semantic Memory


In [4]:
#r "nuget: Microsoft.SemanticKernel, 1.3.1"
#r "nuget: Microsoft.SemanticKernel.Plugins.Memory, 1.3.1-alpha"
#r "nuget: System.Linq.Async, 6.0.1"
#r "nuget: Microsoft.SemanticKernel.Plugins.Core, 1.3.1-alpha"
#r "nuget: pdfpig, 0.1.8"
#!import config/Settings.cs
#!import lib/Usings.cs
#!import plugins/PdfFilesPlugin.cs

Instead of KernelMemory, we are using Semantic Memory here. For the difference between KernelMemory and Semantic Memory, chere [here](https://microsoft.github.io/kernel-memory/#kernel-memory-km-and-semantic-memory-sm);


In [5]:
using Microsoft.SemanticKernel.Memory;
var (useAzureOpenAI, model, azureEndpoint, apiKey, orgId) = Settings.LoadFromFile();
#pragma warning disable SKEXP0003, SKEXP0011, SKEXP0052
var memoryBuilder = new MemoryBuilder();

var memory = memoryBuilder
    .WithMemoryStore(new VolatileMemoryStore())
    .WithAzureOpenAITextEmbeddingGeneration("text-embedding-ada-002", azureEndpoint, apiKey, model)
    .Build();

In [17]:
#pragma warning disable SKEXP0055, SKEXP0003
using static Microsoft.SemanticKernel.Text.TextChunker;
using Microsoft.SemanticKernel.Text;
public sealed class RAGplugin

{
  const string CollectionName = "KnowledgeBase";
  private ISemanticTextMemory memory;
  public RAGplugin(ISemanticTextMemory memory)
  {
    this.memory = memory;
  } 

  private IEnumerable<string> SplitText(string result)
  {
    var lines = TextChunker.SplitPlainTextLines(result, 40);
    var paragraphs = TextChunker.SplitPlainTextParagraphs(lines, 120, 20);
    return paragraphs;
  }

  [KernelFunction, Description("Save Knowledge into KnowledgeBase.")]
  public async void Memorize(
    [Description("The content to memorize")]
    string content)
  {
    Console.WriteLine($"\t reading in documentation...");
    var paragraphs = SplitText(content);
    foreach (var paragraph in paragraphs)
    {
      await memory.SaveInformationAsync(
        CollectionName,
        paragraph, 
        id: Guid.NewGuid().ToString()
      );
    }
    
  }

  [KernelFunction, Description("Search KnowledgeBase for data related to the question.")]
  public async Task<string[]> SearchKnowledgeBase([Description("question to be answered")]string question)
  {
    Console.WriteLine($"\t thinking ...");
    var memoryResult =  memory.SearchAsync(CollectionName, question, limit: 2, minRelevanceScore: 0.5);
    var data = await memoryResult.Select(m => m.Metadata.Text).ToArrayAsync<string>();
    return data;
  }
}

In [8]:
using Microsoft.SemanticKernel;
using Kernel = Microsoft.SemanticKernel.Kernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
var builder = Kernel.CreateBuilder();

// Configure AI service credentials used by the kernel
var (useAzureOpenAI, model, azureEndpoint, apiKey, orgId) = Settings.LoadFromFile();

builder.AddAzureOpenAIChatCompletion(model, azureEndpoint, apiKey);
builder.Plugins.AddFromType<PdfFilesPlugin>();
builder.Plugins.AddFromObject(new RAGplugin(memory));

var kernel = builder.Build();

In [18]:
var sysytemMessage = @"
  You are a professional assistant that helps worker on answer questions about work safty and protection equipments.
  When a user ask a work safty and health related question, search your Knowledge Base, find relevent information.
  Then generate a final answer based on what you have learned from Knowledge Base. 
  If the user question is not related to work safty, health or required equipments, politly rejects his question.
  If you don't know the answer to any legit question, apology and offer to contact your manager for help
  Before you attempt to answer any question, make sure you have memorized all the work saft, health related  regulations in the ./data folder. 
  Aslo make sure you understand the question and have enough information to answer it.
  ";

ChatHistory history = [];
history.AddSystemMessage(sysytemMessage);

// Get chat completion service
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();

// Enable auto function calling
OpenAIPromptExecutionSettings openAIPromptExecutionSettings = new()
{
    ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions
};

var input = "";
while(true) {
  input = await InteractiveKernel.GetInputAsync("Please type in your question here, type exit to stop");
  Console.WriteLine("[User] > " + input);
  if (input == "exit") {
    break;
  }
  history.AddUserMessage(input);
  // Get the response from the AI
  var result = await chatCompletionService.GetChatMessageContentAsync(
        history,
        executionSettings: openAIPromptExecutionSettings,
        kernel: kernel);
  
  // Add the message from the agent to the chat history
  history.AddMessage(result.Role, result.Content);

  Console.WriteLine("[Assistant] > " + result);
}


[User] > Do I need a steel toe boots while working in construction site?
Search result: 2 items found.
[Assistant] > Yes, steel toe boots are generally required while working on a construction site. According to OSHA regulations, specifically reference number 136 on Foot protection, employers are required to provide personal protective equipment (PPE) such as foot protection, which includes steel toe boots, to the employees at no cost. This safety equipment is essential in protecting workers from recognized hazards that are likely to cause serious physical harm or death. The General Duty Clause, Section 5(a)(1) of the OSHA Act, also mandates that employers ensure a safe working environment for their workers, which would include adequate foot protection when working in hazardous settings like construction sites.
[User] > how about at office
Search result: 2 items found.
[Assistant] > Steel toe boots are not typically required while working in an office environment. The need for personal

Error: Command cancelled.

In [16]:
history.Display();
