## Introduction
This notebook is an example of a RAG using an InMemory volatile vector database. That will allow us to see the whole process of setting up a RAG before deploying it to Azure in a more industrialized way.

We will use the [Kernel Memory (KM)](https://microsoft.github.io/kernel-memory/) which is a multi-modal AI Service specialized in the efficient indexing of documents and information through custom continuous data pipelines, with support for Retrieval Augmented Generation (RAG), synthetic memory, prompt engineering, and custom semantic memory processing.

KM supports PDF and Word documents, PowerPoint presentations, Images, Spreadsheets and more, extracting information and generating memories by leveraging Large Language Models (LLMs), Embeddings and Vector storage.

### Login to Azure
First, you need to login to your Azure account. You can do this by running the following command and following the instructions that will be displayed:
```bash
az login
```

It's needed as we will be using Managed Identity to authenticate to Azure services.

In [None]:
az login > null
Write-Host("Successfully logged in to Azure!");

### Install the required nuget packages

In [None]:
#r "nuget: Microsoft.KernelMemory.Core"
#r "nuget: dotenv.net"
#r "nuget: Azure.AI.DocumentIntelligence, 1.0.0-beta.2"
#r "nuget: Azure.Identity, 1.11.0"

### Add the neccecary using statements

In [None]:
using Azure;
using Azure.Identity;
using Azure.AI.DocumentIntelligence;
using Microsoft.KernelMemory;
using dotenv.net;
using System;
using System.IO;

### Load the environment variables from the .env file
- Rename the file .env.sample to .env
- Add the required values to the .env file

In [None]:
DotEnv.Load();
var env = DotEnv.Read();

### Setup of the Kernel Memory using InMemory Serverless Vector Database

In [None]:
var endpoint = env["AZURE_OPENAI_ENDPOINT"];
var config = new AzureOpenAIConfig(){
    APIType = AzureOpenAIConfig.APITypes.ChatCompletion,
    Auth = AzureOpenAIConfig.AuthTypes.AzureIdentity,
    Endpoint = endpoint,
    Deployment = env["AZURE_OPENAI_CHAT_DEPLOYMENT"],
};
var embdeddingConfig = new AzureOpenAIConfig(){
    APIType = AzureOpenAIConfig.APITypes.EmbeddingGeneration,
    Auth = AzureOpenAIConfig.AuthTypes.AzureIdentity,
    Endpoint = endpoint,
    Deployment = env["AZURE_OPENAI_EMBEDDING_DEPLOYMENT"],
};

var memory = new KernelMemoryBuilder()
    .WithAzureOpenAITextGeneration(config)
    .WithAzureOpenAITextEmbeddingGeneration(embdeddingConfig)
    .Build<MemoryServerless>();

### Initialize the Azure Document Intelligence to perform OCR on the PDF files

In [None]:
var docIntelEndpoint =  env["AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT"];
var credential = new DefaultAzureCredential();
var docIntelClient = new DocumentIntelligenceClient(new Uri(docIntelEndpoint), credential);

### Perform the OCR on the PDF files
- Read the documents from the data folder
- Perform the OCR on the documents
- Save the OCR results to the data folder as Markdown files

In [None]:
string folderPath = "../data";
string[] pdfFiles = Directory.GetFiles(folderPath, "*.pdf", SearchOption.AllDirectories);

foreach (var pdfFile in pdfFiles)
{
    var markdownFilePath = $"{pdfFile}.md";
    if (File.Exists(markdownFilePath))
    {
        Console.WriteLine($"Skipping {pdfFile} because it already has a markdown file");
        return;
    }

    try
    {
        using var fileStream = File.OpenRead(pdfFile);
        var binaryData = BinaryData.FromStream(fileStream);
        var analyzeRequest = new AnalyzeDocumentContent
        {
            Base64Source = binaryData
        };
        var result = await docIntelClient.AnalyzeDocumentAsync(waitUntil: WaitUntil.Completed, "prebuilt-layout", analyzeRequest: analyzeRequest, outputContentFormat: ContentFormat.Markdown);
        var markdownContent = result.Value.Content;
        await File.WriteAllTextAsync(markdownFilePath, markdownContent);
        Console.WriteLine($"Created: {markdownFilePath}");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Erreur lors du traitement OCR de {pdfFile}: {ex.Message}");
    }
}

### Load the Markdown files into the Kernel Memory

In [None]:
string[] markdownFiles = Directory.GetFiles(folderPath, "*.md", SearchOption.AllDirectories);
foreach (string filePath in markdownFiles)
{
    string fileName = Path.GetFileName(filePath);
    string fullPath = Path.GetFullPath(filePath);
    
    await memory.ImportDocumentAsync(fullPath, documentId: fileName);

    Console.WriteLine("Successfully imported File Name: " + fileName);
}

### Utility function that ask question to the Memory Vector Database and return the answer provided by Azure OpenAI model

In [None]:
async Task AskQuestionAsync(string question)
{
    var answer = await memory.AskAsync(question);
    Console.WriteLine($"Question: {question}\n\nAnswer: {answer.Result}");

    Console.WriteLine("\nSources:");

    foreach (var x in answer.RelevantSources)
    {
        Console.WriteLine($"  - {x.SourceName}  - {x.Link} [{x.Partitions.First().LastUpdate:D}]");
    }
}

### The previous question that cannot be answered as the response is in an image 
- Now with the OCR with Markdown output, we can ask the question again and get the answer
![Employee's cost per paycheck](../docs/images/00_cost_per_employee.png)


In [None]:
await AskQuestionAsync("what's the employee's cost per paycheck?");

Question: what's the employee's cost per paycheck?

Answer: The employee's cost per paycheck for the Northwind Standard and Northwind Health Plus plans are as follows:

- Northwind Standard:
  - Employee Only: $45.00
  - Employee +1: $65.00
  - Employee +2 or more: $78.00

- Northwind Health Plus:
  - Employee Only: $55.00
  - Employee +1: $71.00
  - Employee +2 or more: $89.00

Sources:
  - Benefit_Options.pdf.md  - default/Benefit_Options.pdf.md/a0b4db647ee34e4d8c8c3e3c65acaea8 [Tuesday, April 9, 2024]
  - employee_handbook.pdf.md  - default/employee_handbook.pdf.md/a18412aa77dc436d86b5455adcd2c8aa [Tuesday, April 9, 2024]
  - role_library.pdf.md  - default/role_library.pdf.md/f686ba0e3def43c8843685e31efd9c74 [Tuesday, April 9, 2024]
