## Advanced Text Processing and Retrieval-Augmented Generation with Azure AI Search and C# in .NET

This Jupyter notebook offers a concise exploration into advanced text processing and retrieval-augmented generation (RAG) using C# and Azure AI Search. Key highlights include:

- **Text Chunking**: Explains breaking down large texts into smaller chunks for efficient language model processing.
- **Text Embeddings Creation**: Guides on generating text embeddings using Azure OpenAI service.
- **Vector Index Setup on Azure AI Search**: Instructs on establishing a vector index on Azure AI Search for enhanced search capabilities.
- **Uploading Embeddings**: Covers the process of uploading text embeddings to Azure AI Search.
- **Vector Similarity Searches**: Showcases conducting searches using vector similarity in Azure AI Search.
- **Retrieval-Augmented Generation with GPT**: Demonstrates enhancing GPT model responses using external data and RAG techniques.

This notebook acts as a hands-on tutorial for individuals interested in implementing Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs) within a .NET framework environment.

### Setup

Create a .env file with the below structure. The values used in the sample below are for demonstration purposes only. Please replace them with your own values.

```
OPENAI_API_KEY="9f2b47e4c8a5461db2e4f3a1b517f2cd"
OPENAI_API_BASE="https://openai-custom-url.openai.azure.com"
OPENAI_API_VERSION="2022-12-01"
OPENAI_API_TYPE="azure"
OPENAI_CHAT_DEPLOYMENT_NAME="chat-deployment-example"
OPENAI_CHAT_API_VERSION="2023-03-15-preview"
AZURE_SEARCH_SERVICE_ENDPOINT = "https://examplesearchservice.search.windows.net"
AZURE_SEARCH_ADMIN_KEY = "3pR4x7q9Yt0HlZ5m8nB2UaX1wQ6cD8eFgHiJkLmNoPqRsTcUvWZg"
```

Install and configure Polyglot Notebooks extension for VSCVode. This extension allows you to run C# code in Jupyter notebooks. For more information, please visit [Polyglot Notebooks](https://marketplace.visualstudio.com/items?itemName=donjayamanne.polyglot).

In [None]:
using System.IO;
using System.Collections.Generic;

// Function to read environment variables from a .env file.
Dictionary<string, string> ReadEnvFile(string filePath)
{
    var dict = new Dictionary<string, string>();
    foreach (var line in File.ReadAllLines(filePath))
    {
        var parts = line.Split('=', 2);
        if (parts.Length == 2)
        {
            var key = parts[0].Trim();
            var value = parts[1].Trim().Trim('"'); // Remove any double quotes
            dict[key] = value;
        }
    }
    return dict;
}

// Read the environment variables from the .env file
var envVars = ReadEnvFile(".env");

// Retrieve the OpenAI API base URL and key from the environment variables
string endpoint = envVars["OPENAI_API_BASE"];
string apiKey = envVars["OPENAI_API_KEY"];

### Text Chunking with Semantic Kernal

Text chunking in the context of Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) implementations is essentially the process of breaking down a large piece of text into smaller, manageable parts or "chunks." This is done to make the processing and understanding of the text more efficient for the model. Crucially, this process is intertwined with the context window size of the model, which defines the amount of text the model can consider at any one time. By dividing text into chunks that fit within this window, the model can process and comprehend each segment effectively without losing vital information that might be omitted if the text exceeds the window size. Further research into various chunking strategies could be beneficial, especially in ensuring that these chunks are structured in a way that preserves the meaningfulness and coherence of the text. This would enhance the model's ability to interpret and generate more contextually rich and accurate responses.

In [None]:
#r "nuget: Microsoft.SemanticKernel, 1.0.1" 

In this section, we're loading a local text file named "azure-functions-June-2023-Updates.txt," which serves as an example of an external data source. We're dividing the content of this file into numerous segments, with each line containing 13 tokens and each paragraph comprising 250 tokens. It's advisable to experiment with these parameters to determine the optimal settings for your specific use cases and data sources.

In [None]:
using Microsoft.SemanticKernel.Text;

// Read the entire content of the RAG update sample file 
string filePath = "azure-functions-June-2023-Updates.txt";
string updateText = await File.ReadAllTextAsync(filePath);

// Disable warning SKEXP0055 
// 'Microsoft.SemanticKernel.Text.TextChunker' is for evaluation purposes only 
// and is subject to change or removal in future updates.
#pragma warning disable SKEXP0055 

// Split the update text into paragraphs
// MaxTokensPerLine is set to 128 and MaxTokensPerParagraph is set to 250
List<string> paragraphs = TextChunker.SplitPlainTextParagraphs(
    TextChunker.SplitPlainTextLines(updateText, 128), //MaxTokensPerLine
    250 //MaxTokensPerParagraph
);

// Re-enable warning SKEXP0055
#pragma warning restore SKEXP0055 

Console.WriteLine($"Number of chunks: {paragraphs.Count}");

### Create Embedding

Embeddings are numerical vectors or arrays that encapsulate the meaning and context of tokens processed and generated by the model. These embeddings originate from the model's parameters or weights. Once created, they are stored in a vector database. This storage allows for advanced semantic and vector searches, facilitating the retrieval of information closely related to a given prompt.

In [None]:
// Import required packages.
#r "nuget: Azure.AI.OpenAI, 1.0.0-beta.12" 
#r "nuget: Azure"

In this step, we are creating embeddings for each text chunk we have previously segmented. Both the embeddings and their corresponding original text segments are being compiled into an in-memory collection of documents. This compiled list will subsequently be stored in a vector database in the following step.

In [None]:
using Azure;
using Azure.AI.OpenAI;
using System.Linq;

AzureKeyCredential credentials = new (apiKey);
OpenAIClient openAIClient = new (new Uri(endpoint), credentials);

// Initialize a list to hold the embedding documents
List<Dictionary<string, object>> inputDocuments = new();

// Iterate over each paragraph in the chunks collection
foreach (var paragraph in paragraphs)
{
    // Initialize a new dictionary to hold the current embedding document
    Dictionary<string, object> currentDocument = new();

    EmbeddingsOptions embeddingOptions = new()
    {
        // Specify the deployment name for the embedding model
        DeploymentName = "text-embedding-ada-002",
        Input = { paragraph },
    };

    // Get the embeddings for the current paragraph
    var returnValue = openAIClient.GetEmbeddings(embeddingOptions);
    float[] embeddingVector = returnValue.Value.Data[0].Embedding.ToArray();
    
    // Add the paragraph and its corresponding embeddings to the current document
    currentDocument["id"] = Guid.NewGuid().ToString();
    currentDocument["content"] = paragraph;
    currentDocument["contentVector"] = embeddingVector;
    inputDocuments.Add(currentDocument);
}

// Get the embeddings for the first document in the list and pring it.
float[] firstDocumentVector = (float[])inputDocuments.First()["contentVector"];
string embeddingString = String.Join(", ", firstDocumentVector);
Console.WriteLine(embeddingString);

### Create Vector Index on Azure AI Search

Vector search is a method in information retrieval that utilizes numerical representations of content for search applications. In this approach, the content is represented in numeric form, allowing the search engine to identify and match vectors that most closely resemble the query. This method does not rely on exact term matching, as it operates on the principle of similarity between vectors.

Recently, Azure AI Search (formerly known as Azure Cognitive Search) has introduced vector search as a new feature. This capability enhances Azure AI Search by enabling the indexing, storage, and retrieval of vector embeddings directly from a search index.

In [None]:
#r "nuget: Azure.Search.Documents, 11.5.1"
#r "nuget: Azure.Identity, 1.10.4"

In this step, we're setting up a vector index named 'vectorindex'. This index will consist of three fields:

1. **ID:** Serves as a unique identifier for each document.
2. **Content:** Contains the original text of each document. It's crucial to store the original text because we cannot reconstruct the original text from its embedding.
3. **ContentVector:** Stores the embedding vector generated for each text chunk.

We designate the 'Content' field as a SearchableField in anticipation of performing a hybrid search. Hybrid search is an advanced technique that combines traditional text search with vector search in a single query. Text search operates on plain text in 'searchable' and 'filterable' fields, while vector search applies to the content in vector fields.

Real-world and benchmark dataset [tests have shown](https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-cognitive-search-outperforming-vector-search-with-hybrid/ba-p/3929167) that hybrid retrieval, which incorporates semantic ranking, significantly enhances search relevance, offering a robust approach to information retrieval.

In [None]:
using Azure.Search.Documents;
using Azure.Search.Documents.Indexes;
using Azure.Search.Documents.Indexes.Models;
using Azure.Search.Documents.Models;
using Azure;

// Define the Azure Search service endpoint and admin key
string serviceEndpoint = envVars["AZURE_SEARCH_SERVICE_ENDPOINT"];
string searchAdminKey = envVars["AZURE_SEARCH_ADMIN_KEY"];

string indexName = "vectorindex";
var searchCredential = new AzureKeyCredential(searchAdminKey);
var indexClient = new SearchIndexClient(new Uri(serviceEndpoint), searchCredential);
var searchClient = indexClient.GetSearchClient(indexName);

// Define the vector search profile and HNSW configuration. We will use the default values.
string vectorSearchProfile = "my-vector-profile";
string vectorSearchHnswConfig = "my-hnsw-vector-config";

// Create a new SearchIndex Definition
SearchIndex searchIndex = new(indexName)
{
    VectorSearch = new()
    {
        Profiles =
        {
            new VectorSearchProfile(vectorSearchProfile, vectorSearchHnswConfig)
        },
            Algorithms =
        {
            new HnswAlgorithmConfiguration(vectorSearchHnswConfig)
        }
    },
    Fields =
    {
        new SimpleField("id", SearchFieldDataType.String) 
        { 
            IsKey = true, 
            IsFilterable = true, 
            IsSortable = true
        },
        new SearchableField("content") 
        { 
            IsFilterable = true 
        },
        new SearchField("contentVector", SearchFieldDataType.Collection(SearchFieldDataType.Single))
        {
            IsSearchable = true,
            // Azure OpenAI model, text-embedding-ada-002 with 1,536 dimensions means one document would consume 1,536 floats.
            VectorSearchDimensions = 1536,
            VectorSearchProfileName = vectorSearchProfile
        }
    }
};

indexClient.CreateOrUpdateIndex(searchIndex);

### Upload Embeddings to Azure AI Search

In this phase, we are uploading both the embeddings and the original text to our index. This operation is flexible and can be repeated as often as necessary, allowing for the continuous integration of new information and data into the index. This adaptability ensures that the index remains up-to-date and reflective of the latest content and insights.

In [None]:
await searchClient.IndexDocumentsAsync(IndexDocumentsBatch.Upload(inputDocuments));

### Performing a vector similarity search

Now we're moving on to the testing phase. We'll use a sample query to search within our vector database, aiming to find information relevant to the query. This process is a key part of implementing the Retrieval-Augmented Generation (RAG) pipeline. By doing so, we'll be able to enhance our prompt with contextually relevant data, thereby leveraging the full potential of the RAG model to generate more accurate and context-aware responses.

In [None]:
var query = "Can you provide the timestamp for the most recent information you have on Azure Functions? Please specify the date and time up to your last update. Give me only the date.";

Embeddings transform the prompt into a numerical representation, a process known as feature extraction. This numerical form enables easy comparison and retrieval from a vector database. Crucially, embeddings capture the semantic essence of the prompt, allowing the system to identify relevant information that shares a similar meaning, even if the exact wording of the prompt isn't present in the database.

In the following step, we create an embedding for our original prompt to facilitate the retrieval of pertinent information from the vector database. We aim to fetch the top 3 most relevant results, storing them in memory for subsequent use.

It's important to note that our approach here is not limited to vector search alone; we are employing a **hybrid search** method. This involves passing the prompt as a 'query' and specifying 'embed' in the 'searchoptions'. Such a setup conducts a vector search on the 'contentVector' field and a text search on the 'content' field, thereby leveraging the strengths of both search methodologies for [more effective and comprehensive retrieval](https://learn.microsoft.com/en-us/azure/search/hybrid-search-overview#how-does-hybrid-search-work).

In [None]:
// Generate the embedding for the query  
EmbeddingsOptions embeddingOptions = new()
{
    DeploymentName = "text-embedding-ada-002",
    Input = { query },
};
var returnValue = openAIClient.GetEmbeddings(embeddingOptions);
float[] queryEmbeddings = returnValue.Value.Data[0].Embedding.ToArray();

// Perform the vector similarity search  
var searchOptions = new SearchOptions
{
    VectorSearch = new()
    {
        Queries = { new VectorizedQuery(queryEmbeddings.ToArray()) { KNearestNeighborsCount = 3, Fields = { "contentVector" } } }
    },
    Size = 3,
    Select = { "content" },
};

// Initialize a list to store the search result documents for future RAG use.
List<SearchDocument> searchDocuments = new List<SearchDocument>();

// Perform the search and get the response
SearchResults<SearchDocument> response = await searchClient.SearchAsync<SearchDocument>(query, searchOptions);

await foreach (SearchResult<SearchDocument> result in response.GetResultsAsync())
{
    searchDocuments.Add(result.Document);
    Console.WriteLine($"Score: {result.Score}\n");
    Console.WriteLine($"Content: {result.Document["content"]}\n");
}

Console.WriteLine($"Total Results: {searchDocuments.Count}");

### Retrievel Augmented Generation (RAG) - Standard GPT Output

In this step, we're conducting a test to assess the baseline performance of the Large Language Model (LLM) without incorporating additional relevant information into the system prompt. This test will provide a standard output from the LLM, serving as a reference point to evaluate the impact and effectiveness of adding contextually relevant data from our vector database in subsequent steps. 

In [None]:
string chatDeploymentName = envVars["OPENAI_CHAT_DEPLOYMENT_NAME"]; 

var chatCompletionsOptions = new ChatCompletionsOptions()
{
    DeploymentName = chatDeploymentName, 
    Messages =
    {
        new ChatRequestSystemMessage("You are a helpful assistant and always tell the truth. You dont talk much."),
        new ChatRequestUserMessage(query)
    },
    MaxTokens = 100
};

Response<ChatCompletions> response = openAIClient.GetChatCompletions(chatCompletionsOptions);

Console.WriteLine(response.Value.Choices[0].Message.Content);

### Retrievel Augmented Generation (RAG) - Augemented GPT Output

At this final stage, we are integrating the three embeddings identified as relevant into the system prompt. This integration is a strategic step to enrich the Large Language Model (LLM) with pertinent and current information. By doing so, we enable the LLM to utilize this contextually relevant data when responding to user prompts, thereby enhancing the accuracy, relevance, and overall quality of its responses. This method illustrates the practical application of the Retrieval-Augmented Generation (RAG) approach, demonstrating how external data can significantly improve the model's performance in generating informed and context-aware replies.

In [None]:
string firstDocumentContent = searchDocuments[0]["content"].ToString();

var chatCompletionsOptions = new ChatCompletionsOptions()
{
    DeploymentName = chatDeploymentName, 
    Messages =
    {
        new ChatRequestSystemMessage($"You are a helpful assistant and always tell the truth. You dont talk much. Here is what you know : {firstDocumentContent}"),
        new ChatRequestUserMessage(query)
    },
    MaxTokens = 100
};

Response<ChatCompletions> response = openAIClient.GetChatCompletions(chatCompletionsOptions);

Console.WriteLine(response.Value.Choices[0].Message.Content);