# 02 RAG | 03 Vector DB | 01 Azure AI Search

## Azure Environment

Necessary parameter are imported from [./Configuration/application.env]. Check [Create Environment](../../01_CreateEnvironment/01_Environment.ipynb) to setup the necessary demo environment.

## Step 1: Create Azure AI Search - Search Index Client

Azure AI Search as a robust vector database to store and query vectors is used. The Azure AI Search SDK provides a `SearchIndexClient` which can be used to create search indexes.

In [1]:
#r "nuget: Azure.Search.Documents, 11.6.0-beta.4"
#r "nuget: DotNetEnv, 2.5.0"

using Azure; 
using DotNetEnv; 
using System.IO;
using Azure.Search.Documents;
using Azure.Search.Documents.Models;
using Azure.Search.Documents.Indexes;
using Azure.Search.Documents.Indexes.Models;

//configuration file is created during environment creation
static string configurationFile = @"../../Configuration/application.env";
Env.Load(configurationFile);

string assetsFolder = Environment.GetEnvironmentVariable("WS_ASSETS_FOLDER") ?? "WS_ASSETS_FOLDER not found";;
string searchApiKey = Environment.GetEnvironmentVariable("WS_SEARCH_APIKEY") ?? "WS_SEARCH_APIKEY not found";
string searchEndpoint = Environment.GetEnvironmentVariable("WS_SEARCH_ENDPOINT") ?? "WS_SEARCH_ENDPOINT not found";
string assetFolder = Environment.GetEnvironmentVariable("WS_ASSETS_FOLDER") ?? "WS_ASSETS_FOLDER not found";

AzureKeyCredential azureKeyCredential = new AzureKeyCredential(searchApiKey);
SearchIndexClient searchIndexClient = new SearchIndexClient(new Uri(searchEndpoint), azureKeyCredential);

Console.WriteLine($"SearchIndexClient create...");


SearchIndexClient create...


# Step 2: Create Search Index

In the code cell, a search index is created which uses a [Hierarchical Navigable Small World (HNSW)](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world) algorithm is employed to create the search index, which ensures fast and accurate search results. For measuring similarity between vectors, the cosine similarity metric is used, optimizing the search process by determining the angular distance between vector pairs. 


In [2]:
string indexName = "documenttenant"; 
string vectorSearchProfileNameHNSW = "search-profile-HNSW";
string vectorSearchConfigHNSW = "vector-config-hnsw";

int modelDimensions = 1536;

SearchIndex index = new SearchIndex(indexName)
{
    Fields =
    {
        new SimpleField("DocumentId", SearchFieldDataType.String) { IsKey = true, IsFilterable = true, IsSortable = true },
        new SearchableField("DocumentName") { IsFilterable = true, IsSortable = true},
        new SearchableField("Description") { AnalyzerName = LexicalAnalyzerName.EnLucene },
        new ComplexField("MetaInfo")
        {
            Fields =
            {
                new SearchableField("Author") { IsFilterable = true, IsSortable = true, IsFacetable = true},
                new SearchableField("CreationDate") { IsFilterable = true, IsSortable = true, IsFacetable = true }
            }
        },
        new VectorSearchField("DocumentContentHNSW", modelDimensions, vectorSearchProfileNameHNSW)
    },
    VectorSearch = new() {
        Profiles =
        {
            new VectorSearchProfile(vectorSearchProfileNameHNSW, vectorSearchConfigHNSW)
        },
        Algorithms = {
            new HnswAlgorithmConfiguration(vectorSearchConfigHNSW){
                Parameters = new HnswParameters
                {
                    EfConstruction = 200,
                    EfSearch = 200,
                    M = 4,
                    Metric = VectorSearchAlgorithmMetric.Cosine
                }
            }
        }
    }
};

await searchIndexClient.CreateOrUpdateIndexAsync(index);

Console.WriteLine($"Search index '{indexName}' created...");

Search index 'documenttenant' created...


## Step 3: Define Data Structure

To populate data to the Azure AI Search search index a .NET POCO object is created and filled with information (description and embeddings) which will be uploaded to the search index.

In [3]:
using System.Text.Json.Serialization;

public class KnowledgeDocument {

    public string DocumentId { get; set; } = Guid.NewGuid().ToString();
    public string DocumentName { get; set; } = "";
    public string Description { get; set; } = "";
    public float[] DocumentContentHNSW { get; set; } = new float[1536];
    public MetaInfo MetaInfo { get; set; } = new MetaInfo();
}

public class MetaInfo {
    public string Author { get; set; } = "";
    public string CreationDate { get; set; } = "";
}



## Step 4: Define Documents

3 different documents will be defined:

- [TextEmbedding_WikiAKS.txt](../../Assets/Embedding/TextEmbedding_WikiAks.txt) which provides information about Azure Kubernetes Service and is downloaded from the MS documentation.
- [TextEmbedding_WikiSuperBowl.txt](../../Assets/Embedding/TextEmbedding_WikiSuperBowl.txt) which provides information about the 2024 Super Bowl in ***German***" to show the language independent features of Embedding models
- A statement with information about the winner of the 2024's Super Bowl

All 3 vectors have been created within the [](../02_02_Embedding/01_TextEmbeddings.ipynb) polyglot notebook.

In [4]:

// List of Knowledge Documents
List<KnowledgeDocument> knowledgeDocuments = new List<KnowledgeDocument>(); 

// Knowledge Document: Wiki AKS 
string embeddingFileName = Path.Combine(assetFolder, "Embedding", "TextEmbedding_WikiAKS.txt");
float[] embedding = await File.ReadAllLinesAsync(embeddingFileName).ContinueWith(t => t.Result.Select(float.Parse).ToArray());

knowledgeDocuments.Add(new KnowledgeDocument()
{
    DocumentName = "Wiki-AKS",
    Description = "What is Azure Kubernetes Service (AKS)? A download from MS Learn",
    DocumentContentHNSW = embedding,
    MetaInfo = new MetaInfo()
    {
        Author = "Microsoft",
        CreationDate = "2021-10-01"
    }
});

// Knowledge Document: Wiki Super Bowl 2024
embeddingFileName = Path.Combine(assetFolder, "Embedding", "TextEmbedding_WikiSuperBowl.txt");
embedding = await File.ReadAllLinesAsync(embeddingFileName).ContinueWith(t => t.Result.Select(float.Parse).ToArray());

knowledgeDocuments.Add(new KnowledgeDocument()
{
    DocumentName = "Wiki-Superbowl",
    Description = "A German Wiki page with information about the Super Bowl 2024",
    DocumentContentHNSW = embedding,
    MetaInfo = new MetaInfo()
    {
        Author = "A Wiki Contributor",
        CreationDate = "2024-04-01"
    }
});

// Knowledge Document: statement
embeddingFileName = Path.Combine(assetFolder, "Embedding", "TextEmbedding_Statement.txt");
embedding = await File.ReadAllLinesAsync(embeddingFileName).ContinueWith(t => t.Result.Select(float.Parse).ToArray());

knowledgeDocuments.Add(new KnowledgeDocument()
{
    DocumentName = "Statement",
    Description = "A statement with information who won the Super Bowl 2024",
    DocumentContentHNSW = embedding,
    MetaInfo = new MetaInfo()
    {
        Author = "Robert",
        CreationDate = "2024-07-07"
    }
});

Console.WriteLine($"Knowledge Documents created...");


Knowledge Documents created...


## Step 5: Index Documents

The objects defined in the previous code cell are uploaded to the created Azure AI search index.


In [5]:
SearchClient searchClient = searchIndexClient.GetSearchClient(indexName);
Response<IndexDocumentsResult> response = await searchClient.UploadDocumentsAsync<KnowledgeDocument>(knowledgeDocuments);

Console.WriteLine($"Knowledge Documents uploaded / indexed...");

Knowledge Documents uploaded / indexed...


## Step 6: Query Documents

The question `Who won the Super Bowl in 2024` which was embedded in a [previous notebook](../02_02_Embedding/01_TextEmbeddings.ipynb) and loaded [from this file](../../Assets/Embedding/TextEmbedding_Statement.txt) is used to perform a vector search.

In [6]:
string fileNameTextEmbeddingQuery = Path.Combine(assetFolder, "Embedding", "TextEmbedding_Query.txt");
float[] queryAsVector = File.ReadAllLines(fileNameTextEmbeddingQuery).Select(float.Parse).ToArray();

SearchResults<KnowledgeDocument> searchResults = await searchClient.SearchAsync<KnowledgeDocument>(
    new SearchOptions
    {
        VectorSearch = new()
        {
            Queries = { 
                new VectorizedQuery(queryAsVector) { 
                    KNearestNeighborsCount = 2, 
                    Fields = { "DocumentContentHNSW" } 
                }
            }
        }
    });

await foreach (SearchResult<KnowledgeDocument> searchResult in searchResults.GetResultsAsync())
{
    KnowledgeDocument knowledgeDocument = searchResult.Document;
    Console.WriteLine($"{knowledgeDocument.DocumentId}: {knowledgeDocument.Description}");
}


2aa2ccdc-a882-4cf6-8e87-47f4c868aef0: A statement with information who won the Super Bowl 2024
7c53041b-898f-4348-82a6-43ac71ceaa09: A German Wiki page with information about the Super Bowl 2024


## Step 7: Understand the results

The vector search executed by Azure AI Search has responded with the information that the statement `The Kansas City Chiefs won the Super Bowl in 2024` has the closest distance to the query, followed by the [Wikipedia information about the Super Bowl 2024](../../assets/Embedding/WikiSuperBowl2024.txt)