# Retrieval Augmented Generation (RAG) with Azure AI Search and OpenAI

This code demonstrates how to work with RAG to give more context to the LLM/SLM models to get a more accurate answer. The code uses Azure AI Search to index the documents and Azure OpenAI's embedding model to generate embeddings/vectors for the documents.

+ Create an index schema
+ Load the sample data from a local folder
+ Embed the documents in-memory using Azure OpenAI's text-embedding-ada-002 model
+ Index the vector and non-vector fields on Azure AI Search
+ Run a series of vector and hybrid queries, including metadata filtering and hybrid (text + vectors) search. 

The code uses Azure OpenAI to generate embeddings for title and content fields. You'll need access to Azure OpenAI to run this demo.

## Create the resources

Refer to the `README.md` file in the root folder to create the resources.

## Install python packages

In [1]:
%pip install python-dotenv
%pip install tiktoken
%pip install azure-search-documents
%pip install azure-identity
%pip install openai

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip


## Connect to the Azure AI Search and OpenAI

Load environment variables from the `.env` file

In [8]:
import os
import re
from openai import AzureOpenAI
from dotenv import load_dotenv
from dotenv import dotenv_values

if os.path.exists(".env"):
    load_dotenv(override=True)
    config = dotenv_values(".env")

azure_openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
azure_openai_api_key = os.getenv("AZURE_OPENAI_API_KEY")
azure_openai_chat_completions_deployment_name = os.getenv("AZURE_OPENAI_CHAT_COMPLETIONS_DEPLOYMENT_NAME")

azure_openai_embedding_model = os.getenv("AZURE_OPENAI_EMBEDDING_MODEL")
embedding_vector_dimensions = os.getenv("EMBEDDING_VECTOR_DIMENSIONS")

azure_search_service_endpoint = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")
azure_search_service_admin_key = os.getenv("AZURE_SEARCH_SERVICE_ADMIN_KEY")
search_index_name = os.getenv("SEARCH_INDEX_NAME")

openai_client = AzureOpenAI(
    azure_endpoint=azure_openai_endpoint,
    api_key=azure_openai_api_key,
    api_version="2024-06-01"
)

# Test connection to OpenAI ChatGPT
completion = openai_client.chat.completions.create(
    model=azure_openai_chat_completions_deployment_name,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who are you ?"}
    ])
print(completion.to_json())

{
  "id": "chatcmpl-AyK8L6Eks2IdCkqK6tQAsLfpHyYuV",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "I am an AI language model created by OpenAI, known as ChatGPT. I'm here to help answer your questions and provide information on a wide range of topics. How can I assist you today?",
        "refusal": null,
        "role": "assistant"
      },
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "created": 1738940437,
  "model": "gpt-4o-2024-08-06",
  "object": "chat.completion",
  "system_fingerprint": "fp_f3927aa00d",
  "usage": {
    "completion_t

## Count the number of tokens in a text

Like LLM models, Embedding models defines a `max input`. It is defined in number of `tokens`. The `max_input` for `text-embedding-3-large` is 8191 tokens. So we need to split the text into chunks of 8191 tokens or less. For that, you need to get the number of tokens in a text string.

In [None]:
import tiktoken

def num_tokens_from_string(string: str) -> int:
    encoding = tiktoken.get_encoding(encoding_name="cl100k_base")
    num_tokens = len(encoding.encode(string, disallowed_special=()))
    return num_tokens

# Test the function
num_tokens_from_string("tiktoken is great!")

The OpenAI embedding model `text-embedding-3-large` has a limit of `8191` tokens per request.
Before sending the files to the model, we need to split the text into chunks of less than `8191` tokens.
Count the number of tokens in the sample files and show the files with more than `8191` tokens.

In [23]:
input_directory = './data/azure-ai-docs/'
i=0

for filename in os.listdir(input_directory):
    if filename.endswith('.md'):
        with open(os.path.join(input_directory, filename), 'r', encoding='utf-8') as file:
            content = file.read()
            tokens = num_tokens_from_string(content)
            if tokens > 8191:
                print(f'File {filename} has {tokens} tokens which is more than 8191 (max) tokens')

## Transforming/cleaning the documents

Later in this lab, we will proceed with markdown `.md` files. We will need to remove all special characters and markdown syntax from the files. The function `clean_markdown_content()` will help us with this.

In [24]:
def clean_markdown_content(content):
    # Remove links
    link_pattern = r'\[([^\[]+)\]\(([^\)]+)\)'
    content = re.sub(link_pattern, r'\1', content)

    # Remove images
    image_pattern = r'\!\[([^\[]*)\]\(([^\)]+)\)'
    content = re.sub(image_pattern, '', content)

    # Remove all occurrences of **
    content = content.replace('**', '')
    content = content.replace('\n', '')

    return content

## Get the vector embedding for an input text

In [None]:
def get_embeddings_vector(text):

    response = openai_client.embeddings.create(
        input=text,
        model=azure_openai_embedding_model,
    )

    embedding = response.data[0].embedding

    return embedding

# Test the function
vector = get_embeddings_vector("Sample text")
print(vector)

## Create file chunks

This is where we split the markdown files in folder `./data/azure-ai-docs` into chunks.

In [None]:
import uuid
import re
import json
import os

input_directory = './data/azure-ai-docs/'
output_directory = './data/chunks/'
# create output directory if it doesn't exist
if not os.path.exists(output_directory):
    os.makedirs(output_directory)

chunk_index=0
# Loop through each file in the directory
for filename in os.listdir(input_directory):
    # Check if the file is a markdown file
    if filename.endswith('.md'):
        # Open the file
        with open(os.path.join(input_directory, filename), 'r', encoding='utf-8') as file:
            print(filename)
            # Read the file content
            content = file.read()
            
            # break if content doesn't contain title, description, ms.date and '##'
            if 'title:' not in content or 'description:' not in content or 'ms.date:' not in content or '##' not in content:
                print(f'File {filename} does not contain title, description, ms.date or ##')
                continue

            # Extract the title, description, and date
            page_title = re.search(r'title: (.*)', content).group(1).replace('"', '')
            page_description = re.search(r'description: (.*)', content).group(1)
            page_date = re.search(r'ms.date: (.*)', content).group(1)
            
            # Split the content into chunks based on '##'
            chunks = content.split('\n## ')[1:]  # Skip the first chunk as it contains the title, description, and date
            
            # Add the chunks to the list along with the title, description, and date
            for chunk in chunks:
                chunk_index=chunk_index + 1
                chunk_content = clean_markdown_content(chunk.strip())
                
                if (num_tokens_from_string(chunk_content) > 8191):
                    print(f'Chunk {chunk_index} in file {filename} has more than 8191 tokens')
                    break

                vector = get_embeddings_vector(chunk_content)
                
                chunk = {
                    "id": str(uuid.uuid4()),
                    'page_title': page_title,
                    'page_description': page_description,
                    'page_date': page_date,
                    'chunk_title': chunk.split('\n')[0],  # The first line after '##' is the title of the chunk
                    'chunk_content': chunk_content,  # Remove leading and trailing whitespaces
                    'vector': vector
                }
                
                chunk_file_name = f'chunk_{chunk_index}_{page_title}.json'.replace('?', '').replace(':', '').replace("'", '').replace('|', '').replace('/', '').replace('\\', '')

                # write chunk into JSON file into output directory
                with open(f'{output_directory}/{chunk_file_name}', 'w') as f:
                    json.dump(chunk, f)

By default, the length of the embedding vector will be `1536` for `text-embedding-3-small` or `3072` for `text-embedding-3-large`. You can reduce the dimensions of the embedding by passing in the dimensions parameter without the embedding losing its concept-representing properties.

## Create Index in Azure AI Search.

In [None]:
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    ComplexField,
    CorsOptions,
    SearchIndex,
    SearchField,
    ScoringProfile,
    SearchFieldDataType,
    SimpleField,
    SearchableField,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    SemanticConfiguration,
    SemanticPrioritizedFields,
    SemanticSearch,
    SemanticField
)

credential = AzureKeyCredential(azure_search_service_admin_key)

search_index_client = SearchIndexClient(
    endpoint=azure_search_service_endpoint, 
    index_name=search_index_name, 
    credential=credential
)

# create search index
fields = [
    SimpleField(
        name="id",
        type=SearchFieldDataType.String,
        key=True,
        sortable=True,
        filterable=True,
        facetable=True,
    ),
    SearchableField(name="page_title", type=SearchFieldDataType.String),
    SearchableField(name="page_description", type=SearchFieldDataType.String),
    SearchableField(name="page_date", type=SearchFieldDataType.String),
    SearchableField(name="chunk_title", type=SearchFieldDataType.String),
    SearchableField(name="chunk_content", type=SearchFieldDataType.String),
    SearchField(name="vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
        searchable=True,
        vector_search_dimensions=3072, #1536,
        vector_search_profile_name="myHnswProfile",
    ),
]

# Configure the vector search configuration  
vector_search = VectorSearch(
    algorithms=[
        HnswAlgorithmConfiguration(
            name="myHnsw"
        )
    ],
    profiles=[
        VectorSearchProfile(
            name="myHnswProfile",
            algorithm_configuration_name="myHnsw",
        )
    ]
)

semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=SemanticPrioritizedFields(
        title_field=SemanticField(field_name="page_title"),
        # keywords_fields=[SemanticField(field_name="category")],
        content_fields=[SemanticField(field_name="chunk_content")]
    )
)

# Create the semantic settings with the configuration
semantic_search = SemanticSearch(configurations=[semantic_config])
# Create the search index with the semantic settings
search_index = SearchIndex(name=search_index_name, fields=fields,
                    vector_search=vector_search, semantic_search=semantic_search)
result = search_index_client.create_or_update_index(search_index)
print(f' {result.name} created')

In case you need to delete an index, you can use the following code.

In [11]:
# delete index
search_index_client.delete_index(search_index_name)

## Upload chunks/documents to Azure AI Search

In [None]:
import uuid
from azure.search.documents import SearchClient

search_client = SearchClient(endpoint=azure_search_service_endpoint, index_name=search_index_name, credential=credential)

# for each json file in ./data/chunks/ folder, load the json document and upload it to the search index

for filename in os.listdir(output_directory):
    if filename.endswith('.json'):
        with open(os.path.join(output_directory, filename), 'r') as file:
            document = json.load(file)

            result = search_client.upload_documents(documents=document)
            print(f"Upload of {filename} succeeded: { result[0].succeeded }")

## Perform a vector similarity search

This example shows a pure vector search using the vectorizable text query, all you need to do is pass in text and your vectorizer will handle the query vectorization.

In [None]:
from azure.search.documents.models import VectorizedQuery

# Pure Vector Search
query = "How to use Azure AI ?"  

embedding = get_embeddings_vector(query)

vector_query = VectorizedQuery(vector=embedding, k_nearest_neighbors=3, fields="vector")
  
results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["page_title", "page_date", "chunk_title", "chunk_content"],
)  
  
for result in results:
    print(f"-------------------------------------------")
    print(f"Page Date: {result['page_date']}")  
    print(f"Page Title: {result['page_title']}")  
    print(f"Chunk Title: {result['chunk_title']}")  
    print(f"Chunk Content: {result['chunk_content']}")
    print(f"Score: {result['@search.score']}")  


## Simulate a user query

This is where we will use the Azure AI Search to search for documents similar to the user query.

In [None]:
response = openai_client.chat.completions.create(
    model=azure_openai_chat_completions_deployment_name,
    messages=[
        {"role": "system", "content": "You are a helpful assistant for an AI learner."},
        {"role": "user", "content": "What are the supported llms in Azure ?"}
    ],
    extra_body={
        "data_sources": [
            {
                "type": "azure_search",
                "parameters": {
                    "endpoint": azure_search_service_endpoint,
                    "index_name": search_index_name,
                    "authentication": {
                        "type": "api_key",
                        "key": azure_search_service_admin_key,
                    }
                }
            }
        ]
    }
)

print(response.to_json())

In [None]:
print(response.choices[0].message.content)