# Retrieval Augmented Generation

[Retrieval Augmentation Generation (RAG)](https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview) is an _architecture_ that augments the capabilities of a Large Language Model (LLM) by adding an _information retrieval system_ that provides grounding data.

### RAG Architecture

![RAG_pattern](rag-pattern.png)

#### Information retrieval system

Why is the information retrieval system important? Because it gives you control over the knowledge that the LLM is using to formulate a response. That means that you can constrain the LLM to your _own content_ sources from _vectorized_ documents, images, and other data formats.

### Azure AI Search
[Azure AI Search](https://learn.microsoft.com/en-us/azure/search/) is a proven solution for information retrieval in a RAG architecture. Architecturally, it sits between the external data stores (with un-indexed data) and your client app. The client app sends query requests to a search index and handles the response:

![search_service](search-service.png)

For those familiar with relational databases, you can imagine that:

- A (search) index ~= A table
  - it describes the [schema of your data](https://learn.microsoft.com/en-us/azure/search/search-what-is-an-index#schema-of-a-search-index)
  - it consists of [`field definitions`](https://learn.microsoft.com/en-us/azure/search/search-what-is-an-index#field-definitions) described by [`field attributes`](https://learn.microsoft.com/en-us/azure/search/search-what-is-an-index#field-attributes) (searchable, filterable, sortable etc)
- A (search) document ~= A row in your table

#### Vector search in Azure AI Search - Overview
![vector_search](vector-search-architecture-diagram.png)

[Search options](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/use-your-data?tabs=ai-search#search-options): keyword search, semantic search, vector search, hybrid search.

<!-- ```{seealso}
- Read [What's Azure AI Search?](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search)
- Read [Vector search in Azure AI Search
](https://learn.microsoft.com/en-us/azure/search/vector-search-overview)
- Read [Relevance and ranking in vector search](https://learn.microsoft.com/en-us/azure/search/vector-search-ranking)
``` -->

<!-- ### Approaches for RAG with Azure AI Search

Due to its increased popularity, Microsoft has several built-in implementations for using Azure AI Search in a RAG solution.

1. Azure AI Studio, [use a vector index and retrieval augmentation - Preview](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/retrieval-augmented-generation).
2. Azure OpenAI Studio, [use a search index with or without vectors - Preview](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/use-your-data?tabs=ai-search).
3. Azure Machine Learning, [use a search index as a vector store in a prompt flow - Preview](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-vector-index?view=azureml-api-2).

```{note}
During this workshop, we will take a code-first approach.
``` -->

# Workshop
### Pre-requisites:
- [Request Access to Azure OpenAI Service](https://aka.ms/oai/access)
- Azure Search Service (which can host one or more search indexes) - portal?
- Azure OpenAI Service and text-embedding-ada-002 model deployed


## Overview
To create the index we need the following objects:

- Data Source - a `link` to some data storage
- Azure Index - defines the data structure over which to search
  - Create an empty index based on an index schema
  - Fill in the data using the Search Indexer (below\_)
- Azure Search Indexer - which indexes the data. ??


### Setup

First, we install the necessary dependencies.
https://github.com/openai/openai-cookbook/blob/main/examples/azure/chat_with_your_own_data.ipynb


In [22]:
# %pip uninstall azure-search-documents

In [None]:
# %pip install azure-ai-ml
# %pip install azureml-core
# %pip install -U azureml-rag==0.1.11
# %pip install azureml-rag
# %pip install openai==0.27.10
%pip install python-dotenv
%pip install azure-search-documents==11.4.0

In this workshop, we'll use `dotenv`. To connect with Azure OpenAI and the Search index, the following variables should be added to a .env file in KEY=VALUE format:
...


In [42]:
import os
import dotenv

# %reload_ext dotenv
# %dotenv
%reload_ext dotenv
%dotenv

### Import required libraries and environment variables


In [56]:
import os
import json
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient  
from azure.search.documents.models import (
    VectorizedQuery,
    VectorFilterMode
)
from azure.search.documents.indexes.models import (
    SearchIndex,
    ScoringProfile,
    SearchFieldDataType,
    SimpleField,
    SearchableField,
    SearchField,
    SemanticConfiguration,
    SemanticField,
    VectorSearchProfile,
    HnswAlgorithmConfiguration,
    VectorSearch,
    HnswParameters,
    SemanticPrioritizedFields,
    SemanticSearch,
    )
from azure.search.documents.indexes import SearchIndexClient  

subscription_id=os.environ['subscription_id']
resource_group_name=os.environ['resource_group_name']
workspace_name=os.environ['workspace_name']
service_endpoint = os.environ['service_endpoint']# the endpoint of your Azure Cognitive Search service
key = os.environ['search_key']

# aoai_connection_name = os.environ['aoai_connection_name']
aoi_api_key=os.environ['aoi_api_key']
aoai_endpoint=os.environ['aoai_endpoint']
embedding_model_name=os.environ['embeddingModelName']

search_index_name = "my_index"
search_index_key = os.getenv("AZURE_SEARCH_ADMIN_KEY") 
credential = AzureKeyCredential(key)

### 1. Create Search Index

https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/search/azure-search-documents/samples/sample_index_crud_operations.py

https://github.com/microsoft/rag-experiment-accelerator/blob/development/rag_experiment_accelerator/init_Index/create_index.py

Used for overall Fields and Semantic Settings inspiration - https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/azure-search-vector-python-huggingface-model-sample.ipynb

Used for SearchField inspiration - https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/search/azure-search-documents/samples/sample_vector_search.py

In [39]:
def create_index(search_index_name):
    client = SearchIndexClient(service_endpoint, AzureKeyCredential(key))
    
    # 1. Define the fields
    fields = [
        SimpleField(name="id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True),  
        SearchableField(name="title", type=SearchFieldDataType.String),  
        SearchableField(name="content", type=SearchFieldDataType.String),  
        SearchableField(name="category", type=SearchFieldDataType.String, filterable=True),  
        SearchField(name="titleVector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),  
                searchable=True, vector_search_dimensions=384, vector_search_profile_name="my-vector-config"),  
        SearchField(name="contentVector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),  
                searchable=True, vector_search_dimensions=384, vector_search_profile_name="my-vector-config"),  
    ]

    # 2. Define the semantic Settings
    # Note: It requires semantic ranker enabled on your search service
    # https://learn.microsoft.com/en-us/azure/search/semantic-search-overview
    # https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-request?tabs=portal%2Cportal-query
    # https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-request?tabs=sdk%2Cportal-query
    semantic_config = SemanticConfiguration(  
        name="my-semantic-config",  
        prioritized_fields=SemanticPrioritizedFields(  
            title_field=SemanticField(field_name="title"),  
            keywords_fields=[SemanticField(field_name="category")],  
            content_fields=[SemanticField(field_name="content")]  
        )  
    )  
    semantic_search = SemanticSearch(configurations=[semantic_config])  

    # 3. Configure the vector search configuration
    vector_search = VectorSearch(
            profiles=[VectorSearchProfile(name="my-vector-config", algorithm_configuration_name="my-algorithms-config")],
            algorithms=[
                # Contains configuration options specific to the hnsw approximate nearest neighbors  algorithm used during indexing and querying
                HnswAlgorithmConfiguration(
                    name="my-algorithms-config",
                    kind="hnsw",
                    # https://learn.microsoft.com/en-us/python/api/azure-search-documents/azure.search.documents.indexes.models.hnswparameters?view=azure-python-preview#variables
                    parameters=HnswParameters(
                        m=4,
                        # The size of the dynamic list containing the nearest neighbors, which is used during index time. 
                        # Increasing this parameter may improve index quality, at the expense of increased indexing time.
                        ef_construction=400, 
                        # The size of the dynamic list containing the nearest neighbors, which is used during search time. 
                        # Increasing this parameter may improve search results, at the expense of slower search.
                        ef_search=500,
                        # The similarity metric to use for vector comparisons. 
                        # Known values are: "cosine", "euclidean", and "dotProduct"
                        metric="cosine",
                    ),
                )
            ]
        )

    # CORS is used for apps that issues requests from different domains.
    # cors_options = CorsOptions(allowed_origins=["*"], max_age_in_seconds=60)

    # 4. Add scoring profiles when the default ranking behavior doesn't go far enough in meeting your business objectives.
    # https://learn.microsoft.com/en-us/azure/search/index-add-scoring-profiles
    scoring_profiles: List[ScoringProfile] = []
    index = SearchIndex(
        name=search_index_name, 
        fields=fields, 
        scoring_profiles=scoring_profiles, 
        # cors_options=cors_options,
        # tokenizers=[], # TOOD: Add tokenizers,
        semantic_search=semantic_search,
        vector_search=vector_search
        ) 
        

    result = client.create_or_update_index(index)
    print(f'{result.name} created or updated')

In [40]:
create_index(search_index_name)

my_index created or updated


### 2. Create a Search Indexer

Create indexer example: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/search/azure-search-documents/samples/sample_indexers_operations.py

Connect to a Blob Storage: https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/azure-search-integrated-vectorization-sample.ipynb & skillset example


In [None]:
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes.models import (
    SearchIndexerDataContainer,
    SearchIndexerDataSourceConnection,
    SearchIndex,
    SearchIndexer,
    SimpleField,
    SearchFieldDataType,
)
from azure.search.documents.indexes import SearchIndexClient, SearchIndexerClient

def create_indexer(indexer_name, data_source_name, storage_account_connection_string, target_index_name):
    # Create a datasource
    container = SearchIndexerDataContainer(name="searchcontainer")
    data_source_connection = SearchIndexerDataSourceConnection(
        name=data_source_name, type="azureblob", 
        connection_string=storage_account_connection_string, 
        container=container
    )
    data_source = indexers_client.create_data_source_connection(data_source_connection)

    # Create the indexer
    indexer = SearchIndexer(
        name=indexer_name, data_source_name=data_source_name, target_index_name=target_index_name
    )
    result = indexers_client.create_indexer(indexer)
    print(f"Created a new Indexer {indexer_name}")

### 3. Create Embeddings
#### Which Embeddings Model to use?

There are several embedding options:

- OpenAI models, such as: [`text-embedding-ada-002`](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings), `text-embedding-3-small`, `text-embedding-3-large`
- HuggingFace models, which offers a wide range of models. The [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) ranks the performance of embeddings models on a few axis, though not all models can be run locally.


### a) Embed a query using an embedding model from OpenAI


In [33]:
import requests

def get_query_embedding(query, endpoint = aoai_endpoint, api_key = aoi_api_key, api_version =  "2023-07-01-preview", embedding_model_deployment=embedding_model_name):
    request_url = f"{endpoint}/openai/deployments/{embedding_model_deployment}/embeddings?api-version={api_version}"
    headers = {
        "Content-Type": "application/json",
        "api-key": api_key
    }
    request_payload = {
        'input': query
    }
    embedding_response = requests.post(request_url, json = request_payload, headers = headers, timeout=None)
    if embedding_response.status_code == 200:
        data_values = embedding_response.json()["data"]
        embeddings_vectors = [data_value["embedding"] for data_value in data_values]
        return embeddings_vectors
    else:
        raise Exception(f"failed to get embedding: {embedding_response.json()}")

In [34]:
query = "Hello"

query_vectors = get_query_embedding(
                query,
                aoai_endpoint,
                aoi_api_key,
                "2023-07-01-preview",
                embedding_model_name
            )
            
print(f'The embedded vector is: {query_vectors}')

The embedded vector is: [[-0.021819873, -0.0072516315, -0.02838273, -0.02452299, -0.023587296, 0.028824585, -0.012300482, -0.002914298, -0.008369266, -0.0053834915, 0.029370407, -0.0032050782, -0.015555919, -0.0026917458, 0.012313478, -0.0009478779, 0.038779333, 0.0057538706, 0.018687896, -0.0139704365, -0.019740552, 0.009954749, 0.0052600317, 0.009025552, -0.0081548365, -0.0052242936, 0.0024545733, -0.012345967, 0.003312293, -0.015659885, 0.0036940433, -0.016166719, -0.017882159, -0.012904785, 0.0040774182, -0.016218703, -0.0010892067, -0.00985728, 0.021300042, -0.008564203, 0.013080227, -0.0062801987, 0.00324569, -0.0067642904, -0.02804484, 0.013216683, -0.012378457, 0.00046459824, -0.014815161, 0.03599824, 0.009187999, 0.0127943205, -0.014750182, -0.0007468498, -0.0061697345, -0.01472419, -0.0077584656, 0.0062542073, 0.007641504, -0.043587763, 0.002810332, 0.024042146, -0.0059455577, 0.015023093, -0.0044477973, 0.020221395, 0.015101068, 0.0052957702, 0.008122347, 0.017739207, 0.0227

#### Create embeddings using OpenAI
Read your data, generate embeddings using OpenAI model

In [61]:
with open('./data/text-sample.json', 'r', encoding='utf-8') as file:  
    input_data = json.load(file)  
  
for item in input_data:  
    title = item['title']  
    content = item['content']  
    title_embeddings = get_query_embedding(title)  
    content_embeddings = get_query_embedding(content)  
    item['titleVector'] = title_embeddings 
    item['contentVector'] = content_embeddings
  
with open("./output/docVectors-openai.json", "w") as f:  
    json.dump(input_data, f)  

### b) Embed a query using an embedding model from Hugging Face
We will use [`infloat/e5-small-v2`](https://huggingface.co/intfloat/e5-small-v2) from Hugging Face, which is of size 0.13 GB.

```{code-cell} ipython3
:tags: [hide-output]
# This cell should have its output hidden!
%pip install pip install sentence_transformers 
```

In [41]:
from sentence_transformers import SentenceTransformer  
  
model = SentenceTransformer('intfloat/e5-small-v2')  
query = "Hello"

embedded_query = model.encode(query, normalize_embeddings=True)  
print(len(embedded_query))

384


#### Create embeddings using Hugging Face model
Read your data, generate embeddings using HuggingFace model

In [36]:
from sentence_transformers import SentenceTransformer  
  
model = SentenceTransformer('intfloat/e5-small-v2')  
  
with open('./data/text-sample.json', 'r', encoding='utf-8') as file:  
    input_data = json.load(file)  
  
for item in input_data:  
    title = item['title']  
    content = item['content']  
    title_embeddings = model.encode(title, normalize_embeddings=True)  
    content_embeddings = model.encode(content, normalize_embeddings=True)  
    item['titleVector'] = title_embeddings.tolist()  
    item['contentVector'] = content_embeddings.tolist()  
  
with open("./output/docVectors-e5.json", "w") as f:  
    json.dump(input_data, f)  

### 3. Upload data

https://github.com/microsoft/rag-experiment-accelerator/blob/development/rag_experiment_accelerator/ingest_data/acs_ingest.py


In [86]:
# import pandas as pd
# from azure.core.credentials import AzureKeyCredential
# from azure.search.documents import SearchClient

# def upload_data(
#     chunks: list,
#     service_endpoint: str,
#     index_name: str,
#     search_key: str,
#     embedding_model: EmbeddingModel,
#     azure_oai_deployment_name: str,
# ):
#     """
#     Uploads data to an Azure AI Search index.

#     Args:
#         chunks (list): A list of data chunks to upload.
#         service_endpoint (str): The endpoint URL for the Azure AI Search service.
#         index_name (str): The name of the index to upload data to.
#         search_key (str): The search key for the Azure AI Search service.
#         embedding_model (EmbeddingModel): The embedding model to generate the embedding.
#         azure_oai_deployment_name (str): The name of the Azure Opan AI deployment to use for generating titles and summaries.

#     Returns:
#         None
#     """
#     credential = AzureKeyCredential(search_key)
#     search_client = SearchClient(
#         endpoint=service_endpoint, index_name=index_name, credential=credential
#     )
#     documents = []
#     for i, chunk in enumerate(chunks):
#         try:
#             title = generate_title(str(chunk["content"]), azure_oai_deployment_name)
#             summary = generate_summary(str(chunk["content"]), azure_oai_deployment_name)
#         except Exception as e:
#             logger.info(f"Could not generate title or summary for chunk {i}")
#             logger.info(e)
#             continue
#         input_data = {
#             "id": str(my_hash(chunk["content"])),
#             "title": title,
#             "summary": summary,
#             "content": str(chunk["content"]),
#             "filename": "test",
#             "contentVector": chunk["content_vector"],
#             "contentSummary": embedding_model.generate_embedding(
#                 chunk=str(pre_process.preprocess(summary))
#             ),
#             "contentTitle": embedding_model.generate_embedding(
#                 chunk=str(pre_process.preprocess(title))
#             ),
#         }

#         documents.append(input_data)

#         search_client.upload_documents([input_data])
#     logger.info(f"Uploaded {len(documents)} documents")
#     logger.info("all documents have been uploaded to the search index")

Add texts and metadata from the JSON data to the vector store using Hugging Face embedded vectors:

In [45]:
# Upload some documents to the index
with open('./output/docVectors-e5.json', 'r') as file:  
    documents = json.load(file)  
search_client = SearchClient(endpoint=service_endpoint, index_name=search_index_name, credential=credential)
result = search_client.upload_documents(documents)  
print(f"Uploaded {len(documents)} documents") 

Uploaded 108 documents


In [54]:
def print_results(result):
    for result in results:
        print(f"Title: {result['title']}")  
        print(f"Score: {result['@search.score']}")  
        print(f"Content: {result['content']}")  
        print(f"Category: {result['category']}\n")  

search_client = SearchClient(service_endpoint, search_index_name, credential=credential)  
query_embeddings = model.encode(query, normalize_embeddings=True)  

### Perform a vector similarity search

In [57]:
query = "tools for software development"  
vector_query = VectorizedQuery(vector=query_embeddings.tolist(), k_nearest_neighbors=3, fields="contentVector")
  
results = search_client.search(  
    search_text=None,
    vector_queries= [vector_query], 
    select=["title", "content", "category"],  
)  
  
print_results(results)

Title: Azure DevOps
Score: 0.8424989
Content: Azure DevOps is a suite of services that help you plan, build, and deploy applications. It includes Azure Boards for work item tracking, Azure Repos for source code management, Azure Pipelines for continuous integration and continuous deployment, Azure Test Plans for manual and automated testing, and Azure Artifacts for package management. DevOps supports a wide range of programming languages, frameworks, and platforms, making it easy to integrate with your existing development tools and processes. It also integrates with other Azure services, such as Azure App Service and Azure Functions.
Category: Developer Tools

Title: Azure Mobile Apps
Score: 0.8417593
Content: Azure Mobile Apps is a mobile app development platform that enables you to build, test, deploy, and monitor your mobile applications. It provides features like offline data sync, push notifications, and user authentication. Mobile Apps supports various platforms, including iOS, 

### Perform a pure vector search with a filter


In [59]:
query = "tools for software development"  
query_embeddings = model.encode(query, normalize_embeddings=True)  
vector_query = VectorizedQuery(vector=query_embeddings.tolist(), k_nearest_neighbors=3, fields="contentVector")
  
results = search_client.search(  
    search_text=None,
    vector_queries= [vector_query], 
    filter="category eq 'Developer Tools'",
    vector_filter_mode=VectorFilterMode.PRE_FILTER,
    select=["title", "content", "category"],  
)  

print_results(results)

Title: Azure DevOps
Score: 0.84249896
Content: Azure DevOps is a suite of services that help you plan, build, and deploy applications. It includes Azure Boards for work item tracking, Azure Repos for source code management, Azure Pipelines for continuous integration and continuous deployment, Azure Test Plans for manual and automated testing, and Azure Artifacts for package management. DevOps supports a wide range of programming languages, frameworks, and platforms, making it easy to integrate with your existing development tools and processes. It also integrates with other Azure services, such as Azure App Service and Azure Functions.
Category: Developer Tools

Title: Azure DevTest Labs
Score: 0.8416291
Content: Azure DevTest Labs is a fully managed service that enables you to create, manage, and share development and test environments in Azure. It provides features like custom templates, cost management, and integration with Azure DevOps. DevTest Labs supports various platforms, such

```{note}
RAG is not the only solution for incorporating domain knowledge, as illustrated below:
```

![domain-knowledge](domain-knowledge2.png)

```{seealso}
New to copilot and RAG concepts?
- Watch [Vector search and state of the art retrieval for Generative AI apps.](https://ignite.microsoft.com/en-US/sessions/18618ca9-0e4d-4f9d-9a28-0bc3ef5cf54e?source=sessions)
- Read [Retrieval Augmented Generation (RAG) in Azure AI Search](https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview)
```
