# Test searching, reranker and answer generation

This code demonstrate how to test the search with different options (uppercase/lowercase, combination of vector fields, number of documents retrieved in the search and number of documents used in the answer generation), re-rank the retrieve documents and generate and evaluate the answers creating excel files with the different combinations.

The tests are defined in the constact TESTS with the following fields:
+ test-name: it will be used as the Excel file name
+ embeddings_fields: list of vector fields to be used in the search
+ uppercase/lowercase: the query will be converted to uppercase or lowercase to execute the search
+ embbeding_model: ada or large-3
+ index_name: the name of the index created with the notebook [create_index_and_index_documents.ipynb](../../4.-search-and-retrieval/4.1.-create-index-and-index-documents/create_index_and_index_documents.ipynb)
+ max_retrieve: maximum number of search results
+ max_generate: maximum number of documents (chunks) used to generate the answers

The output is the Excels files with the tests results.

## Prerequisites

+ An Azure subscription, with [access to Azure OpenAI](https://aka.ms/oai/access).
+ An Azure OpenAI service with the service name and an API key.
+ A deployment of the text-embedding-ada-002 embedding model on the Azure OpenAI Service with the deployment name 'ada'.
+ A deployment of the text-embedding-3-large embedding model on the Azure OpenAI Service with the deployment name 'ada'.
+ An Azure AI Search service with the end-point, API Key and the index name to create.

We used Python 3.12.5, [Visual Studio Code with the Python extension](https://code.visualstudio.com/docs/python/python-tutorial), and the [Jupyter extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) to test this example.

### Set up a Python virtual environment in Visual Studio Code

1. Open the Command Palette (Ctrl+Shift+P).
1. Search for **Python: Create Environment**.
1. Select **Venv**.
1. Select a Python interpreter. Choose 3.10 or later.

It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments).

### Install packages

In [None]:
! pip install openai
! pip install azure-search-documents

## Import packages and create AOAI clients

In [4]:
import os
from dotenv import load_dotenv
from openai import AzureOpenAI
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
import sys
sys.path.append('../..')
from rag_utils import execute_test

# Load environment variables from .env
load_dotenv(override=True)

# AZURE AI SEARCH
ai_search_endpoint = os.environ["SEARCH_SERVICE_ENDPOINT"]
ai_search_apikey = os.environ["SEARCH_SERVICE_QUERY_KEY"]
ai_search_index_name = os.environ["SEARCH_INDEX_NAME"]
ai_search_credential = AzureKeyCredential(ai_search_apikey)

aoai_api_version = '2024-02-15-preview'

# AOAI FOR ANSWER GENERATION
aoai_answer_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
aoai_answer_apikey = os.environ["AZURE_OPENAI_API_KEY"]
aoai_answer_model_name = os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"]
# Create AOAI client for answer generation
aoai_answer_client = AzureOpenAI(
    azure_deployment=aoai_answer_model_name,
    api_version=aoai_api_version,
    azure_endpoint=aoai_answer_endpoint,
    api_key=aoai_answer_apikey
)

# AZURE OPENAI FOR RERANKING
aoai_rerank_endpoint = os.environ["AZURE_OPENAI_RERANK_ENDPOINT"]
azure_openai_rerank_key = os.environ["AZURE_OPENAI_RERANK_API_KEY"]
rerank_model_name = os.environ["AZURE_OPENAI_RERANK_DEPLOYMENT_NAME"]
# Create AOAI client for reranking
aoai_rerank_client = AzureOpenAI(
    azure_deployment=rerank_model_name,
    api_version=aoai_api_version,
    azure_endpoint=aoai_rerank_endpoint,
    api_key=azure_openai_rerank_key
)

# AZURE OPENAI FOR EMBEDDING
aoai_embedding_endpoint = os.environ["AZURE_OPENAI_EMBEDDING_ENDPOINT"]
azure_openai_embedding_key = os.environ["AZURE_OPENAI_EMBEDDING_API_KEY"]
embedding_model_name_ada = os.environ["AZURE_OPENAI_EMBEDDING_NAME_ADA"]
embedding_model_name_large_3 = os.environ["AZURE_OPENAI_EMBEDDING_NAME_LARGE_3"]
# Create AOAI client for embedding creation (ADA)
aoai_embedding_client_ada = AzureOpenAI(
    azure_deployment=embedding_model_name_ada,
    api_version=aoai_api_version,
    azure_endpoint=aoai_embedding_endpoint,
    api_key=azure_openai_embedding_key
)
# Create AOAI client for embedding creation (Large-3)
aoai_embedding_client_large_3 = AzureOpenAI(
    azure_deployment=embedding_model_name_large_3,
    api_version=aoai_api_version,
    azure_endpoint=aoai_embedding_endpoint,
    api_key=azure_openai_embedding_key
)

# CONSTANTS
SELECT_FIELDS=["id", "title", "content"] # Fields to retrieve in the search

# Test-name: embeddings_fields | uppercase/lowercase | embbeding_model | index_name | max_retrieve | max_generate
#TESTS = {
#        "title_content_large3_512_search_upper_20_10": ("embeddingTitle, embeddingContent", "upper", "large-3", aoai_embedding_client_large_3, "project_assurance_large_3", 20, 10),
#        "title_content_large3_512_search_upper_20_20": ("embeddingTitle, embeddingContent", "upper", "large-3", aoai_embedding_client_large_3, "project_assurance_large_3", 20, 20),
#        "title_content_large3_512_search_lower_20_10": ("embeddingTitle, embeddingContent", "lower", "large-3", aoai_embedding_client_large_3, "project_assurance_large_3", 20, 10),
#        "title_content_large3_512_search_lower_20_20": ("embeddingTitle, embeddingContent", "lower", "large-3", aoai_embedding_client_large_3, "project_assurance_large_3", 20, 20),
#}

TESTS = {
        "title_content_ada_512_search_upper_20_10": ("embeddingTitle, embeddingContent", "upper", "ada", aoai_embedding_client_ada, "project_assurance_ada", 20, 10),
        "title_content_ada_512_search_upper_20_20": ("embeddingTitle, embeddingContent", "upper", "ada", aoai_embedding_client_ada, "project_assurance_ada", 20, 20),
        "title_content_ada_512_search_lower_20_10": ("embeddingTitle, embeddingContent", "lower", "ada", aoai_embedding_client_ada, "project_assurance_ada", 20, 10),
        "title_content_ada_512_search_lower_20_20": ("embeddingTitle, embeddingContent", "lower", "ada", aoai_embedding_client_ada, "project_assurance_ada", 20, 20),
}

QA_WITH_ANSWERS_FILENAME = '../../data_out/qa_pairs.xlsx' #'QA_with_answers.xlsx'

In [2]:
def check_answer_iddoc(text, ids_doc, all=False):
    #print(f'ANSWER TO CHECK: [{text}]')

    ids = ids_doc.split(', ')
    if all: # All the IDs must appears to be 1
        for id in ids:
            if id in text:
                continue
            else:
                return 0
        return 1
    
    else: # If any ID is included is 1
        for id in ids:
            if id in text:
                return 1
        return 0

In [None]:
for test_name, (embedding_fields, case, embedding_model, embedding_client, index_name, max_retrieve, max_generate) in TESTS.items():
    execute_test(ai_search_endpoint, ai_search_credential, SELECT_FIELDS, aoai_rerank_client, rerank_model_name, aoai_answer_client, aoai_answer_model_name, test_name, embedding_fields, case, embedding_model, embedding_client, index_name, max_retrieve, max_generate, QA_WITH_ANSWERS_FILENAME)