## 📚 Prerequisites

Before executing this notebook, make sure you have properly set up your Azure Services, created your Conda environment, and configured your environment variables as per the instructions provided in the [README.md](README.md) file.

>%pip install azure-search-documents==11.4.0b10

## 📋 Table of Contents

Explore different retrieval methods in Azure AI Search:

1. [**Understanding Types of Search**](#define-field-types): This section provides a comprehensive overview of the different types of search methods available in Azure AI Search.
2. [**Keyword Search**](#keyword-search): Use direct query term matching with document content.
3. [**Vector Search**](#vector-search): Employ embeddings for semantic content understanding and relevance ranking.
4. [**Hybrid Search**](#hybrid-search): Combine keyword and vector search for comprehensive results.
5. [**Reranking Search**](#reranking-search): Reorder initial search results for improved top result relevance.

Additional resources:
- [Azure AI Search Documentation](https://learn.microsoft.com/en-us/azure/search/)

### 🧭 Understanding Types of Search  

+ **Keyword Search**: Traditional search method relying on direct term matching. Efficient for exact matches but struggles with synonyms and context. [Learn More](https://learn.microsoft.com/en-us/azure/search/search-lucene-query-architecture)

- **Vector Search**: Converts text into high-dimensional vectors to understand semantic meaning. Finds relevant documents even without exact keyword matches. Effectiveness depends on quality of training data. [Learn More](https://learn.microsoft.com/en-us/azure/search/vector-search-overview)

+ **Hybrid Search**: Combines Keyword and Vector Search for comprehensive, contextually relevant results. Effective for complex queries requiring nuanced understanding. [Learn More](https://learn.microsoft.com/en-us/azure/search/vector-search-ranking#hybrid-search)

- **Reranking Search**: Fine-tunes initial search results using advanced algorithms for relevance. Useful when initial retrieval returns relevant but not optimally ordered results. [Learn More](https://learn.microsoft.com/en-us/azure/search/semantic-search-overview)

### 🚧 Limitations

##### Keyword Search
- **Synonym Challenges**: Struggles with recognizing synonyms or different expressions of the same concept.
- **Context Understanding**: May not fully capture the broader context or the query's intent, especially in complex queries.
##### Embedding-Based Search
- **Keyword Precision**: May miss documents that contain exact terms if those terms don't semantically align with the query or document's overall content.
- **Contextual Misinterpretations**: May overgeneralize or incorrectly interpret context, missing specific nuances.
- **Training Data Dependency**: Performance heavily relies on the diversity and depth of the training data.
### 💡 Recommendations

To achieve higher relevance out of the box: 

1. **Hybrid Search**: Combines keyword and vector search methods to ensure comprehensive document retrieval across a range of queries, from highly specific to semantically complex.

2. **Re-Ranking and L2 in AI Search**: Enhances initial search results by applying sophisticated ranking algorithms, improving relevance and accuracy, especially for nuanced queries.

In [None]:
# %pip install azure-search-documents==11.4.0b10

In [1]:
import os
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.models import RawVectorQuery

from src.aoai.azure_open_ai import AzureOpenAIManager

In [2]:
# Load environment variables from .env file
load_dotenv()

# Set up Azure Cognitive Search credentials
service_endpoint = os.getenv("AZURE_AI_SEARCH_SERVICE_ENDPOINT")
key = os.getenv("AZURE_SEARCH_ADMIN_KEY")
credential = AzureKeyCredential(key)

# Define the name of the Azure Search index
# This is the index where your data is stored in Azure Search
index_name = "index-churchofjesuschrist-web"

# Set up the Azure Search client with the specified index
# This prepares the client to interact with the Azure Search service
search_client = SearchClient(service_endpoint, index_name, credential=credential)

In [3]:
# Define the name of the Azure Search index
# This is the index where your data is stored in Azure Search
index_name = "index-churchofjesuschrist-web"

# Set up the Azure Search client with the specified index
# This prepares the client to interact with the Azure Search service
search_client = SearchClient(service_endpoint, index_name, credential=credential)

In [4]:
embedding_aoai_deployment_model = "foundational-ada"
aoai_client = AzureOpenAIManager(embedding_model_name=embedding_aoai_deployment_model)

In [5]:
search_query = "Who is Jesus Christ?"
search_vector = aoai_client.generate_embedding(search_query)

## Keyword Search 

**Full-text search**: This method uses the `@search.score` parameter and the BM25 algorithm for scoring. The BM25 algorithm is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document, regardless of their proximity within the document. There is no upper limit for the score in this method.

```json
"value": [
 {
    "@search.score": 5.1958685,
    "@search.features": {
        "description": {
            "uniqueTokenMatches": 1.0,
            "similarityScore": 0.29541412,
            "termFrequency" : 2
        },
        "title": {
            "uniqueTokenMatches": 3.0,
            "similarityScore": 1.75451557,
            "termFrequency" : 6
        }
    }
 }
]
 ```

- `uniqueTokenMatches`: This parameter indicates the number of unique query terms found in the document field. A higher value means more unique query terms were found, suggesting a stronger match.

- `similarityScore`: This parameter represents the semantic similarity between the content of the document field and the query terms. A higher `similarityScore` means the document content is more semantically similar to the query, indicating a more relevant match.

- `termFrequency`: This parameter shows how often the query terms appear within the document field. A higher `termFrequency` means the query terms appear more often, suggesting a stronger match.

These parameters contribute to the overall `@search.score`. The `@search.score` is a cumulative measure of a document's relevance to the search query. A higher `@search.score` indicates a stronger match between the document and the search query.

When interpreting search results, documents with higher scores are generally considered more relevant to the query than those with lower scores.

In [6]:
# keyword search
r = search_client.search(search_query, top=5)
for doc in r:
    if "Jesus" in doc["content"]:
        content = doc["content"].replace("\n", " ")[:1000]
        print(f"score: {doc['@search.score']}. {content}")

score: 8.978319. 18.12.1. Who Performs the OrdinanceOrdinances and blessings are sacred acts performed by the authority of the priesthood and in the name of Jesus Christ. As priesthood holders perform ordinances and blessings, they follow the Savior’s example of blessing others.
score: 8.562346. 18.9.2. Who Performs the OrdinanceOrdinances and blessings are sacred acts performed by the authority of the priesthood and in the name of Jesus Christ. As priesthood holders perform ordinances and blessings, they follow the Savior’s example of blessing others.
score: 8.388591. 18.6.1. Who Gives the BlessingOrdinances and blessings are sacred acts performed by the authority of the priesthood and in the name of Jesus Christ. As priesthood holders perform ordinances and blessings, they follow the Savior’s example of blessing others.   18.6.2. InstructionsOrdinances and blessings are sacred acts performed by the authority of the priesthood and in the name of Jesus Christ. As priesthood holders per

## Vector Search 

This method also uses the `@search.score` parameter but uses the HNSW (Hierarchical Navigable Small World) algorithm for scoring. The HNSW algorithm is an efficient method for nearest neighbor search in high dimensional spaces. The scoring range is 0.333 - 1.00 for Cosine similarity, and 0 to 1 for Euclidean and DotProduct similarities.

In [9]:
# Pure vector Search
r = search_client.search(
    None,
    top=5,
    vector_queries=[
        RawVectorQuery(vector=search_vector, k=50, fields="content_vector")
    ],
)
for doc in r:
    content = doc["content"].replace("\n", " ")[:1000]
    print(f"score: {doc['@search.score']}. {content}")

score: 0.84436065. God’s Work of Salvation and Exaltation   Living the Gospel of Jesus Christ   16. Living the Gospel of Jesus ChristWe live the gospel as we exercise faith in Jesus Christ, repent daily, make covenants with God as we receive the ordinances of salvation and exaltation, and endure to the end by keeping those covenants.  17. Teaching the Gospel   17. Teaching the GospelEffective gospel teaching helps people grow in their testimonies and their faith in Heavenly Father and Jesus Christ.
score: 0.83588874. Isaiah 7Ephraim and Syria wage war against Judah—Christ will be born of a virgin—Compare 2 Nephi 17.   Isaiah 8Christ will be as a stone of stumbling and a rock of offense—Seek the Lord, not muttering wizards—Turn to the law and to the testimony for guidance—Compare 2 Nephi 18.   Isaiah 9Isaiah speaks about the Messiah—The people in darkness will see a great Light—Unto us a Child is born—He will be the Prince of Peace and reign on David’s throne—Compare 2 Nephi 19.
score: 

## Hybrid search

This method uses the `@search.score` parameter and the RRF (Reciprocal Rank Fusion) algorithm for scoring. The RRF algorithm is a method for data fusion that combines the results of multiple queries. The upper limit of the score is bounded by the number of queries being fused, with each query contributing a maximum of approximately 1 to the RRF score. For example, merging three queries would produce higher RRF scores than if only two search results are merged.

In [18]:
r = search_client.search(
    search_query,
    top=5,
    vector_queries=[
        RawVectorQuery(vector=search_vector, k=50, fields="content_vector")
    ],
)
for doc in r:
    content = doc["content"].replace("\n", " ")[:1000]
    print(
        f"score: {doc['@search.score']}, reranker: {doc['@search.reranker_score']}. {content}"
    )

score: 0.028370220214128494, reranker: None. 17.1. Principles of Christlike Teaching   17.1. Principles of Christlike TeachingEffective gospel teaching helps people grow in their testimonies and their faith in Heavenly Father and Jesus Christ.   17.1.1. Love Those You TeachEffective gospel teaching helps people grow in their testimonies and their faith in Heavenly Father and Jesus Christ.   17.1.2. Teach by the SpiritEffective gospel teaching helps people grow in their testimonies and their faith in Heavenly Father and Jesus Christ.
score: 0.026470590382814407, reranker: None. God’s Work of Salvation and Exaltation   Living the Gospel of Jesus Christ   16. Living the Gospel of Jesus ChristWe live the gospel as we exercise faith in Jesus Christ, repent daily, make covenants with God as we receive the ordinances of salvation and exaltation, and endure to the end by keeping those covenants.  17. Teaching the Gospel   17. Teaching the GospelEffective gospel teaching helps people grow in th

#### Enable Exhaustive `ExhaustiveKnn`

In [19]:
r = search_client.search(
    search_query,
    top=5,
    vector_queries=[
        RawVectorQuery(
            vector=search_vector, k=50, fields="content_vector", exhaustive=True
        )
    ],
)
for doc in r:
    content = doc["content"].replace("\n", " ")[:1000]
    print(
        f"score: {doc['@search.score']}, reranker: {doc['@search.reranker_score']}. {content}"
    )

score: 0.028370220214128494, reranker: None. 17.1. Principles of Christlike Teaching   17.1. Principles of Christlike TeachingEffective gospel teaching helps people grow in their testimonies and their faith in Heavenly Father and Jesus Christ.   17.1.1. Love Those You TeachEffective gospel teaching helps people grow in their testimonies and their faith in Heavenly Father and Jesus Christ.   17.1.2. Teach by the SpiritEffective gospel teaching helps people grow in their testimonies and their faith in Heavenly Father and Jesus Christ.
score: 0.026470590382814407, reranker: None. God’s Work of Salvation and Exaltation   Living the Gospel of Jesus Christ   16. Living the Gospel of Jesus ChristWe live the gospel as we exercise faith in Jesus Christ, repent daily, make covenants with God as we receive the ordinances of salvation and exaltation, and endure to the end by keeping those covenants.  17. Teaching the Gospel   17. Teaching the GospelEffective gospel teaching helps people grow in th

## Semantic ranking

This method uses the `@search.rerankerScore` parameter and a semantic ranking algorithm for scoring. Semantic ranking is a method that uses machine learning models to understand the semantic content of the queries and documents, and ranks the documents based on their relevance to the query. The scoring range is 0.00 - 4.00 in this method.

Remember, a higher score indicates a higher relevance of the document to the query.

In [20]:
# hybrid retrieval + rerank
r = search_client.search(
    search_query,
    top=5,
    vector_queries=[
        RawVectorQuery(vector=search_vector, k=50, fields="content_vector")
    ],
    query_type="semantic",
    semantic_configuration_name="config",
    query_language="en-us",
)

for doc in r:
    content = doc["content"].replace("\n", " ")[:1000]
    print(
        f"score: {doc['@search.score']}, reranker: {doc['@search.reranker_score']}. {content}"
    )

score: 0.015384615398943424, reranker: 2.584941864013672. 27.1.3. Members Who Have Physical DisabilitiesThe temple is the house of the Lord. It points us to our Savior, Jesus Christ. In temples, we participate in sacred ordinances and make covenants with Heavenly Father that bind us to Him and to our Savior. These covenants and ordinances prepare us to return to Heavenly Father’s presence and to be sealed together as families for eternity.
score: 0.01587301678955555, reranker: 2.510263442993164. 18.10.4. Who Performs the OrdinanceOrdinances and blessings are sacred acts performed by the authority of the priesthood and in the name of Jesus Christ. As priesthood holders perform ordinances and blessings, they follow the Savior’s example of blessing others.
score: 0.01660528965294361, reranker: 2.4976255893707275. 27.3.1. Who May Be Sealed in a TempleThe temple is the house of the Lord. It points us to our Savior, Jesus Christ. In temples, we participate in sacred ordinances and make coven