## 🚀 Motivation

With the intention to harness the full potential of AI technologies like OpenAI's GPT-4, I recognize the need to solve critical challenges in data management and retrieval. This pursuit is driven by the goal to optimize the Retrieval Augmented Generation (RAG) process, ensuring that the AI model not only accesses but also accurately interprets and utilizes the vast knowledge stored in vector databases. 

## ❗ Problem Statement


When integrating OpenAI models, such as GPT-4, with a vector store, we encounter a unique challenge. This challenge primarily revolves around the process of Retrieval Augmented Generation (RAG). In this process, the model interacts with the vector store to retrieve specific knowledge chunks to answer a particular question from the user. This interaction presents a complex problem for the developer in terms of an effective chunking, sorting, and retrieval data strategy.

**🔍 The Challenges:**

**Data Chunking and Sorting:**

+ **Determining Optimal Chunk Size**: Deciding the appropriate size for document chunks is crucial. Too large, and the chunks may exceed the model's context window, leading to loss of information; too small, and they may lack sufficient context.

+ **Effective Sorting Strategies**: Sorting these chunks for efficient retrieval is another challenge. The sorting mechanism needs to ensure that the most relevant chunks are prioritized.

+ **Overlap Consideration**: Implementing overlapping chunks can be vital. It ensures continuity and context preservation, especially when dealing with long documents or complex topics.

**The Impact: Fragmented Information**

This fragmentation becomes particularly noticeable when similar terms appear across different sections of a document. The system may inadvertently mix up data from unrelated contexts, leading to potential confusion and misinformation. Additionally, the relevance of retrieved information can vary significantly based on how well the chunking and sorting strategy has been implemented.

## 💡 Solution

Optimize chunking strategy to boost Relevant Information Retrieval metrics:

The primary goal in this integration is to retrieve the most relevant and contextually accurate information. This requires a nuanced understanding of both the content's nature and the query's intent.

**Importance of Chunking Strategy**: Effective chunking strategies, like creating overlapping chunks, play a critical role. They ensure that vital information isn't lost or misrepresented, and that each chunk contains a coherent piece of information that contributes meaningfully to answering the user's query.

**Overlapping Chunks for Context Preservation**: Overlapping chunks can bridge the context gap between different sections of a document, ensuring that the retrieved information is not only relevant but also contextually complete, providing a more accurate and holistic response.

"Good to start a.k.a best practices" -> implies **512 token chunks with 25% overlap.**

## 📝 How-to

Addressing the intricate challenges of retrieval-augmented generation (RAG) systems necessitates a thoughtful and experimental approach. Here are all the elements to enhance your chunking strategy and search:

**🚀 Implement LangChain for high-level orchestration**

Use Azure Search as the vector store, incorporating an overlapping chunking strategy.

**🔎 Exploring Azure Cognitive Search**

The query stack in Azure Cognitive Search is structured into two main layers: retrieval (L1) and ranking (L2).

+ Retrieval (L1): This layer aims to quickly identify documents from a potentially massive index that meet the search criteria. It operates in three modes:

    + Keyword Mode: Utilizes traditional full-text search methods, breaking content into terms through language-specific text analysis, creating inverted indexes for rapid retrieval, and scoring with the BM25 probabilistic model.
    - Vector Mode: Converts documents into vector representations using an embedding model (like Azure Open AI's text-embedding-ada-002, or Ada-002) and performs retrieval by matching query embeddings with the closest document vectors.
    + Hybrid Mode: Combines keyword and vector retrieval, using Reciprocal Rank Fusion (RRF) to fuse and select the best results from each method.

- Ranking (L2): This layer processes a subset of the top results from L1, applying more computational power to compute high-quality relevance scores and reorder them. However, L2 can only enhance the ranking of documents identified by L1; it cannot recover missed documents. The L2 ranking, critical in RAG applications, employs multi-lingual, deep learning models adapted from Microsoft Bing to semantically rank the top 50 L1 results.

Azure Cognitive Search's sophisticated approach, combining keyword and vector search in retrieval and enhancing results with advanced semantic ranking, addresses the complexities of large-scale, nuanced search requirements.


## Getting Started

Before you start, ensure you have a `.env` file in your project directory with the following keys:

```plaintext
OPENAI_API_KEY=****
OPENAI_ENDPOINT=****
AZURE_OPENAI_API_VERSION=****
AZURE_SEARCH_SERVICE_ENDPOINT=****
AZURE_SEARCH_ADMIN_KEY=****
```
and install 

%pip install azure-search-documents==11.4.0b8

In [7]:
import os

# Define the target directory
target_directory = r'C:\Users\pablosal\Desktop\azure-ai-gbb-solutions' #change your directory here

# Check if the directory exists
if os.path.exists(target_directory):
    # Change the current working directory
    os.chdir(target_directory)
    print(f"Directory changed to {os.getcwd()}")
else:
    print(f"Directory {target_directory} does not exist.")

Directory changed to C:\Users\pablosal\Desktop\azure-ai-gbb-solutions


## Chunking

In [8]:
# Import the TextChunkingIndexing class from the langchain_integration module
from src.gbb_ai.rag_utils.langchain_integration import TextChunkingIndexing

# Create an instance of the TextChunkingIndexing class
gbb_ai_client = TextChunkingIndexing()

# Set up the OpenAI API client
gbb_ai_client.setup_aoai()

# Define the name of the deployment
DEPLOYMENT_NAME = "foundational-ada"

# Load the embedding model associated with the specified deployment
gbb_ai_client.load_embedding_model(deployment=DEPLOYMENT_NAME)

2023-11-22 12:37:04,964 - micro - MainProcess - INFO     Loading OpenAIEmbeddings object with model text-embedding-ada-002, deployment foundational-ada, and chunk size 1 (langchain_integration.py:load_embedding_model:106)
2023-11-22 12:37:04,968 - micro - MainProcess - INFO     OpenAIEmbeddings object created successfully. (langchain_integration.py:load_embedding_model:119)


OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, async_client=None, model='text-embedding-ada-002', deployment='foundational-ada', openai_api_version='2023-05-15', openai_api_base='https://ml-workspace-dev-eastus-001-aoai.openai.azure.com/', openai_api_type='azure', openai_proxy='', embedding_ctx_length=8191, openai_api_key='d050ad8b96ef4ecbb5099eece1212a91', openai_organization=None, allowed_special=set(), disallowed_special='all', chunk_size=16, max_retries=2, request_timeout=None, headers=None, tiktoken_model_name=None, show_progress_bar=True, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, http_client=None)

In [9]:
# Define the name of the Azure Search index
# This is the index where your data is stored in Azure Search
INDEX_NAME = "index-teradyne-web"

# Set up the Azure Search client with the specified index
# This prepares the client to interact with the Azure Search service
gbb_ai_client.setup_azure_search(index_name=INDEX_NAME)

100%|██████████| 1/1 [00:00<00:00, 16.72it/s]
2023-11-22 12:37:06,053 - micro - MainProcess - INFO     Azure Cognitive Search client configured successfully. (langchain_integration.py:setup_azure_search:188)


<langchain.vectorstores.azuresearch.AzureSearch at 0x22bb43d8610>

In [10]:
# Scrap web and chuck files intp sentences 
# Define the URLs of the web pages to scrape
file_1 = "C:\\Users\\pablosal\\Desktop\\azure-ai-gbb-solutions\\workshop\\solution\\build_your_own_copilot_aoai\\rag_pattern\\pdf\\ultraflex_user_manual.pdf"

# Set the chunk size and overlap size for splitting the text
CHUNK_SIZE = 512
OVERLAP_SIZE = 128
SEPARATOR = "(\n\w|\w\n)"

# Scrape the web pages, split the text into chunks, and store the chunks
# The text is split into chunks of size CHUNK_SIZE, with an overlap of OVERLAP_SIZE between consecutive chunks
text_chuncked = gbb_ai_client.load_and_split_text_by_character_from_pdf(source=file_1, chunk_size=CHUNK_SIZE, chunk_overlap=OVERLAP_SIZE)

## Indexing

In [11]:
# Embed the chunks and index them in Azure Search
# This function converts the text chunks into vector embeddings and stores them in the Azure Search index
gbb_ai_client.embed_and_index(text_chuncked)

100%|██████████| 1/1 [00:00<00:00, 15.95it/s]
100%|██████████| 1/1 [00:00<00:00, 15.14it/s]
100%|██████████| 1/1 [00:00<00:00, 12.00it/s]
100%|██████████| 1/1 [00:00<00:00, 12.84it/s]
100%|██████████| 1/1 [00:00<00:00, 11.35it/s]
100%|██████████| 1/1 [00:00<00:00, 14.58it/s]
100%|██████████| 1/1 [00:00<00:00, 13.57it/s]
100%|██████████| 1/1 [00:00<00:00, 12.47it/s]
100%|██████████| 1/1 [00:05<00:00,  5.51s/it]
100%|██████████| 1/1 [00:00<00:00, 16.56it/s]
100%|██████████| 1/1 [00:00<00:00,  1.35it/s]
100%|██████████| 1/1 [00:00<00:00, 12.31it/s]
100%|██████████| 1/1 [00:00<00:00, 14.26it/s]
100%|██████████| 1/1 [00:00<00:00, 13.74it/s]
100%|██████████| 1/1 [00:00<00:00, 13.21it/s]
  0%|          | 0/1 [00:00<?, ?it/s]Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..
100%|██████████| 1/1 [00:04<00:00,  4.12s/it]
100%|██████████| 1/1 [00:00<00:00, 12.86it/s]
100