# Phase 2, 3 and 4: Search, Augment and Generate the Answer
In this notebook there are several parts:
- Import libraries, load configuration variables and create clients
- Hybrid search with Semantic ranker
- Filter the chunks leaving the most relevant compared with the user's question
- Generate the answer for the query using the most relevante chunks as the context

### Import libraries, load configuration variables and create clients

In [None]:
#%pip install python-dotenv
#%pip install openai
#%pip install tiktoken
#%pip install azure-search-documents

In [1]:
# Import libraries
import os
import sys
import json
import time
import pandas as pd
from dotenv import load_dotenv, find_dotenv
from openai import AzureOpenAI

sys.path.append(os.path.abspath('..'))
from common_utils import *

# Load Azure OpenAI and AI Search variables and create clients
openai_config, ai_search_config = load_config()


aoai_endpoint: https://openai-asc-swit-north.openai.azure.com/
aoai_deployment_name: gpt-4o
oai_embedding_model: ada
aoai_rerank_model: gpt-4o-mini
ai_search_index_name_regs: rag-index-regs
ai_search_index_name_docs: rag-index-docs


### Search in AI Search with hybrid (keyword and vector searches) with semantic ranker

In [4]:
# Generate the query for the question
question = "What is included in my Northwind Health Plus plan?"

# Hybrid search
results, num_results = semantic_hybrid_search(ai_search_config["ai_search_client_docs"],
                                              openai_config["openai_client"],
                                              openai_config["aoai_embedding_model"],
                                              question, 10)
print(f"num results: {num_results}")
print(f"num len(results): {len(results)}")
show_results(results, question)


num results: 144
num len(results): 10
Hybrid Search Results: [
  {
    "id": "141",
    "title": "Northwind_Health_Plus_Benefits_Details.md",
    "content": " beyond what is\nprovided by Workers' Compensation Insurance. These benefits may include disability\nbenefits, unemployment benefits, or Social Security benefits. It is important for employees\nto research these options in order to determine if they are eligible for any additional\nbenefits.\n\nWhen an employee is injured or becomes ill, they should contact the Workers'\nCompensation Insurance provider immediately. The provider will provide the employee\nwith information on the process and how to file a claim. The provider may also provide\nadditional resources to help the employee understand their rights and responsibilities.\n\nIt is important for employees to remember that Workers' Compensation Insurance is a\nbenefit that is provided by the employer. It is the employer's responsibility to ensure that\nemployees are aware of th

#### Filter the chunks compared with the user's question and Generate the answer using the relevant chunks as context

In [4]:
# Valid chunks for the user question
valid_chunks, num_chunks = get_filtered_chunks(openai_config["openai_client"],
                                               openai_config["aoai_rerank_model"],
                                               results,
                                               question)
print(f"num valid chunks: {num_chunks}")

# Generate answer:
answer = generate_answer(openai_config["openai_client"],
                         openai_config["aoai_deployment_name"],
                                       valid_chunks,
                                       question)
print(f"\n>> Answer:\n{answer}")


num valid chunks: 2

Calling Azure OpenAI model gpt-4o...

>> Answer:
Your Northwind Health Plus plan includes the following key components:

- **Copayment**: A fixed amount you pay for a covered service at the time of service.
- **Deductible**: $2,000 per year, which is the amount you must pay out-of-pocket before the plan starts covering services.
- **Coinsurance**: 20% of the cost of a covered service after meeting the deductible.
- **Out-of-Pocket Maximum**: $4,000 per year, which includes the deductible, coinsurance, and copayments.

**Provider Network**:
- **In-Network Provider**: Lower copayments and coinsurance.
- **Out-of-Network Provider**: Higher copayments and coinsurance.

**Exceptions**:
- **Preventive Care**: Covered at 100% with no copayment, deductible, or coinsurance.
- **Prescription Drugs**: Subject to a copayment, varying by drug type.
- **Mental Health and Substance Abuse Services**: Subject to a copayment and deductible.
- **Emergency Services**: Subject to a cop

In [None]:
## End-to-end process:

question = "What is included in my Northwind Health Plus plan?"
print(f'Question: {question}')

# Hybrid search with Semantic ranker
results, num_results = semantic_hybrid_search(ai_search_config["ai_search_client_docs"],
                                              openai_config["openai_client"],
                                              openai_config["aoai_embedding_model"],
                                              question, 50)
print(f"num results: {num_results}")
show_results(results, question)

# Filter valid chunks for the user question
valid_chunks, num_chunks = get_filtered_chunks(openai_config["openai_client"],
                                               openai_config["aoai_rerank_model"],
                                               results, question)

# Generate answer:
answer = generate_answer(openai_config["openai_client"],
                         openai_config["aoai_deployment_name"],
                         valid_chunks, question)
print(f"\n>> Answer: {answer}")

## Using conversation history

In [None]:
## End-to-end process using conversation history:

import pandas as pd

# Read test data from Excel file
input_file = "../5_evaluation/ground_truth.xlsx"
df = pd.read_excel(input_file,)
data_dict = df.to_dict(orient='records')

question = ''
history=[]
for i, line in enumerate(data_dict):

    question = line['QUESTION']

    print(f'[{i+1}] Question: {question}')
    query = generate_search_query(openai_config["openai_client"],
                           openai_config["aoai_deployment_name"],
                           question,
                           history)
    print(f'Rewritten Question: {query}')

    # Hybrid search with Semantic ranker
    results, num_results = semantic_hybrid_search(ai_search_config["ai_search_client_docs"],
                                                openai_config["openai_client"],
                                                openai_config["aoai_embedding_model"],
                                                query, 50)
    print(f"num results: {num_results}")
    #show_results(results, query)

    # Filter valid chunks for the user question
    valid_chunks, num_chunks = get_filtered_chunks(openai_config["openai_client"],
                                                openai_config["aoai_rerank_model"],
                                                results, question)
    # Generate answer with best chunks as context and the conversation history:
    answer = generate_answer_with_history(openai_config["openai_client"],
                                          openai_config["aoai_deployment_name"],
                                          valid_chunks,
                                          question,
                                          history)
    print(f"\n>> Answer: {answer}\n")

    # check if the number of question and answer pair has reached the limit of 3 and remove the oldest one
    if len(history) >= 3:
        history.pop(0)
    history.append({"question": question, "answer": answer})
    print(f"\nhistory: {json.dumps(history, indent=2)}\n")
    print("--------------------------------------------------")

[1] Question: What is included in my Northwind Health Plus plan?

curr_messages: [
  {
    "role": "system",
    "content": "Below is a history of the conversation so far, and a new question asked by the user that needs to be answered by searching in a knowledge base.\nYou have access to Azure AI Search index with 100's of documents. Follow the steps below to generate a search query:\n1. Identify the previous questions and answers related to the new question.\n2. Generate a search query based on the conversation and the new question, including in the query the key topics in the previous related questions and their answers.\nRemarks:\n- Do not include cited source filenames and document names e.g info.txt or doc.pdf in the search query terms.\n- Do not include any text inside [] or <<>> in the search query terms.\n- Do not include any special characters like '+'.\n- If you cannot generate a search query, return just the number 0."
  },
  {
    "role": "user",
    "content": "How did cry