# Advanced RAG with LlamaParse

<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/demo_advanced.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook is a complete walkthrough for using LlamaParse with advanced indexing/retrieval techniques in LlamaIndex over the Apple 10K Filing.

This allows us to ask sophisticated questions that aren't possible with "naive" parsing/indexing techniques with existing models.

Note for this example, we are using the `llama_index >=0.10.4` version

## Iteration 2 without llamaParse nor langgraph ==> Qdrant instead (goal: reduce latency)


In [105]:
!pip install ipython-autotime
%load_ext autotime

The autotime extension is already loaded. To reload it, use:
  %reload_ext autotime
time: 3.23 s (started: 2024-10-11 16:00:14 +00:00)


In [106]:
!pip install llama-index
!pip install llama-index-core==0.10.6.post1
!pip install llama-index-embeddings-openai
!pip install llama-index-postprocessor-flag-embedding-reranker
!pip install git+https://github.com/FlagOpen/FlagEmbedding.git



Collecting llama-index-core==0.10.6.post1
  Using cached llama_index_core-0.10.6.post1-py3-none-any.whl.metadata (3.6 kB)
Using cached llama_index_core-0.10.6.post1-py3-none-any.whl (15.4 MB)
Installing collected packages: llama-index-core
  Attempting uninstall: llama-index-core
    Found existing installation: llama-index-core 0.11.17
    Uninstalling llama-index-core-0.11.17:
      Successfully uninstalled llama-index-core-0.11.17
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index 0.11.17 requires llama-index-core<0.12.0,>=0.11.17, but you have llama-index-core 0.10.6.post1 which is incompatible.
llama-index-agent-openai 0.3.4 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.10.6.post1 which is incompatible.
llama-index-cli 0.3.1 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.10.6.post1 whic

In [107]:
!pip install llama-index-llms-groq

time: 5.04 s (started: 2024-10-11 16:00:58 +00:00)


In [108]:
!pip install llama-index-embeddings-langchain

time: 6.66 s (started: 2024-10-11 16:01:03 +00:00)


In [109]:
!pip install llama-index-embeddings-huggingface

time: 10.1 s (started: 2024-10-11 16:01:10 +00:00)


In [110]:
!pip install langchain langchain_openai langchain_core

time: 5.55 s (started: 2024-10-11 16:01:20 +00:00)


In [111]:
!pip install llama-index-vector-stores-qdrant

time: 8.99 s (started: 2024-10-11 16:01:26 +00:00)


In [112]:
!pip install llama-index-embeddings-fastembed

time: 6.46 s (started: 2024-10-11 16:01:35 +00:00)


In [113]:
!pip install -U qdrant_client

time: 8.4 s (started: 2024-10-11 16:01:41 +00:00)


In [114]:
!pip install sentence_transformers

time: 4.14 s (started: 2024-10-11 16:01:50 +00:00)


In [115]:
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
import nest_asyncio

nest_asyncio.apply()

import os

# API access to llama-cloud
#os.environ["LLAMA_CLOUD_API_KEY"] = "llx-gQNyCBU754aTjv4qQU3R7iixMmci1RLsKUmSktqb4IX39j9R"

#os.environ["COHERE_API_KEY"] = "WVjjvmm1CoFM9BK2r7dBDUMONEElFsfVkHHECG3N"

# Using OpenAI API for embeddings/llms
os.environ["OPENAI_API_KEY"] = "sk-sl-stPZrlUXvDAgRp1beofE7Jbk5GsbGL9OcbjHT5cT3BlbkFJYrzjtODllOkcnWRPNgzYpqnE2KhL-aIeE4kgwJZUUA"

#Llama 3.2 with groq


#os.environ["GROQ_API_KEY"] = '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'

time: 987 µs (started: 2024-10-11 16:01:54 +00:00)


In [116]:
import logging
import sys
import os

import qdrant_client
from IPython.display import Markdown, display
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore
#from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.core import Settings
from sentence_transformers import SentenceTransformer



#Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")

time: 1.04 ms (started: 2024-10-11 16:01:54 +00:00)


In [117]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex
from llama_index.core import Settings
#from langchain_cohere import CohereEmbeddings
from llama_index.llms.groq import Groq
#from langchain_groq import ChatGroq
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# from sentence_transformers import SentenceTransformer


#embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
#embed_model=SentenceTransformer("sentence-transformers/multi-qa-mpnet-base-dot-v1")
#embed_model= CohereEmbeddings(model="embed-english-v3.0")
#LangchainEmbedding(HuggingFaceEmbeddings(model_name='sentence-transformers/multi-qa-mpnet-base-cos-v1'))
llm = OpenAI(model="gpt-4o",seed=0, max_tokens=16384)

#llm = ChatGroq(model="llama3-groq-70b-8192-tool-use-preview"
#llm = Groq(model="llama3-groq-70b-8192-tool-use-preview")
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 1024
Settings.chunk_overlap = 800
#Settings.max_output_tokens = 128000

time: 12.5 ms (started: 2024-10-11 16:01:54 +00:00)


In [118]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
time: 2.17 s (started: 2024-10-11 16:01:54 +00:00)


In [119]:
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

time: 1 ms (started: 2024-10-11 16:01:56 +00:00)


In [120]:
#load multiple pdfs with llama parse (markdown)
from llama_parse import LlamaParse

#documents = LlamaParse(result_type="markdown").load_data(["/content/test_files/Anthem MediBlue Access (PPO) - Documents.pdf","/content/test_files/Anthem Full Dual Advantage (PPO D-SNP).pdf","/content/test_files/Anthem MediBlue Dual Advantage (HMO D-SNP) Document.pdf","/content/test_files/Anthem Medicare Advantage (HMO).pdf","/content/test_files/Anthem Medicare Advantage 2 (PPO).pdf"])

time: 637 µs (started: 2024-10-11 16:01:56 +00:00)


In [121]:
# load documents with llama index directory reader
documents = SimpleDirectoryReader("/content/test_files").load_data()

time: 12.5 s (started: 2024-10-11 16:01:56 +00:00)


In [122]:
from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker

reranker = FlagEmbeddingReranker(
    top_n=8,
    model="BAAI/bge-reranker-large",
)

# recursive_query_engine = recursive_index.as_query_engine(
#     similarity_top_k=8, node_postprocessors=[reranker], verbose=True
# )

time: 3.91 s (started: 2024-10-11 16:02:09 +00:00)


In [123]:
client = qdrant_client.QdrantClient(
    # you can use :memory: mode for fast and light-weight experiments,
    # it does not require to have Qdrant deployed anywhere
    # but requires qdrant-client >= 1.1.1
    #location=":memory:"
    # otherwise set Qdrant instance address with:
    url="https://225f6124-ed28-4252-a49a-040e264c4d28.europe-west3-0.gcp.cloud.qdrant.io:6333",
    # otherwise set Qdrant instance with host and port:
    #host="localhost",
    #port=6333
    # set API KEY for Qdrant Cloud
    api_key="R084qA_ksblgR_oMAebtKKnqGkZRfZxldfiCH6TjojoWbck9Tv4TyQ",
)

time: 703 ms (started: 2024-10-11 16:02:13 +00:00)


In [124]:
vector_store = QdrantVectorStore(client=client, collection_name="healthcare-ins1")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context
)

time: 8.34 s (started: 2024-10-11 16:02:13 +00:00)


In [143]:
qdrant_query_engine = index.as_query_engine(similarity_top_k=20,PostprocessorComponent=[reranker])

time: 1.01 ms (started: 2024-10-11 16:04:50 +00:00)


In [146]:
response=qdrant_query_engine.query("what are all the plans mentioned in the documents?")

time: 2.69 s (started: 2024-10-11 16:10:47 +00:00)


In [147]:
print(response)

The plans mentioned in the documents are:

1. Anthem MediBlue Dual Advantage (HMO D-SNP)
2. Anthem Medicare Advantage 2 (PPO)
3. Anthem Medicare Advantage (HMO)
4. Anthem Full Dual Advantage (PPO D-SNP)
5. Anthem MediBlue Access Plus (PPO)
6. Anthem MediBlue Access (PPO)
time: 1.67 ms (started: 2024-10-11 16:10:52 +00:00)


## Setup Flow


In [128]:
def get_subquery_response_pairs(query,subqueries):
    """
    Function that takes an original query, decomposes it into subqueries,
    queries a basic query engine for responses, and returns a JSON
    containing subquery/response pairs.

    :param query: str : The original query from the user
    :return: dict : JSON object with subquery/response pairs
    """

    # Decompose the original query into subqueries


    # Initialize an empty dictionary to store indexed subquery-response pairs
    subquery_response_pairs = {
        "original_query": query
    }

    # Loop through the subqueries and fetch responses, adding indexed key-value pairs
    for idx, subquery in enumerate(subqueries["sub_queries"], start=1):
        # Get the response from the raw query engine for each subquery
        response = qdrant_query_engine.query(subquery)

        # Add subquery and response pair to the dictionary in the desired format
        subquery_response_pairs[f"subquery_{idx}"] = subquery
        subquery_response_pairs[f"response_{idx}"] = response

    # Return the final JSON with indexed subquery/response pairs
    return subquery_response_pairs

# Example usage
# query = "give all copay details for medicare access plus plan and Anthem full dual advantage PPO D-SNP plan"
# result = get_subquery_response_pairs(query)
# print(result)

time: 1.11 ms (started: 2024-10-11 16:02:24 +00:00)


In [129]:
def get_citations(subquery_response_pairs):
  """

  """
  citations = []
  for key, response in subquery_response_pairs.items():
        if hasattr(response, 'source_nodes'):  # Checking if response has source_nodes
            source_nodes = response.source_nodes
            for node_with_score in source_nodes:
                node = node_with_score.node
                # Extract metadata, providing fallbacks if not present
                file_name = node.metadata.get('file_name', 'Unknown File')
                page_label = node.metadata.get('page_label', 'Unknown Page')

                # Only append citation if metadata exists
                if file_name != 'Unknown File' or page_label != 'Unknown Page':
                    citations.append(f"{file_name} (Page {page_label})")

  return citations

time: 1.5 ms (started: 2024-10-11 16:02:24 +00:00)


In [130]:
def get_citations_and_text(subquery_response_pairs):
    citations = []
    text_chunks = []
    for key, response in subquery_response_pairs.items():
        if hasattr(response, 'source_nodes'):  # Checking if response has source_nodes
            source_nodes = response.source_nodes
            for node_with_score in source_nodes:
                node = node_with_score.node
                # Extract metadata, providing fallbacks if not present
                file_name = node.metadata.get('file_name', 'Unknown File')
                page_label = node.metadata.get('page_label', 'Unknown Page')

                # Only append citation and text if metadata exists
                if file_name != 'Unknown File' or page_label != 'Unknown Page':
                    citations.append(f"{file_name} (Page {page_label})")
                    text_chunks.append(f"Text Chunk (Page {page_label}): {node.text[:200]}...")  # Limit text to first 200 characters
    return citations, text_chunks

time: 1.21 ms (started: 2024-10-11 16:02:24 +00:00)


In [131]:
def get_unique_citations_and_text(subquery_response_pairs):

    unique_citations = set()  # Use a set to store unique file-page pairs
    text_chunks = []

    for key, response in subquery_response_pairs.items():
        if hasattr(response, 'source_nodes'):  # Checking if response has source_nodes
            source_nodes = response.source_nodes
            for node_with_score in source_nodes:
                node = node_with_score.node
                # Extract metadata, providing fallbacks if not present
                file_name = node.metadata.get('file_name', 'Unknown File')
                page_label = node.metadata.get('page_label', 'Unknown Page')

                # Create a unique identifier for each file-page pair
                citation_key = (file_name, page_label)

                # Add only unique citations
                if citation_key not in unique_citations and (file_name != 'Unknown File' or page_label != 'Unknown Page'):
                    unique_citations.add(citation_key)
                    text_chunks.append(f"Text Chunk (Page {page_label}): {node.text[:200]}...")  # Limit text to first 200 characters

    # Convert set to list for easier formatting
    citations = [f"{file} (Page {page})" for file, page in unique_citations]
    return citations, text_chunks

time: 1 ms (started: 2024-10-11 16:02:24 +00:00)


In [132]:
from langchain.output_parsers import PydanticToolsParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.pydantic_v1 import BaseModel, Field

# Define the SubQuery class as per LangChain's documentation
class SubQuery(BaseModel):
    """Represents a specific sub-query extracted from an original query."""
    sub_query: str = Field(..., description="A specific sub-query to answer the original query.")

# Function to perform query decomposition and return sub-queries as JSON
def decompose_query(original_query: str):
    # Define the system message for decomposition
    # system = """You are an expert at converting user questions into specific sub-queries adapted for vercorstore retrieval.
    # Given a user question, break it down into distinct sub-queries that will help extract relevant context to answer the original question.
    # Each sub-query should focus on a single concept, idea, fact or entity.
    # If there are acronyms or words you are not familiar with, do not try to rephrase them."""
    system = """You have access to documents about healthcare insurance plans separately.
    You are an expert at converting user questions into database queries. \

    Perform query decomposition. Given a user question, break it down into distinct sub questions that \
    you need to answer in order to answer the original question.

    If there are acronyms or words you are not familiar with, do not try to rephrase them.
    example :
    original query : give all deductibles for every plan mentioned in the data.
    sub-query 1 : what are the plans mentioned in the documents?
    sub-query 2 : what are the deductibles for each of the plans mentioned in the documents?
    """

    # Create the prompt template
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system),
            ("human", "{question}"),
        ]
    )

    # Initialize the language model
    llm = ChatOpenAI(model="gpt-4o", temperature=0)
    #llm = Groq(model="llama3-groq-70b-8192-tool-use-preview")
    #llm = ChatGroq(model="llama3-groq-70b-8192-tool-use-preview")
    # Bind the SubQuery tool and parser
    llm_with_tools = llm.bind_tools([SubQuery])
    parser = PydanticToolsParser(tools=[SubQuery])

    # Create the query analyzer
    query_analyzer = prompt | llm_with_tools | parser

    # Run the query analyzer
    sub_queries = query_analyzer.invoke({"question": original_query})

    # Convert the result to JSON format
    sub_queries_json = {
        "original_query": original_query,
        "sub_queries": [sub_query.sub_query for sub_query in sub_queries]
    }

    return sub_queries_json

# Example usage
# query = "give a list of copay details related to each insurance plan mentioned in the documents you have"
# result = decompose_query(query)
# print(result)

time: 2.23 ms (started: 2024-10-11 16:02:24 +00:00)


In [133]:
from langchain.output_parsers import PydanticToolsParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.pydantic_v1 import BaseModel, Field

# Define the SubQuery class as per LangChain's documentation
class SubQuery(BaseModel):
    """Represents a specific sub-query extracted from an original query."""
    sub_query: str = Field(..., description="A specific sub-query to answer the original query.")

# Function to perform query decomposition and return sub-queries as JSON
def update_subqueries(original_query, subqueries, llm_answer):

    # Define the system message for decomposition
    system = """You have access to documents about healthcare insurance plans separately.
    Given a user question, and a list of sub-queries in the form of steps, update the list of subqueries based on the context provided so that it will help extract relevant context to answer the original complex question.
    If there are acronyms or words you are not familiar with, do not try to rephrase them.
    Always return an updated list of sub-queries.
    """

    # Create the prompt template with correctly referenced variables
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system),
            ("human", f"Original Query: {original_query}\n\n Old list of sub-queries: {subqueries['sub_queries']}\n\nSub-query 1 (treated): {subqueries['sub_queries'][0]}\n\nContext for Sub-query 1: {llm_answer}\n\nUpdate the old list of sub-queries based on the new context provided about subquery_1.\n\nsub_querie : ")
        ]
    )

    # Initialize the language model
    llm = ChatOpenAI(model="gpt-4o", temperature=0)

    # Bind the SubQuery tool and parser
    llm_with_tools = llm.bind_tools([SubQuery])
    #print( llm_with_tools)
    parser = PydanticToolsParser(tools=[SubQuery])

    # Create the query analyzer
    query_analyzer = prompt | llm_with_tools | parser

    # Run the query analyzer
    sub_queries = query_analyzer.invoke({
        "original_query": original_query,
        "subqueries": subqueries,
        "subqueries_0": subqueries['sub_queries'][0],
        "llm_answer": llm_answer
    })

    # Convert the result to JSON format
    sub_queries_json = {
        "original_query": original_query,

        "sub_queries": [sub_query.sub_query for sub_query in sub_queries]
    }
    #print(prompt)
    return sub_queries_json

# # Example usage
# query = "give a list of the copays for all plans"
# subqueries = {
#     'original_query': 'give all copays for all the plans mentioned in the document',
#     'sub_queries': [
#         'what are the plans mentioned in the documents?',
#         'what are the copays for each of the plans mentioned in the documents?'
#     ]
# }
# llm_answer = "All plans mentioned are AB, BC, and DC."
# result = update_subqueries(query, subqueries, llm_answer)
# print(result)

time: 2.68 ms (started: 2024-10-11 16:02:24 +00:00)


In [134]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

def generate_final_answer(step1_result,subquery_response_pairs):
    """
    Function to generate a final answer using an LLM based on the original query, subqueries,
    and their retrieved responses.

    :param subquery_response_pairs: dict : The dictionary containing the original query, subqueries, and responses
    :return: str : The final answer generated by the LLM
    """

    # Extract original query
    original_query = subquery_response_pairs["original_query"]

    # Construct context from subqueries and responses
    context = ""
    context += f"step 1 result: {step1_result}\n\n"
    for idx in range(1, len(subquery_response_pairs) // 2 + 1):  # Dividing by 2 because we have subquery/response pairs
        subquery = subquery_response_pairs[f"subquery_{idx}"]
        response = subquery_response_pairs[f"response_{idx}"]
        context += f"Subquery {idx}: {subquery}\nRetrieved Context {idx}: {response}\n\n"

    # Define the system prompt
    system_message = """You are an expert in answering complex questions by decomposing them into subqueries and using retrieved information to construct a final answer.
    Given the original query, the subqueries, and their retrieved contexts, please provide a comprehensive answer to the original question.
    """

    # Create the chat template for the LLM
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_message),
            ("human", f"Original Query: {original_query}\n\n{context}\nAnswer the original question using the context provided.")
        ]
    )

    # Initialize the LLM
    llm = ChatOpenAI(model="gpt-4o", temperature=0,seed=0, max_tokens=16384)

    # Run the LLM to generate the final answer
    chain = prompt | llm
    final_answer=chain.invoke(    {
        "system_message": system_message,
        "original_query": original_query,
        "context": context,
    })



    # Return the final answer
    return final_answer


time: 2.57 ms (started: 2024-10-11 16:02:24 +00:00)


In [135]:
def pretty_print_llm_output(llm_output,subquery_response_pairs):
    """
    Pretty printer function for LLM output, formatting the content, additional info, and metadata.

    :param llm_output: AIMessage : An instance of AIMessage containing LLM output content and metadata
    :return: str : Formatted string for pretty-printed output
    """

    # Extract content from the AIMessage object directly
    content = llm_output.content if hasattr(llm_output, 'content') else 'No content available'
    additional_kwargs = llm_output.additional_kwargs if hasattr(llm_output, 'additional_kwargs') else {}
    response_metadata = llm_output.response_metadata if hasattr(llm_output, 'response_metadata') else {}
    usage_metadata = llm_output.usage_metadata if hasattr(llm_output, 'usage_metadata') else {}

    # Pretty print the output content
    pretty_output = "===== LLM Response Content =====\n"
    pretty_output += content + "\n\n"

    # Extract citations and text chunks
    citations, text_chunks = get_unique_citations_and_text(subquery_response_pairs)


    # Format the final answer with citations and text chunks
    pretty_output += "===== Additional Information =====\n"

    if citations:
        pretty_output += "\n\nCitations:\n" + "\n".join(citations)

    if text_chunks:
        pretty_output += "\n\nRelevant Text Chunks:\n" + "\n".join(text_chunks)

    pretty_output += "\n\n"


    # Response Metadata
    if response_metadata:
        pretty_output += "===== Response Metadata =====\n"
        token_usage = response_metadata.get('token_usage', {})
        model_name = response_metadata.get('model_name', 'Unknown model')
        finish_reason = response_metadata.get('finish_reason', 'Unknown reason')

        pretty_output += f"Model Name: {model_name}\n"
        pretty_output += f"Finish Reason: {finish_reason}\n"
        pretty_output += f"Completion Tokens: {token_usage.get('completion_tokens', 'N/A')}\n"
        pretty_output += f"Prompt Tokens: {token_usage.get('prompt_tokens', 'N/A')}\n"
        pretty_output += f"Total Tokens: {token_usage.get('total_tokens', 'N/A')}\n"
        pretty_output += "\n"

    # Usage Metadata
    if usage_metadata:
        pretty_output += "===== Usage Metadata =====\n"
        pretty_output += f"Input Tokens: {usage_metadata.get('input_tokens', 'N/A')}\n"
        pretty_output += f"Output Tokens: {usage_metadata.get('output_tokens', 'N/A')}\n"
        pretty_output += f"Total Tokens: {usage_metadata.get('total_tokens', 'N/A')}\n"
        pretty_output += "\n"

    return pretty_output

# Example usage with the provided LLM output as an AIMessage object
# formatted_output = pretty_print_llm_output(final_answer)
# print(formatted_output)


time: 1.15 ms (started: 2024-10-11 16:02:24 +00:00)


In [148]:
# Example usage
query = "what are all copays for all plans mentioned "

#Decompose the original query into sub-queries
subqueries = decompose_query(original_query=query)
print("subqueries version 1 : ", subqueries)

#Get context for sub-query 1 (Step 1)
step1=qdrant_query_engine.query(subqueries['sub_queries'][0])

#print(step1)

#Update subqueries based on step1 results
updated_subqueries = update_subqueries(query,subqueries,step1)
print("subqueries version 2 : ", updated_subqueries)

#Get context for updated list of subqueries
subquery_response_pairs = get_subquery_response_pairs(query,updated_subqueries)
print("subquery_response_pairs: ",subquery_response_pairs)
# Extract citations
#citations = get_citations(subquery_response_pairs)
#print("citations: ",citations)

#Pass sub-queries and related context to a final llm call
final_answer = generate_final_answer(step1,subquery_response_pairs)

formatted_output = pretty_print_llm_output(final_answer,subquery_response_pairs)
print(formatted_output)

subqueries version 1 :  {'original_query': 'what are all copays for all plans mentioned ', 'sub_queries': ['what are the plans mentioned in the documents?', 'what are the copays for each of the plans mentioned in the documents?']}
subqueries version 2 :  {'original_query': 'what are all copays for all plans mentioned ', 'sub_queries': ['what are the copays for Anthem Medicare Advantage 2 (PPO)?', 'what are the copays for Anthem MediBlue Dual Advantage (HMO D-SNP)?', 'what are the copays for Anthem MediBlue Access Plus (PPO)?', 'what are the copays for Anthem MediBlue Access (PPO)?', 'what are the copays for Anthem Medicare Advantage (HMO)?']}
subquery_response_pairs:  {'original_query': 'what are all copays for all plans mentioned ', 'subquery_1': 'what are the copays for Anthem Medicare Advantage 2 (PPO)?', 'response_1': Response(response='The copays for Anthem Medicare Advantage 2 (PPO) are as follows:\n\n- Emergency Care: $90.00 copay\n- Urgently Needed Services: $30.00 copay\n- Dia

streamlit / deploy in docker (gcp)/  anushiya to add this to alphaAI (tuesday)
Time to return answering
caching of Q/a (use exact Q match, SQLite / REDIS)
sentence transformer model (embedding)
metadata filter
main llm (max_tokens + seed)
openai : cache token ++++
solve llm bind tools for opensource

In [137]:
citations = get_citations(subquery_response_pairs)
print("citations: ",citations)

citations:  ['Anthem Full Dual Advantage (PPO D-SNP).pdf (Page 15)', 'Anthem MediBlue Dual Advantage (HMO D-SNP) Document.pdf (Page 16)', 'Anthem Medicare Advantage 2 (PPO).pdf (Page 18)', 'Anthem Medicare Advantage (HMO).pdf (Page 14)']
time: 2.19 ms (started: 2024-10-11 16:02:30 +00:00)


In [138]:
citations, text_chunks = get_citations_and_text(subquery_response_pairs)
print("citations: ",citations)
print("text_chunks: ",text_chunks)

citations:  ['Anthem Full Dual Advantage (PPO D-SNP).pdf (Page 15)', 'Anthem MediBlue Dual Advantage (HMO D-SNP) Document.pdf (Page 16)', 'Anthem Medicare Advantage 2 (PPO).pdf (Page 18)', 'Anthem Medicare Advantage (HMO).pdf (Page 14)']
text_chunks:  ['Text Chunk (Page 15): Summary of BenefitsAnthem Full Dual Advantage (PPO D-SNP)\nVision Services\nEyeglasses or contact lenses after cataract surgery\nDoctors in our plan: $0.00  copay\nDoctors not in our plan: $0.00  copay - ...', 'Text Chunk (Page 16): summary of benefitsAnthem MediBlue Dual Advantage (HMO D-SNP)\nVision Services\nMedicare-covered vision services:\nExam to diagnose and treat diseases and conditions of the eye\nDoctors in our plan:  $0.0...', 'Text Chunk (Page 18): Summary of BenefitsAnthem Medicare Advantage 2 \n(PPO)Anthem Medicare Advantage \n(PPO)\nVision Services3\nRoutine vision services:\nRoutine vision exam\nThis plan covers 1 routine eye \nexam(s) every year. ...', 'Text Chunk (Page 14): Summary of BenefitsAnt

In [139]:
subqueries_list=[]
for idx in range(1, len(subquery_response_pairs) // 2 + 1):  # Dividing by 2 because we have subquery/response pairs
        subquery = subquery_response_pairs[f"subquery_{idx}"]
        subqueries_list.append(subquery)
subqueries_list.append(str(step1))

print("subqueries: ",subqueries_list)

subqueries:  ['what is the copay for routine vision in the Full Dual Advantage plan?', 'The copay for routine vision services in the Anthem MediBlue Access Plus (PPO) plan is $0.00 for doctors in the plan.']
time: 1.09 ms (started: 2024-10-11 16:02:30 +00:00)


Expansion (not used in this iteration)


In [140]:
from langchain.output_parsers import PydanticToolsParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.pydantic_v1 import BaseModel, Field

from langchain_core.pydantic_v1 import BaseModel, Field


# Define the ParaphrasedQuery class as per LangChain's documentation
class ParaphrasedQuery(BaseModel):
    """Represents a paraphrased version of the original query."""
    paraphrased_query: str = Field(..., description="A unique paraphrasing of the original question.")

# Function to perform query expansion and return paraphrased queries as JSON
def expand_query(original_query: str):
    # Define the system message for query expansion
    system = """You are an AI language model assistant. Your task is to generate 3
    different versions of the given user question to retrieve relevant documents from a vector
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search.
    Return at least 3 paraphrased versions of the query."""

    # Create the prompt template
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system),
            ("human", "{question}"),
        ]
    )

    # Initialize the language model
    llm = ChatOpenAI(model="gpt-4o", temperature=0)

    # Bind the ParaphrasedQuery tool and parser
    llm_with_tools = llm.bind_tools([ParaphrasedQuery])
    parser = PydanticToolsParser(tools=[ParaphrasedQuery])

    # Create the query analyzer
    query_analyzer = prompt | llm_with_tools | parser

    # Run the query analyzer
    paraphrased_queries = query_analyzer.invoke({"question": original_query})

    # Convert the result to JSON format
    expanded_queries_json = {
        "original_query": original_query,
        "paraphrased_queries": [pq.paraphrased_query for pq in paraphrased_queries]
    }

    return expanded_queries_json

# Example usage
# query = "give all copays for all plans mentioned in the documents provided "
# result = expand_query(query)
# print(result)

time: 4.2 ms (started: 2024-10-11 16:02:30 +00:00)
