<a href="https://colab.research.google.com/github/aishadvitya/llm_playground/blob/main/ELV_sub_question_query_engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/docs/examples/query_engine/sub_question_query_engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sub Question Query Engine
In this tutorial, we showcase how to use a **sub question query engine** to tackle the problem of answering a complex query using multiple data sources.  
It first breaks down the complex query into sub questions for each relevant data source,
then gather all the intermediate reponses and synthesizes a final response.

### Preparation

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [1]:
!pip install llama-index



In [2]:
import os

os.environ["OPENAI_API_KEY"] = "sk-.."

import nest_asyncio

nest_asyncio.apply()

In [3]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.core import Settings

In [4]:
# Using the LlamaDebugHandler to print the trace of the sub questions
# captured by the SUB_QUESTION callback event type
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])

Settings.callback_manager = callback_manager

### Download Data

In [6]:
# load data
pg_essay = SimpleDirectoryReader(input_files=["/content/call_transcript_elv.pdf"]).load_data()

# build index and query engine
vector_query_engine = VectorStoreIndex.from_documents(
    pg_essay,
    use_async=True,
).as_query_engine()

**********
Trace: index_construction
    |_node_parsing -> 0.036634 seconds
      |_chunking -> 0.00155 seconds
      |_chunking -> 0.001575 seconds
      |_chunking -> 0.001391 seconds
      |_chunking -> 0.001357 seconds
      |_chunking -> 0.001427 seconds
      |_chunking -> 0.001205 seconds
      |_chunking -> 0.00118 seconds
      |_chunking -> 0.001104 seconds
      |_chunking -> 0.001137 seconds
      |_chunking -> 0.00103 seconds
      |_chunking -> 0.001274 seconds
      |_chunking -> 0.001192 seconds
      |_chunking -> 0.001297 seconds
      |_chunking -> 0.001098 seconds
      |_chunking -> 0.001053 seconds
      |_chunking -> 0.001239 seconds
      |_chunking -> 0.000895 seconds
      |_chunking -> 0.001074 seconds
      |_chunking -> 0.001196 seconds
      |_chunking -> 0.00102 seconds
      |_chunking -> 0.00023 seconds
    |_embedding -> 1.099091 seconds
**********


### Setup sub question query engine

In [12]:
# setup base query engine as tool
query_engine_tools = [
    QueryEngineTool(
        query_engine=vector_query_engine,
        metadata=ToolMetadata(
            name="Elevance finance call",
            description="Elevance earnings finance call transcript",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)

### Run queries

In [14]:
response = query_engine.query(
    "What was gael and mark's overview of first quarter performance  "
)

Generated 2 sub questions.
[1;3;38;2;237;90;200m[Elevance finance call] Q: What was Gael's overview of first quarter performance?
[0m[1;3;38;2;90;149;237m[Elevance finance call] Q: What was Mark's overview of first quarter performance?
[0m[1;3;38;2;90;149;237m[Elevance finance call] A: Mark provided an overview of the first quarter performance, highlighting solid results under a dynamic operating environment. The quarter saw growth in commercial fee-based and individual ACA members, offsetting Medicaid attrition. Operating revenue was in line with expectations, with improved benefit expense and disciplined expense management. Operating cash flow was $2 billion, and the debt to capital ratio was maintained within the target range. The strategic partnership with Clayton, Dubilier & Rice was emphasized as a step towards advancing a local-oriented approach to care delivery.
[0m[1;3;38;2;237;90;200m[Elevance finance call] A: Gail provided an overview of the first quarter performance,

In [9]:
print(response)

In [10]:
# iterate through sub_question items captured in SUB_QUESTION event
from llama_index.core.callbacks import CBEventType, EventPayload

for i, (start_event, end_event) in enumerate(
    llama_debug.get_event_pairs(CBEventType.SUB_QUESTION)
):
    qa_pair = end_event.payload[EventPayload.SUB_QUESTION]
    print("Sub Question " + str(i) + ": " + qa_pair.sub_q.sub_question.strip())
    print("Answer: " + qa_pair.answer.strip())
    print("====================================")