# SubQuestionQueryEngine

Often, we encounter scenarios where our queries span across multiple documents. 

In this notebook, we delve into addressing complex queries that extend over various documents by breaking them down into simpler sub-queries and generate answers using the `SubQuestionQueryEngine`.

### Installation

In [2]:
%pip install llama-index
%pip install llama-index-llms-bedrock
%pip install llama-index-embeddings-bedrock

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


### Setup and imports

In [1]:
from llama_index.core import ( 
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage
)
from llama_index.core.settings import Settings
from llama_index.llms.bedrock import Bedrock
from llama_index.embeddings.bedrock import BedrockEmbedding, Models

In [2]:
llm = Bedrock(model = "anthropic.claude-v2")
embed_model = BedrockEmbedding(model = "amazon.titan-embed-text-v1")

In [4]:
from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 512

### Setup logging

In [5]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

import logging
import sys

# Set up the root logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)  # Set logger level to INFO

# Clear out any existing handlers
logger.handlers = []

# Set up the StreamHandler to output to sys.stdout (Colab's output)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)  # Set handler level to INFO

# Add the handler to the logger
logger.addHandler(handler)

from IPython.display import display, HTML

### Download Data

We will use Uber and Lyft 2021 10K SEC Filings

In [6]:
!mkdir -p './data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O './data/10k/lyft_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O './data/10k/uber_2021.pdf'

--2024-03-26 19:51:48--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1440303 (1.4M) [application/octet-stream]
Saving to: ‘./data/10k/lyft_2021.pdf’


2024-03-26 19:51:48 (189 MB/s) - ‘./data/10k/lyft_2021.pdf’ saved [1440303/1440303]

--2024-03-26 19:51:48--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1880483 (1.8M) [a

### Load Data

In [7]:
from llama_index.core import SimpleDirectoryReader
lyft_docs = SimpleDirectoryReader(input_files=["./data/10k/lyft_2021.pdf"]).load_data()
uber_docs = SimpleDirectoryReader(input_files=["./data/10k/uber_2021.pdf"]).load_data()

In [8]:
print(f'Loaded lyft 10-K with {len(lyft_docs)} pages')
print(f'Loaded Uber 10-K with {len(uber_docs)} pages')

Loaded lyft 10-K with 238 pages
Loaded Uber 10-K with 307 pages


### Index Data

In [9]:
from llama_index.core import VectorStoreIndex
lyft_index = VectorStoreIndex.from_documents(lyft_docs[:100])
uber_index = VectorStoreIndex.from_documents(uber_docs[:100])

### Create Query Engines

In [10]:
lyft_engine = lyft_index.as_query_engine(similarity_top_k=5)

In [11]:
uber_engine = uber_index.as_query_engine(similarity_top_k=5)


### Querying

In [12]:
response = await lyft_engine.aquery('What is the revenue of Lyft in 2021? Answer in millions with page reference')
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

In [13]:
response = await uber_engine.aquery('What is the revenue of Uber in 2021? Answer in millions, with page reference')
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

### Create Tools

In [14]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine

query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(name='lyft_10k', description='Provides information about Lyft financials for year 2021')
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(name='uber_10k', description='Provides information about Uber financials for year 2021')
    ),
]

### Create `SubQuestionQueryEngine`

In [15]:
sub_question_query_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)

### Querying

In [16]:
response = await sub_question_query_engine.aquery('Compare revenue growth of Uber and Lyft from 2020 to 2021')

Generated 2 sub questions.
[1;3;38;2;237;90;200m[uber_10k] Q: What was Uber's revenue growth from 2020 to 2021?
[0m[1;3;38;2;237;90;200m[uber_10k] A: Based on the context, Uber's revenue grew 57% from 2020 to 2021. The context states:

"Revenue was $17.5 billion, or up 57% year-over-year, reflecting the overall growth in our Delivery business and an increase in Freight revenue attributable to the acquisition of Transplace in the fourth quarter of 2021 as well as growth in the number of shippers and carriers on the network combined with an increase in volumes with our top shippers."

This indicates that Uber's revenue increased by 57% from 2020 to 2021.
[0m[1;3;38;2;90;149;237m[lyft_10k] Q: What was Lyft's revenue growth from 2020 to 2021?
[0m[1;3;38;2;90;149;237m[lyft_10k] A: Based on the context provided, Lyft's revenue grew 36% from 2020 to 2021. Specifically, the context states:

"Revenue increased $843.6 million, or 36%, in 2021 as compared to the prior year, driven primaril

In [17]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))

In [18]:
response = await sub_question_query_engine.aquery('Compare the investments made by Uber and Lyft')

Generated 2 sub questions.
[1;3;38;2;237;90;200m[uber_10k] Q: What investments were made by Uber in 2021?
[0m[1;3;38;2;237;90;200m[uber_10k] A: Based on the context provided, Uber made the following investments in 2021:

- In August 2021, Uber completed the acquisition of the remaining 45% ownership interest in Cornershop Cayman ("Cornershop"), or 47% on a fully-diluted basis, in an all-stock transaction.

- On October 12, 2021, Uber completed the acquisition of 100% ownership interest in The Drizly Group, Inc. ("Drizly"), an on-demand alcohol marketplace in North America. This allowed Uber to expand alcohol offerings in its Delivery business.

- Uber acquired Transplace, a logistics technology company, in the fourth quarter of 2021. This acquisition helped grow Uber's Freight business.
[0m[1;3;38;2;90;149;237m[lyft_10k] Q: What investments were made by Lyft in 2021?
[0m[1;3;38;2;90;149;237m[lyft_10k] A: Based on the context provided, it seems Lyft made several investments in 20

In [19]:
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))