# Compare Documents
# SubQuestionQueryEngine

In [None]:
!pip install llama-index pypdf

Collecting pypdf
  Downloading pypdf-3.15.3-py3-none-any.whl (271 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m271.9/271.9 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pypdf
Successfully installed pypdf-3.15.3


In [None]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().handlers = []
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

import openai
openai.api_key = 'YOUR_OPENAI_API_KEY'

In [None]:
from llama_index import SimpleDirectoryReader, ServiceContext, VectorStoreIndex
from llama_index import set_global_service_context
from llama_index.response.pprint_utils import pprint_response
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.query_engine import SubQuestionQueryEngine

# Load uber and lyft documents

In [None]:
lyft_docs = SimpleDirectoryReader(input_files=["lyft_2021.pdf"]).load_data()
uber_docs = SimpleDirectoryReader(input_files=["uber_2021.pdf"]).load_data()

In [None]:
print(f'Loaded lyft 10-K with {len(lyft_docs)} pages')
print(f'Loaded Uber 10-K with {len(uber_docs)} pages')

Loaded lyft 10-K with 238 pages
Loaded Uber 10-K with 307 pages


# Build indices

In [None]:
lyft_index = VectorStoreIndex.from_documents(lyft_docs)
uber_index = VectorStoreIndex.from_documents(uber_docs)

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


# Basic QA

In [None]:
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)


In [None]:
uber_engine = uber_index.as_query_engine(similarity_top_k=3)


In [None]:
response = await lyft_engine.aquery('What is the revenue of Lyft in 2021? Answer in millions with page reference')


In [None]:
print(response)

The revenue of Lyft in 2021 was $3,208.3 million. (Page reference: 63)


In [None]:
response = await uber_engine.aquery('What is the revenue of Uber in 2021? Answer in millions, with page reference')


In [None]:
print(response)

The revenue of Uber in 2021 was $17,455 million. (Page reference: 98)


# For comparing between uber and lyft

In [None]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(name='lyft_10k', description='Provides information about Lyft financials for year 2021')
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(name='uber_10k', description='Provides information about Uber financials for year 2021')
    ),
]

s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)

In [None]:
response = await s_engine.aquery('Compare and contrast the customer segments and geographies that grew the fastest')

Generated 4 sub questions.
[36;1m[1;3m[lyft_10k] Q: What were the customer segments that grew the fastest for Lyft in 2021?
[0m[33;1m[1;3m[uber_10k] Q: What were the customer segments that grew the fastest for Uber in 2021?
[0m[38;5;200m[1;3m[lyft_10k] Q: Which geographies experienced the fastest growth for Lyft in 2021?
[0m[32;1m[1;3m[uber_10k] Q: Which geographies experienced the fastest growth for Uber in 2021?
[0m[33;1m[1;3m[uber_10k] A: The customer segments that grew the fastest for Uber in 2021 were the Mobility and Delivery segments.
[0m[38;5;200m[1;3m[lyft_10k] A: Lyft's 2021 report does not provide specific information about the geographies that experienced the fastest growth for the company.
[0m[36;1m[1;3m[lyft_10k] A: Lyft experienced growth in the number of Active Riders in 2021 compared to the previous year. However, the context does not provide specific information about the customer segments that grew the fastest for Lyft in 2021.
[0m[32;1m[1;3m[u

In [None]:
print(response)

The customer segments that grew the fastest for Lyft in 2021 are not specified in the given context. However, for Uber, the customer segments that grew the fastest in 2021 were the Mobility and Delivery segments. In terms of geographies, Lyft's 2021 report does not provide specific information about the geographies that experienced the fastest growth. On the other hand, Uber experienced the fastest growth in its Mobility Gross Bookings in five metropolitan areas in 2021, namely Chicago, Miami, New York City in the United States, Sao Paulo in Brazil, and London in the United Kingdom.


In [None]:
response = await s_engine.aquery('Compare revenue growth of Uber and Lyft from 2020 to 2021')

Generated 4 sub questions.
[36;1m[1;3m[uber_10k] Q: What was the revenue of Uber in 2020?
[0m[33;1m[1;3m[uber_10k] Q: What was the revenue of Uber in 2021?
[0m[38;5;200m[1;3m[lyft_10k] Q: What was the revenue of Lyft in 2020?
[0m[32;1m[1;3m[lyft_10k] Q: What was the revenue of Lyft in 2021?
[0m[38;5;200m[1;3m[lyft_10k] A: The revenue of Lyft in 2020 was $2,364,681.
[0m[32;1m[1;3m[lyft_10k] A: The revenue of Lyft in 2021 was $3,208,323,000.
[0m[33;1m[1;3m[uber_10k] A: The revenue of Uber in 2021 was $17.5 billion.
[0m[36;1m[1;3m[uber_10k] A: The revenue of Uber in 2020 was $11,139 million.
[0m

In [None]:
print(response)

The revenue growth of Uber from 2020 to 2021 was significantly higher compared to Lyft.
