# Multi-Step Query Engine

We have a multi-step query engine that's able to decompose a complex query into sequential subquestions. This
guide walks you through how to set it up!

#### Load documents, build the VectorStoreIndex

In [1]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    LLMPredictor,
    ServiceContext,
)
from llama_index.llms import OpenAI
from IPython.display import Markdown, display

In [3]:
import openai
openai.api_key = "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"

# LLM Predictor (gpt-3)
gpt3 = OpenAI(temperature=0, model="text-davinci-003")
service_context_gpt3 = ServiceContext.from_defaults(llm=gpt3)

# LLMPredictor (gpt-4)
gpt4 = OpenAI(temperature=0, model="gpt-4")
service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4)

In [5]:
# load documents
documents = SimpleDirectoryReader("../data_llama/paul_graham").load_data()

In [6]:
index = VectorStoreIndex.from_documents(documents)

#### Query Index

In [7]:
from llama_index.indices.query.query_transform.base import StepDecomposeQueryTransform
from llama_index import LLMPredictor

# gpt-4
step_decompose_transform = StepDecomposeQueryTransform(
    LLMPredictor(llm=gpt4), verbose=True
)

# gpt-3
step_decompose_transform_gpt3 = StepDecomposeQueryTransform(
    LLMPredictor(llm=gpt3), verbose=True
)

In [8]:
index_summary = "Used to answer questions about the author"

In [9]:
# set Logging to DEBUG for more detailed outputs
from llama_index.query_engine.multistep_query_engine import MultiStepQueryEngine

query_engine = index.as_query_engine(service_context=service_context_gpt4)
query_engine = MultiStepQueryEngine(
    query_engine=query_engine,
    query_transform=step_decompose_transform,
    index_summary=index_summary,
)
response_gpt4 = query_engine.query(
    "Who was in the first batch of the accelerator program the author started?",
)

[33;1m[1;3m> Current query: Who was in the first batch of the accelerator program the author started?
[0m[38;5;200m[1;3m> New query: Who is the author of the accelerator program?
[0m[33;1m[1;3m> Current query: Who was in the first batch of the accelerator program the author started?
[0m[38;5;200m[1;3m> New query: What is the name of the accelerator program the author started?
[0m[33;1m[1;3m> Current query: Who was in the first batch of the accelerator program the author started?
[0m[38;5;200m[1;3m> New query: Who was in the first batch of the Y Combinator program?
[0m

In [10]:
display(Markdown(f"<b>{response_gpt4}</b>"))

<b>The first batch of the accelerator program started by the author included the founders of Reddit, Justin Kan and Emmett Shear who later founded Twitch, Aaron Swartz who had already helped write the RSS spec and would later become a martyr for open access, and Sam Altman, who would later become the second president of Y Combinator.</b>

In [14]:
sub_qa = response_gpt4.metadata["sub_qa"]
tuples = [(t[0], t[1].response) for t in sub_qa]
print(tuples)

[('Who is the author who founded Viaweb?', 'The author who founded Viaweb is Paul Graham.'), ('In which city did Paul Graham found his first company, Viaweb?', 'The context does not provide information on the city where Paul Graham founded his first company, Viaweb.')]


In [12]:
response_gpt4 = query_engine.query(
    "In which city did the author found his first company, Viaweb?",
)

[33;1m[1;3m> Current query: In which city did the author found his first company, Viaweb?
[0m[38;5;200m[1;3m> New query: Who is the author who founded Viaweb?
[0m[33;1m[1;3m> Current query: In which city did the author found his first company, Viaweb?
[0m[38;5;200m[1;3m> New query: In which city did Paul Graham found his first company, Viaweb?
[0m[33;1m[1;3m> Current query: In which city did the author found his first company, Viaweb?
[0m[38;5;200m[1;3m> New query: None
[0m

In [13]:
print(response_gpt4)

The context does not provide information on the city where the author founded his first company, Viaweb.


In [18]:
query_engine = index.as_query_engine(service_context=service_context_gpt3)
query_engine = MultiStepQueryEngine(
    query_engine=query_engine,
    query_transform=step_decompose_transform_gpt3,
    index_summary=index_summary,
)

response_gpt3 = query_engine.query(
    "In which city did the author found his first company, Viaweb?",
)

[33;1m[1;3m> Current query: In which city did the author found his first company, Viaweb?
[0m[38;5;200m[1;3m> New query:  Where was the author located when he founded his first company, Viaweb?
[0m[33;1m[1;3m> Current query: In which city did the author found his first company, Viaweb?
[0m[38;5;200m[1;3m> New query:  Was the author located in Cambridge, Massachusetts when he founded his first company, Viaweb?
[0m[33;1m[1;3m> Current query: In which city did the author found his first company, Viaweb?
[0m[38;5;200m[1;3m> New query:  In which city was the author located when he founded his first company, Viaweb?
[0m

In [19]:
print(response_gpt3)

The author founded his first company, Viaweb, in Cambridge, Massachusetts.


In [20]:
sub_qa = response_gpt3.metadata["sub_qa"]
tuples = [(t[0], t[1].response) for t in sub_qa]
print(tuples)

[(' Where was the author located when he founded his first company, Viaweb?', ' The author was located in Cambridge, Massachusetts when he founded his first company, Viaweb.'), (' Was the author located in Cambridge, Massachusetts when he founded his first company, Viaweb?', ' No, the author was located in New York when he founded his first company, Viaweb.'), (' In which city was the author located when he founded his first company, Viaweb?', ' The author was located in Cambridge, Massachusetts when he founded his first company, Viaweb.')]
