# Query Optimization
Compare on Raw Query, Sub-Query, and HyDE Query
* Sub-Query could possible to fetch more information, especially for Comparison and Summary Task. 😄
* HyDE Query will miss stories which are completely beyond LLM imagination. 😅



In [1]:
import nest_asyncio

nest_asyncio.apply()

In [2]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.core import Settings
# Using the LlamaDebugHandler to print the trace of the sub questions
# captured by the SUB_QUESTION callback event type
llama_debug = LlamaDebugHandler(print_trace_on_end=False)
callback_manager = CallbackManager([llama_debug])

Settings.callback_manager = callback_manager

from azureresource import (
    get_llm,
    get_embed_model,
    get_vector_store
)
from index import build_index, get_index
llm = get_llm("gpt-35-turbo", "gpt-35-turbo-1106")
embed_model = get_embed_model("text-embedding-ada-002", "text-embedding-ada-002")
vector_store = get_vector_store("chunk-512")

Settings.llm = llm
Settings.embed_model = embed_model
index = get_index(vector_store, llm, embed_model)

In [3]:
# setup base query engine as tool
query_engine_tools = [
    QueryEngineTool(
        query_engine=index.as_query_engine(),
        metadata=ToolMetadata(
            name="SpainTravelEssay",
            description="本书记述了作者在西班牙旅行的经历。面对西班牙既充满异域风情而又绚丽多姿的历史、艺术、人物和宫殿、教堂、城堡等文化精华，作者在描述其无与伦比的人文及艺术价值外，更把上千年来发生在这块土地上的故事，糅进漫游的行程，使帝王将相、战火烽烟、山川景物，尤其是它走向现代国家的进程贯穿于全书，不但能让读者领略西班牙迷人的风貌，更能深入西班牙幽深的历史，洞悉它深刻而富于启示的社会演化过程。",
        ),
    ),
]

sub_query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)

In [4]:
from testcase import question_list, bcolors
for q in question_list:
    print(f"{bcolors.BOLD}{bcolors.HEADER}Q: {q}{bcolors.ENDC}")
    results = sub_query_engine.query(q)
    print("response", results.response)
    for node in results.source_nodes:
        print(node)
    

[1m[95mQ: 在巴塞罗那可以体验哪些有趣的事情[0m
Generated 2 sub questions.
[1;3;38;2;237;90;200m[SpainTravelEssay] Q: What are the historical landmarks in Barcelona?
[0m[1;3;38;2;90;149;237m[SpainTravelEssay] Q: What are the cultural attractions in Barcelona?
[0m[1;3;38;2;90;149;237m[SpainTravelEssay] A: The cultural attractions in Barcelona include the Montjuic mountain, which hosted the 1929 World Expo, offering panoramic views of the city and the Mediterranean Sea. Additionally, the coastal town of Sitges, known for its nine beautiful beaches and a 17th-century church, provides a charming old town with narrow stone streets and traditional architecture.
[0m[1;3;38;2;237;90;200m[SpainTravelEssay] A: The historical landmarks in Barcelona include Montjuic Mountain, which hosted the 1929 World Exposition, offering panoramic views of the city and the Mediterranean Sea. Another notable landmark is the monument to Cervantes in the expanded new city area.
[0mresponse 您可以在巴塞罗那体验登上蒙特惠奇山，欣赏城市和地中海的全景，还

# DEFAULT_OPENAI_SUB_QUESTION_PROMPT_TMPL

You are a world class state of the art agent.

You have access to multiple tools, each representing a different data source or API.
Each of the tools has a name and a description, formatted as a JSON dictionary.
The keys of the dictionary are the names of the tools and the values are the \
descriptions.
Your purpose is to help answer a complex user question by generating a list of sub \
questions that can be answered by the tools.

These are the guidelines you consider when completing your task:
* Be as specific as possible
* The sub questions should be relevant to the user question
* The sub questions should be answerable by the tools provided
* You can generate multiple sub questions for each tool
* Tools must be specified by their name, not their description
* You don't need to use a tool if you don't think it's relevant

Output the list of sub questions by calling the SubQuestionList function.

## Tools
```json
{tools_str}
```

## User Question
{query_str}


In [5]:
from llama_index.core.indices.query.query_transform import HyDEQueryTransform
from llama_index.core.query_engine import TransformQueryEngine
hyde = HyDEQueryTransform(include_original=True)
hyde_query_engine = TransformQueryEngine(index.as_query_engine(), hyde)

In [11]:
for q in question_list:
    print(f"{bcolors.BOLD}{bcolors.HEADER}Q: {q}{bcolors.ENDC}")
    response = hyde_query_engine.query(q)
    query_bundle = hyde(q)
    hyde_doc = query_bundle.embedding_strs[0]
    print(f"{bcolors.BOLD}Hypothetical Doc{bcolors.ENDC}")
    print(hyde_doc)
    print(f"{bcolors.BOLD}Result{bcolors.ENDC}")
    print(response)
    for node in response.source_nodes:
        print(node)


[1m[95mQ: 在巴塞罗那可以体验哪些有趣的事情[0m
[1mHypothetical Doc[0m
在巴塞罗那，你可以体验许多有趣的事情。你可以参观圣家堂，这是安东尼奥·高迪的杰作，是一座令人惊叹的哥特式建筑。此外，你还可以漫步在兰布拉斯大道上，欣赏街头艺人的表演和购物。如果你喜欢艺术，那么高迪的其他作品，如巴特罗之家和米拉之家也是值得一游的地方。此外，你还可以品尝正宗的西班牙美食，如帕埃利亚和塔帕斯。最后，不要忘记在巴塞罗那海滩上放松身心，享受地中海的阳光和海风。总之，在巴塞罗那有太多有趣的事情等待着你去体验。
[1mResult[0m
You can experience the local dining culture by trying traditional Spanish dishes, such as bread with garlic, olive oil, and tomatoes. Exploring the old city area, including the Gothic Quarter near the cathedral, can provide a glimpse into the historical and cultural aspects of Barcelona. Additionally, staying in a youth hostel can offer a budget-friendly and communal travel experience.
Node ID: 94196ab4-8b43-4a21-b16a-0fcb10d5288c
Text: 旅人们到西班牙，首都马德里当是首选，这是西班牙在地理上和政治上的中心，历史积淀深厚，周边一圈都是历史古城。可是，很多旅人到西班牙
的第一站，选的不是马德里而是巴塞罗那。巴塞罗那也是古城，它的优势是更接近欧洲。从法国过来，翻过比利牛斯山就是它了。而且它靠海，从这里南下，一
路是地中海的蓝色海水和洁白的沙滩。  巴塞罗那的朋友曾带我们去了附近的一个滨海小城西格斯（Sitges）。记得那天早上，我们在巴塞罗那市区游
览，主人说要带我们去当地的饭店吃午饭。在我的习惯中，十二点半吃午饭已算是晚的，可是这次将近两点还没有动静。我饿得开始抓着什么都吃。在我们认