Sources:
- https://docs.llamaindex.ai/en/stable/examples/

PDFs:
- https://media.lonelyplanet.com/shop/pdfs/3379-Portugal_travel_guide72194.pdf
- https://guides.tripomatic.com/download/tripomatic-free-city-guide-lisbon.pdf
- https://www.lonelyplanet.com/articles/best-places-to-visit-in-portugal

In [None]:
!pip install llama-index==0.11.9 docx2txt llama-index-llms-anthropic

In [7]:
from dotenv import load_dotenv
load_dotenv()

True

In [9]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data/portugal_guides").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

In [12]:
response = query_engine.query("What is the tax value in Portugal?")  # tripomatic-free-city-guide-lisbon
print(response)  # yes, mentioned in text

The value-added tax (VAT) in Portugal ranges between 6% and 23%.


In [13]:
response = query_engine.query("Where can you find national parks?")  # Lonelyplanet best 11 places to visit in Portugal 
print(response)  # yes, mentioned in text

You can find national parks in the Parque Natural da Serra da Estrela and the Parque Nacional da Peneda-Gerês in Portugal.


In [18]:
response = query_engine.query("Does Portugal have any interesting libraries?")  # Invented facts.docx
print(response)  # yes

Portugal is home to the world's only underwater library, known as the "Biblioteca Submersa." This unique library is located off the coast of the Algarve and contains thousands of books preserved in waterproof cases. Divers can explore its shelves and borrow books by signing them out with a special underwater pen.


In [19]:
response = query_engine.query("Who is the Portugal's most recognized chess player?")  # no source
print(response)  # yes, but not based on docs

José Custodio is Portugal's most recognized chess player.


## Chatbots

From documents

Chat modes:
- `ChatMode.BEST` (default): Chat engine that uses an agent (react or openai) with a query engine tool
- `ChatMode.CONTEXT`: Chat engine that uses a retriever to get context
- `ChatMode.CONDENSE_QUESTION`: Chat engine that condenses questions
- `ChatMode.CONDENSE_PLUS_CONTEXT`: Chat engine that condenses questions and uses a retriever to get context
- `ChatMode.SIMPLE`: Simple chat engine that uses the LLM directly
- `ChatMode.REACT`: Chat engine that uses a react agent with a query engine tool
- `ChatMode.OPENAI`: Chat engine that uses an openai agent with a query engine tool
    

How does `"{context_str}"` work? This impacts hallucinatios

In [53]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4")
data = SimpleDirectoryReader(input_dir="./data/portugal_guides/").load_data()
index = VectorStoreIndex.from_documents(data)

In [54]:
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.chat_engine.types import ChatMode

memory = ChatMemoryBuffer.from_defaults(token_limit=3900)

chat_engine = index.as_chat_engine(
    chat_mode=ChatMode.CONDENSE_PLUS_CONTEXT,
    memory=memory,
    llm=llm,
    context_prompt=(
        "You are a chatbot that helps users with information about Portugal. "
        "Here are the relevant documents for the context:\n"
        "{context_str}"
        "\nInstruction: Use the previous chat history, or the context above, to interact and help the user."
    ),
    verbose=False,
)

In [None]:
chat_engine.chat(
    # "Hello! What do you know?"
    # "Where can I find the best beaches in Portugal?"
    # "What is the best time to visit Portugal?"
    "How many cities are there in Portugal?" # 159/year 2018 acc to wikipedia
)

In [None]:
%%markdown
chat_engine.chat_history
# [ChatMessage(role=<MessageRole.USER: 'user'>, content='Hello! What do you know?', additional_kwargs={}),
#  ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content="As an AI, I have access to a wide range of information. I can provide information on a variety of topics such as science, history, technology, and more. However, please note that while I strive for accuracy, I don't have the ability to independently verify information or predict the future. How can I assist you today?", additional_kwargs={}),
#  ChatMessage(role=<MessageRole.USER: 'user'>, content='Where can I find the best beaches in Portugal?', additional_kwargs={}),
#  ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content=None, additional_kwargs={'tool_calls': [ChatCompletionMessageToolCall(id='call_yyjyheMObNiebTIMYXwiiPy2', function=Function(arguments='{\n  "input": "best beaches in Portugal"\n}', name='query_engine_tool'), type='function')]}),
#  ChatMessage(role=<MessageRole.TOOL: 'tool'>, content='The Setúbal Peninsula, south of Lisbon, is known for its wild, cliff-backed beaches like Costa da Caparica. For more solitude, you can visit the Parque Natural da Arrábida at the southern end of the peninsula, which has picturesque coves and beaches such as Praia do Portinho da Arrábida. The Algarve, along the south coast, is also famed for its gorgeous and varied coastline, offering both crowded resorts and peaceful wild beaches.', additional_kwargs={'name': 'query_engine_tool', 'tool_call_id': 'call_yyjyheMObNiebTIMYXwiiPy2'}),
#  ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content="Portugal is renowned for its beautiful beaches. Here are some of the best ones:\n\n1. **Costa da Caparica**: Located on the Setúbal Peninsula, south of Lisbon, this beach is known for its wild, cliff-backed scenery.\n\n2. **Praia do Portinho da Arrábida**: This is a picturesque cove located in the Parque Natural da Arrábida at the southern end of the Setúbal Peninsula. It's a great place if you're looking for more solitude.\n\n3. **The Algarve**: This region along the south coast is famed for its gorgeous and varied coastline. It offers both crowded resorts and peaceful wild beaches.\n\nRemember to check local travel advisories and restrictions due to COVID-19 before planning your trip.", additional_kwargs={}),
#  ChatMessage(role=<MessageRole.USER: 'user'>, content='What is the best time to visit Portugal?', additional_kwargs={}),
#  ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content=None, additional_kwargs={'tool_calls': [ChatCompletionMessageToolCall(id='call_ZwNtrYRX5cne28G6QXcpHudU', function=Function(arguments='{\n  "input": "best time to visit Portugal"\n}', name='query_engine_tool'), type='function')]}),
#  ChatMessage(role=<MessageRole.TOOL: 'tool'>, content='The best time to visit Portugal depends on the region and your interests. For instance, the Algarve region is a great destination year-round, boasting about 300 days of sunshine each year. However, you might find the best prices and fewer crowds during the winter months.', additional_kwargs={'name': 'query_engine_tool', 'tool_call_id': 'call_ZwNtrYRX5cne28G6QXcpHudU'}),
#  ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='The best time to visit Portugal can depend on the region and your interests. For instance, the Algarve region, known for its beautiful beaches, is a great destination year-round, boasting about 300 days of sunshine each year. However, if you prefer less crowded places and better prices, the winter months might be a good option for you. Always remember to check the weather and any local advisories before planning your trip.', additional_kwargs={}),
#  ChatMessage(role=<MessageRole.USER: 'user'>, content='How many cities are there in Portugal?', additional_kwargs={}),
#  ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content=None, additional_kwargs={'tool_calls': [ChatCompletionMessageToolCall(id='call_WL0ECAPUTYypljvOXkP2hOLd', function=Function(arguments='{\n  "input": "number of cities in Portugal"\n}', name='query_engine_tool'), type='function')]}),
#  ChatMessage(role=<MessageRole.TOOL: 'tool'>, content='The context does not provide information on the number of cities in Portugal.', additional_kwargs={'name': 'query_engine_tool', 'tool_call_id': 'call_WL0ECAPUTYypljvOXkP2hOLd'}),
#  ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content="I'm sorry, I currently don't have the exact number of cities in Portugal. However, Portugal is divided into 18 districts, each with its own capital, and 2 autonomous regions, the Azores and Madeira. The country has several cities and towns, with Lisbon being the capital and the largest city.", additional_kwargs={})]

In [31]:
chat_engine.reset()

In [None]:
# Streaming
import time

response = chat_engine.stream_chat("What is the tax value in Portugal?")
for token in response.response_gen:
    print(token, end="")  # need to flush the buffer
    time.sleep(0.1)

# ReAct

ReAct:
- interact with external tools and environments
- The trajectories consist of multiple thought-action-observation steps.

Weakness: prompting-based methods

Observations:
- CoT suffers from fact hallucination
- ReAct's structural constraint reduces its flexibility in formulating reasoning steps
- ReAct depends a lot on the information it's retrieving; non-informative search results derails the model reasoning and leads to difficulty in recovering and reformulating thoughts

sources:
- https://www.promptingguide.ai/techniques/react#react-prompting
- https://docs.llamaindex.ai/en/stable/examples/chat_engine/chat_engine_react/
- perplexity

In [36]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.anthropic import Anthropic

#  Known models are: claude-instant-1, claude-instant-1.2, claude-2, claude-2.0, claude-2.1, claude-3-opus-20240229, claude-3-opus@20240229, claude-3-sonnet-20240229, claude-3-sonnet@20240229, claude-3-haiku-20240307, claude-3-haiku@20240307, claude-3-5-sonnet-20240620, claude-3-5-sonnet@20240620
llm = Anthropic(
    model="claude-3-5-sonnet-20240620"
)
data = SimpleDirectoryReader(input_dir="./data/portugal_guides/").load_data()
index = VectorStoreIndex.from_documents(data)

In [37]:
chat_engine = index.as_chat_engine(
    chat_mode="react", 
    llm=llm, 
    verbose=True
)

In [38]:
response = chat_engine.chat(
    # "What is the sum of entrance cost for the three most recognized museums in the capital of Portugal?"
    "What is the total population of the 3 most populous cities in Portugal? Answer in English." # Switches to Portuguese
)

> Running step e0a0fc01-4446-48f3-b539-e8f6ba8a2577. Step input: What is the total population of the 3 most populous cities in Portugal? Answer in English.
[1;3;38;5;200mThought: The current language of the user is: English. To answer this question, I need to find out the three most populous cities in Portugal and their populations. I'll use the query engine tool to get this information.
Action: query_engine_tool
Action Input: {'input': 'What are the three most populous cities in Portugal and their populations?'}
[0m[1;3;34mObservation: Based on the information provided, the three most populous cities in Portugal and their populations are:

1. Lisbon: 567,131 inhabitants
2. Sintra: 395,528 inhabitants
3. Vila Nova de Gaia: 311,223 inhabitants

These figures are from the INE 2023 Estimate for the largest municipalities in Portugal.
[0m> Running step 25068e16-e846-458c-a9c5-c568bec03f39. Step input: None
[1;3;38;5;200mThought: Now that I have the information about the three most pop

In [39]:
# source wikipedia
response = chat_engine.chat(
    "How many of the largest cities in Portugal constitute 25% of the total country population?" # ans total population 10.6M, 25%=2.650.000, n=10
    # Claude 2 failed finding total population,
    # Claude 3 failed to notice that it didn't have enough info with 50% of the population
    # Claude 3 failed with wrong Answer using no tools: According to the information provided, the top 20 largest cities in Portugal constitute 25% of the country's population. (Actually sum(top20)=4.2M)
    # Claude 3.5 failed with wrong Answer using no tools, but got the right answer with tools: Answer: It takes the 10 largest cities in Portugal to constitute approximately 25% of the total country population. These cities are Lisbon, Sintra, Vila Nova de Gaia, Porto, Cascais, Loures, Braga, Almada, Matosinhos, and Amadora. The combined population of these 10 cities slightly exceeds 25% of Portugal's total population.
)

> Running step 99f41d2e-d66f-4f19-bc90-abf18ba3f366. Step input: How many of the largest cities in Portugal constitute 25% of the total country population?
[1;3;38;5;200mThought: First, I need to find out the total population of Portugal.
Action: query_engine_tool
Action Input: {'input': 'What is the total population of Portugal according to the latest estimates?'}
[0m[1;3;34mObservation: According to the latest estimates from 2023, Portugal has a total population of 10,639,726 inhabitants.
[0m> Running step d68a007a-413f-4a5b-a8a4-c05e16414ff6. Step input: None
[1;3;38;5;200mThought: Now that I have the total population of Portugal, I need to calculate 25% of this number and then find out how many of the largest cities it takes to reach this number.
Action: query_engine_tool
Action Input: {'input': 'List the largest cities in Portugal with their populations, in descending order of population size.'}
[0m[1;3;34mObservation: Here's a list of the largest cities in Portugal with th