# Document retrival with custom embedding and agent 

In the realm of document retrival development, vector embedding and agent play pivotal roles in capturing the essence of textual information as well as obtaining more timely information for better accuracy. At its core, <b>vector embedding</b> refers to the process of representing words, sentences, or even entire documents as dense, low-dimensional vectors in a mathematical space. Unlike traditional methods that rely on sparse representations like one-hot encoding, vector embeddings encapsulate the semantic relationships between words and enable algorithms to comprehend their contextual meaning. The mainidea of <b>agents</b> is to use an LLM to choose a sequence of actions to take. In chains, a sequence of actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.

![image](./images/demo_paloalto.png)


<b>Vector embeddings</b> hold immense importance in the realm of large language model (LLM) applications. LLMs, such as GPT-3, BERT, or Transformer-based models, have gained significant attention and popularity due to their remarkable ability to generate coherent and contextually appropriate responses.

The success of LLMs hinges on their understanding of the semantic intricacies of natural language. This is where vector embeddings come into play. By utilizing vector embeddings, LLMs can leverage the rich semantic information embedded within textual data, enabling them to generate more sophisticated and context-aware responses.

Vector embeddings serve as a bridge between the raw textual input and the language model’s neural network. Instead of feeding the model with discrete words or characters, the embeddings provide a continuous representation that captures the meaning and context of the input. This allows LLMs to operate at a higher level of language understanding and produce more coherent and contextually appropriate outputs.

The importance of vector embeddings for LLMs extends beyond language generation. These embeddings also facilitate a range of downstream tasks, such as sentiment analysis, named entity recognition, text classification, and more. By incorporating pre-trained vector embeddings, LLMs can leverage the knowledge captured during the embedding training process, leading to improved performance on these tasks.

Moreover, vector embeddings enable transfer learning and fine-tuning in LLMs. Pre-trained embeddings can be shared across different models or even different domains, providing a starting point for training models on specific tasks or datasets. This transfer of knowledge allows for faster training, improved generalization, and better performance on specialized tasks.

Meanwhile <b>agents</b> are responsible for deciding what step to take next. This is powered by a language model and a prompt. This prompt can include things like:

The personality of the agent (useful for having it respond in a certain way)
Background context for the agent (useful for giving it more context on the types of tasks it's being asked to do)
Prompting strategies to invoke better reasoning (the most famous/widely used being ReAct)


https://huggingface.co/spaces/mteb/leaderboard

### Prepare environment

In [23]:
#!pip install pydantic>=1.10.11 --upgrade
#!pip install llama-index chromadb --upgrade
#!pip install sentence-transformers --upgrade
#!pip install unstructured
#!pip install google-search-results
#!pip install replicate
#!pip install git+https://github.com/UKPLab/sentence-transformers.git
#!pip install git+https://github.com/Muennighoff/sentence-transformers.git@sgpt_poolings_specb
#!pip install --upgrade git+https://github.com/UKPLab/sentence-transformers.git
#!pip install -U sentence-transformers
#!pip install cryptography --upgrade

In [1]:
import logging
import sys
import os
import chromadb
import boto3
import json
from botocore.config import Config
#import openai
from llama_index import SimpleDirectoryReader, LLMPredictor, ServiceContext, StorageContext, LangchainEmbedding, VectorStoreIndex
from llama_index import GPTVectorStoreIndex
#from langchain.chat_models import ChatOpenAI
#from langchain.embeddings import HuggingFaceEmbeddings
#from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms.bedrock import Bedrock
#from llama_index import ResponseSynthesizer

from llama_index.vector_stores import ChromaVectorStore
from llama_index.storage.storage_context import StorageContext
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings import LangchainEmbedding
from IPython.display import Markdown, display

Add Bedrock API

In [2]:
# BedrockAPI setup
def parse_credentials(file_path):
    credentials = {}
    with open(file_path, 'r') as file:
        current_user = None
        for line in file:
            line = line.strip()
            if line.startswith('[') and line.endswith(']'):
                current_user = line[1:-1]
                credentials[current_user] = {}
            elif '=' in line and current_user is not None:
                key, value = line.split('=', 1)
                credentials[current_user][key] = value
    return credentials

def get_key_from_credential_file(user, key_name, credential_file_path):
    credentials = parse_credentials(credential_file_path)

    if user in credentials:
        user_credentials = credentials[user]
        if key_name in user_credentials:
            return user_credentials[key_name]
        else:
            raise KeyError(f"'{key_name}' not found for user '{user}'.")
    else:
        raise KeyError(f"User '{user}' not found in the credential file.")
        
aws_access_key_id = get_key_from_credential_file('default', 'aws_access_key_id', '/home/alfred/.aws/credentials')
aws_secret_access_key = get_key_from_credential_file('default', 'aws_secret_access_key', '/home/alfred/.aws/credentials')

config = Config(
   read_timeout=80,
   retries={
       'max_attempts': 3
   }
)
bedrock = boto3.client(service_name='bedrock',region_name='us-east-1',endpoint_url='https://bedrock.us-east-1.amazonaws.com', config=config,
                       aws_access_key_id=aws_access_key_id, 
                       aws_secret_access_key=aws_secret_access_key)
bedrock.list_foundation_models()

{'ResponseMetadata': {'RequestId': 'e4fdecac-2cd3-4e65-b295-34dfadff1955',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Tue, 26 Sep 2023 20:39:24 GMT',
   'content-type': 'application/json',
   'content-length': '3596',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'e4fdecac-2cd3-4e65-b295-34dfadff1955'},
  'RetryAttempts': 0},
 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-tg1-large',
   'modelId': 'amazon.titan-tg1-large'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-e1t-medium',
   'modelId': 'amazon.titan-e1t-medium'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-g1-text-02',
   'modelId': 'amazon.titan-embed-g1-text-02'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/stability.stable-diffusion-xl',
   'modelId': 'stability.stable-diffusion-xl'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/ai21.j2-grande-instruct',
   'modelId': 'ai

## Get sample documents

Pull open text documents from wiki about about the city demographic data up to 2020.

In [3]:
from pathlib import Path
import requests

def get_wiki(wiki_titles, file_path='./data'):
    for title in wiki_titles:
        response = requests.get(
            'https://en.wikipedia.org/w/api.php',
            params={
                'action': 'query',
                'format': 'json',
                'titles': title,
                'prop': 'extracts',
                # 'exintro': True,
                'explaintext': True,
            }
        ).json()
        page = next(iter(response['query']['pages'].values()))
        try:
            wiki_text = page['extract']
        except Exception:
            pass

        data_path = Path(file_path)
        if not data_path.exists():
            Path.mkdir(data_path)

        with open(data_path / f"{title}.txt", 'w', encoding="utf-8") as fp:
            fp.write(wiki_text)
    return True
wiki_titles = ["Toronto", "Seattle", "Chicago", "Boston", "Houston", "San Francisco"]
#wiki_titles = ["Antony Blinken", "Dominic Raab", "Sergey Lavrov", "Jean-Yves Le Drian", "Subrahmanyam Jaishankar", "Motegi Toshimitsu", "Heiko Maas"]
get_wiki(wiki_titles, file_path='./data')

True

## Sentence Embedding

Sentence embedding refers to a numeric representation of a sentence in the form of a vector of real numbers which encodes meaningful semantic information of the entire sentences.

<b>Instructor</b> by HKU is an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning. It's a opensource posted on Hugging Face Hub which a top runner on Massive Text Embedding Benchmark [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard)!


In [4]:
## Choose a embedding mode
#embedding_model_id = "sentence-transformers/all-mpnet-base-v2"
embedding_model_path = "/home/alfred/models/instructor-large"

# create client and a new collection
chroma_client = chromadb.EphemeralClient()
#chroma_client = chromadb.Client()
chroma_client = chromadb.PersistentClient(path="./vectordb")
chroma_collection = chroma_client.create_collection("citydata_04")

# define embedding function
embed_model = LangchainEmbedding(
    HuggingFaceEmbeddings(model_name=embedding_model_path)
)

  from .autonotebook import tqdm as notebook_tqdm


[2023-09-26 20:39:56,490] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)


Loading the tokenizer from the `special_tokens_map.json` and the `added_tokens.json` will be removed in `transformers 5`,  it is kept for forward compatibility, but it is recommended to update your `tokenizer_config.json` by uploading it again. You will see the new `added_tokens_decoder` attribute that will store the relevant information.


In [5]:
type(embed_model)

llama_index.embeddings.langchain.LangchainEmbedding

## LlamaIndex

LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. It provides 3 key features to augment your LLM applications with data.

* <b>Data Ingestion:</b> Connect your existing data sources and data formats (API's, PDF's, documents, SQL, etc.) to use with a large language model application.
* <b>Data Indexing:</b> Store and index your data for different use cases. Integrate with downstream vector store and database providers.
* <b>Query Interface:</b> LlamaIndex provides a query interface that accepts any input prompt over your data and returns a knowledge-augmented response.

### Creating a Chroma Index

In [5]:
# Delete a chroma
doc_to_update = chroma_collection.get(limit=20)
print(len(doc_to_update))

4


In [10]:
# Read all the documents. I only use New York and Houston for comparison.
from llama_index.node_parser import SimpleNodeParser
import shutil
parser = SimpleNodeParser()

if os.path.exists('./data/.ipynb_checkpoints'):
    shutil.rmtree("./data/.ipynb_checkpoints")
docs = os.listdir('./data')
#docs= ['New York City.txt','Houston.txt']
all_docs = {}
for d in docs:
    doc = SimpleDirectoryReader(input_files=[f"./data/{d}"]).load_data()
    nodes = parser.get_nodes_from_documents(doc)
    doc_id = d.replace(" ","_")
    doc[0].doc_id = d
    ## this can be used for metadata filtering if need
    extra_info = {"id":d}
    doc[0].extra_info = extra_info
    all_docs[d] = doc

Create the index. This will create a mighty GPTVectorStoreIndex. You can try with other indexes if you want. Again, I have written a very comprehensive article on what other indexes do. You can find it here

In [7]:
# set up ChromaVectorStore and load in data
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

## init storage context and service context
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(embed_model=embed_model)

In [8]:
import torch
# flush cuda cache memory
torch.cuda.empty_cache()

In [9]:
index_existed = False
for d in all_docs.keys():
    print(f"Creating/Updating index {d}")
    if index_existed:
        ## update index
        print(f"Updating index: {d}")
        # index_node.insert_nodes(all_nodes[d])
        index.insert(all_docs[d][0])
    else:  
        print(f"Creating new index: {d}")
        index = GPTVectorStoreIndex.from_documents(
                            all_docs[d],
                            service_context=service_context, 
                            storage_context=storage_context
        )
        index_existed = True

Creating/Updating index Toronto.txt
Creating new index: Toronto.txt
Creating/Updating index Seattle.txt
Updating index: Seattle.txt
Creating/Updating index Chicago.txt
Updating index: Chicago.txt
Creating/Updating index Boston.txt
Updating index: Boston.txt
Creating/Updating index Houston.txt
Updating index: Houston.txt
Creating/Updating index San Francisco.txt
Updating index: San Francisco.txt


In [15]:
# Now, let’s experiment with a basic queries
index.as_query_engine().query("Which city has a larger population between Houston and Chicago?").response

'\nHouston has a larger population than Chicago.'

In [16]:
# Now, let’s experiment with a subjective query
index.as_query_engine().query("How does the male to female ratio vary across different age brackets in San Francisco?").response

"\nThe male to female ratio in San Francisco varies across different age brackets. According to the 2020 census, the city's population is composed of 50.2% females and 49.8% males. The age bracket with the highest male to female ratio is the 65 and over age group, with males making up 54.2% of the population and females making up 45.8%. The age bracket with the lowest male to female ratio is the 0-17 age group, with males making up 48.7% of the population and females making up 51.3%."

Too easy, now let’s do something harder. I will ask the question of comparing the population between these two cities. We expect to have a result something like New York City has a large population compared to Houston.

In [20]:
index.as_query_engine().query("Compare the city population of Seattle and San Francisco. What trends can be observed in both cities' household income changes over the past two decades?").response

'\nIn San Francisco, the median household income increased from $65,519 in 2007 to $81,136 in 2020. This is an increase of 24%. In Seattle, the median household income increased from an unknown amount in 1990 to $65,519 in 2007. This is an increase of an unknown amount. Both cities have seen an increase in median household income over the past two decades, with San Francisco seeing a larger increase than Seattle.'

Not bad at all. However it does not produce knowledge-augmented response.

Well this asnwer is OK but a little suboptimal, since it does not produce knowledge-augmented response.. So the question if can we do better?

Retrievers are responsible for fetching the most relevant context given a user query (or chat message). While query engine is a generic interface that allows you to ask question over your data. A query engine takes in a natural language query, and returns a rich response. It is most often (but not always) built on one or many Indices via Retrievers. You can compose multiple query engines to achieve more advanced capability.

### Retriever and query engine

Retrievers are responsible for fetching the most relevant context given a user query (or chat message). While query engine is a generic interface that allows you to ask question over your data. A query engine takes in a natural language query, and returns a rich response. It is most often (but not always) built on one or many Indices via Retrievers. You can compose multiple query engines to achieve more advanced capability.

In [19]:
# configure retriever
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.response_synthesizers import get_response_synthesizer

# this will simple do the vector search and return the top 2 similarity
# with the question being asked.
retriever = VectorIndexRetriever(
    index=index, 
    similarity_top_k=2,
)

# configure response synthesizer
#response_synthesizer = ResponseSynthesizer.from_args(verbose=True)
response_synthesizer = get_response_synthesizer(response_mode='compact')

## if you nee to pass response mode
# response_synthesizer = ResponseSynthesizer.from_args(
#    response_mode='tree_summarize',
#    verbose=True)

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

# query
query_engine.query("Compare the city population of Seattle and San Francisco. What trends can be observed in both cities' population growth over the past two decades?").response

'\nIn the past two decades, Seattle has experienced a 77% increase in its downtown population, while San Francisco has experienced a 45% increase in its foreign-born population. Seattle has also seen a notable increase in the number of adults living alone, with 40.8% of city residents identifying as single-person households in 2000. San Francisco has seen a decrease in the number of children living in the city, with the dog population exceeding the child population in 2018. Both cities have seen an increase in the number of same-sex households, with Seattle having the highest percentage of same-sex households in the United States at 2.6 percent and San Francisco having the highest estimated percentage of gay and lesbian individuals of any of the 50 largest U.S. cities at 15%.'

Not bad. Now let's try to do some basic math using LlamaIndex's query engine

In [21]:
# query
query_engine.query("""What is the exact percentage population difference between Seattle and San Francisco in 2020?""").response

'\nIt is not possible to answer this question with the given context information.'

As expected, LlamaIndex failed on basic math. Now what?

## Langchain -- agent


Unlike LlamaIndex, which is solely focused on LLM applications for documents, Langchain offers a plethora of capabilities. It can assist you in developing various functionalities such as internet search, result consolidation, API invocation, mathematical computations, even complex mathematical operations, and a whole host of other possibilities. we will use the following component of Langchain

* Vector Storage ( LLM Database ): similar to LlamaIndex vector storage
* Langchain’s Agent: this is what made Langchain popular
* Langchain’s chain: RetrievalQA is made for question answering only.
* Langchain’s chain: LLMMathChain is used when you need to answer questions about math.

In [6]:
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import HuggingFaceEmbeddings
#from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings import BedrockEmbeddings
from langchain.vectorstores import Chroma
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.tools import BaseTool
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter

In [7]:
# Initiate a vector store
embed_model = HuggingFaceEmbeddings(model_name=embedding_model_path)
bedrock_embed_model = BedrockEmbeddings(client=bedrock, model_id="amazon.titan-e1t-medium")
vectorstore = Chroma("langchain_store", embed_model)

In [8]:
# Load the documents and add them to the vector store
text_splitter = CharacterTextSplitter(chunk_size=4096, chunk_overlap=100)
#text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=0, separators=[" ", ",", "\n"])

In [11]:
from langchain.document_loaders import UnstructuredFileLoader

#docs= ['Seattle.txt','Houston.txt', "Chicago.txt"]
all_docs = []
for d in docs:
    print(f"#### Loading data: {d}")
    doc = UnstructuredFileLoader(f"./data/{d}",  strategy="hi_res").load()
    doc = text_splitter.split_documents(doc)
    all_docs.extend(doc)

## add to vector store
vectorstore.add_documents(all_docs)

#### Loading data: Toronto.txt
#### Loading data: Seattle.txt
#### Loading data: Chicago.txt
#### Loading data: Boston.txt
#### Loading data: Houston.txt
#### Loading data: San Francisco.txt


OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 22.19 GiB total capacity; 1.19 GiB already allocated; 4.50 MiB free; 1.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

#### Try a few open source models

1) Llama v2 local ( GGML)
```
        llama_v2_ggml = LlamaCpp(
            model_path="/home/alfred/models/Llama-2-13B-chat-GGML/llama-2-13b-chat.ggmlv3.q4_1.bin",
            n_ctx=6000,
            n_gpu_layers=256, #512
            n_batch=1024, #30
            n_threads=16,
            callback_manager=callback_manager,
            temperature = 0.9,
            max_tokens = 4095,
            n_parts=1,
        )
```

3) Llama v2 local:
```
       llm_llama_v2 = HuggingFacePipeline.from_model_id(
        model_id="meta-llama/Llama-2-7b-chat-hf",
        task="text-generation",
        model_kwargs={"temperature": 0.1, "max_length": 512},)

```
4) OpenAI: ChatOpenAI(temperature=0.01,model_name='gpt-4')
5) <b>Bedrock</b>

In [12]:
# Adding openai and Bedrock LLMs
from langchain.chains import RetrievalQA
from langchain.llms import Bedrock
llm_g = Bedrock(model_id="anthropic.claude-v2", client=bedrock, model_kwargs={"temperature":0.1, "max_tokens_to_sample":512, "top_k":250,"top_p":0.75,"stop_sequences":[]})  

#### Create the question-answering chain using Standard retrival from vectorDB

In [13]:
qa = RetrievalQA.from_chain_type(llm=llm_g,
                                 chain_type="stuff", 
                                 retriever=vectorstore.as_retriever())
query_string_0 = "which city has a larger population between Seattle and San Francisco in 2020? By what exact percetage difference?"
result = qa({"query": query_string_0})
result

OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 22.19 GiB total capacity; 1.19 GiB already allocated; 4.50 MiB free; 1.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Aided by LLM, langchain <b>can</b> do basic math. For more complex computations, try LLMMathChain.

In [30]:
# Add math function
from langchain import LLMMathChain
llm_math = LLMMathChain.from_llm(llm=llm_g, verbose=True)

In [31]:
# Add to_do and search fundtion
from langchain import SerpAPIWrapper, LLMChain
from langchain.prompts import PromptTemplate
from langchain.tools import DuckDuckGoSearchResults

todo_prompt = PromptTemplate.from_template(
    "You are a city planning expert. Using demographic data, provide a comprehensive analysis to determine the population statistics for a city."
)
todo_chain = LLMChain(llm=llm_g, prompt=todo_prompt)
#search = SerpAPIWrapper(serpapi_api_key=os.environ.get('serp_api_token'))
search_duck = DuckDuckGoSearchResults()

In [32]:
from langchain.memory import ConversationBufferWindowMemory

def _handle_error(error) -> str:
    return str(error)[:50]
    
tools = [
    Tool(
        name="Query knowledge",
        func=qa.run,
        description="useful for when you need to answer questions based on the data stored in the vectorstore"
    ),
    Tool(
        name="Do the math",
        func=llm_math.run,
        description="Useful for when you need to do math in order to get to the right answers."
    )
]
# Buffer conversations in memeory.
memory = ConversationBufferWindowMemory(
    memory_key="chat_history", k=5, return_messages=True, output_key="output"
)

# Define agent
agent = initialize_agent(tools, llm=llm_g, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True, max_iterations=3, max_iterations_per_tool=1, handle_parsing_errors=_handle_error, memory=memory)

# Prompt

prompt = """
Act like an experienced city planner.
You have to analyze data of a city demographic data to answer this question with best effort.

Since 2010, what has been the average yearly population growth rate for San Francisco?"""
# Run the agent
agent.run(prompt)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m To calculate the average yearly population growth rate, I first need to know the population numbers for San Francisco over the relevant time period. I can query the knowledge base to get this information.
Action: Query knowledge
Action Input: What was the population of San Francisco in 2010 and 2020?[0m
Observation: [36;1m[1;3m According to the passage, the population of San Francisco in 2010 was 805,235 and in 2020 it was 873,965. Specifically, the passage states "In 2020, San Francisco had a population of 873,965, an increase of nearly 70,000 residents from the 2010 census."[0m
Thought:[32;1m[1;3m Now that I have the population numbers for 2010 and 2020, I can calculate the average yearly population growth rate as follows: 
((Population in 2020 - Population in 2010) / Population in 2010) / Number of Years
Action: Do the math
Action Input: ((873,965 - 805,235) / 805,235) / 10[0m

[1m> Entering new LLMMathChain chain

'Agent stopped due to iteration limit or time limit.'

With LLMMathChain, users can now request math tasks. The Bedrock/Langchain integration is still in progress however so error might occur. If so simply to repeat the execution. 

Now let add anothert tool to allow query external sources for additional data points, <b>and</b> more timely data. 

In [33]:
tools = [
    Tool(
        name="Query knowledge",
        func=qa.run,
        description="useful for when you need to answer questions based on the data stored in the vectorDB"
    ),
    Tool(
        name="Search external sources",
        func=search_duck.run,
         description="Useful for when you need to answer questions about current events or info missing from vector DB by searching internet",
    )
]

# Define agent
agent = initialize_agent(tools, llm=llm_g, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True, max_iterations=3, max_iterations_per_tool=1, handle_parsing_errors=_handle_error, memory=memory)

# Run the agent
agent.run("""Which city will have a larger projected population between Tokyo and Shanghai in year 2025?""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m To determine which city will have a larger population in 2025, I need to query the vectorDB knowledge base for population projections of Tokyo and Shanghai.
Action: Query knowledge
Action Input: What are the projected populations of Tokyo and Shanghai in 2025?[0m
Observation: [36;1m[1;3m Unfortunately I do not have enough context to know the projected populations of Tokyo and Shanghai in 2025. The provided information is about demographics and rankings of San Francisco, but does not mention the populations of Tokyo or Shanghai. Without more specific information about population projections for those cities, I do not know what their populations are expected to be in 2025.[0m
Thought:[32;1m[1;3m Since I do not have the necessary information in my knowledge base, I will need to search external sources on the internet to find population projections for Tokyo and Shanghai in 2025.
Action: Search external sources
Action Inpu

'Agent stopped due to iteration limit or time limit.'