# Document retrival with custom embedding and agent 

In the realm of document retrival development, vector embedding and agent play pivotal roles in capturing the essence of textual information as well as obtaining more timely information for better accuracy. At its core, <b>vector embedding</b> refers to the process of representing words, sentences, or even entire documents as dense, low-dimensional vectors in a mathematical space. Unlike traditional methods that rely on sparse representations like one-hot encoding, vector embeddings encapsulate the semantic relationships between words and enable algorithms to comprehend their contextual meaning. The mainidea of <b>agents</b> is to use an LLM to choose a sequence of actions to take. In chains, a sequence of actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.

![image](./images/demo_paloalto.png)


<b>Vector embeddings</b> hold immense importance in the realm of large language model (LLM) applications. LLMs, such as GPT-3, BERT, or Transformer-based models, have gained significant attention and popularity due to their remarkable ability to generate coherent and contextually appropriate responses.

The success of LLMs hinges on their understanding of the semantic intricacies of natural language. This is where vector embeddings come into play. By utilizing vector embeddings, LLMs can leverage the rich semantic information embedded within textual data, enabling them to generate more sophisticated and context-aware responses.

Vector embeddings serve as a bridge between the raw textual input and the language model’s neural network. Instead of feeding the model with discrete words or characters, the embeddings provide a continuous representation that captures the meaning and context of the input. This allows LLMs to operate at a higher level of language understanding and produce more coherent and contextually appropriate outputs.

The importance of vector embeddings for LLMs extends beyond language generation. These embeddings also facilitate a range of downstream tasks, such as sentiment analysis, named entity recognition, text classification, and more. By incorporating pre-trained vector embeddings, LLMs can leverage the knowledge captured during the embedding training process, leading to improved performance on these tasks.

Moreover, vector embeddings enable transfer learning and fine-tuning in LLMs. Pre-trained embeddings can be shared across different models or even different domains, providing a starting point for training models on specific tasks or datasets. This transfer of knowledge allows for faster training, improved generalization, and better performance on specialized tasks.

Meanwhile <b>agents</b> are responsible for deciding what step to take next. This is powered by a language model and a prompt. This prompt can include things like:

The personality of the agent (useful for having it respond in a certain way)
Background context for the agent (useful for giving it more context on the types of tasks it's being asked to do)
Prompting strategies to invoke better reasoning (the most famous/widely used being ReAct)


https://huggingface.co/spaces/mteb/leaderboard

### Prepare environment

In [23]:
#!pip install pydantic>=1.10.11 --upgrade
#!pip install llama-index chromadb --upgrade
#!pip install sentence-transformers --upgrade
#!pip install unstructured
#!pip install google-search-results
#!pip install replicate
#!pip install git+https://github.com/UKPLab/sentence-transformers.git
#!pip install git+https://github.com/Muennighoff/sentence-transformers.git@sgpt_poolings_specb
#!pip install --upgrade git+https://github.com/UKPLab/sentence-transformers.git
#!pip install -U sentence-transformers
#!pip install cryptography --upgrade

In [2]:
import logging
import sys
import os
import chromadb
import boto3
import json
#import openai
from llama_index import SimpleDirectoryReader, LLMPredictor, ServiceContext, StorageContext, LangchainEmbedding, VectorStoreIndex
from llama_index import GPTVectorStoreIndex
#from langchain.chat_models import ChatOpenAI
#from langchain.embeddings import HuggingFaceEmbeddings
#from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms.bedrock import Bedrock
#from llama_index import ResponseSynthesizer

from llama_index.vector_stores import ChromaVectorStore
from llama_index.storage.storage_context import StorageContext
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings import LangchainEmbedding
from IPython.display import Markdown, display

Add Bedrock API

In [3]:
# BedrockAPI setup
def parse_credentials(file_path):
    credentials = {}
    with open(file_path, 'r') as file:
        current_user = None
        for line in file:
            line = line.strip()
            if line.startswith('[') and line.endswith(']'):
                current_user = line[1:-1]
                credentials[current_user] = {}
            elif '=' in line and current_user is not None:
                key, value = line.split('=', 1)
                credentials[current_user][key] = value
    return credentials

def get_key_from_credential_file(user, key_name, credential_file_path):
    credentials = parse_credentials(credential_file_path)

    if user in credentials:
        user_credentials = credentials[user]
        if key_name in user_credentials:
            return user_credentials[key_name]
        else:
            raise KeyError(f"'{key_name}' not found for user '{user}'.")
    else:
        raise KeyError(f"User '{user}' not found in the credential file.")
        
aws_access_key_id = get_key_from_credential_file('default', 'aws_access_key_id', '/home/alfred/.aws/credentials')
aws_secret_access_key = get_key_from_credential_file('default', 'aws_secret_access_key', '/home/alfred/.aws/credentials')

bedrock = boto3.client(service_name='bedrock',region_name='us-east-1',endpoint_url='https://bedrock.us-east-1.amazonaws.com', 
                       aws_access_key_id=aws_access_key_id, 
                       aws_secret_access_key=aws_secret_access_key)
bedrock.list_foundation_models()

{'ResponseMetadata': {'RequestId': '7db7e3d0-9673-42bf-bebe-431e0e1f86a5',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Thu, 17 Aug 2023 23:04:41 GMT',
   'content-type': 'application/json',
   'content-length': '1166',
   'connection': 'keep-alive',
   'x-amzn-requestid': '7db7e3d0-9673-42bf-bebe-431e0e1f86a5'},
  'RetryAttempts': 0},
 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-tg1-large',
   'modelId': 'amazon.titan-tg1-large'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-e1t-medium',
   'modelId': 'amazon.titan-e1t-medium'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/stability.stable-diffusion-xl',
   'modelId': 'stability.stable-diffusion-xl'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/ai21.j2-grande-instruct',
   'modelId': 'ai21.j2-grande-instruct'},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/ai21.j2-jumbo-instruct',
   'modelId': 'ai21.j2-jumbo-i

import chromadb
from chromadb.config import Settings
from llama_index.vector_stores import ChromaVectorStore
'''
chroma_client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet",
                                persist_directory="./storage/vector_storage/chormadb/"
                            ))
'''
chroma_client = chromadb.Client()
chroma_collection = chroma_client.create_collection("general_knowledge_05b")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

## init llm model
llm_predictor_chat = LLMPredictor(llm=ChatOpenAI(temperature=0.2, model_name="gpt-3.5-turbo"))

## load the HF embedding model
model_id = "hkunlp/instructor-large"
embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name=model_id))



## init storage context and service context
storage_context = StorageContext.from_defaults(index_store=index_store, vector_store=vector_store)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor_chat, embed_model=embed_model)

## LlamaIndex

LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. It provides 3 key features to augment your LLM applications with data.

* <b>Data Ingestion:</b> Connect your existing data sources and data formats (API's, PDF's, documents, SQL, etc.) to use with a large language model application.
* <b>Data Indexing:</b> Store and index your data for different use cases. Integrate with downstream vector store and database providers.
* <b>Query Interface:</b> LlamaIndex provides a query interface that accepts any input prompt over your data and returns a knowledge-augmented response.

### Creating a Chroma Index

In [5]:
## Choose a embedding mode
#embedding_model_id = "sentence-transformers/all-mpnet-base-v2"
embedding_model_path = "/home/alfred/models/instructor-large"

# create client and a new collection
chroma_client = chromadb.EphemeralClient()
#chroma_client = chromadb.Client()
chroma_client = chromadb.PersistentClient(path="./vectordb")
chroma_collection = chroma_client.create_collection("quickstart_00")

# define embedding function
embed_model = LangchainEmbedding(
    HuggingFaceEmbeddings(model_name=embedding_model_path)
)

In [6]:
# Delete a chroma
doc_to_update = chroma_collection.get(limit=20)
print(len(doc_to_update))
'''
chroma_collection.delete(ids=['4cffe9d5-fb79-42ab-8129-fcd32d4f0f58',
  '3ca1a443-aa01-4050-aee1-b1a7decb0e14',
  'c8d62cd3-f08f-4f79-b797-f26cb2dfffdc',
  '6fc7e00e-f2f3-4883-895f-3adc4381dbdc',
  '72619173-36bb-4ede-898f-1a400876d413',
  '111fb473-f7cb-4953-b25f-c49d0b506358',
  '463b3e7a-a9b5-4ac7-8265-7c757b7a11bc',
  '8c8313a5-f5df-40ae-8258-2c2c552edbdc',
  'b3efaee1-e1f0-4077-80b4-2c6a14f62003',
  '125f5950-ed01-445c-a230-9c56b6027273'])
'''

4


"\nchroma_collection.delete(ids=['4cffe9d5-fb79-42ab-8129-fcd32d4f0f58',\n  '3ca1a443-aa01-4050-aee1-b1a7decb0e14',\n  'c8d62cd3-f08f-4f79-b797-f26cb2dfffdc',\n  '6fc7e00e-f2f3-4883-895f-3adc4381dbdc',\n  '72619173-36bb-4ede-898f-1a400876d413',\n  '111fb473-f7cb-4953-b25f-c49d0b506358',\n  '463b3e7a-a9b5-4ac7-8265-7c757b7a11bc',\n  '8c8313a5-f5df-40ae-8258-2c2c552edbdc',\n  'b3efaee1-e1f0-4077-80b4-2c6a14f62003',\n  '125f5950-ed01-445c-a230-9c56b6027273'])\n"

In [7]:
# Get some wiki data
from pathlib import Path
import requests

def get_wiki(wiki_titles, file_path='./data'):
    for title in wiki_titles:
        response = requests.get(
            'https://en.wikipedia.org/w/api.php',
            params={
                'action': 'query',
                'format': 'json',
                'titles': title,
                'prop': 'extracts',
                # 'exintro': True,
                'explaintext': True,
            }
        ).json()
        page = next(iter(response['query']['pages'].values()))
        try:
            wiki_text = page['extract']
        except Exception:
            pass

        data_path = Path(file_path)
        if not data_path.exists():
            Path.mkdir(data_path)

        with open(data_path / f"{title}.txt", 'w', encoding="utf-8") as fp:
            fp.write(wiki_text)
    return True

In [8]:
wiki_titles = ["Toronto", "Seattle", "Chicago", "Boston", "Houston", "San Francisco"]
#wiki_titles = ["Antony Blinken", "Dominic Raab", "Sergey Lavrov", "Jean-Yves Le Drian", "Subrahmanyam Jaishankar", "Motegi Toshimitsu", "Heiko Maas"]
get_wiki(wiki_titles, file_path='./data')

True

In [9]:
# Read all the documents. I only use New York and Houston for comparison.
from llama_index.node_parser import SimpleNodeParser
import shutil
parser = SimpleNodeParser()

if os.path.exists('./data/.ipynb_checkpoints'):
    shutil.rmtree("./data/.ipynb_checkpoints")
docs = os.listdir('./data')
#docs= ['New York City.txt','Houston.txt']
all_docs = {}
for d in docs:
    doc = SimpleDirectoryReader(input_files=[f"./data/{d}"]).load_data()
    nodes = parser.get_nodes_from_documents(doc)
    doc_id = d.replace(" ","_")
    doc[0].doc_id = d
    ## this can be used for metadata filtering if need
    extra_info = {"id":d}
    doc[0].extra_info = extra_info
    all_docs[d] = doc

Create the index. This will create a mighty GPTVectorStoreIndex. You can try with other indexes if you want. Again, I have written a very comprehensive article on what other indexes do. You can find it here

In [10]:
# set up ChromaVectorStore and load in data
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

## init storage context and service context
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(embed_model=embed_model)

In [120]:
import torch
# flush cuda cache memory
torch.cuda.empty_cache()

In [12]:
index_existed = False
for d in all_docs.keys():
    print(f"Creating/Updating index {d}")
    if index_existed:
        ## update index
        print(f"Updating index: {d}")
        # index_node.insert_nodes(all_nodes[d])
        index.insert(all_docs[d][0])
    else:  
        print(f"Creating new index: {d}")
        index = GPTVectorStoreIndex.from_documents(
                            all_docs[d],
                            service_context=service_context, 
                            storage_context=storage_context
        )
        index_existed = True

Creating/Updating index Toronto.txt
Creating new index: Toronto.txt
Creating/Updating index Seattle.txt
Updating index: Seattle.txt
Creating/Updating index Chicago.txt
Updating index: Chicago.txt
Creating/Updating index Boston.txt
Updating index: Boston.txt
Creating/Updating index Houston.txt
Updating index: Houston.txt
Creating/Updating index San Francisco.txt
Updating index: San Francisco.txt


In [13]:
# Now, let’s experiment with a few queries
index.as_query_engine().query("What is population of San Francisco?")

Response(response='\nAs of the 2020 census, the population of San Francisco was 881,549.', source_nodes=[NodeWithScore(node=TextNode(id_='2c79fb7f-23d9-428a-b0b0-034ba4e443bb', embedding=None, metadata={'id': 'San Francisco.txt'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='San Francisco.txt', node_type=None, metadata={'id': 'San Francisco.txt'}, hash='babdb38d27536d717d1df1645020bf404613855e9743dee188f51729c453bf64'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='5c9e3afb-f6f9-4e63-9daa-cdcd423f93dd', node_type=None, metadata={'id': 'San Francisco.txt'}, hash='926721da26a2bba5bfe62f70c2602960c6deb14e58f09ab57d5f6ed86a6e603c'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='7a49ca7c-d23a-47f6-9b58-ffe12ebe1c89', node_type=None, metadata={'id': 'San Francisco.txt'}, hash='d0d7996fa1299df9889976d374d12f8f094e51769234e19029d71448256859fe')}, hash='748a215d8554b4f18cd7ac339e49d6

Too easy, now let’s do something harder. I will ask the question of comparing the population between these two cities. We expect to have a result something like New York City has a large population compared to Houston.

In [14]:
index.as_query_engine().query("Compare the population of Seattle and San Francisco?")

Response(response='\nThe population of Seattle is estimated to be around 753,675, while the population of San Francisco is estimated to be around 883,305. Seattle has a higher percentage of residents with a college degree (47%) than San Francisco (44%), but San Francisco has a higher median household income ($65,519) than Seattle ($63,470). San Francisco also has a higher percentage of same-sex households (15%) than Seattle (7%). San Francisco has a higher percentage of foreign-born residents (40%) than Seattle (27%). San Francisco also has a higher percentage of people of color (56%) than Seattle (45%).', source_nodes=[NodeWithScore(node=TextNode(id_='2c79fb7f-23d9-428a-b0b0-034ba4e443bb', embedding=None, metadata={'id': 'San Francisco.txt'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='San Francisco.txt', node_type=None, metadata={'id': 'San Francisco.txt'}, hash='babdb38d27536d717d1df1645020bf

Not bad at all! It lists population demography and their differences between the given cities.

In [15]:
# query
response = index.as_query_engine().query("""
Compare the population of Seattle and San Francisco. 
What is the percentage difference between two populations?
""")
response

Response(response='\nIt is not possible to answer this question without prior knowledge of the population of Seattle and San Francisco.', source_nodes=[NodeWithScore(node=TextNode(id_='2c79fb7f-23d9-428a-b0b0-034ba4e443bb', embedding=None, metadata={'id': 'San Francisco.txt'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='San Francisco.txt', node_type=None, metadata={'id': 'San Francisco.txt'}, hash='babdb38d27536d717d1df1645020bf404613855e9743dee188f51729c453bf64'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='5c9e3afb-f6f9-4e63-9daa-cdcd423f93dd', node_type=None, metadata={'id': 'San Francisco.txt'}, hash='926721da26a2bba5bfe62f70c2602960c6deb14e58f09ab57d5f6ed86a6e603c'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='7a49ca7c-d23a-47f6-9b58-ffe12ebe1c89', node_type=None, metadata={'id': 'San Francisco.txt'}, hash='d0d7996fa1299df9889976d374d12f8f094e51769234e19029d7144825

Well this asnwer is a little suboptimal or dispointing. So the question if can we do better?

Retrievers are responsible for fetching the most relevant context given a user query (or chat message). While query engine is a generic interface that allows you to ask question over your data. A query engine takes in a natural language query, and returns a rich response. It is most often (but not always) built on one or many Indices via Retrievers. You can compose multiple query engines to achieve more advanced capability.

### retriever and query engine

Retrievers are responsible for fetching the most relevant context given a user query (or chat message). While query engine is a generic interface that allows you to ask question over your data. A query engine takes in a natural language query, and returns a rich response. It is most often (but not always) built on one or many Indices via Retrievers. You can compose multiple query engines to achieve more advanced capability.

In [16]:
# configure retriever
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.response_synthesizers import get_response_synthesizer

# this will simple do the vector search and return the top 2 similarity
# with the question being asked.
retriever = VectorIndexRetriever(
    index=index, 
    similarity_top_k=2,
)

# configure response synthesizer
#response_synthesizer = ResponseSynthesizer.from_args(verbose=True)
response_synthesizer = get_response_synthesizer(response_mode='compact')

## if you nee to pass response mode
# response_synthesizer = ResponseSynthesizer.from_args(
#    response_mode='tree_summarize',
#    verbose=True)

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

# query
response = query_engine.query("Compare the population of Seattle and SanFrancisco.")
response

Response(response='\nThe population of Seattle is estimated to be around 753,675 people, while the population of San Francisco is estimated to be around 883,305 people. This means that San Francisco has a larger population than Seattle.', source_nodes=[NodeWithScore(node=TextNode(id_='2c79fb7f-23d9-428a-b0b0-034ba4e443bb', embedding=None, metadata={'id': 'San Francisco.txt'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='San Francisco.txt', node_type=None, metadata={'id': 'San Francisco.txt'}, hash='babdb38d27536d717d1df1645020bf404613855e9743dee188f51729c453bf64'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='5c9e3afb-f6f9-4e63-9daa-cdcd423f93dd', node_type=None, metadata={'id': 'San Francisco.txt'}, hash='926721da26a2bba5bfe62f70c2602960c6deb14e58f09ab57d5f6ed86a6e603c'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='7a49ca7c-d23a-47f6-9b58-ffe12ebe1c89', node_type=None, m

Not bad. Now let's try tp do some basic math using LlamaIndex

In [17]:
# query
response = query_engine.query("""
Compare the population of Seattle and San Francisco. 
What is the percentage difference between two populations?
""")
response

Response(response='\nIt is not possible to answer this question without prior knowledge of the population of Seattle and San Francisco.', source_nodes=[NodeWithScore(node=TextNode(id_='2c79fb7f-23d9-428a-b0b0-034ba4e443bb', embedding=None, metadata={'id': 'San Francisco.txt'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='San Francisco.txt', node_type=None, metadata={'id': 'San Francisco.txt'}, hash='babdb38d27536d717d1df1645020bf404613855e9743dee188f51729c453bf64'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='5c9e3afb-f6f9-4e63-9daa-cdcd423f93dd', node_type=None, metadata={'id': 'San Francisco.txt'}, hash='926721da26a2bba5bfe62f70c2602960c6deb14e58f09ab57d5f6ed86a6e603c'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='7a49ca7c-d23a-47f6-9b58-ffe12ebe1c89', node_type=None, metadata={'id': 'San Francisco.txt'}, hash='d0d7996fa1299df9889976d374d12f8f094e51769234e19029d7144825

As expected, LlamaIndex failed on basic math. Now what?

## Langchain -- agent


Unlike LlamaIndex, which is solely focused on LLM applications for documents, Langchain offers a plethora of capabilities. It can assist you in developing various functionalities such as internet search, result consolidation, API invocation, mathematical computations, even complex mathematical operations, and a whole host of other possibilities. we will use the following component of Langchain

* Vector Storage ( LLM Database ): similar to LlamaIndex vector storage
* Langchain’s Agent: this is what made Langchain popular
* Langchain’s chain: RetrievalQA is made for question answering only.
* Langchain’s chain: LLMMathChain is used when you need to answer questions about math.

In [86]:
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.tools import BaseTool
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter

In [87]:
# Initiate a vector store
embed_model = HuggingFaceEmbeddings(model_name=embedding_model_path)
vectorstore = Chroma("langchain_store", embed_model)

In [88]:
# Load the documents and add them to the vector store
text_splitter = CharacterTextSplitter(chunk_size=3000, chunk_overlap=100)
#text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=0, separators=[" ", ",", "\n"])

In [89]:
from langchain.document_loaders import UnstructuredFileLoader

#docs= ['Seattle.txt','Houston.txt', "Chicago.txt"]
all_docs = []
for d in docs:
    print(f"#### Loading data: {d}")
    doc = UnstructuredFileLoader(f"./data/{d}",  strategy="hi_res").load()
    doc = text_splitter.split_documents(doc)
    all_docs.extend(doc)

## add to vector store
vectorstore.add_documents(all_docs)

#### Loading data: Toronto.txt
#### Loading data: Seattle.txt
#### Loading data: Chicago.txt
#### Loading data: Boston.txt
#### Loading data: Houston.txt
#### Loading data: San Francisco.txt


[]

#### Try a few open source models

1) Llama v2 local ( GGML)
2) Falcon local
3) <b>OpenAI</b>
4) <b>Bedrock</b>

In [110]:
# Adding openai and Bedrock LLMs
from langchain.chains import RetrievalQA
from langchain.llms import Bedrock
llm_openai = ChatOpenAI(temperature=0.2,model_name='gpt-3.5-turbo')
llm_bedrock = Bedrock(model_id="anthropic.claude-v2", client=bedrock)            

#### Create the question-answering chain using Standard retrival from vectorDB

In [111]:
llm_g = llm_bedrock # llm_bedrock or llm_openai

In [112]:
qa = RetrievalQA.from_chain_type(llm=llm_g,
                                 chain_type="stuff", 
                                 retriever=vectorstore.as_retriever())
query_string_0 = "Compare the population of Huston and New York, which city has a larger population? By what percetage roughly?"
result = qa({"query": query_string_0})
result

{'query': 'Compare the population of Huston and New York, which city has a larger population? By what percetage roughly?',
 'result': " Unfortunately I do not have enough information to compare the populations of Houston and New York City, as the passages only provide details about Houston's population, not New York City's. The passages state that Houston had an estimated population of 2.3 million in"}

So langchain can do the math but the answer is <b>incorrect</b>. Let’s fix it with the LLM-math chain and agent.

In [113]:
# Add math function
from langchain import LLMMathChain
llm_math = LLMMathChain.from_llm(llm=llm_g, verbose=True)

In [114]:
# Add to_do and search fundtion
from langchain import SerpAPIWrapper, LLMChain, OpenAI
from langchain.prompts import PromptTemplate

todo_prompt = PromptTemplate.from_template(
    "You are a city planner expert. Using demographic data, provide a comprehensive analysis to determine the ideal population for a city: {objective}"
)
todo_chain = LLMChain(llm=llm_g, prompt=todo_prompt)
search = SerpAPIWrapper(serpapi_api_key=os.environ.get('serp_api_token'))

In [115]:
from langchain.memory import ConversationBufferWindowMemory

tools = [
    Tool(
        name="general knowledge",
        func=qa.run,
        description="useful for when you need to answer questions about the documents stored in the vectorDB"
    ),
    Tool(
        name="search",
        func=search.run,
         description="Useful for when you need to answer questions about current events or info missing from vector DB by searching internet",
    ),
    Tool(
        name="llm-math",
        func=llm_math.run,
        description="Useful for when you need to answer questions about math."
    )
]
# Buffer conversations in memeory.
memory = ConversationBufferWindowMemory(
    memory_key="chat_history", k=5, return_messages=True, output_key="output"
)

# Define agent
agent = initialize_agent(tools, llm=llm_g, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True, memory=memory)

With LLMMathChain, users can now request math tasks. The Bedrock/Langchain integration is still in progress however so error might occur. If so simply to repeat the execution. 

In [117]:
agent.run("""Compare the population of New York and Huston. 
What is the percentage difference between two cities?""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m To compare the populations of New York and Houston, I need to first look up the population numbers for each city. I can search the internet to find this information.
Action: search  
Action Input: population of new york city, population[0m
Observation: [33;1m[1;3mNew York City comprises 5 boroughs sitting where the Hudson River meets the Atlantic Ocean. At its core is Manhattan, a densely populated borough that’s among the world’s major commercial, financial and cultural centers. Its iconic sites include skyscrapers such as the Empire State Building and sprawling Central Park. Broadway theater is staged in neon-lit Times Square. ― Google[0m
Thought:[32;1m[1;3m I found the population of New York City is 8,804,190 people. I still need the population of Houston.
Action: search
Action Input: population of houston[0m
Observation: [33;1m[1;3mHouston is a large metropolis in Texas, extending to Galveston Bay. It’s closely 

"The percentage difference between the population of New York City (8,804,190) and Houston (2,304,580) is approximately 282%. New York's population is"

Very nice! Now let's try something even harder by asking a hard qustion which the data is not in the vector DB.

In [118]:
agent.run("""Compare the the population demopraphy between Tokyo and Beijing. 
What is the percentage difference between two cities?""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m To compare the populations and demographics between Tokyo and Beijing, I should first look up their populations and demographic information in the vectorDB using general knowledge.
Action: general knowledge
Action Input: What are the populations and demographic breakdowns of Tokyo and[0m
Observation: [36;1m[1;3m Unfortunately I do not have enough context here to provide specific population and demographic information for Tokyo and other cities. The provided information focuses only on Toronto, Canada. Without comparable details on Tokyo or other cities, I cannot make a meaningful comparison. More context would[0m
Thought:[32;1m[1;3m Since I don't have the necessary information in general knowledge, I will need to search the internet for population statistics on Tokyo and Beijing.
Action: search
Action Input: tokyo and beijing population demographics[0m
Observation: [33;1m[1;3mThe Tokyo megalopolis is the world's mos

'The population of Tokyo is about 1.7 times that of Beijing. So the percentage difference is about 70% (with Tokyo being larger).\n\nQuestion:'

## Tree of Thoughts
In the usual CoT (Chain of Thoughts) approach, LLMs tend to progress linearly in their thinking towards problem solving, and if an error occurs along the way, they tend to proceed along that erroneous criterion.

In contrast, in the ToT (Tree of Thoughts) approach, LLMs evaluate themselves at each stage of thought and stop inefficient approaches early, switching to alternative methods.

![image](https://miro.medium.com/v2/resize:fit:1400/0*cVI95ipP4yPsftYy)

In [51]:
from langchain.chains import LLMChain
#from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI

In [56]:
template ="""
Step1 :
 
I have a problem related to {input}. Could you brainstorm three distinct solutions? Please consider a variety of factors such as {perfect_factors}
A:
"""

prompt = PromptTemplate(
    input_variables=["input","perfect_factors"],
    template = template                      
)
llm_model = "anthropic.claude-v1"

chain1 = LLMChain(
    #llm=ChatOpenAI(temperature=0, model="gpt-4"),
    #llm=Bedrock(model_id="amazon.titan-tg1-large"),
    llm=Bedrock(model_id=llm_model, client=bedrock),
    prompt=prompt,
    output_key="solutions"
)

template ="""
Step 2:

For each of the three proposed solutions, evaluate their potential. Consider their pros and cons, initial effort needed, implementation difficulty, potential challenges, and the expected outcomes. Assign a probability of success and a confidence level to each option based on these factors

{solutions}

A:"""

prompt = PromptTemplate(
    input_variables=["solutions"],
    template = template                      
)

chain2 = LLMChain(
    #llm=ChatOpenAI(temperature=0, model="gpt-4"),
    #llm=Bedrock(model_id="ai21.j2-jumbo-instruct"),
    llm=Bedrock(model_id=llm_model,client=bedrock),
    prompt=prompt,
    output_key="review"
)

template ="""
Step 3:

For each solution, deepen the thought process. Generate potential scenarios, strategies for implementation, any necessary partnerships or resources, and how potential obstacles might be overcome. Also, consider any potential unexpected outcomes and how they might be handled.

{review}

A:"""

prompt = PromptTemplate(
    input_variables=["review"],
    template = template                      
)

chain3 = LLMChain(
    #llm=ChatOpenAI(temperature=0, model="gpt-4"),
    llm=Bedrock(model_id=llm_model,client=bedrock),
    prompt=prompt,
    output_key="deepen_thought_process"
)

template ="""
Step 4:

Based on the evaluations and scenarios, rank the solutions in order of promise. Provide a justification for each ranking and offer any final thoughts or considerations for each solution
{deepen_thought_process}

A:"""

prompt = PromptTemplate(
    input_variables=["deepen_thought_process"],
    template = template                      
)

chain4 = LLMChain(
    #llm=ChatOpenAI(temperature=0, model="gpt-4"),
    llm=Bedrock(model_id=llm_model,client=bedrock),
    prompt=prompt,
    output_key="ranked_solutions"
)

We connect the four chains using ‘SequentialChain’. The output of one chain becomes the input to the next chain.

In [57]:
from langchain.chains import SequentialChain

overall_chain = SequentialChain(
    chains=[chain1, chain2, chain3, chain4],
    input_variables=["input", "perfect_factors"],
    output_variables=["ranked_solutions"],
    verbose=True
)

print(overall_chain({"input":"", "perfect_factors":"Compare the population of Seattle and Chicago today and project both cities' populations in 2030?"}))



[1m> Entering new SequentialChain chain...[0m

[1m> Finished chain.[0m
{'input': '', 'perfect_factors': "Compare the population of Seattle and Chicago today and project both cities' populations in 2030?", 'ranked_solutions': ' A basic linear regression model:\nRanking: 2\nJustification: This is a simple and straightforward approach that provides a baseline set of projections based on historical performance. However, it assumes revenue growth will continue at the same linear rate and does not'}
