# CPI Q/A bot

This chatbot retrieves context from a proprietary datasource and the web to answer questions about Consumer Price Index (CPI) changes in the province of British Columbia (BC) in April 2024.  The proprietary datasource is a PDF report highlighting CPI changes in BC in April 2024 over a 12-month period.  The web data needed to answer the question is being retrieved using the You.com API.  The chatbot is implemented as a parallel chain in Langchain.

In [25]:
import openai
import langchain
import os

In [26]:
os.environ["YDC_API_KEY"] = "<Insert your YDC API key here>"
os.environ["OPENAI_API_KEY"] = "<Insert your OpenAI key here>"

## Instantiating the You.com Retriever in Langchain

Langchain provides a You.com retriever.  For more information, please visit: https://python.langchain.com/v0.1/docs/integrations/retrievers/you-retriever/

In [27]:
from langchain_community.retrievers.you import YouRetriever

ydc_retriever = YouRetriever(num_web_results = 10)

In [28]:
# Let's test it out
response = ydc_retriever.invoke("British Columbia’s Consumer Price Index (CPI) in April 2024 was 2.9% higher (unadjusted) than in April 2023.  How does this compare to the Canadian CPI?")
response

[Document(page_content='Mail to: BC Stats, Box 9410 Stn Prov Govt, Victoria BC V8W 9V1', metadata={'url': 'https://www2.gov.bc.ca/gov/content/data/statistics/economy/consumer-price-index', 'thumbnail_url': None, 'title': 'Consumer Price Index (CPI) - Province of British Columbia', 'description': "Looking for more data? Explore the B.C. Government's extensive collection of datasets, applications and web services · Please send your questions and service requests to BC Stats here"}),
 Document(page_content='Consumer Price Index (CPI) data', metadata={'url': 'https://www2.gov.bc.ca/gov/content/data/statistics/economy/consumer-price-index', 'thumbnail_url': None, 'title': 'Consumer Price Index (CPI) - Province of British Columbia', 'description': "Looking for more data? Explore the B.C. Government's extensive collection of datasets, applications and web services · Please send your questions and service requests to BC Stats here"}),
 Document(page_content="Shelter inflation has been a thorn 

## Creating a Vector DB retriever based on data from a PDF File

We are going to load a PDF file using the PyPDFLoader in Langchain.  We will then use the RecursiveTextSplitter in Langchain to split the documents into chuncks that can be vectorized.  The vectorized chunks of text will be stored in a Facebook AI Similarity Search (FAISS) vector store.  This vector store will be converted into a Langchain retriever.

In [31]:
from langchain_community.document_loaders import PyPDFLoader

# The PDF file we are using can be downloaded from: https://www2.gov.bc.ca/assets/gov/data/statistics/economy/cpi/cpi_highlights.pdf
# load the PDF file
loader = PyPDFLoader("bc_cpi_highlights.pdf")
docs = loader.load()

In [32]:
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

# split the document into chunks, and vectorize these chunks in a FAISS database
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 100)
notes = text_splitter.split_documents(docs)
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(documents=notes, embedding=embeddings)

In [33]:
# test out the similarity search
query = "How much did food prices increase in April 2024?"
response = db.similarity_search(query, k=3)
response[0].page_content

'(excluding fish, seafood, and other marine products) \n(+2.1%). At the same time, fruit, fruit preparations, \nand nut s was the only major food category to \ndecrease in price (- 0.1%)  \nBritish Columbians paid more for both  health (+2.7%) \nand personal (+ 2.0%) care  when compared to \n12-months ago. Services, instead of items within \nthese categories, had the largest price increase. Personal services (such a hairdressing) cost 4.8% \nmore when compared to 12 -months ago, while the \ncost of health care services (such as eye and dental \ncare) increased by 4.3%.  Consumer Price \nIndex   \n \n \nReference date:  April  2024  Issue:  #24-04 Released:  May 21 , 2024 \n      \n-5.8-1.91.92.22.32.62.82.96.8\nClothing & FootwearHouseholdRecreationAlc., Tob., & CannabisHealth & PersonalFoodTransportationAll-itemsShelterInflation by Category\n% change, same month previous year'

In [34]:
# Create the retriever
faiss_retriever = db.as_retriever()

## Create an Ensemble Retriever using the You.Com Retriever and the FAISS Retriever

The Ensemble Retriever in Langchain ensembles results from multiple retrievers.  We will create an Ensemble Retriever with the FAISS Vector store retriever and the You.com retriever that we defined above as constituent retrievers.

In [35]:
from langchain.retrievers import EnsembleRetriever

ensemble_retriever = EnsembleRetriever(
    retrievers = [ydc_retriever, faiss_retriever], weights = [0.5, 0.5]
)

## Instantiate the LLM

In [36]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", temperature=0.5)

## Create the Prompt Template

In [37]:
system_prompt = """
You are an assistant that answers questions pertaining to CPI (Consumer Price Index).  Please utilize the following retrieved context from the web and from a proprietary
datasource to provide an accurate answer to the question.  Please try and utilize numbers where applicable to substantiate your answer.  If you do not know the answer, simply say you do not 
know the answer.  Please keep the response concise.

{context}
"""

In [38]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

## Create a basic chain without chat history

We will test our chain first without chat history.

In [39]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

qa_chain = create_stuff_documents_chain(llm, qa_prompt)
rag_chain = create_retrieval_chain(ensemble_retriever, qa_chain)

In [40]:
response = rag_chain.invoke({"input": "How did the CPI in April 2024 in BC compare to the national CPI in Canada?"})

In [41]:
response["answer"]

"In April 2024, the Consumer Price Index (CPI) in British Columbia (BC) increased by 2.9% compared to April 2023. Nationally, Canada's CPI was up by 2.7% over the same period. Therefore, the CPI in BC rose slightly more than the national CPI."

## Add chat history to our chatbot

Chat history is an integral component of any chat application, as the input query might require additional conversational context to be understood by the LLM.  We are going to add chat history to our chatbot, and contextualize the input prompts with chat history.

In [42]:
# Create a prompt that utilizes the chat history as context to reformulate the most recent input, as a standalone question that the LLM can comprehend
from langchain.chains import create_history_aware_retriever

contextualize_q_system_prompt = """
Given a chat history and the latest question, which might reference context in the chat history, formulate a standalone question, which can be understood without chat history.
Do not answer the question, just reformulate the question if necessary and return it as it otherwise.
"""

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}")
    ]
)

In [43]:
# Create a chain that takes conversation history and contextualizes the prompt
history_aware_retriever = create_history_aware_retriever(llm, ensemble_retriever, contextualize_q_prompt)

In [44]:
# rejig qa prompt to include the chat history
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}")
    ]
)

In [45]:
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

# statefully manage session history
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
        
    return store[session_id]

In [46]:
# create chains that include message history
qa_chain = create_stuff_documents_chain(llm, qa_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever, qa_chain)

In [47]:
from langchain_core.runnables.history import RunnableWithMessageHistory

# create final chain that ties everything together

conversation_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key = "input",
    history_messages_key = "chat_history",
    output_messages_key = "answer"
)

## Let's try it out!

In [48]:
conversation_rag_chain.invoke({"input": "How much did food prices increase in April 2024 in BC compared to April 2023?"}, config = {"configurable": {"session_id": "xyz_789"}})["answer"]

'Food prices in British Columbia increased by 2.6% in April 2024 compared to April 2023.'

In [49]:
conversation_rag_chain.invoke({"input": "How does that compare to the increase in food prices across the nation?"}, config = {"configurable": {"session_id": "xyz_789"}})["answer"]

'Across Canada, food prices increased by 2.3% in April 2024 compared to April 2023. Therefore, the increase in food prices in British Columbia (2.6%) was slightly higher than the national average.'

In [50]:
conversation_rag_chain.invoke({"input": "What contributed to the rising food prices in BC in April 2024?"}, config = {"configurable": {"session_id": "xyz_789"}})["answer"]

'The rising food prices in British Columbia in April 2024 were influenced by several factors:\n\n1. **Beef and Veal Prices**: These saw significant increases, rising by 0.8% in April 2024 and being 7.0% higher than in April 2023 due to tight supplies and strong demand.\n\n2. **Pork Prices**: Wholesale pork prices increased by 2.9% in April 2024 and were 18.3% higher than in April 2023, driven by higher demand after declines in 2022 and 2023.\n\n3. **Poultry Prices**: These grew by 4.4% in April 2024, reversing the trend of declining prices in 2023.\n\n4. **General Food Inflation**: Overall food prices in Canada are predicted to rise by 2.5% to 4.5% in 2024, with specific categories such as meat, bakery items, and vegetables expected to see the biggest cost increases.\n\nThese factors, along with global events and supply chain issues, contributed to the rising food prices in BC in April 2024.'

In [51]:
conversation_rag_chain.invoke({"input": "How did the CPI in April 2024 in BC compare to the national CPI in Canada?"}, config = {"configurable": {"session_id": "xyz_789"}})["answer"]

'In April 2024, the Consumer Price Index (CPI) in British Columbia increased by 2.9% compared to April 2023. Nationally, the CPI in Canada was up by 2.7% over the same period. Thus, the CPI increase in British Columbia was slightly higher than the national average.'