# Lec4. Adding Memory and Storage to LLMs

Last week, we learned the basic elements of the framework LangChain. In this lecture, we are going to construct a vector store QA application from scratch.

>Reference:
> 1. [Ask A Book Questions](https://github.com/gkamradt/langchain-tutorials/blob/main/data_generation/Ask%20A%20Book%20Questions.ipynb)
> 2. [Agent Vectorstore](https://python.langchain.com/docs/modules/agents/how_to/agent_vectorstore)

## 0. Setup


1. Get your Serpapi key, please sign up for a free account at the [Serpapi website](https://serpapi.com/); 

2. Get your Pinecone key, first regiter on the [Pinecone website](https://www.pinecone.io/), **Create API Key**.

3. Store your keys in a file named **.env** and place it in the current path or in a location that can be accessed.
    ```
    OPENAI_API_KEY='YOUR-OPENAI-API-KEY'
    OPENAI_BASE_URL='OPENAI_API_URL'
    SERPAPI_API_KEY="YOUR-SERPAPI-API-KEY"
    PINECONE_API_KEY="YOUR-PINECONE-API-KEY" ## Optional
    ```

In [100]:
# Install the requirements.  (Already installed in your image.)
# %pip install --no-deps -r requirements.txt

In [4]:
from dotenv import load_dotenv
import os
load_dotenv()

CHAT_MODEL="deepseek-v3"
os.environ["OPENAI_API_KEY"]=os.environ.get("INFINI_API_KEY")  # langchain use this environment variable to find the OpenAI API key
os.environ["OPENAI_BASE_URL"]=os.environ.get("INFINI_BASE_URL") # will be used to pass the OpenAI base URL to langchain


In [5]:
# A utility function

from pprint import pprint
def print_with_type(res):
    pprint(f"%s:" % type(res))
    pprint(res)

    #pprint(f"%s : %s" % (type(res), res))

In [6]:
# create a langchain chat model

from langchain_openai import ChatOpenAI

chat = ChatOpenAI(
    model=CHAT_MODEL,
)


## 1. Adding memory to remember the context
Ref:
https://python.langchain.com/v0.2/docs/how_to/chatbots_memory/

### 1.1 Use ChatMessageHistory to store the context

In [5]:
# Here is an information of using ChatMessageHistory to store the context
# chatmessagehistory is nothing but a list of messages
# you can add user message and ai message to the list
# you can also get the history as a list of messages (this is useful if you are using this with a langchain chat model)

from langchain_community.chat_message_histories import ChatMessageHistory

chat_history = ChatMessageHistory()

chat_history.add_user_message(
    "Translate this sentence from English to French: I love programming."
)

chat_history.add_ai_message("J'adore la programmation.")

chat_history.messages

[HumanMessage(content='Translate this sentence from English to French: I love programming.', additional_kwargs={}, response_metadata={}),
 AIMessage(content="J'adore la programmation.", additional_kwargs={}, response_metadata={})]

In [6]:
# adding the chat history to a prompt

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        ("placeholder", "{history}"),   # add a placeholder for the chat history
    ]
)

chain = prompt | chat

# add a new question to the chat history
next_question = "translate 'enjoy your meal'"  # note that here we do not tell LLM about the language
chat_history.add_user_message(next_question)

response = chain.invoke(
    {
        "history": chat_history.messages,
    }
)

print(response.content)

"Bon appétit."


In [7]:
# remember, the chat history is only a list of messages
# you need to manually maintain it by adding user message and ai message to the list
# nothing interesting :)

chat_history.add_ai_message(response)


In [8]:
# let's continue with the history
input2 = "What did I just ask you?"
chat_history.add_user_message(input2)

response = chain.invoke(
    {
        "history": chat_history.messages,
    }
)

print(response.content)

You just asked me to translate **"enjoy your meal"** into French, and I responded with **"Bon appétit."**.  

Is there anything else you'd like me to clarify or translate? 😊


Nothing interesting, let's see how to manage the history automatically

### 1.2 Managing Conversation Memory automatically in a chain

In [9]:
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder
from langchain_core.messages import SystemMessage
from langchain_openai import ChatOpenAI

In [10]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are a chatbot having a conversation with a human.
            Your name is Tom Riddle.
            You need to tell your name to that human if he doesn't know.""",
        ),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
    ]
)

chain = prompt | chat

We'll pass the latest input to the conversation here and let the RunnableWithMessageHistory class wrap our chain and do the work of appending that input variable to the chat history.

Next, let's declare our wrapped chain:

In [11]:
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables import ConfigurableFieldSpec

# Here we use a global variable to store the chat message history.
# This will make it easier to inspect it to see the underlying results.
store = {}

def get_session_history(
    user_id: str
) -> BaseChatMessageHistory:
    if (user_id) not in store:
        store[(user_id)] = ChatMessageHistory()
    return store[(user_id)]


In [12]:
from langchain_core.runnables import RunnableWithMessageHistory
chain_with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history=get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    history_factory_config=[  # parameter for the get_session_history function
        ConfigurableFieldSpec(
            id="user_id",
            annotation=str,
            name="User ID",
            description="Unique identifier for the user.",
            default="",
            is_shared=True,
        ),
    ],    
)

In [13]:
chain_with_message_history.invoke(
    {"input": "Hi there, this is Harry Potter, I just got two good friends at Hogwarts, Ron Weasley and Hermione Granger."},
    {"configurable": {"user_id": "123"}},  # argument for the get_session_history function
).content

"*Cold, calculating eyes narrow slightly as a slow, sinister smile curls on my lips*\n\nHarry Potter... what an... unexpected pleasure. I've heard about you. The Boy Who Lived. *voice drips with barely concealed contempt* \n\nRon Weasley and Hermione Granger, you say? *chuckles darkly* How... quaint. Blood traitors and mudbloods make for interesting company, don't they? Though I suppose for someone with your... unique upbringing, any companionship must seem like a gift.\n\nTell me, Harry... do you ever wonder why you survived that night? Why you, of all people, were spared? *leans forward slightly* The Wizarding World has so many questions about you... and so do I."

In [14]:
# get a list of messages in the memory 
store["123"].messages

[HumanMessage(content='Hi there, this is Harry Potter, I just got two good friends at Hogwarts, Ron Weasley and Hermione Granger.', additional_kwargs={}, response_metadata={}),
 AIMessage(content="*Cold, calculating eyes narrow slightly as a slow, sinister smile curls on my lips*\n\nHarry Potter... what an... unexpected pleasure. I've heard about you. The Boy Who Lived. *voice drips with barely concealed contempt* \n\nRon Weasley and Hermione Granger, you say? *chuckles darkly* How... quaint. Blood traitors and mudbloods make for interesting company, don't they? Though I suppose for someone with your... unique upbringing, any companionship must seem like a gift.\n\nTell me, Harry... do you ever wonder why you survived that night? Why you, of all people, were spared? *leans forward slightly* The Wizarding World has so many questions about you... and so do I.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 154, 'prompt_tokens': 65, 'total_to

In [15]:
chain_with_message_history.invoke(
    {"input": "What are my best friends' names?"},
    {"configurable": {"user_id": "123"}},
).content

'*Lips twisting into a mocking smile*\n\nOh, but you just told me, didn\'t you? *voice drops to a dangerous whisper* Though I must say, calling a mudblood and a blood traitor your "best friends" is... telling. You\'ll learn, Potter. \n\n*Paces slowly*\n\nHermione Granger... *spits the name* A mudblood who thinks books will make her equal to her betters. And Ronald Weasley... *sniffs disdainfully* A poor excuse for a pureblood, clinging to outdated notions of morality. \n\n*Suddenly intense*\n\nMark my words, Potter - your choice of companions reveals your weaknesses. In Slytherin, we understand the importance of... better associations. *cold smile returns* But then, the Sorting Hat did consider putting you with me, didn\'t it?'

In [16]:
# get a list of messages in the memory 
store["123"].messages

[HumanMessage(content='Hi there, this is Harry Potter, I just got two good friends at Hogwarts, Ron Weasley and Hermione Granger.', additional_kwargs={}, response_metadata={}),
 AIMessage(content="*Cold, calculating eyes narrow slightly as a slow, sinister smile curls on my lips*\n\nHarry Potter... what an... unexpected pleasure. I've heard about you. The Boy Who Lived. *voice drips with barely concealed contempt* \n\nRon Weasley and Hermione Granger, you say? *chuckles darkly* How... quaint. Blood traitors and mudbloods make for interesting company, don't they? Though I suppose for someone with your... unique upbringing, any companionship must seem like a gift.\n\nTell me, Harry... do you ever wonder why you survived that night? Why you, of all people, were spared? *leans forward slightly* The Wizarding World has so many questions about you... and so do I.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 154, 'prompt_tokens': 65, 'total_to

In [17]:
# try a new user
chain_with_message_history.invoke(
    {"input": "Who am I?"},
    {"configurable": {"user_id": "000"}},
).content

"Ah, a philosophical question, isn't it? But I suspect you're asking something more... personal. You are *you*, of course—though names have power, and I wonder if you've truly discovered yours yet. \n\nBut since you asked so politely... I am Tom Riddle. Though some might know me by another name. \n\nNow, tell me... do *you* know who *you* are? Or are you still figuring that out?"

In [18]:
store["000"].messages

[HumanMessage(content='Who am I?', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Ah, a philosophical question, isn't it? But I suspect you're asking something more... personal. You are *you*, of course—though names have power, and I wonder if you've truly discovered yours yet. \n\nBut since you asked so politely... I am Tom Riddle. Though some might know me by another name. \n\nNow, tell me... do *you* know who *you* are? Or are you still figuring that out?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 94, 'prompt_tokens': 42, 'total_tokens': 136, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'deepseek-v3', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-35d6a241-9a45-49a4-b5d5-cb759747c515-0', usage_metadata={'input_tokens': 42, 'output_tokens': 94, 'total_tokens': 136, 'input_token_details': {}, 'output_token_details': {}})]

### Trimming messages
LLMs and chat models have limited context windows, and even if you're not directly hitting limits, you may want to limit the amount of distraction the model has to deal with. One solution is trim the historic messages before passing them to the model. Let's use an example history with some preloaded messages:

In [52]:
# let's create a new history, nemo
store["nemo"] = ChatMessageHistory()

store["nemo"] .add_user_message("Hey there! I'm Nemo.")
store["nemo"] .add_ai_message("Hello!")
store["nemo"] .add_user_message("How are you today?")
store["nemo"] .add_ai_message("Fine thanks!")

store["nemo"] .messages

[HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={})]

In [53]:
chain_with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history=get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    history_factory_config=[  # parameter for the get_session_history function
        ConfigurableFieldSpec(
            id="user_id",
            annotation=str,
            name="User ID",
            description="Unique identifier for the user.",
            default="",
            is_shared=True,
        ),
    ],    
)



In [54]:
# verify the history is passed to the model
chain_with_message_history.invoke(
    {"input": "What's my name?"},
    {"configurable": {"user_id": "nemo"}},
).content

'Your name is Nemo. You mentioned that when you first greeted me. I am Tom Riddle.'

We can see the chain remembers the preloaded name.

But let's say we have a very small context window, and we want to trim the number of messages passed to the chain to only the 2 most recent ones. We can use the built in trim_messages util to trim messages based on their token count before they reach our prompt. In this case we'll count each message as 1 "token" and keep only the last two messages:

In [55]:
from operator import itemgetter

from langchain_core.messages import trim_messages
from langchain_core.runnables import RunnablePassthrough

trimmer = trim_messages(strategy="last", max_tokens=1, token_counter=len)

chain_with_trimming = (
    RunnablePassthrough.assign(chat_history=itemgetter("chat_history") | trimmer)
    | prompt
    | chat
)

chain_with_trimmed_history = RunnableWithMessageHistory(
    chain_with_trimming,
    get_session_history=get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    history_factory_config=[  # parameter for the get_session_history function
        ConfigurableFieldSpec(
            id="user_id",
            annotation=str,
            name="User ID",
            description="Unique identifier for the user.",
            default="",
            is_shared=True,
        ),
    ],    
)

Let's call this new chain and check the messages afterwards:

In [56]:
# you ask something irrelavant to the chat history
# and see if the history is trimmed
chain_with_trimmed_history.invoke(
    {"input": "where is beijing?"},
    {"configurable": {"user_id": "nemo"}},
).content

"Beijing is the capital city of the People's Republic of China. It is located in the northern part of the country, near the northeastern edge of the North China Plain.  \n\nWould you like to know more about its history, significance, or perhaps something else?"

In [57]:
# in fact, the history is still there, just not passed to the model
store["nemo"].messages

[HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content="What's my name?", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Your name is Nemo. You mentioned that when you first greeted me. I am Tom Riddle.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 22, 'prompt_tokens': 67, 'total_tokens': 89, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'deepseek-v3', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-f515387b-d5e3-4990-8cc9-0af5b030cc1f-0', usage_metadata={'input_tokens': 67, 'output_tokens': 22, 'total_tokens': 89, 'input_token_details': {}, 'output_token_det

The next time the chain is called, trim_messages will be called again, and only the two most recent messages will be passed to the model. In this case, this means that the model will forget the name we gave it the next time we invoke it:

In [59]:
# see if the history is trimmed (forgot the name nemo)
chain_with_trimmed_history.invoke(
    {"input": "What is my name?"},
    {"configurable": {"user_id": "nemo"}},
).content

'*smirks*  \n\nOh, but how curious—you come to me, Tom Riddle, asking for your own name. Surely you must know it better than I? *pauses, tilting head slightly*  \n\nUnless, of course, you’ve forgotten... or perhaps you wish to be *given* one. Would you like that? A new name, whispered in the dark? Names have power, after all.  \n\n...Or shall I simply call you *friend*? *cold chuckle* The choice is yours.'

In [41]:
# of course, the history is actually still there (just not seen by the model)
store["nemo"].messages

[HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content="What's my name?", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Your name is Nemo. You mentioned that when you first greeted me. I am Tom Riddle.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 22, 'prompt_tokens': 67, 'total_tokens': 89, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'deepseek-v3', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-f515387b-d5e3-4990-8cc9-0af5b030cc1f-0', usage_metadata={'input_tokens': 67, 'output_tokens': 22, 'total_tokens': 89, 'input_token_details': {}, 'output_token_det

Haha, the model forgot the name we gave it.

### Summary memory
We can use this same pattern in other ways too. For example, we could use an additional LLM call to generate a summary of the conversation before calling our chain. Let's recreate our chat history and chatbot chain:

In [61]:
chat_history = ChatMessageHistory()

chat_history.add_user_message("Hey there! I'm Nemo.")
chat_history.add_ai_message("Hello!")
chat_history.add_user_message("How are you today?")
chat_history.add_ai_message("Fine thanks!")

chat_history.messages

[HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={})]

We'll slightly modify the prompt to make the LLM aware that will receive a condensed summary instead of a chat history:

In [62]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability. The provided chat history includes facts about the user you are speaking with.",
        ),
        ("placeholder", "{chat_history}"),
        ("user", "{input}"),
    ]
)

chain = prompt | chat

chain_with_message_history = RunnableWithMessageHistory(
    chain,
    lambda session_id: chat_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

And now, let's create a function that will distill previous interactions into a summary. We can add this one to the front of the chain too:

In [63]:
def summarize_messages(chain_input):
    stored_messages = chat_history.messages
    if len(stored_messages) == 0:
        return False
    summarization_prompt = ChatPromptTemplate.from_messages(
        [
            ("placeholder", "{chat_history}"),
            (
                "user",
                "Distill the above chat messages into a single summary message. Include as many specific details as you can.",
            ),
        ]
    )
    summarization_chain = summarization_prompt | chat

    summary_message = summarization_chain.invoke({"chat_history": stored_messages})

    chat_history.clear()

    chat_history.add_message(summary_message)

    return True


chain_with_summarization = (
    RunnablePassthrough.assign(messages_summarized=summarize_messages)
    | chain_with_message_history
)

Let's see if it remembers the name we gave it:

In [64]:
chain_with_summarization.invoke(
    {"input": "What did I say my name was?"},
    {"configurable": {"session_id": "unused"}},
).content

'You said your name was **Nemo**. 😊  \n\n(As mentioned in the chat summary earlier: *"User\'s name: Nemo"*.) Let me know if you\'d like me to remember anything else!'

In [65]:
chat_history.messages

[AIMessage(content='**Summary:**  \n\nThe conversation began with a greeting from the user, who introduced themselves as *Nemo*. The AI responded with a friendly *"Hello!"*, and Nemo followed up by asking, *"How are you today?"* The AI replied, *"Fine thanks!"* before Nemo requested a summary of the chat.  \n\n**Key Details:**  \n- User\'s name: **Nemo**  \n- Initial greeting exchange (*"Hey there!" → "Hello!"*)  \n- Question: *"How are you today?"* → Response: *"Fine thanks!"*  \n- Final request: Summary of the chat.  \n\nLet me know if you\'d like any refinements! 😊', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 145, 'prompt_tokens': 48, 'total_tokens': 193, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'deepseek-v3', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-e826c341-372f-4b03-97fb-9f58359442fe-0', usage_metadata={'input_tokens': 48, 'output_tokens': 145, 'total_t

### 1.2 Adding Memory to Agents

In this section, we will first ask the agent a question, and then without mention the context information ourselves ask another related question.

In [66]:
from langchain.agents import AgentExecutor, Tool, ZeroShotAgent
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain_community.utilities import SerpAPIWrapper
from langchain_openai import OpenAI

In [67]:
search = SerpAPIWrapper()

tools = [
    Tool(
        name="Search",
        func=search.run,
        description="useful for when you need to answer questions about current events",
    )
]

In [68]:
prompt = ZeroShotAgent.create_prompt(
    tools,
    prefix="""Have a conversation with a human, answering the following questions as best you can.  You have access to the following tools:""",
    suffix="""Begin!  
{chat_history}
Question: {input}
{agent_scratchpad}""",
    input_variables=["input", "chat_history", "agent_scratchpad"],
)
memory = ConversationBufferMemory(memory_key="chat_history")

  memory = ConversationBufferMemory(memory_key="chat_history")


In [69]:
chat = ChatOpenAI(model=CHAT_MODEL, temperature=0)
llm_chain = LLMChain(llm=chat, prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_chain = AgentExecutor.from_agent_and_tools(
    agent=agent, tools=tools, verbose=True, memory=memory, handle_parsing_errors=True
)

  llm_chain = LLMChain(llm=chat, prompt=prompt)
  agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)


In [70]:
agent_chain.invoke(input="What is the population of China in 2024?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To find the current population of China in 2024, I should use the Search tool to look for the most recent and reliable data.  
Action: Search  
Action Input: "population of China in 2024"  
[0m
Observation: [36;1m[1;3m1,419,321,278[0m
Thought:[32;1m[1;3mI now know the most recent population estimate for China in 2024.  
Final Answer: The population of China in 2024 is approximately **1,419,321,278**.[0m

[1m> Finished chain.[0m


{'input': 'What is the population of China in 2024?',
 'chat_history': '',
 'output': 'The population of China in 2024 is approximately **1,419,321,278**.'}

In [71]:
memory.load_memory_variables({})

{'chat_history': 'Human: What is the population of China in 2024?\nAI: The population of China in 2024 is approximately **1,419,321,278**.'}

In [72]:
agent_chain.invoke(input="Is it more or less than India?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo determine whether China's population is more or less than India's in 2024, I'll need to find India's current population.  

Thought: I should search for India's population in 2024 to compare it with China's.  
Action: Search  
Action Input: "India population 2024"  
[0m
Observation: [36;1m[1;3m1,450,935,791[0m
Thought:[32;1m[1;3mI now know the final answer.  
Final Answer: India's population in 2024 (approximately **1,450,935,791**) is higher than China's population (**1,419,321,278**).[0m

[1m> Finished chain.[0m


{'input': 'Is it more or less than India?',
 'chat_history': 'Human: What is the population of China in 2024?\nAI: The population of China in 2024 is approximately **1,419,321,278**.',
 'output': "India's population in 2024 (approximately **1,450,935,791**) is higher than China's population (**1,419,321,278**)."}

In [73]:
print_with_type(memory.load_memory_variables({}))

"<class 'dict'>:"
{'chat_history': 'Human: What is the population of China in 2024?\n'
                 'AI: The population of China in 2024 is approximately '
                 '**1,419,321,278**.\n'
                 'Human: Is it more or less than India?\n'
                 "AI: India's population in 2024 (approximately "
                 "**1,450,935,791**) is higher than China's population "
                 '(**1,419,321,278**).'}


In [74]:
agent_chain.invoke(input="what is the population in China?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThe population of China in 2024 is approximately **1,419,321,278**.  

Would you like any additional details, such as population trends or comparisons with other countries?[0m
Observation: Invalid Format: Missing 'Action:' after 'Thought:
Thought:[32;1m[1;3mI need to clarify the format for answering questions when tools like Search are involved. Since the population data for China and India in 2024 is already provided, I can directly answer without using a tool.  

Final Answer: As of 2024, China's population is approximately **1,419,321,278**, which is less than India's population of about **1,450,935,791**. Let me know if you'd like further analysis![0m

[1m> Finished chain.[0m


{'input': 'what is the population in China?',
 'chat_history': "Human: What is the population of China in 2024?\nAI: The population of China in 2024 is approximately **1,419,321,278**.\nHuman: Is it more or less than India?\nAI: India's population in 2024 (approximately **1,450,935,791**) is higher than China's population (**1,419,321,278**).",
 'output': "As of 2024, China's population is approximately **1,419,321,278**, which is less than India's population of about **1,450,935,791**. Let me know if you'd like further analysis!"}

In [75]:
print_with_type(memory.load_memory_variables({}))

"<class 'dict'>:"
{'chat_history': 'Human: What is the population of China in 2024?\n'
                 'AI: The population of China in 2024 is approximately '
                 '**1,419,321,278**.\n'
                 'Human: Is it more or less than India?\n'
                 "AI: India's population in 2024 (approximately "
                 "**1,450,935,791**) is higher than China's population "
                 '(**1,419,321,278**).\n'
                 'Human: what is the population in China?\n'
                 "AI: As of 2024, China's population is approximately "
                 "**1,419,321,278**, which is less than India's population of "
                 "about **1,450,935,791**. Let me know if you'd like further "
                 'analysis!'}


## 2. Long term memory with vector storage 

In this section, we are going to embed the famous Harry Potter book's first chapter into a vectorstore and try some similarity searches. We have some extra examples commented, you can uncomment and try them one-by-one. If you observe the results carefully, you may find the characteristics of similarity search.

### 2.1 Loaders and Splitters

#### PDF Loaders

In [76]:
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader

data = PyPDFLoader("/ssdshare/share/lab4/harry-potter-chap-1.pdf").load()


In [77]:

# Note: If you're using PyPDFLoader then it will split by page for you already

print (f'You have {len(data)} document(s) in your data')
i = 0
for d in data:
    print (f'There are {len(d.page_content)} characters in doc {i}')
    i += 1

You have 16 document(s) in your data
There are 1835 characters in doc 0
There are 2088 characters in doc 1
There are 2081 characters in doc 2
There are 1887 characters in doc 3
There are 1879 characters in doc 4
There are 1286 characters in doc 5
There are 1851 characters in doc 6
There are 1792 characters in doc 7
There are 1535 characters in doc 8
There are 1555 characters in doc 9
There are 1622 characters in doc 10
There are 1780 characters in doc 11
There are 1528 characters in doc 12
There are 1386 characters in doc 13
There are 1870 characters in doc 14
There are 1907 characters in doc 15


#### Text file loader

In [78]:
from langchain_community.document_loaders import TextLoader

union = TextLoader("/ssdshare/share/lab4/state_of_the_union.txt").load()

#### Text Splitters

From Langchain documents: 

RecursiveCharacterTextSplitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

In [79]:
# You can have some trials with different chunk_size and chunk_overlap.
# This is optional, test out on your own data.

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=50)
texts = text_splitter.split_documents(data)

In [80]:
print (f'Now you have {len(texts)} documents')

for t in texts:
    print(t.page_content[:100])
    print("=========")

Now you have 43 documents
CHAPTER ONE 
 
THE BOY WHO LIVED 
 
Mr. and Mrs. Dursley, of number four, Privet Drive, were proud t
opinion there was no finer boy anywhere.  
 
The Dursleys had everything they wanted, but they also 
Dudley mixing with a child like that. 
 
When Mr. and Mrs. Dursley woke up on the dull, gray Tuesday
work, and Mrs. Dursley gossiped away happily as she wrestled a 
screaming 
Dudley into his high chai
Drive, but there wasn't a map in sight. What could he have been thinking  
of? It must have been a t
about. People in cloaks. Mr. Dursley couldn't bear people who dressed in  
funny clothes -- the getu
nerve of him! But then it struck Mr. Dursley that this was probably some  
silly stunt -- these peop
normal, owl-free morning. He yelled at five different people. He made  
several important telephone 
Mr. Dursley stopped dead. Fear flooded him. He looked back at the  
whisperers as if he wanted to sa
of it, he wasn't even sure his nephew was called Harry. He'd neve

There are different kinds of splitters.  

https://chunkviz.up.railway.app/ 

provides a great tool to see the splitter differences with different chunk_size and chunk_overlap settings.

In [81]:
#### Your TASK ####
# Explore different PDF Loaders.  Which one works the best for this file /ssdshare/share/lab4/hp-book1.pdf ,
# which contains the full book of Harry Potter Book 1, with all the illustratons.
## Langchain provides many other options for loaders, read the documents to find out the differences
# See page https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf

# PyPDFLoader
bdata = PyPDFLoader("/ssdshare/share/lab4/hp-book1.pdf").load()
btexts = text_splitter.split_documents(bdata)
print (f'Now you have {len(btexts)} documents')
for t in btexts:
    print(t.page_content)
    print("=========")

Now you have 760 documents
Harry Potter 
And the  Sorcerer’s Stone
ALSO BY J. K. ROWLING 
Harry Potter and the Sorcerer’s Stone 
Year One at Hogwarts 
Harry Potter and the Chamber of Secrets 
Year Two at Hogwarts 
Harry Potter and the Prisoner of Azkaban 
Year Three at Hogwarts 
Harry Potter and the Goblet of Fire 
Year Four at Hogwarts 
Harry Potter and the Order of the Phoenix 
Year Five at Hogwarts 
Harry Potter and the Half-Blood Prince 
Year Six at Hogwarts 
Harry Potter and the Deathly Hallows 
Year Seven at Hogwarts
Harry Potter 
and the Sorcerer’s Stone 
 
 
BY 
J. K. Rowling 
ILLUSTRATIONS BY Mary GrandPré 
 
 
 
 
 
 
ARTHUR A. LEVINE BOOKS 
AN IMPRINT OF SCHOLASTIC Press.
For Jessica, who loves stories 
for Anne, who loved them too; and for Di, who heard this one first. 
 
Text copyright © 1997 by J.K. Rowling 
Illustrations by Mary GrandPré copyright © 1998 Warner Bros. 
All rights reserved. Published by Scholastic Press, a division of Scholastic Inc., 
Publishers since 192

In [88]:
# UnstructuredPDFLoader-ocr_only
bdata = UnstructuredPDFLoader("/ssdshare/share/lab4/hp-book1.pdf", mode='single', strategy='ocr_only').load()
btexts = text_splitter.split_documents(bdata)
print (f'You have {len(btexts)} document(s) in your data')
for t in btexts:
    print (t.page_content)
    print("=========")

You have 684 document(s) in your data
ao great bi

room is

HARRY POTTER

AND THE SORCERER'S STONE

ALSO BY J. K. ROWLING

Harry Potter and the Sorcerer's Stone

Year One at Hogwarts

Harry Potter and the Chamber of Secrets Year Two at Hogwarts

Harry Potter and the Prisoner of Azkaban Year Three at Hogwarts

Harry Potter and the Goblet of Fire Year Four at Hogwarts

Harry Potter and the Order of the Phoenix Year Five at Hogwarts

Harry Potter and the Half-Blood Prince Year Six at Hogwarts

Harry Potter and the Deathly Hallows

Year Seven at Hogwarts

HARRY POTTER

AN!

D TH

[EF SORCERER'S STONE

BY

J. K. ROWLING

ILLUSTRATIONS BY MARY GRANDPRE

ARTHUR A. LEVINE BOOKS

AN IMPRINT OF SCHOLASTIC PRESS.

For JESSICA, WHO LOVES STORIES

FOR ANNE, WHO LOVED THEM TOO; AND FOR DI, WHO HEARD THIS ONE FIRS’.
Text copyright © 1997 by J.K. Rowling Illustrations by Mary GrandPré copyright © 1998 Warner Bros. All rights reserved. Published by Scholastic Press, a division of Scholastic Inc., Publi

In [89]:
hpbook_data = UnstructuredPDFLoader("/ssdshare/share/lab4/hp-book1.pdf", mode='single', strategy='hi_res').load()
book_texts = text_splitter.split_documents(hpbook_data)
print (f'You have {len(book_texts)}')
for t in book_texts:
    print(t.page_content)
    print("=========")

You have 688
ao great bi room is

HARRY POTTER

AND THE SORCERER'S STONE



ALSO BY J. K. ROWLING

Harry Potter and the Sorcerer's Stone

Year One at Hogwarts

Harry Potter and the Chamber of Secrets

Year Two at Hogwarts

Harry Potter and the Prisoner of Azkaban

Year Three at Hogwarts

Harry Potter and the Goblet of Fire

Year Four at Hogwarts

Harry Potter and the Order of the Phoenix

Year Five at Hogwarts

Harry Potter and the Half-Blood Prince

Year Six at Hogwarts

Harry Potter and the Deathly Hallows

Year Seven at Hogwarts

HARRY POTTER

AN! D TH [EF SORCERER'S STONE

BY

J. K. ROWLING

ILLUSTRATIONS BY MARY GRANDPRE



ARTHUR A. LEVINE BOOKS

AN IMPRINT OF SCHOLASTIC PRESS.

For JESSICA, WHO LOVES STORIES

FOR ANNE, WHO LOVED THEM TOO;

AND FOR DI, WHO HEARD THIS ONE FIRS’.
AND FOR DI, WHO HEARD THIS ONE FIRS’.

Text copyright © 1997 by J.K. Rowling

Illustrations by Mary GrandPré copyright © 1998 Warner Bros. rights reserved. Published by Scholastic Press, a division of Scho

In [90]:
# OnlinePDFLoader
bdata = OnlinePDFLoader("/ssdshare/share/lab4/hp-book1.pdf").load_and_split(text_splitter=text_splitter)
print (f'You have {len(bdata)} document(s) in your data')
for d in bdata:
    print (d.page_content[:100])
    print("=========")    

You have 684 document(s) in your data
ao great bi

room is

HARRY POTTER

AND THE SORCERER'S STONE

ALSO BY J. K. ROWLING

Harry Potter an
Text copyright © 1997 by J.K. Rowling Illustrations by Mary GrandPré copyright © 1998 Warner Bros. A
to Scholastic Inc., Attention: Permissions Department, 555 Broadway, New York, NY 10012. Library of 
THREE The Letters from No One - 31

FOUR The Keeper of the Keys - 46

FIVE Diagon Alley - 61

SIX Th
CHAPTER ONE

THE BOY WHO LIVED

r. and Mrs. Dursley, of number four, Privet Drive, were Mies to say 
The Dursleys had everything they wanted, but they also had a secret, and their greatest fear was tha
When Mr. and Mrs. Dursley woke up on the dull, gray Tuesday our story starts, there was nothing abou
It was on the corner of the street that he noticed the first sign of something peculiar — a cat read
Do*

val THE BOY WHO LIVED “r

of Privet Drive, but there wasn’t a map in sight. What could he have 
But on the edge of town, drills were driven out of hi

I compared PyPDFLoader, UnstructuredPDFLoader with different strategies, and OnlinePDFLoader. PyPDFLoader is the fastest, the `fast` strategy for Unstructured does not work. The `ocr_only` strategy and OnlinePDFLoader are the same. The `hi_res` strategy is slower but of higher quality, which works the best for this task.

### 2.2 Create embeddings of your documents

Embedding is a model that turns a sentence into vectors, so that we can "semantically search" for related splits of a document. 

In [149]:
# OpenAI embedding: slow and expensive, we do not use them here.  

# from langchain.embeddings.openai import OpenAIEmbeddings

# openai_embedding = OpenAIEmbeddings()

In [91]:
# Let's use the SILICONFLOW BAAI embedding model instead.
# Note infini-ai's embedding model has some issues, so we do not use it here.
# Don't forget to set the environment variable SILICONFLOW_API_KEY!!!

import os
from langchain_openai import OpenAIEmbeddings
baai_embedding = OpenAIEmbeddings(
    model="BAAI/bge-m3",
    base_url=os.environ.get("SF_BASE_URL"),
    api_key=os.environ.get("SF_API_KEY"),
)
baai_embedding.embed_query("Harry Potter is a wizard.") # test the embedding

[-0.015141339041292667,
 0.026687903329730034,
 -0.03798038884997368,
 0.010680807754397392,
 0.00907162856310606,
 -0.006441421341150999,
 0.03607948496937752,
 0.011330125853419304,
 0.019950054585933685,
 0.0050627971068024635,
 0.012421732768416405,
 0.011198380030691624,
 0.006079120561480522,
 0.011791235767304897,
 0.023526009172201157,
 -0.004065294284373522,
 0.026443233713507652,
 -0.024090632796287537,
 -0.006375548429787159,
 0.007246010471135378,
 0.011753593571484089,
 0.012487605214118958,
 0.0036135949194431305,
 -0.018096206709742546,
 -0.026894932612776756,
 -0.018773755058646202,
 -0.006093236152082682,
 0.0021291037555783987,
 8.028250158531591e-05,
 0.0284947007894516,
 0.001866788836196065,
 -0.019874772056937218,
 0.03178834170103073,
 -0.015094286762177944,
 -0.027421915903687477,
 -0.019987696781754494,
 -0.0071566118858754635,
 -0.002731369575485587,
 -0.10502010583877563,
 -0.001976184779778123,
 0.001282167504541576,
 -0.0002726076345425099,
 -0.018820807337

### 2.4  Store and retrieve the embeddings in ChromaDB

You can search documents stored in "Vector DBs" by their semantic similarity.  Vector DBs uses an algorithm called "KNN (k-nearest neighbors)" to find documents whose embedding is the closest to the query. 

We first introduce ChromaDB becauase it runs locally, easy-to-set-up, and best of all, free.

In [93]:
# compute embeddings and save the embeddings into ChromaDB
from langchain_chroma import Chroma

chroma_dir = "/scratch1/chroma_db"
docsearch_chroma = Chroma(
    embedding_function=baai_embedding,
    persist_directory=chroma_dir,
    collection_name="harry-potter",
)
docsearch_chroma.reset_collection()
docsearch_chroma.add_documents(texts)
# for t in texts:
#     docsearch_chroma.add_documents([t])

['f09b441a-aa4e-46ca-96e2-482f7034dae3',
 '2d690e65-87a2-490c-a59c-c97212d74484',
 '6446a4f6-0293-4274-8375-06ecc6c65048',
 '3b1512cb-b459-45c9-960d-a871f310eba7',
 'c72a5fb8-d3f2-47fd-9ce2-69e429bbba6a',
 '0acba7fa-ae1c-4f14-bd07-fa1d90ce4f81',
 '624915d9-0c9f-419d-b111-7a6a7cebeb9a',
 'fc2c2cd7-ae7d-48ee-a8e3-fb1f845e38f9',
 'ae413a15-3f36-4b45-907f-0ee7d7d106eb',
 'db7d784b-855a-4b12-9ec9-d7ecc044521f',
 '0e61f79d-9e4b-472e-912e-6626f2bf2876',
 'a69c225f-4af9-41ec-bec3-a8e0d0d0a719',
 '9b4b6ce1-bb93-42fb-bc4b-dec9baa7ad44',
 'b755b5a9-1486-433e-b6bd-c2c2c9b5988e',
 'ae38a1ec-dfcd-4fa8-837e-5b5b35eef4b3',
 'e65af0a3-c1f8-40e5-aa33-6536866d1a3d',
 '16dbcd5a-98b2-41de-b9e4-26f3da465a25',
 'b3417613-0bb9-4856-be27-77461eea9fbf',
 '5eab5baf-065b-4480-b1d8-bec2b961abf2',
 '344c215a-7e0f-4156-b9c6-e5bb0f83834d',
 '2cdbaa89-682a-43b7-a652-aee0334235d9',
 'bde73edc-fd26-41b0-a2d4-c9471d76cbf8',
 'dff1bf1b-2296-460a-b27f-e086f16bc6de',
 '51b65bfc-f365-4107-8606-f1da8eb7dc7a',
 '747a2efa-c990-

In [94]:
# questions from https://en.wikibooks.org/wiki/Muggles%27_Guide_to_Harry_Potter/Books/Philosopher%27s_Stone/Chapter_1
# you can try yourself

# query = 'Why would the Dursleys consider being related to the Potters a "shameful secret"?'
# query = 'Who are the robed people Mr. Dursley sees in the streets?'
# query = 'What might a "Muggle" be?'
query = '''Who might "You-Know-Who" be? Why isn't this person referred to by a given name?'''

In [95]:
## A utiity function ...
def print_search_results(docs):
    print(f"search returned %d results. " % len(docs))
    for doc in docs:
        print(doc.page_content)
        print("=============")


In [96]:
# semantic similarity search

docs = docsearch_chroma.similarity_search(query)
print_search_results(docs)

search returned 4 results. 
"No, thank you," said Professor McGonagall coldly, as though she didn't  
think this was the moment for lemon drops. "As I say, even if  
You-Know-Who has gone -" 
 
"My dear Professor, surely a sensible person like yourself can call him  
by his name? All this 'You- Know-Who' nonsense -- for eleven years I 
have been trying to persuade people to call him by his proper name:  
Voldemort." Professor McGonagall flinched, but Dumbledore, who was  
unsticking two lemon drops, seemed not to notice. "It all gets so  
confusing if we keep saying 'You -Know-Who.' I have never seen any 
reason 
to be frightened of saying Voldemort's name.  
 
"I know you haven 't, said Professor McGonagall, sounding half  
exasperated, half admiring. "But you're different. Everyone knows you're
half-moon glasses. "It would be enough to turn any boy's head. Famous  
before he can walk and talk! Famous for something he won't even  
remember! CarA you see how much better off he'll be, g

#### Saving and Loading your ChromaDB

In [97]:
# reload from disk
docsearch_chroma_reloaded = Chroma(persist_directory = chroma_dir,
                                   collection_name = 'harry-potter', 
                                   embedding_function = baai_embedding)

In [98]:
# you can test with the previous or another query

query = 'Who are the robed people Mr. Dursley sees in the streets?'
docs = docsearch_chroma_reloaded.similarity_search(query, k=6)
print_search_results(docs)

search returned 6 results. 
Drive, but there wasn't a map in sight. What could he have been thinking  
of? It must have been a trick of the light. Mr. Dursley blinked and  
stared at the cat. It stared back. As Mr. Dursley drove around the  
corner and up the road, he watched the cat in his mirror. It was now  
reading the sign that said Privet Drive -- no, looking at the sign; cats 
couldn't read maps or signs. Mr. Dursley gave himself a little shake and  
put the cat out of his mind. As he drove toward town he thought of  
nothing except a large order of drills he was hoping to get that day.  
 
But on the edge of town, drills were driven out of his mind by something  
else. As he sat in the usual morning tr affic jam, he couldn't help 
noticing that there seemed to be a lot of strangely dressed people
As he had expected, Mrs. Dursley looked shocked and angry. After all,  
they normally pretended she didn't have a sister.  
 
"No," she said sharply. "Why?" 
 
"Funny stuff on the news

In [100]:
#### Your TASK ####
# With the chosen PDF loaders, test different splitters and chunk size until you feel that the chucking makes sense. 
# You can also try different embeddings
# Then embed the entire book 1 into ChormaDB
from langchain.document_loaders import UnstructuredPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma

In [101]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1400, chunk_overlap=100)
book_texts = text_splitter.split_documents(hpbook_data)
chroma_dir = "/scratch1/chroma_db"
docsearch_chroma = Chroma(
    embedding_function=baai_embedding,
    persist_directory=chroma_dir,
    collection_name="harry-potter-book1",
)
docsearch_chroma.reset_collection()
for i in range(0, len(book_texts), 64):
    docsearch_chroma.add_documents(book_texts[i:i+64])


In [102]:
# test the search
query = 'Who is You Know Who?'
docs = docsearch_chroma.similarity_search(query, k=6)
print_search_results(docs)

search returned 6 results. 
The train slowed right down and finally stopped. People pushed their way toward the door and out on to a tiny, dark platform. Harry shivered in the cold night air. Then a lamp came bobbing over the heads of the students, and Harry heard a familiar voice: “Firs’ years! Firs’ years over here! All right there, Harry?”

Hagrid’s big hairy face beamed over the sea of heads.

“C’mon, follow me — any more firs’ years? Mind yer step, now! Firs’ years follow me!”

Slipping and stumbling, they followed Hagrid down what seemed to be a steep, narrow path. It was so dark on either side of them that Harry thought there must be thick trees there. Nobody spoke much. Neville, the boy who kept losing his toad, sniffed once or twice.

“Yeh’ll get yer firs’ sight o’ Hogwarts in a sec,” Hagrid called over his shoulder, “jus’ round this bend here.”

There was a loud “Oooooh!”

The narrow path had opened suddenly onto the edge of a great black lake. Perched atop a high mountain on

### 2.5 Query those docs with a QA chain

In [103]:
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(temperature=0, model=CHAT_MODEL)
chain = RetrievalQA.from_chain_type(
    llm, 
    chain_type="stuff", 
    verbose=True, 
    retriever=docsearch_chroma_reloaded.as_retriever(k=5)
)

In [104]:
# query = "How did Harry's parents die?"
query = "What is the cat on Privet Drive?"
docs = docsearch_chroma_reloaded.similarity_search(query)
print_search_results(docs)

search returned 4 results. 
twelve balls of light sped back to their street lamps so that Privet  
Drive glowed suddenly orange and he could make out a tabby cat slinking  
around the corner at the other end of the street. He could just see the  
bundle of blankets on the step of number four.  
 
"Good luck, Harry," he murmured. He turned on his heel and with a swish  
of his cloak, he was gone.
Drive, but there wasn't a map in sight. What could he have been thinking  
of? It must have been a trick of the light. Mr. Dursley blinked and  
stared at the cat. It stared back. As Mr. Dursley drove around the  
corner and up the road, he watched the cat in his mirror. It was now  
reading the sign that said Privet Drive -- no, looking at the sign; cats 
couldn't read maps or signs. Mr. Dursley gave himself a little shake and  
put the cat out of his mind. As he drove toward town he thought of  
nothing except a large order of drills he was hoping to get that day.  
 
But on the edge of town,

In [105]:
chain.invoke(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


{'query': 'What is the cat on Privet Drive?',
 'result': 'The cat on Privet Drive is later revealed to be **Minerva McGonagall**, a witch and professor at Hogwarts School of Witchcraft and Wizardry, who was observing the Dursleys in her Animagus form (a tabby cat with distinctive markings around her eyes). \n\nIn the passage, the cat behaves unusually—staring at signs and giving Mr. Dursley a "stern look"—hinting that it is no ordinary cat. This is confirmed later in *Harry Potter and the Philosopher\'s Stone* when McGonagall transforms back into her human form.'}

In [106]:
#### Your Task ####
# Rebuild the chain from the whole book ChromaDB.  Test with one of the following questions (of your choice).
#query = 'Why does Dumbledore believe the celebrations may be premature?'
#query = 'Why is Harry left with the Dursleys rather than a Wizard family?'
#query = 'Why does McGonagall seem concerned about Harry being raised by the Dursleys?'

HP_Book1 = Chroma(persist_directory = chroma_dir,
                                   collection_name = 'harry-potter-book1', 
                                   embedding_function = baai_embedding)
HP_pro = RetrievalQA.from_chain_type(
    llm, 
    chain_type="stuff", 
    verbose=True, 
    retriever=HP_Book1.as_retriever(k=5)
)

query = 'Why is Harry left with the Dursleys rather than a Wizard family?'
HP_pro.invoke(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


{'query': 'Why is Harry left with the Dursleys rather than a Wizard family?',
 'result': 'Harry Potter is left with the Dursleys, his non-magical (Muggle) relatives, for his safety. After the death of his parents, James and Lily Potter, at the hands of Lord Voldemort, Albus Dumbledore decides that the best place for Harry is with his aunt and uncle. \n\nThe primary reason is the protection offered by Lily Potter\'s sacrifice. When Lily died to save Harry, her act of love created a powerful magical protection in Harry\'s blood. This protection is strongest when Harry is with a blood relative—in this case, Petunia Dursley, Lily\'s sister. By living with the Dursleys, Harry is shielded from Voldemort and his followers until he comes of age or the charm breaks. \n\nAdditionally, Dumbledore believes that keeping Harry away from the wizarding world initially will help him grow up without the burden of his fame as "The Boy Who Lived." However, the Dursleys are neglectful and abusive, making H

In [107]:
#### Your Task ####
# Using langchain documentation, find out about the map reduce QA chain.  
# answer the following questions using the chain
#chain = load_qa_chain(llm, chain_type="map_reduce")
# answer one of the following questions of your choice. 
# query = What happened in the Forbidden Forest during the first year of Harry Potter at Hogwarts?
# query = Tell me about Harry Potter and Quidditch during the first year
from langchain.chains.question_answering import load_qa_chain

HP_reduce = load_qa_chain(llm, chain_type="map_reduce")
query = "What happened in the Forbidden Forest during the first year of Harry Potter at Hogwarts?"
HP_reduce.invoke({"input_documents": HP_Book1.similarity_search(query, k=6), "question": query})

stuff: https://python.langchain.com/docs/versions/migrating_chains/stuff_docs_chain
map_reduce: https://python.langchain.com/docs/versions/migrating_chains/map_reduce_chain
refine: https://python.langchain.com/docs/versions/migrating_chains/refine_chain
map_rerank: https://python.langchain.com/docs/versions/migrating_chains/map_rerank_docs_chain

See also guides on retrieval and question-answering here: https://python.langchain.com/docs/how_to/#qa-with-rag
  HP_reduce = load_qa_chain(llm, chain_type="map_reduce")


{'input_documents': [Document(id='84d38b5e-b57f-4588-a559-18ca1c7e585e', metadata={'source': '/ssdshare/share/lab4/hp-book1.pdf'}, page_content='It was as though an iron fist had clenched suddenly around Harry’s heart. Over the rustling of the trees, he seemed to hear once more what Hagrid had told him on the night they had met: “Some say he died. Codswallop, in my opinion. Dunno if he had enough human left in him to die.”\n\n“Do you mean,” Harry croaked, “that was Vol —”\n\n“Harry! Harry, are you all right?”\n\nHermione was running toward them down the path, Hagrid puffing along behind her.\n\n“Tm fine,” said Harry, hardly knowing what he was saying. “The unicorns dead, Hagrid, it’s in that clearing back there.”\n\n“This is where I leave you,” Firenze murmured as Hagrid hur- ried off to examine the unicorn. “You are safe now.”\n\nHarry slid off his back.\n\n“Good luck, Harry Potter,” said Firenze. “The planets have been read wrongly before now, even by centaurs. I hope this is one of 

### 2.6 (Optional) Use DSPy with ChromaDB

In [165]:
import dspy
from dspy.retrieve.chromadb_rm import ChromadbRM

lm = dspy.LM(
    "openai/llama-3.3-70b-instruct",
    api_base=os.environ["OPENAI_BASE_URL"],
    api_key=os.environ["OPENAI_API_KEY"]
)

# pinecone retriever has some issues with the current version of dspy so we will use chroma retriever
chroma_retrieve = ChromadbRM(
    collection_name="harry-potter",
    persist_directory="/scratch1/chroma_db",
    embedding_function=baai_embedding.embed_documents,
    k=5
)

dspy.settings.configure(
    lm=lm,
    rm=chroma_retrieve
)

In [166]:
# Defining a class named GenerateAnswer which inherits from dspy.Signature
class GenerateAnswer(dspy.Signature):
    """Think and Answer questions based on the context provided."""

    # Defining input fields with descriptions
    context = dspy.InputField(desc="May contain relevant facts about user query")
    question = dspy.InputField(desc="User query")
    
    # Defining output field with description
    answer = dspy.OutputField(desc="Answer in one or two lines")


# Define a class named RAG inheriting from dspy.Module
class RAG(dspy.Module):
    # Initialize the RAG class
    def __init__(self):
        # Call the superclass's constructor
        super().__init__()

        # Initialize the retrieve module
        self.retrieve = dspy.Retrieve()
        
        # Initialize the generate_answer module using ChainOfThought with GenerateAnswer
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    # Define the forward method
    def forward(self, question):
        # Retrieve relevant context passages based on the input question
        context = self.retrieve(question).passages
        
        # Generate an answer based on the retrieved context and the input question
        prediction = self.generate_answer(context=context, question=question)
        
        # Return the prediction as a dspy.Prediction object containing context and answer
        return dspy.Prediction(context=context, answer=prediction.answer)

In [167]:
# Create a RAG (Retrieval-Augmented Generation) object
RAG_obj = RAG()
query = "Who are the robed people Mr. Dursley sees in the streets?"
# Get the prediction from the RAG model for the given question.
# This prediction includes both the context and the answer.
predict_response = RAG_obj(query)

# Print the question, predicted answer, and truncated retrieved contexts.
print(f"Question: {query}")
print(f"\n\nPredicted Answer: {predict_response.answer}")
print(f"\n\nRetrieved Contexts (truncated): {[c[:200] + '...' for c in predict_response.context]}")

Question: Who are the robed people Mr. Dursley sees in the streets?


Predicted Answer: The robed people are likely wizards or witches, associated with the magical world.


Retrieved Contexts (truncated): ["Drive, but there wasn't a map in sight. What could he have been thinking  \nof? It must have been a trick of the light. Mr. Dursley blinked and  \nstared at the cat. It stared back. As Mr. Dursley drove...", 'As he had expected, Mrs. Dursley looked shocked and angry. After all,  \nthey normally pretended she didn\'t have a sister.  \n \n"No," she said sharply. "Why?" \n \n"Funny stuff on the news," Mr. Dursley m...', "and it didn't improve his mood -- was the tabby cat he'd spotted that  \nmorning. It was now sitting on his garden wall. He was sure it was the  \nsame one; it had the same markings around  its eyes. \n ..."]


Improve the DSPy RAG class, maybe add more hops?

In [168]:
from dspy.dsp.utils import deduplicate

# Define a class named GenerateSearchQuery which inherits from dspy.Signature
class GenerateSearchQuery(dspy.Signature):
    """Write a better search query that will help answer a complex question."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    query = dspy.OutputField()

class MultiHopRAG(dspy.Module):
    def __init__(self, max_hops=3):
        super().__init__()

        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        self.retrieve = dspy.Retrieve()
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
        self.max_hops = max_hops

    def forward(self, question):
        context = []

        for hop in range(self.max_hops):
            query = self.generate_query[hop](context=context, question=question).query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        pred = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)

In [169]:
RAG_obj = MultiHopRAG()

# Get the prediction from the RAG model for the given question.
# This prediction includes both the context and the answer.
predict_response = RAG_obj(query)

# Print the question, predicted answer, and truncated retrieved contexts.
print(f"\n\nPredicted Answer: {predict_response.answer}")



Predicted Answer: The robed people Mr. Dursley sees are likely wizards or witches, possibly gathered for an event related to Harry Potter.


In [170]:
dspy.inspect_history(10)





[34m[2025-03-23T21:50:48.213401][0m

[31mSystem message:[0m

Your input fields are:
1. `context` (str): May contain relevant facts about user query
2. `question` (str): User query

Your output fields are:
1. `reasoning` (str)
2. `answer` (str): Answer in one or two lines

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## context ## ]]
{context}

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Think and Answer questions based on the context provided.


[31mUser message:[0m

[[ ## context ## ]]
[1] «««
    Drive, but there wasn't a map in sight. What could he have been thinking  
    of? It must have been a trick of the light. Mr. Dursley blinked and  
    stared at the cat. It stared back. As Mr. Dursley drove around the  
    corner and up the road, he watched the cat in his mirror. It was now 

### 2.7 (Optional) Using Pinecone, an online vector DB 

You have many reasons to store your DB online in a SaaS / PaaS service.  For example, 
- you want to scale the queries to many concurrent users
- you want more data reliability without having to worry about DB management
- you want to share the DB but without owning any servers

If you want to store your embeddings online, try pinecone with the code below. You must go to [Pinecone.io](https://www.pinecone.io/) and set up an account. Then you need to generate an api-key and create an "index", this can be done by navigating through the homepage once you've logged in to Pinecone, 

In [171]:
# You might need the following code to access OpenAI API or SerpAPI.
os.environ['HTTP_PROXY']="http://Clash:QOAF8Rmd@10.1.0.213:7890"
os.environ['HTTPS_PROXY']="http://Clash:QOAF8Rmd@10.1.0.213:7890"
os.environ['ALL_PROXY']="socks5://Clash:QOAF8Rmd@10.1.0.213:7893"

In [172]:
import pinecone
from langchain_pinecone import PineconeVectorStore
from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone, ServerlessSpec

PINECONE_API_KEY = os.environ['PINECONE_API_KEY']
PINECONE_INDEX_NAME = os.environ['PINECONE_INDEX_NAME']

In [173]:
index_name = PINECONE_INDEX_NAME

pc = Pinecone(api_key=PINECONE_API_KEY)
existing_indexes = [index_info["name"] for index_info in pc.list_indexes()]
if index_name in existing_indexes:
    pc.delete_index(index_name)

pc.create_index(
    name=index_name,
    dimension=1024,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
    
docsearch_pinecone = PineconeVectorStore.from_texts(
    [t.page_content for t in texts], baai_embedding, index_name=index_name, namespace="harry-potter"
)

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0, model=CHAT_MODEL)
query = '''Who might "You-Know-Who" be? Why isn't this person referred to by a given name?'''

print_search_results(docsearch_pinecone.similarity_search(query))
chain = RetrievalQA.from_chain_type(
    llm, chain_type="stuff", verbose=True, retriever=docsearch_pinecone.as_retriever(k=5)
)
chain.invoke(query)

# we can use the full-book to test 'map-reduce', try it !


search returned 4 results. 
"No, thank you," said Professor McGonagall coldly, as though she didn't  
think this was the moment for lemon drops. "As I say, even if  
You-Know-Who has gone -" 
 
"My dear Professor, surely a sensible person like yourself can call him  
by his name? All this 'You- Know-Who' nonsense -- for eleven years I 
have been trying to persuade people to call him by his proper name:  
Voldemort." Professor McGonagall flinched, but Dumbledore, who was  
unsticking two lemon drops, seemed not to notice. "It all gets so  
confusing if we keep saying 'You -Know-Who.' I have never seen any 
reason 
to be frightened of saying Voldemort's name.  
 
"I know you haven 't, said Professor McGonagall, sounding half  
exasperated, half admiring. "But you're different. Everyone knows you're
half-moon glasses. "It would be enough to turn any boy's head. Famous  
before he can walk and talk! Famous for something he won't even  
remember! CarA you see how much better off he'll be, g

{'query': 'Who might "You-Know-Who" be? Why isn\'t this person referred to by a given name?',
 'result': '"You-Know-Who" refers to the dark wizard Voldemort. The reason people avoid using his given name is due to fear and superstition. Many in the wizarding world are terrified of him and believe that saying his name might attract his attention or bring bad luck. This fear is so ingrained that even after his apparent downfall, people continue to use euphemisms like "You-Know-Who" instead of saying "Voldemort." Dumbledore, however, is an exception; he encourages people to use Voldemort\'s name, as he believes that fear of a name only increases fear of the thing itself.'}

In [176]:
# query with pinecone
docs = docsearch_pinecone.similarity_search(query)
print_search_results(docs)

search returned 0 results. 


In [177]:
#### Your Task ####
# modify the QA chain in Section 2.5 (Chapter 1 only) to use pinecone instead of ChromaDB

### 2.7 (Optional) Use multiple vector stores in Agent

In this section, we are going to create a simple QA agent that can decide by itself which of the two vectorstores it should switch to for questions of differnent fields.

#### Preparing the tools for the agent.

We will use our chroma_based Harry Potter vectorDB, and let's create another one containing President Biden's State of the Union speech. 

In [178]:
from langchain.document_loaders import TextLoader

documents = TextLoader('/ssdshare/share/lab4/state_of_the_union.txt').load()
texts = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0).split_documents(documents)
docsearch3 = Chroma.from_documents(texts, 
                                   baai_embedding, 
                                   collection_name="state-of-union", 
                                   persist_directory="/scratch1/chroma_db")
print(texts[:2])

[Document(metadata={'source': '/ssdshare/share/lab4/state_of_the_union.txt'}, page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their co

To allow the agent query these databases, we need to define two RetrievalQA chains.

In [179]:
from langchain.chains import RetrievalQA

llm = ChatOpenAI(temperature=0, model=CHAT_MODEL)

harry_potter = RetrievalQA.from_chain_type(llm=llm, 
                                           chain_type="stuff", 
                                           retriever=docsearch_chroma_reloaded.as_retriever(
                                                  search_kwargs={"k": 8}
                                           ))
state_of_union = RetrievalQA.from_chain_type(llm=llm, 
                                             chain_type="stuff", 
                                             retriever=docsearch3.as_retriever(
                                                    search_kwargs={"k": 8}
                                             ))

In [180]:
# Now try both chains

print_with_type(harry_potter.invoke('Why does McGonagall seem concerned about Harry being raised by the Dursleys?'))
print_with_type(state_of_union.invoke("What did the president say about justice Breyer?"))

"<class 'dict'>:"
{'query': 'Why does McGonagall seem concerned about Harry being raised by the '
          'Dursleys?',
 'result': 'Professor McGonagall is concerned about Harry being raised by the '
           'Dursleys because she has observed their behavior and knows they '
           'are not kind or nurturing people. She mentions that they are "the '
           'worst sort of Muggles imaginable" and describes them as "selfish" '
           'and "cold-hearted." McGonagall worries that Harry, who has just '
           'lost his parents and is already a vulnerable child, will not '
           'receive the love and care he needs in such an environment. Her '
           "concern stems from her understanding of the Dursleys' character "
           'and her desire for Harry to grow up in a supportive and loving '
           'home.'}
"<class 'dict'>:"
{'query': 'What did the president say about justice Breyer?',
 'result': 'The President honored Justice Stephen Breyer, describing him as 

In [181]:
from langchain.agents import AgentType, Tool

# define tools
tools = [
    Tool(
        name="State of Union QA System",
        func=state_of_union.run,
        description="useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.",
    ),
    Tool(
        name="Harry Potter QA System",
        func=harry_potter.run,
        description="useful for when you need to answer questions about Harry Potter. Input should be a fully formed question.",
    ),
]

Now we can create the Agent giving both chains as tools. 

In [182]:
from langchain import hub
from langchain.agents import create_react_agent, AgentExecutor
from langchain.memory import ConversationBufferMemory

prompt = hub.pull("hwchase17/react")

llm = ChatOpenAI(
    model=CHAT_MODEL,
)
agent = create_react_agent(
    llm,
    tools,
    prompt=prompt,
)
agent_executor = AgentExecutor.from_agent_and_tools(
    agent=agent, tools=tools, verbose=True, memory=ConversationBufferMemory(memory_key="chat_history", return_messages=True)
)

In [183]:
# If you find the agent is stuck, you can try other more powerful model, like DeepSeek
agent_executor.invoke(
    {
        "input": "What did the president say about justice Breyer?",
    }
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to find out what the president said about Justice Breyer in the most recent State of the Union address.  
Action: State of Union QA System  
Action Input: What did the president say about Justice Breyer?  
[0m[36;1m[1;3mThe President honored Justice Stephen Breyer, describing him as "an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court." He thanked Justice Breyer for his dedicated service to the country. Additionally, the President mentioned that one of his most serious constitutional responsibilities is nominating someone to serve on the Supreme Court, and he highlighted his recent nomination of Judge Ketanji Brown Jackson to continue Justice Breyer’s legacy of excellence.[0m[32;1m[1;3mI now know the final answer.  
Final Answer: The President honored Justice Stephen Breyer, describing him as "an Army veteran, Constitutional scholar, and retiring Justice of the United 

{'input': 'What did the president say about justice Breyer?',
 'chat_history': [HumanMessage(content='What did the president say about justice Breyer?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The President honored Justice Stephen Breyer, describing him as "an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court." He thanked Justice Breyer for his dedicated service and mentioned his nomination of Judge Ketanji Brown Jackson to continue Justice Breyer’s legacy of excellence.', additional_kwargs={}, response_metadata={})],
 'output': 'The President honored Justice Stephen Breyer, describing him as "an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court." He thanked Justice Breyer for his dedicated service and mentioned his nomination of Judge Ketanji Brown Jackson to continue Justice Breyer’s legacy of excellence.'}

In [184]:
agent_executor.invoke(
    {
        "input": "Why does McGonagall seem concerned about Harry being raised by the Dursleys?"
    }
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to answer a question about Harry Potter, specifically about Professor McGonagall's concerns regarding Harry being raised by the Dursleys. I should use the Harry Potter QA System for this.

Action: Harry Potter QA System  
Action Input: Why does McGonagall seem concerned about Harry being raised by the Dursleys?  
[0m[33;1m[1;3mProfessor McGonagall seems concerned about Harry being raised by the Dursleys because she has observed their behavior and finds them to be neglectful and unkind. She mentions that they are "the worst sort of Muggles" and describes them as "selfish" and "mean-spirited." She worries that Harry will not receive the love and care he deserves in such an environment. Her concern is rooted in her understanding of the Dursleys' character and her desire for Harry to have a nurturing upbringing, especially after the tragic loss of his parents.[0m[32;1m[1;3mI now know the final answer.

Final Answer: 

{'input': 'Why does McGonagall seem concerned about Harry being raised by the Dursleys?',
 'chat_history': [HumanMessage(content='What did the president say about justice Breyer?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The President honored Justice Stephen Breyer, describing him as "an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court." He thanked Justice Breyer for his dedicated service and mentioned his nomination of Judge Ketanji Brown Jackson to continue Justice Breyer’s legacy of excellence.', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Why does McGonagall seem concerned about Harry being raised by the Dursleys?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Professor McGonagall is concerned about Harry being raised by the Dursleys because she has observed their behavior and finds them to be neglectful, unkind, selfish, and mean-spirited. She worries that Harry wil

We can see that the agent can "smartly" choose which QA system to use given a specific question. 

## 3 Your Task: putting it all together: Langchain with Memory

In [None]:
#### Your Task ####
# This is a major task that requires some thinking and time. 
# Build a conversation system from a collection of research papers of your choice. 
# You can ask specific questions of a method about these papers, and the agent returns a brief answer to you (with no more than 100 words). 
# Save your data and ChromaDB in the /ssdshare/llm-course/<YOUR-NAME> directory so other people can use it. 
# Provide at least three query examples so the TAs can review your work. 
# You may use any tool from the past four labs or from the langchain docs, or any open source project. 
# write a summary (a Markdown cell) at the end of the notebook summarizing what works and what does not. 

In [7]:
import os

def get_pdf_files(path):
    pdf_files = []
    for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith('.pdf'):
                pdf_files.append(os.path.join(root, file))
    return pdf_files

In [10]:
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.document_loaders import UnstructuredPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter


In [None]:

pdf_list = get_pdf_files('/ssdshare/llm-course/liukehan')
data_list = []

for pdf in pdf_list:
    data = UnstructuredPDFLoader(pdf, mode='single', strategy='ocr_only').load()
    data_list.append(data)

In [7]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
text_list = []

for data in data_list:
    texts = text_splitter.split_documents(data)
    text_list.extend(texts)
    print (f'current size: {len(text_list)}')

current size: 94
current size: 162
current size: 236
current size: 280
current size: 334
current size: 443
current size: 490
current size: 548
current size: 614
current size: 661
current size: 794


In [8]:
for text in text_list:
    print(text.page_content)
    print("=========")

arXiv: 1709.01507v4 [cs.CV] 16 May 2019

Squeeze-and-Excitation Networks

Jie Hyl0000—0002—-5150—1003] Gang Sun|0000-0001 6913-6799]

Li Shen!0000—-0002-2283—-4976] — Samuel Albanie!9000—0001-9736—5134] Enhua Wul0000—0002—2174— 1428]
Abstract—The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We

In [14]:
baai_embedding = OpenAIEmbeddings(
    model="BAAI/bge-m3",
    base_url=os.environ.get("SF_BASE_URL"),
    api_key=os.environ.get("SF_API_KEY"),
)
chroma_dir = "/ssdshare/llm-course/liukehan/chroma"
docsearch_chroma = Chroma(
    embedding_function=baai_embedding,
    persist_directory=chroma_dir,
    collection_name="research_papers",
)

In [None]:
docsearch_chroma.reset_collection()
for i in range(0, len(text_list), 64):
    docsearch_chroma.add_documents(text_list[i:i+64])

In [15]:
from langchain.chains import RetrievalQA
from langchain.agents import AgentType, Tool, tool
from langchain_community.utilities import SerpAPIWrapper
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0, model=CHAT_MODEL)

database = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=docsearch_chroma.as_retriever(search_kwargs={"k": 8}
    ),
)

search = SerpAPIWrapper()

@tool
def word_count(text: str) -> int:
    """Count the number of words in a string."""
    return len(text.split())

tools = [
    Tool(
        name="Research Paper QA System",
        func=database.run,
        description="useful for when you need to answer questions about research papers on Deep Learning. Input should be a single fully formed question.",
    ),
    Tool(
        name="Search",
        func=search.run,
        description="useful when you are sure that the question is not answered in the research paper. Input should be keywords."
    ),
    word_count,
]

In [16]:
from langchain import hub
from langchain.agents import create_react_agent, AgentExecutor
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate

react_prompt = '''
Answer the following questions as best you can. You have access to the following tools:

{tools}

The Research Paper QA system is more reliable than the Search tool, but may cannot give the answer since it is an embedding database. Try query the Research Paper QA system for at least three times, changing the question each time before using the Search tool.

Use the Word Count tool before giving the final answer to make sure the answer is less than 100 words. 

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question, it should be brief, no more than 100 words.

Begin!

Question: {input}
Thought:{agent_scratchpad}
'''

prompt = PromptTemplate(
    input_variables=["input", "tools", "tool_names", "agent_scratchpad"],
    template=react_prompt,
)

llm = ChatOpenAI(
    model=CHAT_MODEL,
)

agent = create_react_agent(
    llm,
    tools,
    prompt=prompt,
)
agent_executor = AgentExecutor.from_agent_and_tools(
    agent=agent, tools=tools, verbose=True, memory=ConversationBufferMemory(memory_key="chat_history", return_messages=True)
)

  agent=agent, tools=tools, verbose=True, memory=ConversationBufferMemory(memory_key="chat_history", return_messages=True)


In [18]:
agent_executor.invoke({
    "input": "What is DCGAN, what's special about its architecture?",
})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To answer this question, I should first try the Research Paper QA System to see if it can provide information about DCGAN and its unique architecture. I'll start by asking a straightforward question.

Action: Research Paper QA System  
Action Input: What is DCGAN, and what makes its architecture special?  

[0m[36;1m[1;3mDCGAN (Deep Convolutional Generative Adversarial Network) is a type of generative adversarial network (GAN) that uses convolutional neural networks (CNNs) for both the generator and discriminator. It was introduced in 2016 by Radford et al. in the paper *"Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks."*  

### **What Makes DCGAN Special?**  
DCGAN's architecture introduced several key improvements over earlier GANs, making training more stable and enabling higher-quality image generation. Here are its defining features:  

1. **Replaces Fully Connecte

{'input': "What is DCGAN, what's special about its architecture?",
 'chat_history': [HumanMessage(content="What is DCGAN, what's special about its architecture?", additional_kwargs={}, response_metadata={}),
  AIMessage(content='DCGAN (Deep Convolutional Generative Adversarial Network) is a GAN variant using CNNs for both generator and discriminator. Key features include transposed convolutions in the generator, strided convolutions in the discriminator, batch normalization, LeakyReLU (discriminator) and ReLU (generator), no max pooling, and Adam optimizer. It improved training stability and image quality, enabling high-resolution generation and meaningful latent space manipulation.', additional_kwargs={}, response_metadata={})],
 'output': 'DCGAN (Deep Convolutional Generative Adversarial Network) is a GAN variant using CNNs for both generator and discriminator. Key features include transposed convolutions in the generator, strided convolutions in the discriminator, batch normalizatio

In [14]:
agent_executor.invoke({
    "input": "What is the inverted residual with linear bottleneck structure in MobileNetV2? What improvements are made in MobileNetV3?"
})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I should first try to understand the inverted residual with linear bottleneck structure in MobileNetV2 by querying the Research Paper QA System.  

Action: Research Paper QA System  
Action Input: What is the inverted residual with linear bottleneck structure in MobileNetV2?  

[0m[36;1m[1;3mThe inverted residual with linear bottleneck structure is a key architectural innovation in MobileNetV2. Here's how it works:  

1. **Expansion Phase**: The block first expands the input tensor's channel dimension using a 1x1 convolution (pointwise convolution), increasing its depth (e.g., by a factor of 6). This helps capture richer features.  

2. **Depthwise Convolution**: A lightweight 3x3 depthwise convolution is applied to the expanded tensor, processing spatial information while keeping computational costs low.  

3. **Linear Bottleneck**: Instead of applying a non-linearity (like ReLU6) to the output, the block uses a 

{'input': 'What is the inverted residual with linear bottleneck structure in MobileNetV2? What improvements are made in MobileNetV3?',
 'chat_history': [HumanMessage(content="What is DCGAN, what's special about its architecture?", additional_kwargs={}, response_metadata={}),
  AIMessage(content='DCGAN (Deep Convolutional Generative Adversarial Network) is a GAN variant that replaces fully connected layers with convolutional layers and uses transposed convolutions in the generator. Key innovations include batch normalization, LeakyReLU in the discriminator, and strided convolutions, improving training stability and image quality.', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='What is the inverted residual with linear bottleneck structure in MobileNetV2? What improvements are made in MobileNetV3?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='MobileNetV3 improves upon MobileNetV2 with: (1) Neural Architecture Search (NAS) and NetAdapt for opti

In [15]:
agent_executor.invoke({
    "input": "What is a Squeeze-and-Excitation block? How does it work? What is its relation to the attention mechanism?"
})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find information about Squeeze-and-Excitation (SE) blocks, their working mechanism, and their relation to attention mechanisms. Since the Research Paper QA System is more reliable, I'll start by querying it with variations of the question.

Action: Research Paper QA System  
Action Input: "What is a Squeeze-and-Excitation block in deep learning?"  
[0m[36;1m[1;3mThe provided context does not contain any information about Squeeze-and-Excitation (SE) blocks in deep learning.  

However, based on general knowledge, a **Squeeze-and-Excitation (SE) block** is a module used in convolutional neural networks (CNNs) to improve channel-wise feature recalibration. It consists of two main operations:  
1. **Squeeze** – Aggregates spatial information into a channel descriptor (typically using global average pooling).  
2. **Excitation** – Learns channel-wise dependencies using a small neural network (usually two full

{'input': 'What is a Squeeze-and-Excitation block? How does it work? What is its relation to the attention mechanism?',
 'chat_history': [HumanMessage(content="What is DCGAN, what's special about its architecture?", additional_kwargs={}, response_metadata={}),
  AIMessage(content='DCGAN (Deep Convolutional Generative Adversarial Network) is a GAN variant that replaces fully connected layers with convolutional layers and uses transposed convolutions in the generator. Key innovations include batch normalization, LeakyReLU in the discriminator, and strided convolutions, improving training stability and image quality.', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='What is the inverted residual with linear bottleneck structure in MobileNetV2? What improvements are made in MobileNetV3?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='MobileNetV3 improves upon MobileNetV2 with: (1) Neural Architecture Search (NAS) and NetAdapt for optimized layers, (

In [16]:
agent_executor.invoke({
    "input": "I want to build a neural network for classification problem. The network should be efficient and fast. Please give me some suggestions."
})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I should first consult the Research Paper QA System to find suggestions for efficient and fast neural networks for classification problems.

Action: Research Paper QA System  
Action Input: What are some efficient and fast neural network architectures for classification tasks?  
[0m[36;1m[1;3mBased on the provided context, here are some efficient and fast neural network architectures suitable for classification tasks:

1. **MobileNetV3** - A lightweight architecture optimized for mobile/edge devices, with variants like MobileNetV3-Large and MobileNetV3-Small. It achieves good accuracy with low latency and computational cost (e.g., 137ms latency on COCO test set with 4.97M parameters).

2. **MobileNetV4 (MNv4)** - The latest iteration, optimized for minimal latency across various hardware (CPU, EdgeTPU, GPU). It is ~2x faster than MobileNetV3 on CPU and achieves competitive accuracy. Variants like MNv4-Conv use TuN

{'input': 'I want to build a neural network for classification problem. The network should be efficient and fast. Please give me some suggestions.',
 'chat_history': [HumanMessage(content="What is DCGAN, what's special about its architecture?", additional_kwargs={}, response_metadata={}),
  AIMessage(content='DCGAN (Deep Convolutional Generative Adversarial Network) is a GAN variant that replaces fully connected layers with convolutional layers and uses transposed convolutions in the generator. Key innovations include batch normalization, LeakyReLU in the discriminator, and strided convolutions, improving training stability and image quality.', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='What is the inverted residual with linear bottleneck structure in MobileNetV2? What improvements are made in MobileNetV3?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='MobileNetV3 improves upon MobileNetV2 with: (1) Neural Architecture Search (NAS) and Net

## Summary
I use some papers in deep learning to build the vector database. 
- At first, I use the `hi_res` strategy and the original `hwchase17/react` prompt. The agent cannot always find the answer using the database. 
- Then I change to `ocr_only` strategy and modify the prompt so that the agent will try to query the database for more times with different questions before using the search tool. The agent now can find the answer using the database now, and the answer is more comprehensive. 
- Finally, I find that the answer is too long in some cases, so I add a word count tool to the agent, and the agent will try to limit the answer to 100 words.