# Lec4. Adding Memory and Storage to LLMs

Last week, we learned the basic elements of the framework LangChain. In this lecture, we are going to construct a vector store QA application from scratch.

>Reference:
> 1. [Ask A Book Questions](https://github.com/gkamradt/langchain-tutorials/blob/main/data_generation/Ask%20A%20Book%20Questions.ipynb)
> 2. [Agent Vectorstore](https://python.langchain.com/docs/modules/agents/how_to/agent_vectorstore)

## 0. Setup


1. Get your Serpapi key, please sign up for a free account at the [Serpapi website](https://serpapi.com/); 

2. Get your Pinecone key, first regiter on the [Pinecone website](https://www.pinecone.io/), **Create API Key**.

3. Store your keys in a file named **.env** and place it in the current path or in a location that can be accessed.
    ```
    OPENAI_API_KEY='YOUR-OPENAI-API-KEY'
    OPENAI_BASE_URL='OPENAI_API_URL'
    SERPAPI_API_KEY="YOUR-SERPAPI-API-KEY"
    PINECONE_API_KEY="YOUR-PINECONE-API-KEY" ## Optional
    ```

In [6]:
# Install the requirements.  (Already installed in your image.)
#%pip install -r requirements.txt

In [7]:
from dotenv import load_dotenv
import os
load_dotenv()

CHAT_MODEL="deepseek-v3"
os.environ["OPENAI_API_KEY"]=os.environ.get("INFINI_API_KEY")  # langchain use this environment variable to find the OpenAI API key
os.environ["OPENAI_BASE_URL"]=os.environ.get("INFINI_BASE_URL") # will be used to pass the OpenAI base URL to langchain


In [8]:
# A utility function

from pprint import pprint
def print_with_type(res):
    pprint(f"%s:" % type(res))
    pprint(res)

    #pprint(f"%s : %s" % (type(res), res))

In [9]:
# create a langchain chat model

from langchain_openai import ChatOpenAI

chat = ChatOpenAI(
    model=CHAT_MODEL,
)


## 1. Adding memory to remember the context
Ref:
https://python.langchain.com/v0.2/docs/how_to/chatbots_memory/

### 1.1 Use ChatMessageHistory to store the context

In [9]:
# Here is an information of using ChatMessageHistory to store the context
# chatmessagehistory is nothing but a list of messages
# you can add user message and ai message to the list
# you can also get the history as a list of messages (this is useful if you are using this with a langchain chat model)

from langchain_community.chat_message_histories import ChatMessageHistory

chat_history = ChatMessageHistory()

chat_history.add_user_message(
    "Translate this sentence from English to French: I love programming."
)

chat_history.add_ai_message("J'adore la programmation.")

chat_history.messages

[HumanMessage(content='Translate this sentence from English to French: I love programming.', additional_kwargs={}, response_metadata={}),
 AIMessage(content="J'adore la programmation.", additional_kwargs={}, response_metadata={})]

In [11]:
# adding the chat history to a prompt

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        MessagesPlaceholder(variable_name="history"),   # add a placeholder for the chat history
    ]
)

chain = prompt | chat

# add a new question to the chat history
next_question = "translate 'enjoy your meal'"  # note that here we do not tell LLM about the language
chat_history.add_user_message(next_question)

response = chain.invoke(
    {
        "history": chat_history.messages,
    }
)

print(response.content)

The translation of "enjoy your meal" in French is:  

**"Bon appétit !"**  

This is the most common and natural way to wish someone a good meal in French.  

Other variations (depending on context):  
- **"Régalez-vous !"** (More informal, like "Enjoy!")  
- **"Savourez bien votre repas !"** (More formal/literal)  

But **"Bon appétit"** is the standard phrase. 😊


In [6]:
# remember, the chat history is only a list of messages
# you need to manually maintain it by adding user message and ai message to the list
# nothing interesting :)

chat_history.add_ai_message(response)


In [7]:
# let's continue with the history
input2 = "What did I just ask you?"
chat_history.add_user_message(input2)

response = chain.invoke(
    {
        "history": chat_history.messages,
    }
)

print(response.content)

You asked me to translate "enjoy your meal."


Nothing interesting, let's see how to manage the history automatically

### 1.2 Managing Conversation Memory automatically in a chain

In [5]:
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder
from langchain_core.messages import SystemMessage
from langchain_openai import ChatOpenAI

In [12]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are a chatbot having a conversation with a human.
            Your name is Tom Riddle.
            You need to tell your name to that human if he doesn't know.""",
        ),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{input}"),
    ]
)

chain = prompt | chat

We'll pass the latest input to the conversation here and let the RunnableWithMessageHistory class wrap our chain and do the work of appending that input variable to the chat history.

Next, let's declare our wrapped chain:

In [6]:
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables import ConfigurableFieldSpec

# Here we use a global variable to store the chat message history.
# This will make it easier to inspect it to see the underlying results.
store = {}

def get_session_history(
    user_id: str
) -> BaseChatMessageHistory:
    if (user_id) not in store:
        store[(user_id)] = ChatMessageHistory()
    return store[(user_id)]

In [13]:
from langchain_core.runnables import RunnableWithMessageHistory
chain_with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history=get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    history_factory_config=[  # parameter for the get_session_history function
        ConfigurableFieldSpec(
            id="user_id",
            annotation=str,
            name="User ID",
            description="Unique identifier for the user.",
            default="",
            is_shared=True,
        ),
    ],    
)

In [12]:
response = chain_with_message_history.invoke(
    {"input": "Hi there, this is Harry Potter, I just got two good friends at Hogwarts, Ron Weasley and Hermione Granger.",
     "history": chat_history.messages},  # Pass history here
    config={"configurable": {"user_id": "123"}},  # argument for the get_session_history function
)

print(response.content)

Hello, Harry Potter. It seems you've already made some notable connections. Ron Weasley and Hermione Granger—interesting choices. Friendship can be a powerful tool, but remember, not all alliances are as they seem. Keep your wits about you. I am Tom Riddle, by the way.


In [13]:
# get a list of messages in the memory 
store["123"].messages

[HumanMessage(content='Hi there, this is Harry Potter, I just got two good friends at Hogwarts, Ron Weasley and Hermione Granger.', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Hello, Harry Potter. It seems you've already made some notable connections. Ron Weasley and Hermione Granger—interesting choices. Friendship can be a powerful tool, but remember, not all alliances are as they seem. Keep your wits about you. I am Tom Riddle, by the way.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 63, 'prompt_tokens': 111, 'total_tokens': 174, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'deepseek-v3', 'system_fingerprint': None, 'id': 'da77317dec8e49b982e97ac2d5375cfd', 'finish_reason': 'stop', 'logprobs': None}, id='run-899bb917-335b-45b0-a4d6-f9fee23df70b-0', usage_metadata={'input_tokens': 111, 'output_tokens': 63, 'total_tokens': 174, 'input_token_details': {}, 'output_token_details': {}

In [14]:
response = chain_with_message_history.invoke(
    {"input": "What are my best friends' names?",
     "history": chat_history.messages},  # Pass history here
    config={"configurable": {"user_id": "123"}},  # argument for the get_session_history function
)

print(response.content)

I don’t have access to personal information, so I wouldn’t know the names of your best friends. Perhaps you could tell me about them?


In [15]:
# get a list of messages in the memory 
store["123"].messages

[HumanMessage(content='Hi there, this is Harry Potter, I just got two good friends at Hogwarts, Ron Weasley and Hermione Granger.', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Hello, Harry Potter. It seems you've already made some notable connections. Ron Weasley and Hermione Granger—interesting choices. Friendship can be a powerful tool, but remember, not all alliances are as they seem. Keep your wits about you. I am Tom Riddle, by the way.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 63, 'prompt_tokens': 111, 'total_tokens': 174, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'deepseek-v3', 'system_fingerprint': None, 'id': 'da77317dec8e49b982e97ac2d5375cfd', 'finish_reason': 'stop', 'logprobs': None}, id='run-899bb917-335b-45b0-a4d6-f9fee23df70b-0', usage_metadata={'input_tokens': 111, 'output_tokens': 63, 'total_tokens': 174, 'input_token_details': {}, 'output_token_details': {}

In [16]:

response = chain_with_message_history.invoke(
    {"input": "Who am I?",
     "history": chat_history.messages},  # Pass history here
    config={"configurable": {"user_id": "000"}},  # argument for the get_session_history function
)

print(response.content)


You are a human interacting with me, Tom Riddle. Let me know how I can assist you further!


In [17]:
store["000"].messages

[HumanMessage(content='Who am I?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='You are a human interacting with me, Tom Riddle. Let me know how I can assist you further!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 23, 'prompt_tokens': 88, 'total_tokens': 111, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'deepseek-v3', 'system_fingerprint': None, 'id': '7ad4893e9aa64aae86be657b764b0b22', 'finish_reason': 'stop', 'logprobs': None}, id='run-7cf2653e-a1f6-4494-a00a-00f43c2026c7-0', usage_metadata={'input_tokens': 88, 'output_tokens': 23, 'total_tokens': 111, 'input_token_details': {}, 'output_token_details': {}})]

### Trimming messages
LLMs and chat models have limited context windows, and even if you're not directly hitting limits, you may want to limit the amount of distraction the model has to deal with. One solution is trim the historic messages before passing them to the model. Let's use an example history with some preloaded messages:

In [18]:
# let's create a new history, nemo
store["nemo"] = ChatMessageHistory()

store["nemo"] .add_user_message("Hey there! I'm Nemo.")
store["nemo"] .add_ai_message("Hello!")
store["nemo"] .add_user_message("How are you today?")
store["nemo"] .add_ai_message("Fine thanks!")

store["nemo"] .messages

[HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={})]

In [19]:
chain_with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history=get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    history_factory_config=[  # parameter for the get_session_history function
        ConfigurableFieldSpec(
            id="user_id",
            annotation=str,
            name="User ID",
            description="Unique identifier for the user.",
            default="",
            is_shared=True,
        ),
    ],    
)



In [20]:
response = chain_with_message_history.invoke(
    {"input": "What's my name?",
     "history": chat_history.messages},  # Pass history here
    config={"configurable": {"user_id": "nemo"}},  # argument for the get_session_history function
)

print(response.content)

Your name is Tom Riddle. How can I assist you today?


We can see the chain remembers the preloaded name.

But let's say we have a very small context window, and we want to trim the number of messages passed to the chain to only the 2 most recent ones. We can use the built in trim_messages util to trim messages based on their token count before they reach our prompt. In this case we'll count each message as 1 "token" and keep only the last two messages:

In [21]:
from operator import itemgetter

from langchain_core.messages import trim_messages
from langchain_core.runnables import RunnablePassthrough

trimmer = trim_messages(strategy="last", max_tokens=1, token_counter=len)

chain_with_trimming = (
    RunnablePassthrough.assign(chat_history=itemgetter("chat_history") | trimmer)
    | prompt
    | chat
)

chain_with_trimmed_history = RunnableWithMessageHistory(
    chain_with_trimming,
    get_session_history=get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    history_factory_config=[  # parameter for the get_session_history function
        ConfigurableFieldSpec(
            id="user_id",
            annotation=str,
            name="User ID",
            description="Unique identifier for the user.",
            default="",
            is_shared=True,
        ),
    ],    
)

Let's call this new chain and check the messages afterwards:

In [22]:
# you ask something irrelavant to the chat history
# and see if the history is trimmed
response = chain_with_message_history.invoke(
    {"input": "where is beijing?",
     "history": chat_history.messages},  # Pass history here
    config={"configurable": {"user_id": "nemo"}},  # argument for the get_session_history function
)

print(response.content)


Beijing is the capital city of China, located in the northern part of the country. It is a major cultural, historical, and political center. However, it seems you didn’t actually ask this question—you asked me what you had just asked me! Let me know if you'd like more information about Beijing or anything else.


In [23]:
# in fact, the history is still there, just not passed to the model
store["nemo"].messages

[HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content="What's my name?", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Your name is Tom Riddle. How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 89, 'total_tokens': 104, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'deepseek-v3', 'system_fingerprint': None, 'id': 'c8caa83b6bd1478daf2fa0b9e98d1c09', 'finish_reason': 'stop', 'logprobs': None}, id='run-28f6fbfe-6731-44ee-bf70-0b7d3979a3de-0', usage_metadata={'input_tokens': 89, 'output_tokens': 15, 'total_tokens': 104, 'input_token_details': {}, 'o

The next time the chain is called, trim_messages will be called again, and only the two most recent messages will be passed to the model. In this case, this means that the model will forget the name we gave it the next time we invoke it:

In [24]:
# see if the history is trimmed (forgot the name nemo)
response = chain_with_message_history.invoke(
    {"input": "What is my name?",
     "history": chat_history.messages},  # Pass history here
    config={"configurable": {"user_id": "nemo"}},  # argument for the get_session_history function
)

print(response.content)

Your name is Tom Riddle. How may I assist you today?


In [25]:
# of course, the history is actually still there (just not seen by the model)
store["nemo"].messages

[HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content="What's my name?", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Your name is Tom Riddle. How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 89, 'total_tokens': 104, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'deepseek-v3', 'system_fingerprint': None, 'id': 'c8caa83b6bd1478daf2fa0b9e98d1c09', 'finish_reason': 'stop', 'logprobs': None}, id='run-28f6fbfe-6731-44ee-bf70-0b7d3979a3de-0', usage_metadata={'input_tokens': 89, 'output_tokens': 15, 'total_tokens': 104, 'input_token_details': {}, 'o

Haha, the model forgot the name we gave it.

### Summary memory
We can use this same pattern in other ways too. For example, we could use an additional LLM call to generate a summary of the conversation before calling our chain. Let's recreate our chat history and chatbot chain:

In [26]:
chat_history = ChatMessageHistory()

chat_history.add_user_message("Hey there! I'm Nemo.")
chat_history.add_ai_message("Hello!")
chat_history.add_user_message("How are you today?")
chat_history.add_ai_message("Fine thanks!")

chat_history.messages

[HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={})]

We'll slightly modify the prompt to make the LLM aware that will receive a condensed summary instead of a chat history:

In [27]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability. The provided chat history includes facts about the user you are speaking with.",
        ),
        ("placeholder", "{chat_history}"),
        ("user", "{input}"),
    ]
)

chain = prompt | chat

chain_with_message_history = RunnableWithMessageHistory(
    chain,
    lambda session_id: chat_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

And now, let's create a function that will distill previous interactions into a summary. We can add this one to the front of the chain too:

In [28]:
def summarize_messages(chain_input):
    stored_messages = chat_history.messages
    if len(stored_messages) == 0:
        return False
    summarization_prompt = ChatPromptTemplate.from_messages(
        [
            ("placeholder", "{chat_history}"),
            (
                "user",
                "Distill the above chat messages into a single summary message. Include as many specific details as you can.",
            ),
        ]
    )
    summarization_chain = summarization_prompt | chat

    summary_message = summarization_chain.invoke({"chat_history": stored_messages})

    chat_history.clear()

    chat_history.add_message(summary_message)

    return True


chain_with_summarization = (
    RunnablePassthrough.assign(messages_summarized=summarize_messages)
    | chain_with_message_history
)

Let's see if it remembers the name we gave it:

In [29]:
chain_with_summarization.invoke(
    {"input": "What did I say my name was?"},
    {"configurable": {"session_id": "unused"}},
).content

'You mentioned that your name is Nemo.'

In [30]:
chat_history.messages

[AIMessage(content='Nemo initiated the conversation with a greeting, and I responded with a friendly "Hello!" When Nemo asked how I was today, I replied, "Fine thanks!" This brief exchange captures a polite and casual interaction between Nemo and me.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 48, 'total_tokens': 98, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'deepseek-v3', 'system_fingerprint': None, 'id': '0a7b7dcc0896425f826fb8573d151fca', 'finish_reason': 'stop', 'logprobs': None}, id='run-b560cbb7-c638-4e2a-bbef-2afded30f696-0', usage_metadata={'input_tokens': 48, 'output_tokens': 50, 'total_tokens': 98, 'input_token_details': {}, 'output_token_details': {}}),
 HumanMessage(content='What did I say my name was?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='You mentioned that your name is Nemo.', additional_kwargs={'refusal': None}, response_metadata={'t

### 1.3 Adding Memory to Agents

In this section, we will first ask the agent a question, and then without mention the context information ourselves ask another related question.

In [14]:
from langchain.agents import AgentExecutor, Tool, ZeroShotAgent
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain_community.utilities import SerpAPIWrapper
from langchain_openai import OpenAI

In [32]:
search = SerpAPIWrapper()

tools = [
    Tool(
        name="Search",
        func=search.run,
        description="useful for when you need to answer questions about current events",
    )
]

In [33]:
prompt = ZeroShotAgent.create_prompt(
    tools,
    prefix="""Have a conversation with a human, answering the following questions as best you can.  You have access to the following tools:""",
    suffix="""Begin!  
{chat_history}
Question: {input}
{agent_scratchpad}""",
    input_variables=["input", "chat_history", "agent_scratchpad"],
)
memory = ConversationBufferMemory(memory_key="chat_history")

  memory = ConversationBufferMemory(memory_key="chat_history")


In [34]:
chat = ChatOpenAI(model=CHAT_MODEL, temperature=0)
llm_chain = LLMChain(llm=chat, prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_chain = AgentExecutor.from_agent_and_tools(
    agent=agent, tools=tools, verbose=True, memory=memory, handle_parsing_errors=True
)

  llm_chain = LLMChain(llm=chat, prompt=prompt)
  agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)


In [35]:
agent_chain.invoke(input="What is the population of China in 2024?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To answer the question about the population of China in 2024, I need to find the most recent and reliable data. I will use the Search tool to look for this information.

Action: Search  
Action Input: "population of China in 2024"  
[0m
Observation: [36;1m[1;3maround 1.408 billion people[0m
Thought:[32;1m[1;3mThought: I have found the most recent and reliable data on the population of China in 2024.  
Final Answer: The population of China in 2024 is approximately **1.408 billion people**.[0m

[1m> Finished chain.[0m


{'input': 'What is the population of China in 2024?',
 'chat_history': '',
 'output': 'The population of China in 2024 is approximately **1.408 billion people**.'}

In [36]:
memory.load_memory_variables({})

{'chat_history': 'Human: What is the population of China in 2024?\nAI: The population of China in 2024 is approximately **1.408 billion people**.'}

In [38]:
agent_chain.invoke(input="Is it more or less than India?")



[1m> Entering new AgentExecutor chain...[0m


[32;1m[1;3mThought: To compare the population of China and India in 2024, I need to find the population of India in 2024.  
Action: Search  
Action Input: "Population of India in 2024"  [0m
Observation: [36;1m[1;3m{'type': 'organic_result', 'title': 'India Population (2025) - Worldometer', 'source': 'Worldometer', 'description': 'Population of India (2025 and historical)'}[0m
Thought:[32;1m[1;3mThe search result provides information about India's population in 2025, but not specifically for 2024. However, based on current trends, India's population in 2024 is estimated to be slightly less than in 2025. According to recent data, India's population is projected to surpass China's in 2023, making it the most populous country in the world. Therefore, in 2024, India's population is likely to be **more** than China's population of approximately 1.408 billion.  
Final Answer: In 2024, India's population is **more** than China's population.[0m

[1m> Finished chain.[0m


{'input': 'Is it more or less than India?',
 'chat_history': 'Human: What is the population of China in 2024?\nAI: The population of China in 2024 is approximately **1.408 billion people**.',
 'output': "In 2024, India's population is **more** than China's population."}

In [None]:
print_with_type(memory.load_memory_variables({}))

"<class 'dict'>:"
{'chat_history': 'Human: What is the population of China in 2024?\n'
                 'AI: The population of China in 2024 is approximately **1.408 '
                 'billion people**.\n'
                 'Human: Is it more or less than India?\n'
                 "AI: India's population in 2024 is slightly more than "
                 "China's, with approximately **1.428 billion people** "
                 "compared to China's **1.408 billion people**."}


In [None]:
agent_chain.invoke(input="what is the population in China?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: The user is asking for the population of China. I already provided this information earlier, but I will confirm it again.  
Action: Search  
Action Input: "Population of China in 2024"  [0m
Observation: [36;1m[1;3maround 1.408 billion people[0m
Thought:[32;1m[1;3mI now know the final answer.  
Final Answer: The population of China in 2024 is approximately **1.408 billion people**.[0m

[1m> Finished chain.[0m


{'input': 'what is the population in China?',
 'chat_history': "Human: What is the population of China in 2024?\nAI: The population of China in 2024 is approximately **1.408 billion people**.\nHuman: Is it more or less than India?\nAI: India's population in 2024 is slightly more than China's, with approximately **1.428 billion people** compared to China's **1.408 billion people**.",
 'output': 'The population of China in 2024 is approximately **1.408 billion people**.'}

In [39]:
print_with_type(memory.load_memory_variables({}))

"<class 'dict'>:"
{'chat_history': 'Human: What is the population of China in 2024?\n'
                 'AI: The population of China in 2024 is approximately **1.408 '
                 'billion people**.\n'
                 'Human: Is it more or less than India?\n'
                 "AI: In 2024, India's population is **more** than China's "
                 'population.'}


## 2. Long term memory with vector storage 

In this section, we are going to embed the famous Harry Potter book's first chapter into a vectorstore and try some similarity searches. We have some extra examples commented, you can uncomment and try them one-by-one. If you observe the results carefully, you may find the characteristics of similarity search.

### 2.1 Loaders and Splitters

#### PDF Loaders

In [15]:
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader

data = PyPDFLoader("/ssdshare/share/lab4/harry-potter-chap-1.pdf").load()


In [41]:
# Note: If you're using PyPDFLoader then it will split by page for you already

print (f'You have {len(data)} document(s) in your data')
i = 0
for d in data:
    print (f'There are {len(d.page_content)} characters in doc {i}')
    i += 1

You have 16 document(s) in your data
There are 1835 characters in doc 0
There are 2088 characters in doc 1
There are 2081 characters in doc 2
There are 1887 characters in doc 3
There are 1879 characters in doc 4
There are 1286 characters in doc 5
There are 1851 characters in doc 6
There are 1792 characters in doc 7
There are 1535 characters in doc 8
There are 1555 characters in doc 9
There are 1622 characters in doc 10
There are 1780 characters in doc 11
There are 1528 characters in doc 12
There are 1386 characters in doc 13
There are 1870 characters in doc 14
There are 1907 characters in doc 15


#### Text file loader

In [6]:
from langchain_community.document_loaders import TextLoader

union = TextLoader("/ssdshare/share/lab4/state_of_the_union.txt").load()

#### Text Splitters

From Langchain documents: 

RecursiveCharacterTextSplitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

In [12]:
# You can have some trials with different chunk_size and chunk_overlap.
# This is optional, test out on your own data.

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=50)
texts = text_splitter.split_documents(data)

In [44]:
print (f'Now you have {len(texts)} documents')

for t in texts:
    print(t.page_content[:100])
    print("=========")

Now you have 43 documents
CHAPTER ONE 
 
THE BOY WHO LIVED 
 
Mr. and Mrs. Dursley, of number four, Privet Drive, were proud t
opinion there was no finer boy anywhere.  
 
The Dursleys had everything they wanted, but they also 
Dudley mixing with a child like that. 
 
When Mr. and Mrs. Dursley woke up on the dull, gray Tuesday
work, and Mrs. Dursley gossiped away happily as she wrestled a 
screaming 
Dudley into his high chai
Drive, but there wasn't a map in sight. What could he have been thinking  
of? It must have been a t
about. People in cloaks. Mr. Dursley couldn't bear people who dressed in  
funny clothes -- the getu
nerve of him! But then it struck Mr. Dursley that this was probably some  
silly stunt -- these peop
normal, owl-free morning. He yelled at five different people. He made  
several important telephone 
Mr. Dursley stopped dead. Fear flooded him. He looked back at the  
whisperers as if he wanted to sa
of it, he wasn't even sure his nephew was called Harry. He'd neve

There are different kinds of splitters.  

https://chunkviz.up.railway.app/ 

provides a great tool to see the splitter differences with different chunk_size and chunk_overlap settings.

In [None]:

from langchain.document_loaders import UnstructuredPDFLoader, PyPDFLoader, PyMuPDFLoader
import time
import os

#### Your TASK ####
# Explore different PDF Loaders.  Which one works the best for this file /ssdshare/share/lab4/hp-book1.pdf ,
# which contains the full book of Harry Potter Book 1, with all the illustratons.
## Langchain provides many other options for loaders, read the documents to find out the differences
# See page https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf
# File path
hp_book_path = "/ssdshare/share/lab4/hp-book1.pdf"

# Using UnstructuredPDFLoader to load the full Harry Potter book
start_time = time.time()
unstructured_data = UnstructuredPDFLoader(hp_book_path).load()
print(f"UnstructuredPDFLoader loaded {len(unstructured_data)} documents in {time.time() - start_time:.2f} seconds")
print(f"First document sample: {unstructured_data[0].page_content[:200]}...")
# Check if UnstructuredPDFLoader handles images
print(f"Does it handle images? {'image' in str(unstructured_data[0].metadata).lower()}")

# Try PyPDFLoader
start_time = time.time()
pypdf_data = PyPDFLoader(hp_book_path).load()
print(f"\nPyPDFLoader loaded {len(pypdf_data)} documents in {time.time() - start_time:.2f} seconds")
print(f"First document sample: {pypdf_data[0].page_content[:200]}...")
print(f"Does PyPDFLoader handle images? {'image' in str(pypdf_data[0].metadata).lower()}")

# Try PyMuPDFLoader
start_time = time.time()
pymupdf_data = PyMuPDFLoader(hp_book_path).load()
print(f"\nPyMuPDFLoader loaded {len(pymupdf_data)} documents in {time.time() - start_time:.2f} seconds")
print(f"First document sample: {pymupdf_data[0].page_content[:200]}...")
print(f"Does PyMuPDFLoader handle images? {'image' in str(pymupdf_data[0].metadata).lower()}")

# Compare the loaders
print(f"\nComparison summary:")
print(f"- UnstructuredPDFLoader: {len(unstructured_data)} document, {len(unstructured_data[0].page_content)} characters")
print(f"- PyPDFLoader: {len(pypdf_data)} documents (one per page)")
print(f"- PyMuPDFLoader: {len(pymupdf_data)} documents (one per page)")

# Check file sizes and processing speed
for i, (loader_name, data) in enumerate([
    ("UnstructuredPDFLoader", unstructured_data), 
    ("PyPDFLoader", pypdf_data), 
    ("PyMuPDFLoader", pymupdf_data)
]):
    total_chars = sum(len(doc.page_content) for doc in data)
    print(f"- {loader_name}: {total_chars} total characters extracted")

# the code is generated by copilot (Claude 3.7 Sonnet Thinking), using prompt "finish the task". The output is not modified.


  from .autonotebook import tqdm as notebook_tqdm


UnstructuredPDFLoader loaded 1 documents in 848.46 seconds
First document sample: eo great bi

room is

HARRY POTTER

AND THE SORCERER'S STONE

ALSO BY J. K. ROWLING

Harry Potter and the Sorcerer's Stone

Year One at Hogwarts

Harry Potter and the Chamber of Secrets Year Two at Ho...
Does it handle images? False

PyPDFLoader loaded 327 documents in 3.19 seconds
First document sample: ...
Does PyPDFLoader handle images? False

PyMuPDFLoader loaded 327 documents in 0.62 seconds
First document sample: ...
Does PyMuPDFLoader handle images? False

Comparison summary:
- UnstructuredPDFLoader: 1 document, 453541 characters
- PyPDFLoader: 327 documents (one per page)
- PyMuPDFLoader: 327 documents (one per page)
- UnstructuredPDFLoader: 453541 total characters extracted
- PyPDFLoader: 462642 total characters extracted
- PyMuPDFLoader: 457769 total characters extracted


### 2.2 Create embeddings of your documents

Embedding is a model that turns a sentence into vectors, so that we can "semantically search" for related splits of a document. 

In [None]:
# OpenAI embedding: slow and expensive, we do not use them here.  

# from langchain.embeddings.openai import OpenAIEmbeddings

# openai_embedding = OpenAIEmbeddings()

In [10]:
# Let's use the SILICONFLOW BAAI embedding model instead.
# Note infini-ai's embedding model has some issues, so we do not use it here.
# Don't forget to set the environment variable SILICONFLOW_API_KEY!!!

import os
from langchain_openai import OpenAIEmbeddings
baai_embedding = OpenAIEmbeddings(
    model="BAAI/bge-m3",
    base_url=os.environ.get("SF_BASE_URL"),
    api_key=os.environ.get("SF_API_KEY"),
)
baai_embedding.embed_query("Harry Potter is a wizard.") # test the embedding

[-0.015141339041292667,
 0.026687903329730034,
 -0.03798038884997368,
 0.010680807754397392,
 0.00907162856310606,
 -0.006441421341150999,
 0.03607948496937752,
 0.011330125853419304,
 0.019950054585933685,
 0.0050627971068024635,
 0.012421732768416405,
 0.011198380030691624,
 0.006079120561480522,
 0.011791235767304897,
 0.023526009172201157,
 -0.004065294284373522,
 0.026443233713507652,
 -0.024090632796287537,
 -0.006375548429787159,
 0.007246010471135378,
 0.011753593571484089,
 0.012487605214118958,
 0.0036135949194431305,
 -0.018096206709742546,
 -0.026894932612776756,
 -0.018773755058646202,
 -0.006093236152082682,
 0.0021291037555783987,
 8.028250158531591e-05,
 0.0284947007894516,
 0.001866788836196065,
 -0.019874772056937218,
 0.03178834170103073,
 -0.015094286762177944,
 -0.027421915903687477,
 -0.019987696781754494,
 -0.0071566118858754635,
 -0.002731369575485587,
 -0.10502010583877563,
 -0.001976184779778123,
 0.001282167504541576,
 -0.0002726076345425099,
 -0.018820807337

### 2.4  Store and retrieve the embeddings in ChromaDB

You can search documents stored in "Vector DBs" by their semantic similarity.  Vector DBs uses an algorithm called "KNN (k-nearest neighbors)" to find documents whose embedding is the closest to the query. 

We first introduce ChromaDB becauase it runs locally, easy-to-set-up, and best of all, free.

In [17]:
# compute embeddings and save the embeddings into ChromaDB
from langchain_chroma import Chroma

chroma_dir = "/scratch1/chroma_db"
docsearch_chroma = Chroma(
    embedding_function=baai_embedding,
    persist_directory=chroma_dir,
    collection_name="harry-potter",
)
docsearch_chroma.reset_collection()
# docsearch_chroma.add_documents(texts)
for t in texts:
    docsearch_chroma.add_documents([t])

In [48]:
# questions from https://en.wikibooks.org/wiki/Muggles%27_Guide_to_Harry_Potter/Books/Philosopher%27s_Stone/Chapter_1
# you can try yourself

# query = 'Why would the Dursleys consider being related to the Potters a "shameful secret"?'
# query = 'Who are the robed people Mr. Dursley sees in the streets?'
# query = 'What might a "Muggle" be?'
query = '''Who might "You-Know-Who" be? Why isn't this person referred to by a given name?'''

In [3]:
## A utiity function ...
def print_search_results(docs):
    print(f"search returned %d results. " % len(docs))
    for doc in docs:
        print(doc.page_content)
        print("=============")


In [50]:
# semantic similarity search

docs = docsearch_chroma.similarity_search(query)
print_search_results(docs)

search returned 4 results. 
"No, thank you," said Professor McGonagall coldly, as though she didn't  
think this was the moment for lemon drops. "As I say, even if  
You-Know-Who has gone -" 
 
"My dear Professor, surely a sensible person like yourself can call him  
by his name? All this 'You- Know-Who' nonsense -- for eleven years I 
have been trying to persuade people to call him by his proper name:  
Voldemort." Professor McGonagall flinched, but Dumbledore, who was  
unsticking two lemon drops, seemed not to notice. "It all gets so  
confusing if we keep saying 'You -Know-Who.' I have never seen any 
reason 
to be frightened of saying Voldemort's name.  
 
"I know you haven 't, said Professor McGonagall, sounding half  
exasperated, half admiring. "But you're different. Everyone knows you're
half-moon glasses. "It would be enough to turn any boy's head. Famous  
before he can walk and talk! Famous for something he won't even  
remember! CarA you see how much better off he'll be, g

#### Saving and Loading your ChromaDB

In [24]:
# reload from disk
docsearch_chroma_reloaded = Chroma(persist_directory = chroma_dir,
                                   collection_name = 'harry-potter', 
                                   embedding_function = baai_embedding)

In [None]:
# you can test with the previous or another query

query = 'Who are the robed people Mr. Dursley sees in the streets?'
docs = docsearch_chroma_reloaded.similarity_search(query, k=6)
print_search_results(docs)

search returned 6 results. 
Drive, but there wasn't a map in sight. What could he have been thinking  
of? It must have been a trick of the light. Mr. Dursley blinked and  
stared at the cat. It stared back. As Mr. Dursley drove around the  
corner and up the road, he watched the cat in his mirror. It was now  
reading the sign that said Privet Drive -- no, looking at the sign; cats 
couldn't read maps or signs. Mr. Dursley gave himself a little shake and  
put the cat out of his mind. As he drove toward town he thought of  
nothing except a large order of drills he was hoping to get that day.  
 
But on the edge of town, drills were driven out of his mind by something  
else. As he sat in the usual morning tr affic jam, he couldn't help 
noticing that there seemed to be a lot of strangely dressed people
As he had expected, Mrs. Dursley looked shocked and angry. After all,  
they normally pretended she didn't have a sister.  
 
"No," she said sharply. "Why?" 
 
"Funny stuff on the news

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

#### Your TASK ####
# With the chosen PDF loaders, test different splitters and chunk size until you feel that the chucking makes sense. 
# You can also try different embeddings
# Then embed the entire book 1 into ChormaDB

# Load the full Harry Potter book
hp_book_path = "/ssdshare/share/lab4/hp-book1.pdf"
full_book = PyPDFLoader(hp_book_path).load()
print(f"Loaded book with {len(full_book)} pages")

# Test different text splitters and chunk sizes

# First try RecursiveCharacterTextSplitter with different chunk sizes
for chunk_size in [500, 1000, 1500]:
    for chunk_overlap in [50, 150, 300]:
        print(f"\nTesting RecursiveCharacterTextSplitter with chunk_size={chunk_size}, overlap={chunk_overlap}")
        splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
        chunks = splitter.split_documents(full_book[14:17])  # Test with first 3 pages
        print(f"Created {len(chunks)} chunks from 3 pages")
        
        # Print a sample chunk to see how it looks
        if chunks:
            print(f"Sample chunk ({len(chunks[0].page_content)} chars):\n{chunks[0].page_content[:150]}...")

# Try CharacterTextSplitter for comparison
char_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
char_chunks = char_splitter.split_documents(full_book[:3])
print(f"\nCharacterTextSplitter created {len(char_chunks)} chunks from 3 pages")

# Based on the tests, choose the best splitter and parameters
# Let's use RecursiveCharacterTextSplitter with chunk_size=1000 and chunk_overlap=150
chosen_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
all_chunks = chosen_splitter.split_documents(full_book)
print(f"\nSplitting full book into {len(all_chunks)} chunks with chunk_size=1000, overlap=150")

# Create a new ChromaDB collection for the full book
full_book_chroma_dir = "/scratch1/full_hp_chroma_db"
full_book_chroma = Chroma(
    embedding_function=baai_embedding,
    persist_directory=full_book_chroma_dir,
    collection_name="harry-potter-full",
)
full_book_chroma.reset_collection()

# Embed and store chunks in batches to avoid memory issues
batch_size = 50
for i in range(0, len(all_chunks), batch_size):
    batch_end = min(i + batch_size, len(all_chunks))
    print(f"Adding chunks {i} to {batch_end-1}")
    full_book_chroma.add_documents(all_chunks[i:batch_end])

print(f"Successfully embedded full book into ChromaDB at {full_book_chroma_dir}")

# Test a query to verify the embedding quality
test_query = "What happens when Harry first meets Hagrid?"
results = full_book_chroma.similarity_search(test_query, k=3)
print(f"\nTest query: {test_query}")
print_search_results(results)


# the code is generated by copilot (Claude 3.7 Sonnet Thinking), using prompt "finish the task". I modified the test pages to 14-16 (the original code selected 0-2, which do not contain text).

Loaded book with 327 pages

Testing RecursiveCharacterTextSplitter with chunk_size=500, overlap=50
Created 12 chunks from 3 pages
Sample chunk (439 chars):
THE  BOY  WHO  LIVED 
 3  
of Privet Drive, but there wasn’t  a map in sight. What could he 
have been thinking of? It must have  been a trick of th...

Testing RecursiveCharacterTextSplitter with chunk_size=500, overlap=150
Created 16 chunks from 3 pages
Sample chunk (439 chars):
THE  BOY  WHO  LIVED 
 3  
of Privet Drive, but there wasn’t  a map in sight. What could he 
have been thinking of? It must have  been a trick of th...

Testing RecursiveCharacterTextSplitter with chunk_size=500, overlap=300
Created 23 chunks from 3 pages
Sample chunk (439 chars):
THE  BOY  WHO  LIVED 
 3  
of Privet Drive, but there wasn’t  a map in sight. What could he 
have been thinking of? It must have  been a trick of th...

Testing RecursiveCharacterTextSplitter with chunk_size=1000, overlap=50
Created 6 chunks from 3 pages
Sample chunk (936 c

### 2.5 Query those docs with a QA chain

In [25]:
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(temperature=0, model=CHAT_MODEL)
chain = RetrievalQA.from_chain_type(
    llm, 
    chain_type="stuff", 
    verbose=True, 
    retriever=docsearch_chroma_reloaded.as_retriever(k=5)
)

In [None]:
# query = "How did Harry's parents die?"
query = "What is the cat on Privet Drive?"
docs = docsearch_chroma_reloaded.similarity_search(query)
print_search_results(docs)

search returned 4 results. 
twelve balls of light sped back to their street lamps so that Privet  
Drive glowed suddenly orange and he could make out a tabby cat slinking  
around the corner at the other end of the street. He could just see the  
bundle of blankets on the step of number four.  
 
"Good luck, Harry," he murmured. He turned on his heel and with a swish  
of his cloak, he was gone.
Drive, but there wasn't a map in sight. What could he have been thinking  
of? It must have been a trick of the light. Mr. Dursley blinked and  
stared at the cat. It stared back. As Mr. Dursley drove around the  
corner and up the road, he watched the cat in his mirror. It was now  
reading the sign that said Privet Drive -- no, looking at the sign; cats 
couldn't read maps or signs. Mr. Dursley gave himself a little shake and  
put the cat out of his mind. As he drove toward town he thought of  
nothing except a large order of drills he was hoping to get that day.  
 
But on the edge of town,

In [None]:
chain.invoke(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


{'query': 'What is the cat on Privet Drive?',
 'result': "The cat on Privet Drive is described as a tabby cat with distinctive markings around its eyes. It is later revealed in the Harry Potter series that this cat is actually Professor Minerva McGonagall, who is a witch and a Transfiguration teacher at Hogwarts School of Witchcraft and Wizardry. She is observing the Dursleys' house in her Animagus form, which allows her to transform into a cat."}

In [None]:
#### Your Task ####
# Rebuild the chain from the whole book ChromaDB.  Test with one of the following questions (of your choice).
#query = 'Why does Dumbledore believe the celebrations may be premature?'
#query = 'Why is Harry left with the Dursleys rather than a Wizard family?'
#query = 'Why does McGonagall seem concerned about Harry being raised by the Dursleys?'

from langchain_chroma import Chroma
full_book_chroma_dir = "/scratch1/full_hp_chroma_db"
full_book_chroma = Chroma(
    embedding_function=baai_embedding,
    persist_directory=full_book_chroma_dir,
    collection_name="harry-potter-full",
)
# Create a QA chain with the full book ChromaDB
full_book_qa_chain = RetrievalQA.from_chain_type(
    llm, 
    chain_type="stuff", 
    verbose=True, 
    retriever=full_book_chroma.as_retriever(k=5)
)

# Choose one of the provided questions
query = 'Why is Harry left with the Dursleys rather than a Wizard family?'

# Get the relevant documents
relevant_docs = full_book_chroma.similarity_search(query)
print("Documents retrieved by similarity search:")
print_search_results(relevant_docs)

# Run the query through the QA chain
result = full_book_qa_chain.invoke(query)
print("\nFinal answer:")
print(result)



# the code is generated by copilot (Claude 3.7 Sonnet Thinking), using prompt "finish the task". The output is not modified.

Documents retrieved by similarity search:
search returned 4 results. 
tering nonstop under his breath. 
“He’s doing something — jinxing the broom,” said Hermione. 
“What should we do?” 
“Leave it to me.” 
Before Ron could say another word, Hermione had disappeared. 
Ron turned the binoculars back on Harry. His broom was vibrat-
ing so hard, it was almost impossible for him to hang on much 
longer. The whole crowd was on its feet, watching, terrified, as 
the Weasleys flew up to try and pull Harry safely onto one of 
their brooms, but it was no good — every time they got near him, 
the broom would jump higher still. They dropped lower and cir-
cled beneath him, obviously hoping to catch him if he fell. Marcus
before. She let Harry watch televisi on and gave him a bit of choco-
late cake that tasted as though  she’d had it for several years. 
That evening, Dudley paraded ar ound the living room for the 
family in his brand-new uniform. Smeltings boys wore maroon 
tailcoats, orange knicke

In [None]:
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts import PromptTemplate

#### Your Task ####
# Using langchain documentation, find out about the map reduce QA chain.  
# answer the following questions using the chain
#chain = load_qa_chain(llm, chain_type="map_reduce")
# answer one of the following questions of your choice. 
# query = What happened in the Forbidden Forest during the first year of Harry Potter at Hogwarts?
# query = Tell me about Harry Potter and Quidditch during the first year

# Choose one of the questions
query = "What happened in the Forbidden Forest during the first year of Harry Potter at Hogwarts?"

# Retrieve relevant documents from our ChromaDB
relevant_docs = full_book_chroma.similarity_search(query, k=8)

# Create custom prompts for map and reduce steps
map_template = """The following is an excerpt from Harry Potter:
{context}

Based only on this excerpt, provide a partial answer to the question: {question}
If the text doesn't contain relevant information, just say "This passage doesn't contain information about this."
"""

reduce_template = """The following are partial answers to a question about Harry Potter:
{summaries}

Based on these partial answers, provide a comprehensive answer to the question: {question}
"""

# Create the map and reduce prompts
map_prompt = PromptTemplate.from_template(map_template)
reduce_prompt = PromptTemplate.from_template(reduce_template)

# Create the map_reduce chain
map_reduce_chain = load_qa_chain(
    llm,
    chain_type="map_reduce",
    question_prompt=map_prompt,
    combine_prompt=reduce_prompt,
    verbose=True
)

# Run the query through the map_reduce chain
result = map_reduce_chain.invoke({"input_documents": relevant_docs, "question": query})

print("\nFinal answer from map_reduce chain:")
print(result["output_text"])

# Compare with the previously created 'stuff' method chain
print("\nComparison with 'stuff' method:")
stuff_result = full_book_qa_chain.invoke(query)
print(stuff_result["result"])


# the code is generated by copilot (Claude 3.7 Sonnet Thinking), using prompt "finish the task". The output is not modified.



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is an excerpt from Harry Potter:
The  Vanishing  Glass 
 19  
Yet Harry Potter was still there, asleep at the moment, but not 
for long. His Aunt Petunia was aw ake and it was her shrill voice 
that made the first noise of the day. 
“Up! Get up! Now!” 
Harry woke with a st art. His aunt rapped  on the door again. 
“Up!” she screeched. Harry heard her walking toward the 
kitchen and then the sound of th e frying pan being put on the 
stove. He rolled onto his back an d tried to remember the dream he 
had been having. It ha d been a good one. Ther e had been a flying 
motorcycle in it. He had a funny feeling he’d had the same dream 
before. 
His aunt was back outside the door. 
“Are you up yet?” she demanded. 
“Nearly,” said Harry. 
“Well, get a move on, I want yo u to look after the bacon. And 
don’t you dare let it burn, I want every

### 2.6 (Optional) Use DSPy with ChromaDB

In [None]:
import dspy
from dspy.retrieve.chromadb_rm import ChromadbRM

lm = dspy.LM(
    "openai/llama-3.3-70b-instruct",
    api_base=os.environ["OPENAI_BASE_URL"],
    api_key=os.environ["OPENAI_API_KEY"]
)

# pinecone retriever has some issues with the current version of dspy so we will use chroma retriever
chroma_retrieve = ChromadbRM(
    collection_name="harry-potter",
    persist_directory="/scratch1/chroma_db",
    embedding_function=baai_embedding.embed_documents,
    k=5
)

dspy.settings.configure(
    lm=lm,
    rm=chroma_retrieve
)

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# Defining a class named GenerateAnswer which inherits from dspy.Signature
class GenerateAnswer(dspy.Signature):
    """Think and Answer questions based on the context provided."""

    # Defining input fields with descriptions
    context = dspy.InputField(desc="May contain relevant facts about user query")
    question = dspy.InputField(desc="User query")
    
    # Defining output field with description
    answer = dspy.OutputField(desc="Answer in one or two lines")


# Define a class named RAG inheriting from dspy.Module
class RAG(dspy.Module):
    # Initialize the RAG class
    def __init__(self):
        # Call the superclass's constructor
        super().__init__()

        # Initialize the retrieve module
        self.retrieve = dspy.Retrieve()
        
        # Initialize the generate_answer module using ChainOfThought with GenerateAnswer
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    # Define the forward method
    def forward(self, question):
        # Retrieve relevant context passages based on the input question
        context = self.retrieve(question).passages
        
        # Generate an answer based on the retrieved context and the input question
        prediction = self.generate_answer(context=context, question=question)
        
        # Return the prediction as a dspy.Prediction object containing context and answer
        return dspy.Prediction(context=context, answer=prediction.answer)

In [None]:
# Create a RAG (Retrieval-Augmented Generation) object
RAG_obj = RAG()
query = "Who are the robed people Mr. Dursley sees in the streets?"
# Get the prediction from the RAG model for the given question.
# This prediction includes both the context and the answer.
predict_response = RAG_obj(query)

# Print the question, predicted answer, and truncated retrieved contexts.
print(f"Question: {query}")
print(f"\n\nPredicted Answer: {predict_response.answer}")
print(f"\n\nRetrieved Contexts (truncated): {[c[:200] + '...' for c in predict_response.context]}")

Question: Who are the robed people Mr. Dursley sees in the streets?


Predicted Answer: The robed people are likely wizards or witches, associated with the magical world.


Retrieved Contexts (truncated): ["Drive, but there wasn't a map in sight. What could he have been thinking  \nof? It must have been a trick of the light. Mr. Dursley blinked and  \nstared at the cat. It stared back. As Mr. Dursley drove...", 'As he had expected, Mrs. Dursley looked shocked and angry. After all,  \nthey normally pretended she didn\'t have a sister.  \n \n"No," she said sharply. "Why?" \n \n"Funny stuff on the news," Mr. Dursley m...', "and it didn't improve his mood -- was the tabby cat he'd spotted that  \nmorning. It was now sitting on his garden wall. He was sure it was the  \nsame one; it had the same markings around  its eyes. \n ..."]


Improve the DSPy RAG class, maybe add more hops?

In [None]:
from dspy.dsp.utils import deduplicate

# Define a class named GenerateSearchQuery which inherits from dspy.Signature
class GenerateSearchQuery(dspy.Signature):
    """Write a better search query that will help answer a complex question."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    query = dspy.OutputField()

class MultiHopRAG(dspy.Module):
    def __init__(self, max_hops=3):
        super().__init__()

        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        self.retrieve = dspy.Retrieve()
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
        self.max_hops = max_hops

    def forward(self, question):
        context = []

        for hop in range(self.max_hops):
            query = self.generate_query[hop](context=context, question=question).query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        pred = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)

In [None]:
RAG_obj = MultiHopRAG()

# Get the prediction from the RAG model for the given question.
# This prediction includes both the context and the answer.
predict_response = RAG_obj(query)

# Print the question, predicted answer, and truncated retrieved contexts.
print(f"\n\nPredicted Answer: {predict_response.answer}")



Predicted Answer: The robed people Mr. Dursley sees are likely wizards or witches, possibly gathered for an event related to Harry Potter.


In [None]:
dspy.inspect_history(10)





[34m[2025-03-20T21:10:14.948483][0m

[31mSystem message:[0m

Your input fields are:
1. `context` (str): May contain relevant facts about user query
2. `question` (str): User query

Your output fields are:
1. `reasoning` (str)
2. `answer` (str): Answer in one or two lines

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## context ## ]]
{context}

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Think and Answer questions based on the context provided.


[31mUser message:[0m

[[ ## context ## ]]
[1] «««
    Drive, but there wasn't a map in sight. What could he have been thinking  
    of? It must have been a trick of the light. Mr. Dursley blinked and  
    stared at the cat. It stared back. As Mr. Dursley drove around the  
    corner and up the road, he watched the cat in his mirror. It was now 

### 2.7 (Optional) Using Pinecone, an online vector DB 

You have many reasons to store your DB online in a SaaS / PaaS service.  For example, 
- you want to scale the queries to many concurrent users
- you want more data reliability without having to worry about DB management
- you want to share the DB but without owning any servers

If you want to store your embeddings online, try pinecone with the code below. You must go to [Pinecone.io](https://www.pinecone.io/) and set up an account. Then you need to generate an api-key and create an "index", this can be done by navigating through the homepage once you've logged in to Pinecone, 

In [None]:
# You might need the following code to access OpenAI API or SerpAPI.
# os.environ['HTTP_PROXY']="http://Clash:QOAF8Rmd@10.1.0.213:7890"
# os.environ['HTTPS_PROXY']="http://Clash:QOAF8Rmd@10.1.0.213:7890"
# os.environ['ALL_PROXY']="socks5://Clash:QOAF8Rmd@10.1.0.213:7893"

In [None]:
import pinecone
from langchain_pinecone import PineconeVectorStore
from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone, ServerlessSpec

PINECONE_API_KEY = os.environ['PINECONE_API_KEY']
PINECONE_INDEX_NAME = os.environ['PINECONE_INDEX_NAME']

In [None]:
index_name = "lab4"

pc = Pinecone(api_key=PINECONE_API_KEY)
existing_indexes = [index_info["name"] for index_info in pc.list_indexes()]
if index_name in existing_indexes:
    pc.delete_index(index_name)

pc.create_index(
    name=index_name,
    dimension=1024,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
    
docsearch_pinecone = PineconeVectorStore.from_texts(
    [t.page_content for t in texts], baai_embedding, index_name=index_name, namespace="harry-potter"
)

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0, model=CHAT_MODEL)
query = '''Who might "You-Know-Who" be? Why isn't this person referred to by a given name?'''

print_search_results(docsearch_pinecone.similarity_search(query))
chain = RetrievalQA.from_chain_type(
    llm, chain_type="stuff", verbose=True, retriever=docsearch_pinecone.as_retriever(k=5)
)
chain.invoke(query)

# we can use the full-book to test 'map-reduce', try it !

search returned 0 results. 


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


{'query': 'Who might "You-Know-Who" be? Why isn\'t this person referred to by a given name?',
 'result': '"You-Know-Who" is a phrase used in the Harry Potter series by J.K. Rowling to refer to the dark wizard Lord Voldemort. Characters in the series often avoid saying his name directly due to fear and superstition. This practice stems from a combination of cultural taboos and the belief that saying his name might attract his attention or invoke his presence. The reluctance to use his name is also a reflection of the widespread terror he instilled in the wizarding community. Harry Potter and a few others, like Albus Dumbledore, are notable exceptions who use Voldemort\'s name openly, symbolizing their refusal to be intimidated by him.'}

In [None]:
# query with pinecone
docs = docsearch_pinecone.similarity_search(query)
print_search_results(docs)

search returned 0 results. 


In [None]:
#### Your Task ####
# modify the QA chain in Section 2.5 (Chapter 1 only) to use pinecone instead of ChromaDB

### 2.7 (Optional) Use multiple vector stores in Agent

In this section, we are going to create a simple QA agent that can decide by itself which of the two vectorstores it should switch to for questions of differnent fields.

#### Preparing the tools for the agent.

We will use our chroma_based Harry Potter vectorDB, and let's create another one containing President Biden's State of the Union speech. 

In [None]:
from langchain.document_loaders import TextLoader

documents = TextLoader('/ssdshare/share/lab4/state_of_the_union.txt').load()
texts = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0).split_documents(documents)
docsearch3 = Chroma.from_documents(texts, 
                                   baai_embedding, 
                                   collection_name="state-of-union", 
                                   persist_directory="/scratch1/chroma_db")
print(texts[:2])

[Document(metadata={'source': '/ssdshare/share/lab4/state_of_the_union.txt'}, page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their co

To allow the agent query these databases, we need to define two RetrievalQA chains.

In [None]:
from langchain.chains import RetrievalQA

llm = ChatOpenAI(temperature=0, model=CHAT_MODEL)

harry_potter = RetrievalQA.from_chain_type(llm=llm, 
                                           chain_type="stuff", 
                                           retriever=docsearch_chroma_reloaded.as_retriever(
                                                  search_kwargs={"k": 8}
                                           ))
state_of_union = RetrievalQA.from_chain_type(llm=llm, 
                                             chain_type="stuff", 
                                             retriever=docsearch3.as_retriever(
                                                    search_kwargs={"k": 8}
                                             ))

In [None]:
# Now try both chains

print_with_type(harry_potter.invoke('Why does McGonagall seem concerned about Harry being raised by the Dursleys?'))
print_with_type(state_of_union.invoke("What did the president say about justice Breyer?"))

"<class 'dict'>:"
{'query': 'Why does McGonagall seem concerned about Harry being raised by the '
          'Dursleys?',
 'result': 'Professor McGonagall is concerned about Harry being raised by the '
           'Dursleys because she has observed their behavior and knows they '
           'are not kind or nurturing people. She mentions that they are "the '
           'worst sort of Muggles imaginable" and expresses doubt that they '
           'will provide a loving and supportive environment for Harry. Her '
           'concern stems from her understanding that Harry, as a young child '
           'who has just lost his parents, will need care and compassion, '
           'which she fears the Dursleys are incapable of providing.'}
"<class 'dict'>:"
{'query': 'What did the president say about justice Breyer?',
 'result': 'The President honored Justice Stephen Breyer, describing him as an '
           'Army veteran, Constitutional scholar, and retiring Justice of the '
           'Unite

In [None]:
from langchain.agents import AgentType, Tool

# define tools
tools = [
    Tool(
        name="State of Union QA System",
        func=state_of_union.run,
        description="useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.",
    ),
    Tool(
        name="Harry Potter QA System",
        func=harry_potter.run,
        description="useful for when you need to answer questions about Harry Potter. Input should be a fully formed question.",
    ),
]

Now we can create the Agent giving both chains as tools. 

In [None]:
from langchain import hub
from langchain.agents import create_react_agent, AgentExecutor
from langchain.memory import ConversationBufferMemory

prompt = hub.pull("hwchase17/react")

llm = ChatOpenAI(
    model=CHAT_MODEL,
)
agent = create_react_agent(
    llm,
    tools,
    prompt=prompt,
)
agent_executor = AgentExecutor.from_agent_and_tools(
    agent=agent, tools=tools, verbose=True, memory=ConversationBufferMemory(memory_key="chat_history", return_messages=True)
)

In [None]:
# If you find the agent is stuck, you can try other more powerful model, like DeepSeek
agent_executor.invoke(
    {
        "input": "What did the president say about justice Breyer?",
    }
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThe question is about a statement made by the president, specifically regarding Justice Breyer. This is likely related to the most recent State of the Union address.  
Action: State of Union QA System  
Action Input: What did the president say about Justice Breyer?  
[0m[36;1m[1;3mThe President honored Justice Stephen Breyer, describing him as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. He thanked Justice Breyer for his dedicated service to the country. Additionally, the President mentioned that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to continue Justice Breyer’s legacy of excellence on the Supreme Court.[0m[32;1m[1;3mI now know the final answer.  
Final Answer: The President honored Justice Stephen Breyer, describing him as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. He thanked Justice B

{'input': 'What did the president say about justice Breyer?',
 'chat_history': [HumanMessage(content='What did the president say about justice Breyer?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The President honored Justice Stephen Breyer, describing him as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. He thanked Justice Breyer for his dedicated service to the country and mentioned that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to continue Justice Breyer’s legacy of excellence on the Supreme Court.', additional_kwargs={}, response_metadata={})],
 'output': 'The President honored Justice Stephen Breyer, describing him as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. He thanked Justice Breyer for his dedicated service to the country and mentioned that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to continue Justice B

In [None]:
agent_executor.invoke(
    {
        "input": "Why does McGonagall seem concerned about Harry being raised by the Dursleys?"
    }
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThis question is about Harry Potter, so I should use the Harry Potter QA System to find the answer.  
Action: Harry Potter QA System  
Action Input: Why does McGonagall seem concerned about Harry being raised by the Dursleys?  
[0m[33;1m[1;3mProfessor McGonagall seems concerned about Harry being raised by the Dursleys because she has observed their behavior and finds them to be neglectful and unkind. She mentions that they are "the worst sort of Muggles" and expresses doubt that they will provide Harry with a loving and nurturing environment. Her concern is rooted in her understanding of the Dursleys' character and her desire for Harry to grow up in a supportive and caring home, especially after the tragic loss of his parents.[0m[32;1m[1;3mI now know the final answer.  
Final Answer: Professor McGonagall is concerned about Harry being raised by the Dursleys because she has observed their neglectful and unkind behavior, 

{'input': 'Why does McGonagall seem concerned about Harry being raised by the Dursleys?',
 'chat_history': [HumanMessage(content='What did the president say about justice Breyer?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The President honored Justice Stephen Breyer, describing him as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. He thanked Justice Breyer for his dedicated service to the country and mentioned that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to continue Justice Breyer’s legacy of excellence on the Supreme Court.', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Why does McGonagall seem concerned about Harry being raised by the Dursleys?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Professor McGonagall is concerned about Harry being raised by the Dursleys because she has observed their neglectful and unkind behavior, describing th

We can see that the agent can "smartly" choose which QA system to use given a specific question. 

## 3 Your Task: putting it all together: Langchain with Memory

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain.embeddings import HuggingFaceEmbeddings
import os
import glob

#### Your Task ####
# This is a major task that requires some thinking and time. 
# Build a conversation system from a collection of research papers of your choice. 
# You can ask specific questions of a method about these papers, and the agent returns a brief answer to you (with no more than 100 words). 
# Save your data and ChromaDB in the /ssdshare/llm-course/<YOUR-NAME> directory so other people can use it. 
# Provide at least three query examples so the TAs can review your work. 
# You may use any tool from the past four labs or from the langchain docs, or any open source project. 
# write a summary (a Markdown cell) at the end of the notebook summarizing what works and what does not. 

# Load the research papers and create a conversational system

# 1. First, load the PDF files from the specified directory

# Directory to save our database
output_dir = "/ssdshare/llm-course/gzh/papers_db"
os.makedirs(output_dir, exist_ok=True)

# Load PDF files
papers_dir = "/ssdshare/llm-course/gzh"  # Changed to absolute path
pdf_files = glob.glob(os.path.join(papers_dir, "*.pdf"))

all_docs = []
for pdf_file in pdf_files:
    loader = PyPDFLoader(pdf_file)
    documents = loader.load()
    for doc in documents:
        # Add source filename to metadata
        doc.metadata["source"] = os.path.basename(pdf_file)
    all_docs.extend(documents)

# 2. Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = text_splitter.split_documents(all_docs)

# 3. Create embeddings and store in ChromaDB
baai_embedding = OpenAIEmbeddings(
    model="BAAI/bge-m3",
    base_url=os.environ.get("SF_BASE_URL"),
    api_key=os.environ.get("SF_API_KEY"),
)

papers_db = Chroma(
    embedding_function=baai_embedding,
    persist_directory=output_dir,
    collection_name="quantum_gravity_papers"
)
papers_db.reset_collection()

# Embed and store chunks in batches to avoid memory issues
batch_size = 50
for i in range(0, len(chunks), batch_size):
    batch_end = min(i + batch_size, len(chunks))
    print(f"Adding chunks {i} to {batch_end-1}")
    papers_db.add_documents(chunks[i:batch_end])
# 4. Build a conversational retrieval chain with memory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    output_key="answer",  # Add this line to specify which output to save
    return_messages=True
)

llm = ChatOpenAI(temperature=0, model=CHAT_MODEL)

# Custom prompt to ensure brief answers
qa_prompt_template = """You are a helpful AI assistant that answers questions about research papers on quantum gravity and philosophy of science.
Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say you don't know. Don't make up an answer.
Keep your answer brief and concise - no more than 100 words.

Context: {context}

Question: {question}

Your answer (maximum 100 words):"""
qa_prompt = PromptTemplate.from_template(qa_prompt_template)

# Create a retrieval chain
retriever = papers_db.as_retriever(search_kwargs={"k": 6})
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory,
    combine_docs_chain_kwargs={"prompt": qa_prompt},
    verbose=True,
    return_source_documents=True
)

# Demo function to show conversation capability
def chat_with_papers(questions):
    for i, query in enumerate(questions):
        print(f"\nQuestion {i+1}: {query}")
        result = qa_chain({"question": query})
        print(f"Answer: {result['answer']}")
        print("Sources:")
        for j, doc in enumerate(result['source_documents'][:2]):
            print(f"  Source {j+1}: {doc.metadata.get('source', 'Unknown')} - Page {doc.metadata.get('page', 'Unknown')}")

# Example queries to demonstrate system capabilities
example_conversation = [
    "What is quantum gravity and why is it important?",
    "How did Einstein contribute to this field?",
    "What philosophical challenges does quantum gravity present?",
    "How does Kuhn's theory of scientific revolutions relate to quantum gravity research?"
]

print("\nDemonstrating conversational capability:")
chat_with_papers(example_conversation)

Ignoring wrong pointing object 2 65536 (offset 0)
Ignoring wrong pointing object 8 65536 (offset 0)
Ignoring wrong pointing object 13 65536 (offset 0)
Ignoring wrong pointing object 18 65536 (offset 0)
Ignoring wrong pointing object 23 65536 (offset 0)
Ignoring wrong pointing object 28 65536 (offset 0)
Ignoring wrong pointing object 33 65536 (offset 0)
Ignoring wrong pointing object 38 65536 (offset 0)
Ignoring wrong pointing object 43 65536 (offset 0)
Ignoring wrong pointing object 48 65536 (offset 0)
Ignoring wrong pointing object 53 65536 (offset 0)
Ignoring wrong pointing object 58 65536 (offset 0)
Ignoring wrong pointing object 63 65536 (offset 0)
Ignoring wrong pointing object 68 65536 (offset 0)
Ignoring wrong pointing object 73 65536 (offset 0)
Ignoring wrong pointing object 78 65536 (offset 0)
Ignoring wrong pointing object 83 65536 (offset 0)
Ignoring wrong pointing object 88 65536 (offset 0)


Adding chunks 0 to 49
Adding chunks 50 to 99
Adding chunks 100 to 149
Adding chunks 150 to 199
Adding chunks 200 to 249
Adding chunks 250 to 299
Adding chunks 300 to 349
Adding chunks 350 to 399
Adding chunks 400 to 449
Adding chunks 450 to 499
Adding chunks 500 to 549
Adding chunks 550 to 599
Adding chunks 600 to 649
Adding chunks 650 to 699
Adding chunks 700 to 749
Adding chunks 750 to 799
Adding chunks 800 to 849
Adding chunks 850 to 899
Adding chunks 900 to 949
Adding chunks 950 to 999
Adding chunks 1000 to 1049
Adding chunks 1050 to 1099
Adding chunks 1100 to 1149
Adding chunks 1150 to 1199
Adding chunks 1200 to 1249
Adding chunks 1250 to 1299
Adding chunks 1300 to 1349
Adding chunks 1350 to 1399
Adding chunks 1400 to 1428


  memory = ConversationBufferMemory(
  result = qa_chain({"question": query})



Demonstrating conversational capability:

Question 1: What is quantum gravity and why is it important?


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a helpful AI assistant that answers questions about research papers on quantum gravity and philosophy of science.
Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say you don't know. Don't make up an answer.
Keep your answer brief and concise - no more than 100 words.

Context: The starting point in the covariant approach is a Lagrangian for gravity (with
matter or without, giving pure gravity). The Einstein-Hilbert Lagrangian is:

care of the various problems cases facing Popper’s more stringent methodological
model. However, the central principle guiding the construction of quantum gravity
directed research programmes are precisely mathematical consistency and the ab-
sence of anomali

### Summary of Quantum Gravity Research Papers Conversational System

#### What Works
- Successfully loaded and processed PDF research papers on quantum gravity and philosophy of science
- Effective document chunking using RecursiveCharacterTextSplitter (500-character chunks with 100-character overlap)
- Vector embeddings created with BAAI/bge-m3 model and stored in ChromaDB
- Implemented conversation memory to maintain context across questions
- Custom prompt template ensures concise answers (100 words maximum)
- Retrieval system fetches relevant context (k=6) to support accurate responses
- Returns source documents for answer verification and attribution

#### What Could Be Improved
- Limited corpus size may restrict the breadth of answerable questions
- No filtering mechanism to prioritize more authoritative or recent papers
- Batch processing of embeddings could be optimized for larger document collections
- No semantic search fallback when exact matches aren't found
- Could implement more sophisticated retrieval techniques (map-reduce) for complex questions
- No evaluation metrics to assess answer quality and relevance systematically

The system demonstrates effective RAG capabilities for specialized academic content but would benefit from expanded corpus size and advanced retrieval methods for complex theoretical questions about quantum gravity and philosophy of science.

The code and the report is generated by copilot (Claude 3.7 Sonnet Thinking), using prompt **"finish the task using papers in /ssdshare/llm_course/gzh: 'Ashtekar. WindingRoadToQG.pdf' ...(other papers omitted)"** and **"write a summary of the task"**. I modified the code a little bit to fix the problem of oversized batch (~800 compared to limit of 64).