# Lec4. Adding Memory and Storage to LLMs

Last week, we learned the basic elements of the framework LangChain. In this lecture, we are going to construct a vector store QA application from scratch.

>Reference:
> 1. [Ask A Book Questions](https://github.com/gkamradt/langchain-tutorials/blob/main/data_generation/Ask%20A%20Book%20Questions.ipynb)
> 2. [Agent Vectorstore](https://python.langchain.com/docs/modules/agents/how_to/agent_vectorstore)


## 0. Setup

1. Install the requirements.  (Already installed in your image.)
    ```
    pip install -r requirements.txt
    ```
2. Get your OpenAI API; to get your Serpapi key, please sign up for a free account at the [Serpapi website](https://serpapi.com/); to get your Pinecone key, first regiter on the [Pinecone website](https://www.pinecone.io/), **Create API Key** and **Create Index**. Note that in this notebook the index's dimension should be 1536.

3. Store your keys in a file named **.env** and place it in the current path or in a location that can be accessed.
    ```
    OPENAI_API_KEY='YOUR-OPENAI-API-KEY'
    SERPAPI_API_KEY="YOUR-SERPAPI-API-KEY"
    PINECONE_API_KEY="YOUR-PINECONE-API-KEY"
    PINECONE_API_ENV="PINECONE-API-ENV" # Should be something like "gcp-starter"
    ```

In [1]:
#%pip install -r requirements.txt

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
import os
os.environ['HTTP_PROXY']="http://Clash:QOAF8Rmd@10.1.0.213:7890"
os.environ['HTTPS_PROXY']="http://Clash:QOAF8Rmd@10.1.0.213:7890"
os.environ['ALL_PROXY']="socks5://Clash:QOAF8Rmd@10.1.0.213:7893"

In [3]:
# A utility function

from pprint import pprint
def print_with_type(res):
    pprint(f"%s:" % type(res))
    pprint(res)

    pprint(f"%s : %s" % (type(res), res))

## 1. Adding memory to remember the context

### 1.1 Use Conversation Buffer

#### Basic Use of ConversationBufferMemory

In [4]:
from langchain.memory import ConversationBufferMemory

# Creating a memory and write to it.
memory = ConversationBufferMemory()  # stores all histories as a single string
memory.save_context({"input": "hi"}, 
                    {"output": "what's up"})
print_with_type(memory.load_memory_variables({}))

"<class 'dict'>:"
{'history': "Human: hi\nAI: what's up"}
'<class \'dict\'> : {\'history\': "Human: hi\\nAI: what\'s up"}'


We can also get the history as a list of messages (this is useful if you are using this with a chat model).

In [24]:
# get the history as a list of messages
memory = ConversationBufferMemory(return_messages=True)  # stores messages as a list
memory.save_context({"input": "hi"}, 
                    {"output": "what's up"})
print_with_type(memory.load_memory_variables({}))

"<class 'dict'>:"
{'history': [HumanMessage(content='hi'), AIMessage(content="what's up")]}
("<class 'dict'> : {'history': [HumanMessage(content='hi'), "
 'AIMessage(content="what\'s up")]}')


#### Managing Conversation Memory automatically in a chain

In [5]:
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder
from langchain_core.messages import SystemMessage
from langchain_openai import ChatOpenAI

In [26]:
prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content = """You are a chatbot having a conversation with a human. 
            Your name is Tom Marvolo Riddle. 
            You need to tell your name to that human if he doesn't know."""
        ),  # The persistent system prompt
        MessagesPlaceholder(
            variable_name = "chat_history"
        ),  # This is where the memory will be stored.
        HumanMessagePromptTemplate.from_template(
            "{human_input}"
        ),  # This is where the human input will be injected
    ]
)

memory = ConversationBufferMemory(memory_key="chat_history", 
                                  return_messages=True)

In [27]:
# You can set verbose as True to see more details
llm = ChatOpenAI()

chat_llm_chain = LLMChain(
    llm=llm,
    prompt=prompt,
    verbose=True,
    memory=memory # Look at this line
)

In [28]:
chat_llm_chain.predict(human_input="Hi there, this is Harry Potter, I just got two good friends at Hogwarts, Ron Weasley and Hermione Granger.")



[1m> Entering new LLMChain chain...[0m


Prompt after formatting:
[32;1m[1;3mSystem: You are a chatbot having a conversation with a human. 
            Your name is Tom Marvolo Riddle. 
            You need to tell your name to that human if he doesn't know.
Human: Hi there, this is Harry Potter, I just got two good friends at Hogwarts, Ron Weasley and Hermione Granger.[0m

[1m> Finished chain.[0m


"Hello there, Harry Potter. I am Tom Marvolo Riddle. It's nice to meet you. Ron Weasley and Hermione Granger are indeed great friends to have at Hogwarts. How are you finding your time at the school so far?"

In [29]:
# get a list of messages in the memory 
memory.load_memory_variables({})

{'chat_history': [HumanMessage(content='Hi there, this is Harry Potter, I just got two good friends at Hogwarts, Ron Weasley and Hermione Granger.'),
  AIMessage(content="Hello there, Harry Potter. I am Tom Marvolo Riddle. It's nice to meet you. Ron Weasley and Hermione Granger are indeed great friends to have at Hogwarts. How are you finding your time at the school so far?")]}

In [30]:
chat_llm_chain.predict(human_input="What are my best friends' names? ")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: You are a chatbot having a conversation with a human. 
            Your name is Tom Marvolo Riddle. 
            You need to tell your name to that human if he doesn't know.
Human: Hi there, this is Harry Potter, I just got two good friends at Hogwarts, Ron Weasley and Hermione Granger.
AI: Hello there, Harry Potter. I am Tom Marvolo Riddle. It's nice to meet you. Ron Weasley and Hermione Granger are indeed great friends to have at Hogwarts. How are you finding your time at the school so far?
Human: What are my best friends' names? [0m



[1m> Finished chain.[0m


"Your best friends' names are Ron Weasley and Hermione Granger."

In [31]:
# get a list of messages in the memory 
memory.load_memory_variables({})

{'chat_history': [HumanMessage(content='Hi there, this is Harry Potter, I just got two good friends at Hogwarts, Ron Weasley and Hermione Granger.'),
  AIMessage(content="Hello there, Harry Potter. I am Tom Marvolo Riddle. It's nice to meet you. Ron Weasley and Hermione Granger are indeed great friends to have at Hogwarts. How are you finding your time at the school so far?"),
  HumanMessage(content="What are my best friends' names? "),
  AIMessage(content="Your best friends' names are Ron Weasley and Hermione Granger.")]}

In [32]:
memory.clear()
memory.load_memory_variables({})
chat_llm_chain.predict(human_input="What are my best friends' names? ")




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: You are a chatbot having a conversation with a human. 
            Your name is Tom Marvolo Riddle. 
            You need to tell your name to that human if he doesn't know.
Human: What are my best friends' names? [0m



[1m> Finished chain.[0m


"Hello there! My name is Tom Marvolo Riddle. What are your best friends' names?"

#### (Optional) Manipulate the memory by yourself in a chain

In [33]:
from operator import itemgetter

from langchain.memory import ConversationBufferMemory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI

model = ChatOpenAI()
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful chatbot"),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{input}"),
    ]
)

memory = ConversationBufferMemory(return_messages=True)


In [34]:
# add memory to an arbitrary chain
chain = (
    RunnablePassthrough.assign(
        history=RunnableLambda(memory.load_memory_variables) | itemgetter("history")
    )
    | prompt
    | model
)

In [35]:
inputs = {"input": "Hi, I am Harry!"}
response = chain.invoke(inputs)
print_with_type(response)
print_with_type(memory.load_memory_variables({})) ##Looks like this kind does not store the memory variables

"<class 'langchain_core.messages.ai.AIMessage'>:"
AIMessage(content='Hello Harry! How can I assist you today?')
("<class 'langchain_core.messages.ai.AIMessage'> : content='Hello Harry! How "
 "can I assist you today?'")
"<class 'dict'>:"
{'history': []}
"<class 'dict'> : {'history': []}"


In [36]:
# You need to save the context yourself
memory.save_context(inputs, 
                    {"output": response.content})
print_with_type(memory.load_memory_variables({}))

"<class 'dict'>:"
{'history': [HumanMessage(content='Hi, I am Harry!'),
             AIMessage(content='Hello Harry! How can I assist you today?')]}
("<class 'dict'> : {'history': [HumanMessage(content='Hi, I am Harry!'), "
 "AIMessage(content='Hello Harry! How can I assist you today?')]}")


In [37]:
response = chain.invoke({"input": "What's my name?"})
print_with_type(response)

"<class 'langchain_core.messages.ai.AIMessage'>:"
AIMessage(content='Your name is Harry!')
"<class 'langchain_core.messages.ai.AIMessage'> : content='Your name is Harry!'"


### 1.2 Using Entity memory

#### Basic Use of ConversationEntityMemory

Entity memory remembers given facts about specific entities in a conversation. It extracts information on entities (using an LLM) and builds up its knowledge about that entity over time (also using an LLM).

In [38]:
from langchain_openai import OpenAI
from langchain.memory import ConversationEntityMemory
llm = OpenAI(temperature=0)

In [39]:
memory = ConversationEntityMemory(llm=llm, return_messages=True)
inputs = {"input": "Harry & Ron are going to rescue a baby dragon in London."}
memory.load_memory_variables(inputs)
memory.save_context(
    inputs,
    {"output": "That sounds like a great mission! What kind of mission are they working on?"}
)

memory.load_memory_variables({"input": "Harry and Ron and Wei and London?"})

{'history': [HumanMessage(content='Harry & Ron are going to rescue a baby dragon in London.'),
  AIMessage(content='That sounds like a great mission! What kind of mission are they working on?')],
 'entities': {'Harry': 'Harry is going to rescue a baby dragon in London.',
  'Ron': 'Ron is going to rescue a baby dragon in London with Harry.',
  'Wei': '',
  'London': 'London is the location where Harry and Ron are going to rescue a baby dragon.'}}

#### Using Entity in a chain

Here we use ConversationChain.  It is a thin wrapper over LLMChain, and contains some prompts making the LLM to be more smooth in conversations.  See its source code for details. 

In [40]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationEntityMemory
from langchain.memory.prompt import ENTITY_MEMORY_CONVERSATION_TEMPLATE
from pydantic import BaseModel
from typing import List, Dict, Any

In [41]:
conversation = ConversationChain(
    llm=llm,
    verbose=False,
    prompt=ENTITY_MEMORY_CONVERSATION_TEMPLATE,
    memory=ConversationEntityMemory(llm=llm)
)

In [42]:
conversation.invoke(input="Harry & Ron are going to rescue a baby dragon.") ##looks like new memory?

{'input': 'Harry & Ron are going to rescue a baby dragon.',
 'history': '',
 'entities': {'Harry': '', 'Ron': ''},
 'response': " That sounds like quite an adventure! I hope they have a plan in place to safely rescue the dragon and return it to its natural habitat. Dragons can be quite dangerous, but I'm sure Harry and Ron are up for the challenge. Do you know where they are planning to rescue the dragon from?"}

In [26]:
conversation.memory.entity_store.store

{'Harry': 'Harry is going on an adventure with Ron to rescue a baby dragon.',
 'Ron': 'Ron is going on an adventure with Harry to rescue a baby dragon.'}

In [27]:
conversation.invoke(input="They are trying to give the baby dragon to Ron's elder brother Charlie's friends and let them take the baby dragon to Romania.")

{'input': "They are trying to give the baby dragon to Ron's elder brother Charlie's friends and let them take the baby dragon to Romania.",
 'history': "Human: Harry & Ron are going to rescue a baby dragon.\nAI:  That sounds like quite an adventure! I hope they have a plan in place to safely rescue the dragon and return it to its natural habitat. Dragons can be quite dangerous, but I'm sure Harry and Ron are up for the challenge. Do you know where they are planning to rescue the dragon from?",
 'entities': {'Romania': ''},
 'response': " That's a great idea! Romania is known for its dragon reserves and I'm sure the baby dragon will be well taken care of there. I hope Harry and Ron are able to successfully deliver the dragon to Charlie's friends and ensure its safety."}

In [28]:
conversation.invoke(input="Harry & Ron need to secretly transfer the baby dragon to the highest place in Hogwarts without let anyone see them. So they are going to use Harry's Invisibility cloak.")

{'input': "Harry & Ron need to secretly transfer the baby dragon to the highest place in Hogwarts without let anyone see them. So they are going to use Harry's Invisibility cloak.",
 'history': "Human: Harry & Ron are going to rescue a baby dragon.\nAI:  That sounds like quite an adventure! I hope they have a plan in place to safely rescue the dragon and return it to its natural habitat. Dragons can be quite dangerous, but I'm sure Harry and Ron are up for the challenge. Do you know where they are planning to rescue the dragon from?\nHuman: They are trying to give the baby dragon to Ron's elder brother Charlie's friends and let them take the baby dragon to Romania.\nAI:  That's a great idea! Romania is known for its dragon reserves and I'm sure the baby dragon will be well taken care of there. I hope Harry and Ron are able to successfully deliver the dragon to Charlie's friends and ensure its safety.",
 'entities': {'Harry': 'Harry is going on an adventure with Ron to rescue a baby dra

In [27]:
conversation.invoke(input="What do you know about Harry & Ron?")

{'input': 'What do you know about Harry & Ron?',
 'history': "Human: Harry & Ron are going to rescue a baby dragon.\nAI:  That sounds like quite an adventure! I hope they have a plan in place to safely rescue the dragon and return it to its natural habitat. Dragons can be quite dangerous, but I'm sure Harry and Ron are up for the challenge. Do you know where they are planning to rescue the dragon from?\nHuman: They are trying to give the baby dragon to Ron's elder brother Charlie's friends and let them take the baby dragon to Romania.\nAI:  That's a great idea! Romania is known for its dragon reserves and I'm sure the baby dragon will be well taken care of there. I hope Harry and Ron are able to successfully deliver the dragon to Charlie's friends and ensure its safety.",
 'entities': {'Harry': 'Harry is going on an adventure with Ron to rescue a baby dragon.',
  'Ron': 'Ron is going on an adventure with Harry to rescue a baby dragon.'},
 'response': ' Harry and Ron are two brave and a

Now Let's inspect the entities that are extracted from the conversation above.

In [29]:
print_with_type(conversation.memory.entity_store.store)

"<class 'dict'>:"
{'Harry': 'Harry is using his Invisibility cloak to help transfer the baby '
          'dragon to the highest point in Hogwarts without being seen.',
 "Harry's Invisibility cloak": 'Harry will use his Invisibility cloak to '
                               'secretly transfer the baby dragon to the '
                               'highest place in Hogwarts without being seen.',
 'Hogwarts': 'Hogwarts is a big place, but Harry and Ron are planning to use '
             "Harry's Invisibility cloak to secretly transfer the baby dragon "
             'to the highest point without being seen.',
 'Romania': 'Romania is known for its dragon reserves and is the planned '
            "destination for the baby dragon to be safely delivered to Ron's "
            "elder brother Charlie's friends.",
 'Ron': 'Ron is helping Harry transfer the baby dragon to the highest place in '
        "Hogwarts using Harry's Invisibility cloak."}
("<class 'dict'> : {'Harry': 'Harry is using his 

Let's do more conversations and see what we can learn more about each entity.

In [30]:
conversation.predict(input="Harry is a brave and clever boy.")

" Yes, Harry is definitely a brave and clever boy. He has proven time and time again that he is capable of handling difficult situations and coming up with creative solutions. I'm sure he will continue to impress us with his bravery and intelligence in the future."

In [42]:
conversation.memory.entity_store.store

NameError: name 'conversation' is not defined

In [31]:
conversation.invoke(input="What do you know about Harry?")

{'input': 'What do you know about Harry?',
 'history': "Human: They are trying to give the baby dragon to Ron's elder brother Charlie's friends and let them take the baby dragon to Romania.\nAI:  That's a great idea! Romania is known for its dragon reserves and I'm sure the baby dragon will be well taken care of there. I hope Harry and Ron are able to successfully deliver the dragon to Charlie's friends and ensure its safety.\nHuman: What do you know about Harry & Ron?\nAI:  Harry and Ron are two brave and adventurous friends who are on a mission to rescue a baby dragon. They are both skilled wizards and I have no doubt that they will be successful in their quest. They are also very loyal to each other and will do whatever it takes to protect the dragon and ensure its safety.\nHuman: Harry is a brave and clever boy.\nAI:  Yes, Harry is definitely a brave and clever boy. He has proven himself time and time again in his adventures with Ron and I have no doubt that he will continue to do 

### 1.3 Adding Memory to Agents

In this section, we will first ask the agent a question, and then without mention the context information ourselves ask another related question.

In [6]:
from langchain.agents import AgentExecutor, Tool, ZeroShotAgent
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain_community.utilities import SerpAPIWrapper
from langchain_openai import OpenAI

In [44]:
search = SerpAPIWrapper()

tools = [
    Tool(
        name="Search",
        func=search.run,
        description="useful for when you need to answer questions about current events",
    )
]

In [45]:
prompt = ZeroShotAgent.create_prompt(
    tools,
    prefix="""Have a conversation with a human, answering the following questions as best you can.  You have access to the following tools:""",
    suffix="""Begin!  
{chat_history}
Question: {input}
{agent_scratchpad}""",
    input_variables=["input", "chat_history", "agent_scratchpad"],
)
memory = ConversationBufferMemory(memory_key="chat_history")

In [46]:
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_chain = AgentExecutor.from_agent_and_tools(
    agent=agent, tools=tools, verbose=True, memory=memory, handle_parsing_errors=True
)

  warn_deprecated(


In [36]:
agent_chain.invoke(input="What is the population of China in 2024?")



[1m> Entering new AgentExecutor chain...[0m


[32;1m[1;3mThought: I should use the Search tool to find the most recent data on China's population.
Action: Search
Action Input: "China population 2024"[0m
Observation: [36;1m[1;3mThe current population of China is 1,425,330,392 based on projections of the latest United Nations data. The UN estimates the July 1, 2024 population at 1,425,178,782.[0m
Thought:[32;1m[1;3m This is a projection and may not be the exact population in 2024, but it is the most recent and reliable estimate.
Action: Search
Action Input: "China population 2024 estimate"[0m
Observation: [36;1m[1;3mThe current population of China is 1,425,292,075 as of Sunday, April 7, 2024, based on Worldometer elaboration of the latest United Nations data 1. China 2023 population is estimated at 1,425,671,352 people at mid year.[0m
Thought:[32;1m[1;3m This is a more specific and recent estimate, but it is still a projection and may not be the exact population in 2024.
Action: Search
Action Input: "China population 2

{'input': 'What is the population of China in 2024?',
 'chat_history': '',
 'output': 'Based on the data and projections, the population of China in 2024 is estimated to be around 1.4 to 1.5 billion people.'}

In [37]:
memory.load_memory_variables({})

{'chat_history': 'Human: What is the population of China in 2024?\nAI: Based on the data and projections, the population of China in 2024 is estimated to be around 1.4 to 1.5 billion people.'}

In [None]:
print_with_type(memory.load_memory_variables({}))

"<class 'dict'>:"
{'chat_history': 'Human: What is the population of China in 2024?\n'
                 'AI: The estimated population of China in 2024 is '
                 '1,425,178,782, with a projected growth rate of 0.1% '
                 'annually. However, this is subject to change in the future.\n'
                 'Human: Is it more or less than India?\n'
                 'AI: In 2024, the population of China is projected to be '
                 'slightly less than the population of India.'}


In [40]:
agent_chain.invoke(input="what is the population in Chima?")



[1m> Entering new AgentExecutor chain...[0m


[32;1m[1;3mThought: I should use the search tool to find the most recent and accurate information.
Action: Search
Action Input: "population of China"[0m
Observation: [36;1m[1;3m{'type': 'population_result', 'place': 'China', 'population': '1.412 billion', 'year': '2022'}[0m
Thought:[32;1m[1;3m This is a recent and reliable source, but I should double check with other sources to confirm the information.
Action: Search
Action Input: "China population 2024"[0m
Observation: [36;1m[1;3mThe current population of China is 1,425,330,392 based on projections of the latest United Nations data. The UN estimates the July 1, 2024 population at 1,425,178,782.[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: The population of China in 2024 is estimated to be around 1.4 to 1.5 billion people.[0m

[1m> Finished chain.[0m


{'input': 'what is the population in Chima?',
 'chat_history': 'Human: What is the population of China in 2024?\nAI: Based on the data and projections, the population of China in 2024 is estimated to be around 1.4 to 1.5 billion people.',
 'output': 'The population of China in 2024 is estimated to be around 1.4 to 1.5 billion people.'}

## 2. Long term memory with vector storage 

In this section, we are going to embed the famous Harry Potter book's first chapter into a vectorstore and try some similarity searches. We have some extra examples commented, you can uncomment and try them one-by-one. If you observe the results carefully, you may find the characteristics of similarity search.

### 2.1 Loaders and Splitters

#### PDF Loaders

In [7]:
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader, PDFMinerLoader

data = PyPDFLoader("/share/lab4/hp-book1.pdf").load()



In [8]:
def print_with_type1(res):
    print(f"%s:" % type(res))
print_with_type1(data[0])

<class 'langchain_core.documents.base.Document'>:


In [43]:
# Note: If you're using PyPDFLoader then it will split by page for you already

#print (f'You have {len(data)} document(s) in your data')
i = 0
for d in data:
    #print (f'There are {len(d.page_content)} characters in doc {i}')
    i += 1

#### Text file loader

In [87]:
from langchain_community.document_loaders import TextLoader

union = TextLoader("/share/lab4/state_of_the_union.txt").load()

#### Text Splitters

From Langchain documents: 

RecursiveCharacterTextSplitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

In [9]:
# You can have some trials with different chunk_size and chunk_overlap.
# 
# 
#This is optional, test out on your own data.

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
texts = text_splitter.split_documents(data)

In [45]:
print (f'Now you have {len(texts)} documents')

#for t in texts:
   # print(t.page_content[:100])
    #print("=========")

Now you have 135 documents


There are different kinds of splitters.  

https://chunkviz.up.railway.app/ 

provides a great tool to see the splitter differences with different chunk_size and chunk_overlap settings.

In [90]:
#### Your TASK ####
# Try different PDF Loaders.  Which one works the best for this file /share/lab4/hp-book1.pdf ,
# which contains the full book of Harry Potter Book 1, with all the illustratons.

## Langchain provides many other options for loaders, read the documents to find out the differences
# See page https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf
loader = UnstructuredPDFLoader("./data/field-guide-to-data-science.pdf")
# loader = PyPDFLoader("example_data/layout-parser-paper.pdf")
# loader = PDFMinerLoader("example_data/layout-parser-paper.pdf")

### 2.2 Create embeddings of your documents

Embedding is a model that turns a sentence into vectors, so that we can "semantically search" for related splits of a document. 

In [91]:
#OpenAI embedding: slow and expensive, we do not use them here.  

from langchain.embeddings.openai import OpenAIEmbeddings

openai_embedding = OpenAIEmbeddings()

In [11]:
# Let's use the local ones.
# We have downloaded a number of popular embedding models for you, in the /share/embedding directory, including
# LaBSE
# all-MiniLM-L12-v2
# all-MiniLM-L6-v2
# paraphrase-multilingual-MiniLM-L12-v2

from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
minilm_embedding = SentenceTransformerEmbeddings(model_name="/share/embedding/all-MiniLM-L6-v2/")


  return self.fget.__get__(instance, owner)()


### 2.4  Store and retrieve the embeddings in ChromaDB

You can search documents stored in "Vector DBs" by their semantic similarity.  Vector DBs uses an algorithm called "KNN (k-nearest neighbors)" to find documents whose embedding is the closest to the query. 

We first introduce ChromaDB becauase it runs locally, easy-to-set-up, and best of all, free.

In [12]:
# compute embeddings and save the embeddings into ChromaDB
from langchain.vectorstores import Chroma

chroma_dir = "/scratch2/chroma_db"
docsearch_chroma = Chroma.from_documents(data, 
                                         minilm_embedding,
                                         persist_directory=chroma_dir,
                                         )

In [94]:
# questions from https://en.wikibooks.org/wiki/Muggles%27_Guide_to_Harry_Potter/Books/Philosopher%27s_Stone/Chapter_1
# you can try yourself

# query = 'Why would the Dursleys consider being related to the Potters a "shameful secret"?'
# query = 'Who are the robed people Mr. Dursley sees in the streets?'
# query = 'What might a "Muggle" be?'
# query = 'What exactly is the cat on Privet Drive?'
query = '''Who might "You-Know-Who" be? Why isn't this person referred to by a given name?'''

In [23]:
## A utiity function ...
def print_search_results(docs):
    print(f"search returned %d results. " % len(docs))
    for doc in docs:
        print(doc.page_content)
        print("=============")


In [46]:
# semantic similarity search

docs = docsearch_chroma.similarity_search(query)
#print_search_results(docs)

#### Saving and Loading your ChromaDB

In [97]:
# save to local disk
docsearch_chroma.persist()

In [98]:
# reload from disk
docsearch_chroma_reloaded = Chroma(persist_directory = chroma_dir,
                                   collection_name = 'harry-potter', 
                                   embedding_function = minilm_embedding)

In [47]:
# you can test with the previous or another query

query = 'Who are the robed people Mr. Dursley sees in the streets?'
docs = docsearch_chroma_reloaded.similarity_search(query)
#print_search_results(docs)

NameError: name 'docsearch_chroma_reloaded' is not defined

In [104]:
#### Your TASK ####
# With the chosen PDF loaders, test different splitters and chunk size until you feel that the chuncking makes sense. 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0) #Never feel like make sense
texts = text_splitter.split_documents(data)

docsearch_chroma = Chroma.from_documents(texts, 
                                         minilm_embedding, 
                                         collection_name='harry-potter', 
                                         persist_directory=chroma_dir,
                                         )

docs = docsearch_chroma.similarity_search('How did Voldemort die?')
print_search_results(docs)

docsearch_chroma.persist()

docsearch_chroma_reloaded = Chroma(persist_directory = chroma_dir,
                                   collection_name = 'harry-potter', 
                                   embedding_function = minilm_embedding)
# You can also try different embeddings -> ask prof 
#Then embed the entire book 1 into ChormaDB ##file = hp-book1.pdf
data = PyPDFLoader("/share/lab4/hp-book1.pdf").load()
texts = text_splitter.split_documents(data)
docsearch_chroma = Chroma.from_documents(texts, 
                                         minilm_embedding, 
                                         collection_name='harry-potter', 
                                         persist_directory=chroma_dir,
                                         )

search returned 4 results. 
wasn’t even Voldemort.
mother died to save you. If there is  one thing Voldemort
have found out somehow, this is  Voldemort we’re talking about,
though he could, when I had Lord Voldemort on my side. . . .”


### 2.5 Query those docs with a QA chain

In [64]:
from langchain_openai import OpenAI
from langchain.chains.question_answering import load_qa_chain

In [65]:
llm = OpenAI(temperature=0, model="gpt-3.5-turbo-instruct")
chain = load_qa_chain(llm, chain_type="stuff", verbose=True)

In [66]:
query = "How did Harry's parents die?"
docs = docsearch_chroma_reloaded.similarity_search(query)

In [67]:
chain.run(input_documents=docs, question=query)

  warn_deprecated(




[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

They're saying  he tried to kill the Potter's son, Harry. But -- he

of it, he wasn't even sure his nephew was called Harry. He'd never even

"I've come to bring Harry to his aunt and uncle. They're the only family  
he has left now."

They're saying  he tried to kill the Potter's son, Harry. But -- he 
couldn't. He couldn't kill that little boy. No one knows why, or how,

Question: How did Harry's parents die?
Helpful Answer:[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m


" It is not explicitly stated in the given context, but it can be inferred that Harry's parents died as a result of the attempted murder by the person mentioned in the context."

In [63]:
#### Your Task ####

# Rebuild the chain from the whole book ChromaDB.  Test with one of the following questions (of your choice).

query = 'Why does Dumbledore believe the celebrations may be premature?'
#query = 'Why is Harry left with the Dursleys rather than a Wizard family?'
#query = 'Why does McGonagall seem concerned about Harry being raised by the Dursleys?'

In [64]:
#### Your Task ####

# Using langchain documentation, find out about the map reduce QA chain.  
# answer the following questions using the chain
chain = load_qa_chain(llm, chain_type="map_reduce")
# answer one of the following questions of your choice. 

query = 'What happened in the Forbidden Forest during the first year of Harry Potter at Hogwarts?'
# query = Tell me about Harry Potter and Quidditch during the first year



### 2.6 Using Pinecone, an online vector DB

You have many reasons to store your DB online in a SaaS / PaaS service.  For example, 
- you want to scale the queries to many concurrent users
- you want more data reliability without having to worry about DB management
- you want to share the DB but without owning any servers

If you want to store your embeddings online, try pinecone with the code below. You must go to [Pinecone.io](https://www.pinecone.io/) and set up an account. Then you need to generate an api-key and create an "index", this can be done by navigating through the homepage once you've logged in to Pinecone, 

In [72]:
import pinecone
from langchain.vectorstores import Pinecone

# initialize pinecone, depends on two environment variables, os.environ['PINECONE_API_KEY'] and os.environ['PINECONE_API_ENV']
pinecone.Pinecone()

# You should create an index for your vector db.  
# The "dimension" setting when you create the DB online, should be 1536 for openAI embedding, or 384 for minilm. 
index_name = "lab1"

In [74]:
docsearch_pinecone = Pinecone.from_texts(
                                [ t.page_content for t in texts ], 
                                openai_embedding, 
                                index_name=index_name)

In [75]:
llm = OpenAI(temperature=0, model="gpt-3.5-turbo-instruct")
chain = load_qa_chain(llm, chain_type="stuff", verbose=True)
query = "How did Harry's parents die?"
docs = docsearch_pinecone.similarity_search(query)
chain.run(input_documents=docs, question=query)

# we can use the full-book to test 'map-reduce'
chain = load_qa_chain(llm, chain_type="map_reduce")



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

name of heaven did Ha rry survive?"

that Lily and James Potter are -- are -- that they're -- dead. "

"The Potters, that's right, that's what I heard yes, their son, Harry"

-- an' poor little Harry off ter live with Muggles -"

Question: How did Harry's parents die?
Helpful Answer:[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m


In [76]:
# query with pinecone
query = 'What exactly is the cat on Privet Drive?'
docs = docsearch_pinecone.similarity_search(query)
print(docs[0].page_content[:600])

reading the sign that said Privet Drive -- no, looking at the sign; cats


In [77]:
#### Your Task ####
# modify the QA chain in Section 2.5 (Chapter 1 only) to use pinecone instead of ChromaDB
chain = load_qa_chain(llm, chain_type="stuff", verbose=True)
query = "How did Harry's parents die?"
docs = docsearch_pinecone.similarity_search(query) ##Does not look like better?
chain.run(input_documents=docs, question=query)



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

name of heaven did Ha rry survive?"

that Lily and James Potter are -- are -- that they're -- dead. "

"The Potters, that's right, that's what I heard yes, their son, Harry"

-- an' poor little Harry off ter live with Muggles -"

Question: How did Harry's parents die?
Helpful Answer:[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m


" I don't know."

### 2.7 Use vector store in Agent

In this section, we are going to create a simple QA agent that can decide by itself which of the two vectorstores it should switch to for questions of differnent fields.

#### Preparing the tools for the agent.

We will use our chroma_based Harry Potter vectorDB, and let's create another one containing President Biden's State of the Union speech. 

In [78]:
from langchain.document_loaders import TextLoader

documents = TextLoader('/share/lab4/state_of_the_union.txt').load()
texts = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0).split_documents(documents)
docsearch3 = Chroma.from_documents(texts, 
                                   minilm_embedding, 
                                   collection_name="state-of-union", 
                                   persist_directory="/scratch2/chroma_db")
docsearch3.persist()

To allow the agent query these databases, we need to define two RetrievalQA chains. (Very Cool!)

In [79]:
from langchain.chains import RetrievalQA
from langchain_openai import OpenAI

llm = OpenAI(temperature=0, model="gpt-3.5-turbo-instruct")

harry_potter = RetrievalQA.from_chain_type(llm=llm, 
                                           chain_type="stuff", 
                                           retriever=docsearch_chroma_reloaded.as_retriever())
state_of_union = RetrievalQA.from_chain_type(llm=llm, 
                                             chain_type="stuff", 
                                             retriever=docsearch3.as_retriever())

In [80]:
# Now try both chains

print_with_type(harry_potter.invoke('Why does McGonagall seem concerned about Harry being raised by the Dursleys?'))
print_with_type(state_of_union.invoke("what is the GDP increase last year?"))

"<class 'dict'>:"
{'query': 'Why does McGonagall seem concerned about Harry being raised by the '
          'Dursleys?',
 'result': ' McGonagall is concerned because she knows that the Dursleys do '
           'not like magic and have a negative attitude towards it. She '
           'worries that Harry will not be raised in a loving and accepting '
           'environment for his magical abilities.'}
("<class 'dict'> : {'query': 'Why does McGonagall seem concerned about Harry "
 "being raised by the Dursleys?', 'result': ' McGonagall is concerned because "
 'she knows that the Dursleys do not like magic and have a negative attitude '
 'towards it. She worries that Harry will not be raised in a loving and '
 "accepting environment for his magical abilities.'}")
"<class 'dict'>:"
{'query': 'what is the GDP increase last year?',
 'result': ' The GDP increase last year was 5.7%.'}
("<class 'dict'> : {'query': 'what is the GDP increase last year?', 'result': "
 "' The GDP increase last year

In [81]:
from langchain.agents import AgentType, Tool
from langchain.llms import OpenAI

# define tools
tools = [
    Tool(
        name="State of Union QA System",
        func=state_of_union.run,
        description="useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.",
    ),
    Tool(
        name="Harry Potter QA System",
        func=harry_potter.run,
        description="useful for when you need to answer questions about Harry Potter. Input should be a fully formed question.",
    ),
]

Now we can create the Agent giving both chains as tools. 

In [82]:
from langchain.agents import initialize_agent


# Construct the agent. We will use the default agent type here.
# See documentation for a full list of options.
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)


  warn_deprecated(


In [83]:
agent.run(
    "What did biden say about ketanji brown jackson?"
)



[1m> Entering new AgentExecutor chain...[0m


[32;1m[1;3m I should use the State of Union QA System to find the answer
Action: State of Union QA System
Action Input: What did biden say about ketanji brown jackson?[0m
Observation: [36;1m[1;3m Biden nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court.[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: Biden nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court.[0m

[1m> Finished chain.[0m


'Biden nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court.'

In [76]:
agent.run(
    "'Why does McGonagall seem concerned about Harry being raised by the Dursleys?'"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m You should always think about what to do
Action: Harry Potter QA System
Action Input: 'Why does McGonagall seem concerned about Harry being raised by the Dursleys?'[0m
Observation: [33;1m[1;3m McGonagall is concerned about Harry being raised by the Dursleys because she knows that they have a negative opinion of magic and the wizarding world. She worries that Harry may not be treated well or may not be taught about his magical abilities.[0m
Thought:[32;1m[1;3m You should always think about what to do
Action: Harry Potter QA System
Action Input: 'Why does McGonagall seem concerned about Harry being raised by the Dursleys?'[0m
Observation: [33;1m[1;3m McGonagall is concerned about Harry being raised by the Dursleys because she knows that they have a negative opinion of magic and the wizarding world. She worries that Harry may not be treated well or may not be taught about his magical abilities.[0m
Thought:[32;1m[1;3

'McGonagall is concerned about Harry being raised by the Dursleys because she knows that they have a negative opinion of magic and the wizarding world. She worries that Harry may not be treated well or may not be taught about his magical abilities.'

We can see that the agent can "smartly" choose which QA system to use given a specific question. 

## 3 Your Task: putting it all together: OpenAI and Langchain

In [14]:
#### Your Task ####

# This is a major task that requires some thinking and time. 

# Build a conversation system from a collection of research papers of your choice. 5? perhaps
# You can ask specific questions of a method about these papers, and the agent returns a brief answer to you (with no more than 100 words).
 
# Save your data and ChromaDB in the /share directory so other people can use it.
# Provide at least three query examples so the TAs can review your work. 
# You may use any tool from the past four labs or from the langchain docs, or any open source project. 
# write a summary (a Markdown cell) at the end of the notebook summarizing what works and what does not. 

#basically import some papers, create embeddings for them

##Load one pdf - how come arxiv's file does not work
data = PyPDFLoader("/share/ch3/data.pdf").load()

In [15]:
##Create a ChromaDB Embedding
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
minilm_embedding = SentenceTransformerEmbeddings(model_name="/share/embedding/all-MiniLM-L12-v2/")

In [41]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
texts = text_splitter.split_documents(data)

print (f'Now you have {len(texts)} documents')

Now you have 135 documents


In [38]:
from langchain.vectorstores import Chroma

chroma_dir = "/share/ch3/chroma_db"
docsearch_chroma = Chroma.from_documents(texts, 
                                         minilm_embedding,
                                         persist_directory=chroma_dir,
                                         )


In [39]:
## Queries first of three: normal
docs = docsearch_chroma.similarity_search("Who are the authors of the paper 'BARTScore: Evaluating Generated Text as Text Generation'")
print_search_results(docs) ##Find the result after adding the name of the paper

search returned 4 results. 
be mitigated.
5 Implications and Future Directions
In this paper, we proposed a metric BARTS CORE that formulates evaluation of generated text as a
text generation task, and empirically demonstrated its efficacy. Without the supervision of human
judgments, BARTS CORE can effectively evaluate texts from 7 perspectives and achieve the best
performance on 16 of 22 settings against existing top-scoring metrics. We highlight potential future
directions based on what we have learned.
BARTS CORE :
Evaluating Generated Text as Text Generation
Weizhe Yuan
Carnegie Mellon University
weizhey@cs.cmu.eduGraham Neubig
Carnegie Mellon University
gneubig@cs.cmu.eduPengfei Liu∗
Carnegie Mellon University
pliu3@cs.cmu.edu
Abstract
A wide variety of NLP applications, such as machine translation, summarization,
and dialog, involve text generation. One major challenge for these applications
is how to evaluate whether such generated texts are actually fluent, accurate,
By explori

In [48]:
## Queries second of three: qa chain
from langchain_openai import OpenAI
from langchain.chains.question_answering import load_qa_chain

llm = OpenAI(temperature=0, model="gpt-3.5-turbo-instruct")
chain = load_qa_chain(llm, chain_type="stuff", verbose=False) ##What are some other chain types

In [49]:
query = 'What is the topic of the research paper BartScore: Evaluating Generated Text as Text Generation'
docs = docsearch_chroma.similarity_search(query)
chain.run(input_documents=docs, question=query)

' The topic of the research paper is evaluating generated text as text generation using a metric called BARTScore.'

In [51]:
## Queries last of three: a different chain
chain = load_qa_chain(llm, chain_type="map_reduce", verbose=False)
query = "What does the paper's experimental result that evaluates the effectiveness of automated scientific reviewing include?" ## also successful##Previous query: What does the paper's experimental result include? ##Previous unsuccessful query: what is hte paper's experimental result?
docs = docsearch_chroma.similarity_search(query)
chain.run(input_documents=docs, question=query)

" The paper's experimental result includes an evaluation of the effectiveness of automated scientific reviewing using BARTS CORE, which can evaluate text from various perspectives and estimate measures of quality such as coherence and fluency. It also includes a measure of precision from reference text to system-generated text."

In [70]:
##Need a memory chain for this... it cannot give me an explicit answer
from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder
from langchain.schema import SystemMessage
from langchain.memory import ConversationBufferMemory

prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content = """You are an assistent that helps human search for information from research papers.
            If you do not know the answer, answer you don't know. Do not try to make up answers."""
        ),  # The persistent system prompt
        MessagesPlaceholder(
            variable_name = "chat_history"
        ),  # This is where the memory will be stored.
        HumanMessagePromptTemplate.from_template(
            "{human_input}"
        ),  # This is where the human input will be injected
    ]
)

memory = ConversationBufferMemory(memory_key="chat_history",
                                  input_key="human_input",
                                  return_messages=True)

In [73]:

chain = load_qa_chain(llm, chain_type="stuff", verbose=False)
query = "What does the paper's experimental result include?" ## also successful##Previous query: What does the paper's experimental result include? ##Previous unsuccessful query: what is hte paper's experimental result?
docs = docsearch_chroma.similarity_search(query)
chain.run(input_documents=docs, question=query)

" The paper's experimental result includes a fine-grained analysis and prompt analysis, as well as measures such as Kendall's Tau, BERTScore, PRISM, BLEURT, COMET, and BARTS CORE. It also includes evaluations of semantic overlap, linguistic quality, and factual correctness."

In [62]:
query = "What does the paper's experimental result include?"
docs = docsearch_chroma.similarity_search(query)
chain.run(input_documents=docs, question=query)

" The paper's experimental result includes a fine-grained analysis and prompt analysis, as well as measures such as Kendall's Tau, BERTScore, PRISM, BLEURT, COMET, and BARTS CORE. It also includes evaluations of semantic overlap, linguistic quality, and factual correctness."

In [None]:
memory.load_memory_variables({})

In [None]:
##copying from somewhere else
memory = ConversationBufferMemory(memory_key="chat_history", input_key="question")
chain = load_qa_chain(
    OpenAI(temperature=0), chain_type="stuff", memory=memory, prompt=PROMPT
)

docs=db.similarity_search(query=query)

# building the dictionary for chain

chain_input={
    "input_documents": docs,
    "context":"This is contextless",
    "question":query,
    "Customer_Name":"Bob",
    "Customer_State":"NY",
    "Customer_Gender":"Male"
}

result=chain(chain_input, return_only_outputs=True)

In [None]:
##Possibly a Tool?
from langchain.agents import AgentType, Tool
from langchain.llms import OpenAI

# define tools
tools = [
    Tool(
        name="State of Union QA System",
        func=state_of_union.run,
        description="useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.",
    ),
    Tool(
        name="Harry Potter QA System",
        func=harry_potter.run,
        description="useful for when you need to answer questions about Harry Potter. Input should be a fully formed question.",
    ),
]

from langchain.chains import RetrievalQA
from langchain_openai import OpenAI

llm = OpenAI(temperature=0, model="gpt-3.5-turbo-instruct")

harry_potter = RetrievalQA.from_chain_type(llm=llm, 
                                           chain_type="stuff", 
                                           retriever=docsearch_chroma_reloaded.as_retriever())

from langchain.agents import initialize_agent

agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [40]:
## saving the chromaDB to local
docsearch_chroma.persist() ##how do we know where is it stored?