# Chatting With Your Data

Suppose you have some text documents (reports, news articles, PDFs, etc.) and want to ask questions related to the contents of those documents. LLMs, given their proficiency in understanding text, are a great tool for this.

In this exercise we'll create an application that answers questions about documents using LLMs.

Reference: [https://python.langchain.com/docs/use_cases/question_answering/](https://python.langchain.com/docs/use_cases/question_answering/)

## Overview

The pipeline for converting raw unstructured data into a QA chain looks like this:
1. `Loading`: [Document Loaders](https://python.langchain.com/docs/modules/data_connection/document_loaders/) load our data as LangChain `Documents`
2. `Splitting`: [Text splitters](https://python.langchain.com/docs/modules/data_connection/document_transformers/) break `Documents` into text chunks of a specified size
3. `Storage`: Storage (e.g., often a [vector store](https://python.langchain.com/docs/modules/data_connection/vectorstores/)) will house and [embed](https://www.pinecone.io/learn/vector-embeddings/) the text chunks
4. `Retrieval`: The app retrieves relevant text chunks from storage (e.g., text chunks [with similar embeddings](https://www.pinecone.io/learn/k-nearest-neighbor/) to the input question)
5. `Output`: An [LLM](https://python.langchain.com/docs/modules/model_io/models/llms/) produces an answer using a prompt that includes the question and the retrieved data

![./images/qa_flow.jpeg](./images/qa_flow.jpeg)

## Quickstart

To give you a sneak preview, the above pipeline can be all be wrapped in a single object: `VectorstoreIndexCreator`. Suppose we want a QA app over [this](https://en.wikipedia.org/wiki/Bigfoot) Wikipedia article about Bigfoot. We can create this in a few lines of code. 

First let's install some necessary packages:

In [None]:
%pip install wikipedia chromadb tiktoken langchain langchainhub openai python-dotenv scikit-learn -q

Now let's configure our OpenAI API key. Create a file called `my.env` in your working directory with the following content:

```
OPENAI_API_KEY=<your API key>
```

Be sure to replace `<you API key>` with your actual OpenAI API key

In [None]:
import os
import dotenv

dotenv.load_dotenv("my.env")
assert "OPENAI_API_KEY" in os.environ, "OpenAI API Key not set"

Next we initialize the `WikipeidaLoader` and our `VectorstoreIndexCreator`:

In [None]:
from langchain.document_loaders import WikipediaLoader
from langchain.indexes import VectorstoreIndexCreator

loader = WikipediaLoader(query="Bigfoot", lang="en", load_max_docs=1)
index = VectorstoreIndexCreator().from_loaders([loader])

LangChain just retrieved the Wikipedia article, split it into sizeable chunks, embedded those chunks, and stored them in our vector store. Pretty cool, huh?

Now we can ask our question:

In [None]:
from langchain.llms.openai import OpenAI
from textwrap import wrap

llm = OpenAI(temperature=1, model="gpt-3.5-turbo-instruct")
output = index.query("What is Bigfoot?", llm=llm)
for line in wrap(output, width=140):
    print(line)

Under the hood LangChain just retrieved relevant text chunks from our vector store and sent it to an OpenAI LLM along with our question. Job done!

Before we move on though, here's a few things to try out:
- Ask some different questions, or ask it to generate something else entirely, like a song!
- Change the temperature parameter and see how it affects the output. What do you notice?
- Change the `WikipediaLoader` query and ask about something else. Did you learn any fun facts?
- Change the `WikipediaLoader` to use a different language (e.g. `es` for Spanish). Does it still answer in English?

Ok, that's great, but how did it do that and how could we customize this for our specific use case? For that, let's take a look at how we can construct this pipeline piece by piece.

## Step 1. Load

Specify a `DocumentLoader` to load in your unstructured data as `Documents`. A `Document` is a piece of text (the `page_content`) and associated metadata.

In [None]:
from langchain.document_loaders import WikipediaLoader

loader = WikipediaLoader(query="Bigfoot", lang="en", load_max_docs=1, load_all_available_meta=True)
data = loader.load()

data

### Go deeper
- Browse the data loader integrations [here](https://integrations.langchain.com/).
- See further documentation on loaders [here](https://python.langchain.com/docs/modules/data_connection/document_loaders/).

## Step 2. Split

Split the `Document` into chunks for embedding and vector storage.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
splits = text_splitter.split_documents(data)

splits

Here the `RecursiveCharacterTextSplitter` does its best to split the text into chunks around newlines and word breaks, with a maximum chunk size of 500 characters. But can we do better?

Wikipedia articles are already broken up into sections, so it would be great if our splits didn't cross those boundaries. Here's how we can do that:

First we split the article up into it's component sections and sub-sections. In this case, main sections are denoted using `==` (e.g. `== History ==`) and sub-sections are denoted using `===` (e.g. `=== Indigenous and early records ===`).

In [None]:
from langchain.text_splitter import MarkdownHeaderTextSplitter

text_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=[("==", "Section"), ("===", "Sub-Section")])
section_splits = text_splitter.split_text(data[0].page_content)

section_splits

Then we use the `RecursiveCharacterTextSplitter` to split any sections that are still too large.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)
all_splits = text_splitter.split_documents(section_splits)

all_splits

Now our article is nicely split along its sections and we removed all of the headings at the same time!

### Go deeper

- `DocumentSplitters` are just one type of the more generic [`DocumentTransformers`](https://python.langchain.com/docs/modules/data_connection/document_transformers/), which can all be useful in this preprocessing step.
- `Context-aware splitters` keep the location ("context") of each split in the original `Document`:
    - [Markdown files](https://python.langchain.com/docs/use_cases/question_answering/how_to/document-context-aware-QA)
    - [Code (py or js)](https://python.langchain.com/docs/integrations/document_loaders/source_code)
    - [PDFs](https://python.langchain.com/docs/integrations/document_loaders/grobid)

## Step 3. Store

To be able to look up our document splits, we first need to store them somewhere. The most common way to do this is to embed the contents of each document then store the embedding and document in a vector store.

In [None]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

vector_store = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

### Go deeper
- Browse the vector store integrations [here](https://integrations.langchain.com/vectorstores).
- See further documentation on vector stores [here](https://python.langchain.com/docs/modules/data_connection/vectorstores/).
- Browse the text embedding integrations [here](https://integrations.langchain.com/embeddings).
- See further documentation on embedding models [here](https://python.langchain.com/docs/modules/data_connection/text_embedding/).

## Step 4. Retrieve

Retrieve relevant splits for any question using [similarity search](https://www.pinecone.io/learn/what-is-similarity-search/).

In [None]:
question = "Did Native Americans believe in Bigfoot?"
docs = vector_store.similarity_search(question, k=1)
docs

### Go deeper

Vector stores are commonly used for retrieval, but they are not the only option. For example, SVMs (Support Vector Machines) can also be used.

LangChain [has many retrievers](https://python.langchain.com/docs/integrations/retrievers) including, but not limited to, vector stores. All retrievers implement a common method `get_relevant_documents()` (and its asynchronous variant `aget_relevant_documents()`).

In [None]:
from langchain.retrievers import SVMRetriever

svm_retriever = SVMRetriever.from_documents(all_splits, OpenAIEmbeddings())
docs_svm=svm_retriever.get_relevant_documents(question)
docs_svm

Some common ways to improve on vector similarity search include:
- `MultiQueryRetriever` [generates variants of the input question](https://python.langchain.com/docs/modules/data_connection/retrievers/MultiQueryRetriever) to improve retrieval.
- `Max marginal relevance` selects for [relevance and diversity](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf) among the retrieved documents.
- Documents can be filtered during retrieval using [`metadata` filters](https://python.langchain.com/docs/use_cases/question_answering/how_to/document-context-aware-QA).

In [None]:
import logging

from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever

logging.basicConfig()
logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)

retriever_from_llm = MultiQueryRetriever.from_llm(retriever=svm_retriever, llm=ChatOpenAI(temperature=0))
unique_docs = retriever_from_llm.get_relevant_documents(query=question)
unique_docs

## Step 5. Output

Distill the retrieved documents into an answer using an LLM/Chat model (e.g., `gpt-3.5-turbo`) with `RetrievalQA` chain.

In [None]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from textwrap import wrap

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm, retriever=svm_retriever)

output = qa_chain({"query": question})
for line in wrap(output["result"], width=140):
    print(line)

Note, you can pass in an `LLM` or a `ChatModel` (like we did here) to the `RetrievalQA` chain. Don't worry about it too much for now, Chat Models just have a more conversational tone. They also support a different input format which we will get to later.

### Go deeper

#### Choosing LLMs
- Browse the LLM integrations [here](https://integrations.langchain.com/llms).
- Browse the Chat Model integrations [here](https://integrations.langchain.com/chat-models).
- See further documentation on LLMs and chat models [here](https://python.langchain.com/docs/modules/model_io/models/).
- See a guide on local LLMS [here](https://python.langchain.com/docs/use_cases/question_answering/how_to/local_retrieval_qa).

#### Customizing the prompt

The prompt in RetrievalQA chain can be easily customized.

In [None]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from textwrap import wrap

template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 
Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=svm_retriever,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

output = qa_chain({"query": question})
for line in wrap(output["result"], width=140):
    print(line)

We can also store and fetch prompts from the LangChain prompt hub.

For example, [here](https://smith.langchain.com/hub/rlm/rag-prompt) is a common prompt for RAG which we can load.

In [None]:
from langchain import hub
QA_CHAIN_PROMPT_HUB = hub.pull("rlm/rag-prompt")

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=svm_retriever,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT_HUB}
)

output = qa_chain({"query": question})
for line in wrap(output["result"], width=140):
    print(line)

#### Return source documents

The full set of retrieved documents used for answer distillation can be returned using `return_source_documents=True`.

In [None]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(llm,retriever=svm_retriever,
                                       return_source_documents=True)
result = qa_chain({"query": question})
result['source_documents']

#### Customizing retrieved document processing

Retrieved documents can be fed to an LLM for answer distillation in a few different ways.

`stuff`, `refine`, `map-reduce`, and `map-rerank` chains for passing documents to an LLM prompt are well summarized [here](https://python.langchain.com/docs/modules/chains/document/).
 
`stuff` is commonly used because it simply "stuffs" all retrieved documents into the prompt.

We can pass `chain_type` to `RetrievalQA` to try them out.

In [None]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(llm, retriever=svm_retriever, chain_type="refine")

output = qa_chain({"query": question})
for line in wrap(output["result"], width=140):
    print(line)

In summary, users have many levels of abstraction to choose from for QA:

![./images/summary_chains.png](./images/summary_chains.png)

## But Wait, There's More!

We said we were going to teach you to chat with your data, not just ask it questions! Chatbots are one of the central LLM use-cases. The core features of chatbots are that they can have long-running conversations and have access to information that users want to know about.

Aside from basic prompting and LLMs, memory and retrieval are the core components of a chatbot. Memory allows a chatbot to remember past interactions, and retrieval provides a chatbot with up-to-date, domain-specific information.

### ChatBots

With a plain chat model, we can get chat completions by passing one or more messages to the model. The chat model will respond with a message in kind.

In [None]:
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI

chat = ChatOpenAI()
chat([HumanMessage(content="Translate this sentence from English to French: I love Bigfoot.")])

And if we pass in a list of messages:

In [None]:
messages = [
    SystemMessage(content="You are a helpful assistant that translates English to French."),
    HumanMessage(content="I love Bigfoot.")
]
chat(messages)

We can then wrap our chat model in a ConversationChain, which has built-in memory for remembering past user inputs and model outputs:

In [None]:
from langchain.chains import ConversationChain  
  
conversation = ConversationChain(llm=chat)  
conversation.run("Translate this sentence from English to French: I love Bigfoot.")

In [None]:
conversation.run("Now translate it to German.")

Tada!

### Memory

As we mentioned above, the core component of chatbots is the memory system. One of the simplest and most commonly used forms of memory is `ConversationBufferMemory`:
- This memory allows for storing of messages in a `buffer`
- When called in a chain, it returns all of the messages it has stored

LangChain comes with many other types of memory, too. See [here](https://python.langchain.com/docs/modules/memory/) for in-depth documentation on memory types.

For now let's take a quick look at `ConversationBufferMemory`. We can manually add a few chat messages to the memory like so:

In [None]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("whats up?")

And now we can load from our memory. The key method exposed by all `Memory` classes is `load_memory_variables`. This takes in any initial chain input and returns a list of memory variables which are added to the chain input.

Since this simple memory type doesn't actually take into account the chain input when loading memory, we can pass in an empty input for now:

In [None]:
memory.load_memory_variables({})

We can also keep a sliding window of the most recent `k` interactions using `ConversationBufferWindowMemory`:

In [None]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory.load_memory_variables({})

`ConversationSummaryMemory` is an extension of this theme. It creates a summary of the conversation over time.

This memory is most useful for longer conversations where the full message history would consume many tokens and might not fit into the LLMs context window.

In [None]:
from langchain.llms import OpenAI
from langchain.memory import ConversationSummaryMemory

llm = OpenAI(temperature=0)
memory = ConversationSummaryMemory(llm=llm)
memory.save_context({"input": "hi"},{"output": "whats up"})
memory.save_context({"input": "im working on better docs for chatbots"},{"output": "oh, that sounds like a lot of work"})
memory.save_context({"input": "yes, but it's worth the effort"},{"output": "agreed, good docs are important!"})

memory.load_memory_variables({})

`ConversationSummaryBufferMemory` extends this a bit further. It uses token length rather than number of interactions to determine when to flush interactions.

In [None]:
from langchain.memory import ConversationSummaryBufferMemory
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=10)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

memory.load_memory_variables({})

### Conversation

We can unpack what goes under the hood with `ConversationChain`.

We can specify our memory, `ConversationSummaryMemory` and we can specify the prompt.

In [None]:
from langchain.prompts import (
    ChatPromptTemplate,
    MessagesPlaceholder,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.chains import LLMChain

# LLM
llm = ChatOpenAI()

# Prompt 
prompt = ChatPromptTemplate(
    messages=[
        SystemMessagePromptTemplate.from_template(
            "You are a nice chatbot having a conversation with a human."
        ),
        # The `variable_name` here is what must align with memory
        MessagesPlaceholder(variable_name="chat_history"),
        HumanMessagePromptTemplate.from_template("{question}")
    ]
)

# Notice that we `return_messages=True` to fit into the MessagesPlaceholder
# Notice that `"chat_history"` aligns with the MessagesPlaceholder name
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
conversation = LLMChain(
    llm=llm,
    prompt=prompt,
    verbose=True,
    memory=memory
)

# Notice that we just pass in the `question` variables - `chat_history` gets populated by memory
conversation({"question": "hi"})

In [None]:
conversation({"question": "Translate this sentence from English to French: I love Bigfoot."})

In [None]:
conversation({"question": "Now translate the sentence to German."})

Not bad! But how do I ask it about my Wikipedia article?

### Chat Retrieval

Combining chat with document retrieval is a popular use case. It allows us to chat with specific information that the model was not trained on.

Let's create our memory, as before, but's let's use `ConversationSummaryMemory`.

In [None]:
memory = ConversationSummaryMemory(llm=llm, memory_key="chat_history", return_messages=True)

Now we can use the retriever we set up earlier:

In [None]:
from langchain.chains import ConversationalRetrievalChain

qa = ConversationalRetrievalChain.from_llm(llm, retriever=svm_retriever, memory=memory)

qa("What is Bigfoot?")

In [None]:
qa("Give me a description of one of the sightings. What happened?")

It's just that easy! But wait, what if I want the chat bot to have access to multiple data stores and choose the correct ony dynamically?

### Agents

The core idea of agents is to use an LLM to choose a sequence of actions to take. In chains, a sequence of actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.

Agents, such as the [conversational retrieval agent](https://python.langchain.com/docs/use_cases/question_answering/how_to/conversational_retrieval_agents), can be used for retrieval when necessary while also holding a conversation.

First we create the retriever tool:

In [None]:
from langchain.agents.agent_toolkits import create_retriever_tool

tool = create_retriever_tool(
    svm_retriever, 
    "search_bigfoot",
    "Searches and returns documents regarding Bigfoot."
)
tools = [tool]

Then we create the agent:

In [None]:
from langchain.agents.agent_toolkits import create_conversational_retrieval_agent

agent_executor = create_conversational_retrieval_agent(llm, tools, verbose=True)

output = agent_executor({"input": "hi, im bob"})

In [None]:
output = agent_executor({"input": "whats my name?"})

In [None]:
output = agent_executor({"input": "What color are Bigfoot's eyes?"})

output["output"]