<a href="https://colab.research.google.com/github/ad71/ragbot/blob/master/LangChain_QA_types.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chatting with your own documents
4 ways to do question-answering

In [None]:
%pip install langchain
%pip install pypdf
%pip install openai
%pip install tiktoken
%pip install chromadb



In [None]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader('/content/sample_data/ai_index_ch01.pdf')
documents = loader.load()

In [None]:
documents[20]

Document(page_content='Chapter 1 Preview 21\nArtificial Intelligence\nIndex Report 2024National Affiliation\nTo illustrate the evolving geopolitical landscape of \nAI, the AI Index research team analyzed the country \nof origin of notable models.\nFigure 1.3.2 displays the total number of notable \nmachine learning models attributed to the location \nof researchers’ affiliated institutions.5 \nIn 2023, the United States led with 61 notable \nmachine learning models, followed by China with \n15, and France with 8. For the first time since 2019, \nthe European Union and the United Kingdom \ntogether have surpassed China in the number of \nnotable AI models produced (Figure 1.3.3). Since \n2003, the United States has produced more models \nthan other major geographic regions such as the \nUnited Kingdom, China, and Canada (Figure 1.3.4).1.3 Frontier AI ResearchChapter 1: Research and Development Artificial Intelligence\nIndex Report 2024\n233444581561\n0 5101520 25 30 35 40 45 50 55 60Egy

## Method 1: load_qa_chain

In [None]:
# most generic interface
from google.colab import userdata
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

chain = load_qa_chain(llm=OpenAI(api_key=userdata.get('OPENAI_API_KEY')), chain_type='map_reduce')
query = 'What was used if NVIDIA A100 SXM4 was not available?'
chain.run(input_documents=documents, question=query)

  warn_deprecated(
  warn_deprecated(


' If the NVIDIA A100 SXM4 was not available, a generalization of "NVIDIA A100" was used.'

In [None]:
# # it also lets you QA over a set of documents

# loaders = [...]
# documents = []
# for loader in loaders:
#     documents.extend(loader.load())

## What if the document is super long that it exceeds the token limit?

- **Solution 1: Chain Type**
The default `chain_type='stuff'` uses _all_ of the text from the documents in the prompt. It actually doesn't work with our example because it exceeds the token limit and causes rate-limiting errors. That's why in this example, we had to use other chain types for example `map_reduce`. Here are the other chain types:

1. `map_reduce`: It separates text into batches (as an example, you can define batch size in `llm=OpenAI(batch_size=5)`, feeds each batch with the question to LLM separately, and comes up with the final answer based on the answers from each batch.

2. `refine`: It separates text into batches, feeds the first batch into LLM, and feeds the answer and the second batch to LLM. It refines the answer by going through all the batches.

3. `map_rerank`: It separates text into batches, feeds each batch to the LLM, returns a score of how fully it answers the question, and comes up with the final answer based on the high-scored answers from each batch.

- **Solution 2: RetrievalQA**
One issue with using ALL of the text is that it can be very costly because you are feeding all the texts to OpenAI API and the API is charged by the number of tokens. A better solution is to retrieve relevant text chunks first and only use the relevant text chunks in the language model

## Method 2: RetrievalQA
Actually uses `load_qa_chain` under the hood. We retrieve the most relevant chunks of text and feed those to the language model.

In [None]:
from langchain.chains import RetrievalQA
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

In [None]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings(api_key=userdata.get('OPENAI_API_KEY'))
db = Chroma.from_documents(texts, embeddings)

retriever = db.as_retriever(search_type='similarity', search_kwargs={'k': 2})
# `similarity` uses similarity search in the retriever object where it selects text chunk vectors that are most similar to the question vector
# `mmr` uses the maximum marginal relevance search where it optimizes for similarity to query and diversity among selected documents

qa = RetrievalQA.from_chain_type(llm=OpenAI(api_key=userdata.get('OPENAI_API_KEY')), chain_type='stuff', retriever=retriever, return_source_documents=True)
query = 'How many AI publications in 2024?'
result = qa.invoke({'query': query})

In [None]:
result

{'query': 'How many AI publications in 2024?',
 'result': " I don't know.",
 'source_documents': [Document(page_content='Chapter 1 Preview 10\nArtificial Intelligence\nIndex Report 2024232.67\n2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022050100150200Number of AI p ublic ations (in tho usands)Number of AI jo urnal p ublic ations, 20 10–22\nSource: Center for Security and E merging T echnolog y, 2023 | C hart: 202 4 AI Inde x repor tAI Journal Publications\nFigure 1.1.6 illustrates the total number of AI journal publications from 2010 to 2022. The number of AI journal \npublications experienced modest growth from 2010 to 2015 but grew approximately 2.4 times since 2015. \nBetween 2021 and 2022, AI journal publications saw a 4.5% increase.1.1 PublicationsChapter 1: Research and Development\nFigure 1.1.6Artificial Intelligence\nIndex Report 2024', metadata={'page': 9, 'source': '/content/sample_data/ai_index_ch01.pdf'}),
  Document(page_content='Chapter 1 Preview 10\nArt

## Method 3: VectorstoreIndexCreator
VectorstoreIndexCreator is a wrapper around the above functionality. It is exactly the same under the hood, but just exposes a higher-level interface to let you get started in three lines of code.

In [None]:
import os
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

In [None]:
index = VectorstoreIndexCreator().from_loaders([loader])
query = 'What was used if NVIDIA A100 SXM4 was not available?'
index.query(llm=OpenAI(api_key=userdata.get('OPENAI_API_KEY')), question=query, chain_type='stuff')

' \n\nA generalization of the hardware type was used, such as "NVIDIA A100."'

In [None]:
index = VectorstoreIndexCreator(
    text_splitter=CharacterTextSplitter(chunk_size=1000, chunk_overlap=0),
    embedding=OpenAIEmbeddings(),
    vectorstore_cls=Chroma
).from_loaders([loader])
index.query(llm=OpenAI(), question=query, chain_type='stuff')

' I do not know.'

## Method 4: ConversationalRetrievalChain
Very similar to RetrievalQA. It added an additional parameter `chat_history` to pass in chat history which can be used for follow-up questions.

ConversationalRetrievalChain = conversation memory + RetrievalQAChain

If you'd like your language model to have a memory of the previous conversation, use this method.

In the original example, they asked about the number of AI publications and got the result of 500,000. Then they asked the LLM to divide this number by 2. Since it has all the chat history, the model knows the number they were referring to was 500,000 and the result returned is 250,000.

In [None]:
from langchain.chains import ConversationalRetrievalChain

In [None]:
loader = PyPDFLoader('/content/sample_data/ai_index_ch01.pdf')
documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(texts, embeddings)
retriever = db.as_retriever(search_type='similarity', search_kwargs={'k': 4})
qa = ConversationalRetrievalChain.from_llm(OpenAI(), retriever)

In [None]:
chat_history = []
query = 'Which country had the most notable AI models?'
result = qa.invoke({'question': query, 'chat_history': chat_history})

In [None]:
result['answer']

' The United States had the most notable AI models in 2023.'

In [None]:
chat_history = [(query, result['answer'])]
query = 'How many?'
result = qa.invoke({'question': query, 'chat_history': chat_history})

In [None]:
chat_history

[('Which country had the most notable AI models?',
  ' The United States had the most notable AI models in 2023.')]

In [None]:
result['answer']

'  The United States had 61 notable AI models in 2023.'

## Summary
4 ways of QA with LangChain
1. `load_qa_chain` uses all texts and accepts multiple documents
2. `RetrievalQA` uses `load_qa_chain` under the hood but retrieves relevant text chunks first
3. `VectorstoreIndexCreator` is the same as `RetrievalQA` with a higher-level interface.
4. `ConversationalRetrievalChain` is useful when you want to pass your chat history to the model.