# Session 2 - Demo 2.4 - Data-Augmented Question Answering

<a href="https://colab.research.google.com/github/dair-ai/pe-for-llms/blob/main/notebooks/session-2/demo-2.3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [7]:
%%capture
# update or install the necessary libraries
!pip install --upgrade openai
!pip install --upgrade langchain
!pip install --upgrade python-dotenv
!pip install chromadb

In [1]:
import openai
import os
import IPython
from langchain.llms import OpenAI
from dotenv import load_dotenv

In [2]:
load_dotenv()

# API configuration
openai.api_key = os.getenv("OPENAI_API_KEY")

# for LangChain
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

First, we need to download the data we want to use as source to augment generation.

In [3]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch
from langchain.vectorstores import Chroma
from langchain.docstore.document import Document
from langchain.prompts import PromptTemplate

As our data source, we will use a transcription of Karpathy's recent lecture on GPT. 

In [29]:
#with open('../data/state_of_the_union.txt') as f:
with open('../data/kar-gpt.txt') as f:
    text_data = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0, separator=" ")
texts = text_splitter.split_text(text_data)

embeddings = OpenAIEmbeddings()

In [31]:
docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{"source": str(i)} for i in range(len(texts))])

Using embedded DuckDB without persistence: data will be transient


In [34]:
query = "What is the course about?"
docs = docsearch.similarity_search(query)

In [10]:
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.llms import OpenAI

In [56]:
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff")
query = "What is the course about?"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': ' This course is about understanding and appreciating how chat GPT works, and how to develop a transformer neural network. It requires proficiency in Python and some basic understanding of calculus and statistics.\nSOURCES: 1, 7, 107, 108'}

In [55]:
template = """Given the following extracted parts of a long document and a question, create a final answer with references ("SOURCES"). 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer.
Respond in Spanish.

QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER IN SPANISH:"""

# create a prompt template
PROMPT = PromptTemplate(template=template, input_variables=["summaries", "question"])

# query 
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff", prompt=PROMPT)
query = "What is the course about?"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'input_documents': [Document(page_content="of fine tuning which we did not cover and that could be simple supervised fine tuning or it can be something more fancy like we see in charge of GPT we actually train a reward model and then do rounds of PPO to align it with respect to the reward model so there's a lot more that can be down on top of it I think for now we're starting to get to about two hours mark so I'm going to kind of finish here I hope you enjoyed the lecture and yeah go forth and transform see you later", metadata={'source': '108'}),
  Document(page_content="example of the prompt. People have come up with many, many examples and there are entire websites that index interactions with chat GPT and so many of them are quite humorous. Explain HTML to me like I'm a dog. Write reliefs notes for chest two. Write a note about Elon Musk buying a Twitter and so on. So as an example, please write a breaking news article about a leaf falling from a tree. In a shocking turn of events

Learn more about chain types here: https://docs.langchain.com/docs/components/chains/index_related_chains