# **05. Question Answering**

In [1]:
import os
import openai
import sys
sys.path.append('../..')

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

In [2]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

  embedding = OpenAIEmbeddings()
  vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)


In [3]:
print(vectordb._collection.count())

6087


In [4]:
question = "what undergraduate degress are available to study computer science?"
docs = vectordb.similarity_search(question,k=4)
len(docs)

4

In [5]:
print(docs[0].page_content[0:200])

Programme  Conve ner: Dr J Buys  
Entry requirements – BSc Hons (CS): A BSc degree majoring in C omputer Scien ce from UCT, 
with an average of at  least 60% in both CSC3002F and CSC3003S, or permiss 


In [6]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

### **Retrieval Q&A chain**





**Process:**
- Question searches the vectore store and returns a set of documents (relevant splits) that are most similar. This is then used to generate a system prompt for an LLM.  
- System prompt + original question is then fed into an LLM to answer the question.  

By default we pass all this info into a single context window (using "Stuffs" method). However, when the qunatum of documents becomes very large, this can be challenging. Alternative approaches:  
1. Map_reduce  
2. Refine   
3. Map_rank  

In [7]:
from langchain.chains import RetrievalQA

In [8]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

In [9]:
result = qa_chain({"query": question})

  result = qa_chain({"query": question})


In [10]:
print(result["result"])

At the University of Cape Town, undergraduate degrees available to study Computer Science include a Bachelor of Science (BSc) majoring in Computer Science and a Bachelor of Business Science specialising in Information Systems.


## **Prompt Template**

In [11]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""

QA_CHAIN_PROMPT = PromptTemplate.from_template(template)


In [12]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [13]:
question = "List all the computer science courses that are available to study"

In [14]:
result = qa_chain({"query": question})

In [15]:
result["result"]

'Computer Science 1010, Human Computer Interaction, Artificial Intelligence, Network and Internetwork Security, Computer Game Design, High Performance Computing, Introduction to Computer Graphics. Thanks for asking!'

In [16]:
result["source_documents"][0]

Document(metadata={'page': 52, 'source': 'assets/1.Commerce-undergrad.pdf'}, page_content='Computer Science major courses and at least 55% for each course to be considered for a place in 4th year Computer Science courses. \nPlaces may be limited. Students who do not qualify for admission to 4th year Computer Science courses will be required to change their \nspecialisation or degree in consultation with the Head of Department and the Deputy Dean Undergraduate Studies  of Commerce.  \n \n \nBachelor of Business Science specialising in Information Systems [CB0 15INF01]  \n \nFirst Year Core Modules  \nCode  Course  NQF Credits  NQF Level  \nCML1001F  Business Law I  ................................ ................................ .............  18 5 \nDOC1103H  Skills for Commerce  ................................ ................................ ...... 2 5 \nINF1102F  Foundations of Information Systems  ................................ ............  18 5 \n  OR ........................

## **Map Reduce**
Each individual document is processed by an LLM to generate an answer. The LLM then takes the answers and generates a final answer. Involves many more calls to the LLM than the previous method + it's slower + result may be less accurate as it answers based on each individual document and loses context.  

In [17]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)

In [18]:
result = qa_chain_mr({"query": question})



In [19]:
result["result"]

'The available computer science courses to study are:\n- CSC1010H Computer Science 1010\n- CSC4024Z Human Computer Interaction\n- CSC4025Z Artificial Intelligence\n- CSC4026Z Network and Internetwork Security\n- CSC4027Z Computer Game Design\n- CSC4028Z High Performance Computing\n- CSC4029Z Introduction to Computer Graphics'

## **Refine**
The refine chain allows you to combine information, albeit sequentially, and allows for carry over of information from one LLM call to the next.  

In [20]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="refine"
)
result = qa_chain_mr({"query": question})
result["result"]

'Based on the additional context provided, the Computer Science major courses available to study are:\n\n1. CSC1010H - Computer Science 1010\n2. CSC4024Z - Human Computer Interaction\n3. CSC4025Z - Artificial Intelligence\n4. CSC4026Z - Network and Internetwork Security\n5. CSC4027Z - Computer Game Design\n6. CSC4028Z - High Performance Computing\n7. CSC4029Z - Introduction to Computer Graphics\n\nStudents must achieve at least 55% in each course to be considered for a place in 4th-year Computer Science courses. If students do not qualify for admission to 4th-year Computer Science courses, they may be required to change their specialization or degree in consultation with the Head of Department and the Deputy Dean Undergraduate Studies of Commerce.'

In [25]:
LANGCHAIN_TRACING_V2="true"
LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_API_KEY="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"

In [23]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)
result = qa_chain_mr({"query": question})
result["result"]



'The available computer science courses to study are:\n- CSC1010H Computer Science 1010\n- CSC4024Z Human Computer Interaction\n- CSC4025Z Artificial Intelligence\n- CSC4026Z Network and Internetwork Security\n- CSC4027Z Computer Game Design\n- CSC4028Z High Performance Computing\n- CSC4029Z Introduction to Computer Graphics'

In [26]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="refine"
)
result = qa_chain_mr({"query": question})
result["result"]

'Based on the additional context provided, the Computer Science major courses available to study are:\n\n1. CSC1010H - Computer Science 1010\n2. CSC4024Z - Human Computer Interaction\n3. CSC4025Z - Artificial Intelligence\n4. CSC4026Z - Network and Internetwork Security\n5. CSC4027Z - Computer Game Design\n6. CSC4028Z - High Performance Computing\n7. CSC4029Z - Introduction to Computer Graphics\n\nStudents must achieve at least 55% in each course to be considered for a place in 4th-year Computer Science courses. If students do not qualify for admission to 4th-year Computer Science courses, they may be required to change their specialization or degree in consultation with the Head of Department and the Deputy Dean Undergraduate Studies of Commerce.'