# Question Answering

In [8]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OllamaEmbeddings

persist_directory = "docs/Chroma"
embedding = OllamaEmbeddings(model = "nomic-embed-text")
vectordb = Chroma(persist_directory=persist_directory, embedding_function = embedding)


In [9]:
print(vectordb._collection.count())

228


In [12]:
question = "What is the major topics for this class?"
docs = vectordb.similarity_search(question, k=3)
len(docs)

3

In [14]:
from langchain_community.llms import Ollama

llm = Ollama(
    model = "llama3"
)

### RetrievalQA chain

In [15]:
from langchain.chains import RetrievalQA

In [17]:
qa_chain = RetrievalQA.from_chain_type(
    llm, 
    retriever = vectordb.as_retriever()
)

In [20]:
result = qa_chain({"query": question})
result['result']

'Based on the context provided, it appears that the major topics for this class are:\n\n* Machine learning algorithms\n* Applications of machine learning in various fields, such as:\n\t+ Control of snake robots and flying autonomous aircraft\n\t+ Improving computer vision algorithms using machine learning\n\t+ Medical robotics and neuroscience\n\t+ Financial trading and market makings\n\t+ Understanding brain science and neuroscience through machine learning algorithms\n\t+ Optical illusions and musical instrument detection\n\nThese topics are mentioned repeatedly throughout the text, suggesting that they are significant aspects of the course.'

### Prompt

In [24]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

In [25]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [26]:
question = "Is probability a class topic?"
result = qa_chain({"query:", question})

In [30]:
result["result"]

'No, based on the context provided, it seems that probability is not mentioned as one of the topics covered in the course or the projects done by students last year. Thanks for asking!'

In [32]:
result["source_documents"][0]

Document(page_content="And let's see. Oh, and the goal of the projec t should really be for you to do a publishable \npiece of research in machine learning, okay?  \nAnd if you go to the course website, you'll actuall y find a list of the projects that students \nhad done last year. And so I'm holding the li st in my hand. You can  go home later and \ntake a look at it online.  \nBut reading down this list, I see that last year, there were st udents that ap plied learning \nalgorithms to control a snake robot. Ther e was a few projects on improving learning \nalgorithms. There's a project on flying autonomous  aircraft. There was a project actually \ndone by our TA Paul on improvi ng computer vision algorithms  using machine learning.  \nThere are a couple of project s on Netflix rankings using learning algorithms; a few \nmedical robots; ones on segmenting [inaudibl e] to segmenting pieces of the body using \nlearning algorithms; one on musical instrume nt detection; anot her on irony

### RetrievalQA chain types

In [37]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever = vectordb.as_retriever(),
    chain_type = "map_reduce"
)

In [38]:
result = qa_chain_mr({"query": question})
result["result"]

'The final answer is: This Agreement is governed by English law.\n\nThere is no relevant text for the other questions. The president did not mention Michael Jackson in the given passage. Probability is not mentioned as a class topic or in any context in the provided text.'

In [39]:
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.langchain.plus"
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGSMITH_API_KEY")

In [40]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever = vectordb.as_retriever(),
    chain_type = "map_reduce"
)
result = qa_chain_mr({"query": question})
result["result"]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


"The final answer is:\n\nThis Agreement is governed by English law.\n\nThe president did not mention Michael Jackson.\n\nSince the provided passage does not contain any relevant information about probability, I do not have enough context to provide a meaningful answer. Therefore, my answer would be that I don't know if probability is a class topic."

In [41]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever = vectordb.as_retriever(),
    chain_type = "map_reduce"
)
result = qa_chain_mr({"query": question})
result["result"]

"I don't know the answer.\n\nPlease note that there are multiple texts provided, but none of them mention probability or machine learning concepts related to probability. The text appears to be discussing logistical and practical considerations for a televised class, but does not touch on the topic of probability."

### RetrievalQA limitations
 
QA fails to preserve conversational history.

In [42]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

In [43]:
question = "Is probability a class topic?"
result = qa_chain({"query": question})
result["result"]

"I don't know. The text doesn't mention probability at all, so I wouldn't be able to answer the question based on this context."

In [44]:
question = "why are those prerequesites needed?"
result = qa_chain({"query": question})
result["result"]

"I don't know the answer to this question based on the provided context. The text does not mention prerequisites for the algorithm discussed in the lecture, nor is there any explanation of why certain unsupervised learning algorithms are used to solve specific problems. Therefore, I cannot provide a helpful answer."