![](images/qa_intro-9b468dbffe1cbe7f0bd822b28648db9e.png)
## Steps
1. Loading: We load unstructured data from web as a document.
2. Splitting: Break the document into splits of specific sizes.
3. Storage:  Embed the splits and store it into vector database.
4. Retrieval: The app retrieves splits from storage.
5. Generation: An LLM produces an answer using a prompt that includes the question and the retrieved data
6. Conversation: Hold a multi-turn conversation by adding Memory to your QA chain.
![](images/qa_flow-9fbd91de9282eb806bda1c6db501ecec.jpeg)

In [2]:
import os 
import dotenv
import logging 

import chromadb
from chromadb.config import Settings

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.embeddings import HuggingFaceEmbeddings


from chromadb_cli.chromadb_cli import load_doc_to_db, load_n_split_doc, get_db

logging.basicConfig() 

collection_name = "airmaster"

In [13]:
# client = chromadb.HttpClient(settings=Settings(allow_reset=True))
# client.reset()

True

## Step 1, 2, 3: Load, Split and store

In [2]:
embedding_model = 'sentence-transformers/all-mpnet-base-v2'
model_kwargs = {'device': 'cpu'}
#Emebdding model
hf_embeddings = HuggingFaceEmbeddings(model_name=embedding_model, model_kwargs=model_kwargs)

# document
http_path = "https://www.airmaster.com.au/who-we-are"

# load and split the documents
docs = load_n_split_doc(http_path, 500)
# load the documents to the database
vectorstore = load_doc_to_db(documents=docs, collection_name=collection_name)

## Step 4: Retrieve
Using chroma client to access the database `chromadb`.

In [3]:
vectorstore = get_db(collection_name=collection_name)

question = "When airmaster was founded?"
result = vectorstore.similarity_search(question, k=1)
result

In [6]:
# from langchain.retrievers import SVMRetriever

# svm_retriever = SVMRetriever.from_documents(docs, hf_embeddings)
# svm_retriever.get_relevant_documents(question)

### Step 5: Generation
Distill the retrieved documents into an answer using an LLM/Chat model

1. MultiQueryRetriever generates variants of the input question to improve retrieval.

In [7]:

from langchain.llms import GPT4All
from langchain.retrievers.multi_query import MultiQueryRetriever

logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)


callback_handler = [StreamingStdOutCallbackHandler()]

llm = GPT4All(model="./models/llama-2-7b-chat.ggmlv3.q4_0.bin", callbacks=callback_handler, verbose=True)

retriever_from_llm= MultiQueryRetriever.from_llm(retriever=vectorstore.as_retriever(), llm=llm)

unique_docs = retriever_from_llm.get_relevant_documents(query=question)
unique_docs

Found model file at  ./models/llama-2-7b-chat.ggmlv3.q4_0.bin


objc[89103]: Class GGMLMetalClass is implemented in both /Users/adebayoakinlalu/.pyenv/versions/3.10.12/envs/llm-env/lib/python3.10/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libreplit-mainline-metal.dylib (0x2997e8208) and /Users/adebayoakinlalu/.pyenv/versions/3.10.12/envs/llm-env/lib/python3.10/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libllamamodel-mainline-metal.dylib (0x299b5c208). One of the two will be used. Which one is undefined.
llama.cpp: using Metal
llama.cpp: loading model from ./models/llama-2-7b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_interna

llama_new_context_with_model: max tensor size =    70.31 MB

1. What are the key events in Airmaster's history that led up to its founding?
2. Which companies or organizations were involved in the development of Airmaster technology before its founding?
3. How did Airmaster differentiate itself from other similar technologies when it was first introduced?

INFO:langchain.retrievers.multi_query:Generated queries: ["1. What are the key events in Airmaster's history that led up to its founding?", '2. Which companies or organizations were involved in the development of Airmaster technology before its founding?', '3. How did Airmaster differentiate itself from other similar technologies when it was first introduced?']


[Document(page_content='\u200b\n\u200bAirmaster’s commitment to sustainability is achieved through a proactive, integrated approach to helping organisations achieve energy efficiency in innovative ways.\n\xa0\nAirmaster officially opened for business on 2nd February 1988 from a small factory in Ferntree Gully. With an emphasis on service and relationships, Airmaster quickly attained high profile clients like Highpoint Shopping Centre and The Age building in Spencer Street.\n\u200b', metadata={'language': 'en', 'source': 'https://www.airmaster.com.au/who-we-are', 'title': 'Who We Are | Airmaster'}),
 Document(page_content='Who We Are | Airmaster\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\ntop of page', metadata={'language': 'en', 'source': 'https://www.airmaster.com.au/who-we-are', 'title': 'Who We Are | Airmaster'}),
 Document(page_content='\u200b\nSince its inception, Airmaster has built a winning culture based on employee retention and a deliver-at-all-costs attitude.\n\xa0\nAir

2. RetrievakQA is used to chain for  question-answering against an index.

In [8]:
from langchain.chains import RetrievalQA

retrieval_qa = RetrievalQA.from_chain_type(retriever=vectorstore.as_retriever(), llm=llm)
retrieval_qa({"query": question})

 Based on the text, we know that Airmaster was officially opened for business on February 2nd, 1988, so the answer is 1988.

{'query': 'When airmaster was founded?',
 'result': ' Based on the text, we know that Airmaster was officially opened for business on February 2nd, 1988, so the answer is 1988.'}

3. Customizing the prompt

In [9]:
from langchain.prompts import PromptTemplate

template ="""Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 
Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""

QA_CHAIN_PROMPT= PromptTemplate.from_template(template)
qa_chain = RetrievalQA.from_chain_type(
    retriever=vectorstore.as_retriever(), 
    llm=llm,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
    )

qa_chain({"query": question})

 Airmaster was founded on 2nd February 1988.

{'query': 'When airmaster was founded?',
 'result': ' Airmaster was founded on 2nd February 1988.'}

4. Customizing retrieved document processing
Retrieved documents can be fed to an LLM for answer distillation in a few different ways.

stuff, refine, map-reduce, and map-rerank chains for passing documents to an LLM prompt are well summarized here.

stuff is commonly used because it simply "stuffs" all retrieved documents into the prompt.

The load_qa_chain is an easy way to pass documents to an LLM using these various approaches (e.g., see chain_type).

In [19]:
from langchain.chains.question_answering import load_qa_chain

chain = load_qa_chain(llm, chain_type="stuff")

chain({"input_documents": unique_docs, "question": question}, return_only_outputs=True)

 Based on the text provided, Airmaster was founded in 1988.

{'output_text': ' Based on the text provided, Airmaster was founded in 1988.'}

## Step 6: Chat

In [15]:
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever=vectorstore.as_retriever(), memory=memory)

result = qa({"question": question})

result["answer"]


 Based on the text, we know that Airmaster was officially opened for business on February 2nd, 1988, so the answer is 1988.

' Based on the text, we know that Airmaster was officially opened for business on February 2nd, 1988, so the answer is 1988.'

In [16]:
question2 = "What are the key events in Airmaster's history that led up to its founding?"

result = qa({"question": question2})

result["answer"]

 Can you tell me about the key events in Airmaster's history leading up to its founding in 1988? Based on the provided text, I can see that Airmaster was officially opened for business on February 2nd, 1988 from a small factory in Ferntree Gully. However, I cannot find any information about key events leading up to its founding. Therefore, I must answer "I don't know" to your question.

' Based on the provided text, I can see that Airmaster was officially opened for business on February 2nd, 1988 from a small factory in Ferntree Gully. However, I cannot find any information about key events leading up to its founding. Therefore, I must answer "I don\'t know" to your question.'

In [17]:
question3= "Which companies or organizations were involved in the development of Airmaster technology before its founding?"

result = qa({"question": question3})
result["answer"]

 Who developed Airmaster technology before its founding? I don't know.

" I don't know."

In [20]:
dotenv.load_dotenv()

HF_AUTH_TOKEN = os.getenv("HF_AUTH_TOKEN")

In [9]:
# architecture = "allenai/unifiedqa-t5-small"
# model_name = "meta-llama/Llama-2-7b-chat-hf"

# model = T5ForConditionalGeneration.from_pretrained(architecture)
# tokenizer = T5Tokenizer.from_pretrained(architecture)

# def run_model(input_string, **generator_args):
#     input_ids = tokenizer.encode(input_string, return_tensors="pt")
#     res = model.generate(input_ids, **generator_args)
#     return tokenizer.batch_decode(res, skip_special_tokens=True)

# model = AutoModelForCausalLM.from_pretrained(model_name)
# tokenizer = AutoTokenizer.from_pretrained(model_name)

# pipe = pipeline(
#     'text-generation', 
#      model=model, 
#      tokenizer=tokenizer,
#      max_length=100,
#      temperature=0,
#      top_p=0.95,
#     repetition_penalty=1.2,
#     use_auth_token=HF_AUTH_TOKEN
#      )



In [9]:
run_model("scott filled a tray with juice and put it in a freezer. the next day, scott opened the freezer. how did the juice most likely change? \\n (a) it condensed. (b) it evaporated. (c) it became a gas. (d) it became a solid.")

['it condensed.']

In [11]:
run_model("which is best conductor? \\n (a) iron (b) feather (c) wood (d) plastic",
         temperature=0.9, num_beams=20)

['iron']

In [14]:
client.get_settings()

Settings(environment='', chroma_db_impl=None, chroma_api_impl='chromadb.api.fastapi.FastAPI', chroma_telemetry_impl='chromadb.telemetry.posthog.Posthog', chroma_sysdb_impl='chromadb.db.impl.sqlite.SqliteDB', chroma_producer_impl='chromadb.db.impl.sqlite.SqliteDB', chroma_consumer_impl='chromadb.db.impl.sqlite.SqliteDB', chroma_segment_manager_impl='chromadb.segment.impl.manager.local.LocalSegmentManager', tenant_id='default', topic_namespace='default', is_persistent=False, persist_directory='./chroma', chroma_server_host='localhost', chroma_server_headers={}, chroma_server_http_port=8000, chroma_server_ssl_enabled=False, chroma_server_grpc_port=None, chroma_server_cors_allow_origins=[], chroma_server_auth_provider=None, chroma_server_auth_configuration_provider=None, chroma_server_auth_configuration_file=None, chroma_server_auth_credentials_provider=None, chroma_server_auth_credentials_file=None, chroma_server_auth_credentials=None, chroma_client_auth_provider=None, chroma_server_auth_