# Extractive QnA

So far, we have seen how to load our documents, split them into Nodes, create Embeddings for the Nodes, insert them into Vector database, and then finally query them for a given input query.

In this Getting Started guide, we are going to see how to use LLM to do an extractive QnA on the returned nodes, and display result directly. We are going to use service and components that we have already covered in the guide to build this workflow.

Let's get started.

In [1]:
# prepare the node embeddings for the Paul Graham's essay:
# 1. Load the Paul Graham essays from data/data-loader directory using `file` DataLoader
# 2. Convert it into Nodes using sentence_splitter
# 3. Enrich node embeddings using the sentence_transformers
import os
from pathlib import Path
from bodhilib import (
    get_data_loader,
    get_splitter,
    get_embedder,
    get_vector_db,
    Distance,
)

# Get data directory path and add it to data_loader
current_dir = current_working_directory = Path(os.getcwd())
data_dir = current_dir / ".." / "data" / "data-loader"
data_loader = get_data_loader("file")
data_loader.add_resource(dir=str(data_dir))
docs = data_loader.load()
splitter = get_splitter("text_splitter", max_len=300, overlap=30)
nodes = splitter.split(docs)
embedder = get_embedder("sentence_transformers")
_ = embedder.embed(nodes)
collection_name = "test_collection"
vector_db = get_vector_db("qdrant", location=":memory:")
if "test_collection" in vector_db.get_collections():
    vector_db.delete_collection("test_collection")
vector_db.create_collection(
    collection_name=collection_name,
    dimension=embedder.dimension,
    distance=Distance.COSINE,
)
_ = vector_db.upsert(collection_name, nodes)
input_query = "According to Paul Graham, how to tackle when you are in doubt?"
embedding = embedder.embed(input_query)
result = vector_db.query(collection_name, embedding[0].embedding, limit=5)

In [2]:
# Create the prompt template for extracting answer from given text chunks
from bodhilib import PromptTemplate

template = """Below are the text chunks from a blog/article. 
1. Read and understand the text chunks
2. After the text chunks, there are list of questions starting with `Question:`
3. Answer the questions from the information given in the text chunks
4. If you don't find the answer in the provided text chunks, say 'I couldn't find the answer to this question in the given text'

{% for text in texts %}
### START
{{ text }}
### END
{% endfor %}

Question: {{ query }}
Answer: 
"""
prompt_template = PromptTemplate(template=template, format='jinja2')

In [3]:
texts = [r.text for r in result]
prompt = prompt_template.to_prompts(texts=texts, query=input_query)

In [4]:
# OpenAI API setup
import os
from getpass import getpass

if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass('Enter your OpenAI API key: ')

In [5]:
# get the OpenAI LLM service instance
from bodhilib import get_llm

llm = get_llm('openai_chat', model='gpt-3.5-turbo')

In [6]:
response = llm.generate(prompt)

In [7]:
print(response.text)

According to the given text chunks, when you are in doubt, you should optimize for interestingness and give different types of work a chance to show you what they're like. You should also try lots of things, meet lots of people, read lots of books, and ask lots of questions.


---
🎉 We just created a flow for Extractive QnA using different bodhilib components.

The Extractive QnA flow is so frequently used, that bodhiext provides an implementation for this flow in form of BodhiEngine. Let's check out [BodhiEngine](BodhiEngine) next.