Title: Build a Context-Aware Question Answering Chatbot using Free GenAI
Models and Vector Store
Objective:
Design and implement a RAG-based chatbot system that answers user questions using custom
knowledge (PDF, text, or dataset). It should utilize free HuggingFace models, vector database
(FAISS or ChromaDB), and demonstrate a clear flow from data ingestion → embedding →
retrieval → generation.

Task Breakdown:
1. Data Ingestion (10 marks)
● Load your own text dataset (e.g., a collection of articles, FAQs, or PDF chunks).
● Preprocess and split it into meaningful chunks (using CharacterTextSplitter or
similar).

2. Embedding (10 marks)
● Use a free sentence transformer like all-MiniLM-L6-v2 to convert chunks into
embeddings.
● Store them in FAISS or ChromaDB.

3. Retriever Logic (10 marks)
● Implement a similarity search to fetch top-k relevant chunks for a given query.
● Justify your choice of similarity metric (cosine vs L2).
4. Response Generation (20 marks)
Use a free instruct-tuned model from HuggingFace (e.g., google/flan-t5-base,
mistralai/Mistral-7B-Instruct-v0.1, or tiiuae/falcon-rw-1b) to generate
answers.

In [41]:
#1
with open('/content/data.txt') as f:
  text = f.read()
  print(text)

Q: What is Generative AI?
A: Generative AI refers to artificial intelligence systems that can produce text, images, audio, or other data. It uses models like GPT or DALL·E to generate human-like content.

Q: What is a RAG-based chatbot?
A: A RAG (Retrieval-Augmented Generation) chatbot retrieves relevant information from a knowledge base and generates an answer using a language model.

Q: What is embedding in NLP?
A: Embedding is a technique that transforms text into numerical vectors to capture semantic meaning, allowing models to compare similarity between text chunks.

Q: What is ChromaDB?
A: ChromaDB is a fast and open-source vector database used to store and search embeddings efficiently. It’s commonly used in RAG-based architectures.

Q: What is FLAN-T5?
A: FLAN-T5 is a fine-tuned version of the T5 language model, trained to follow instructions across many tasks like question answering and summarization.

Q: Applications of Generative AI?
A: Generative AI is used in content creat

In [42]:
#2
!pip install langchain



In [43]:
from langchain.text_splitter import CharacterTextSplitter
split = CharacterTextSplitter(
    separator="\n",
    chunk_size=1000,
    chunk_overlap=200,
    length_function = len
)
chunks = split.split_text(text)

In [44]:
print(chunks)
len(chunks)

['Q: What is Generative AI?\nA: Generative AI refers to artificial intelligence systems that can produce text, images, audio, or other data. It uses models like GPT or DALL·E to generate human-like content.\nQ: What is a RAG-based chatbot?\nA: A RAG (Retrieval-Augmented Generation) chatbot retrieves relevant information from a knowledge base and generates an answer using a language model.\nQ: What is embedding in NLP?\nA: Embedding is a technique that transforms text into numerical vectors to capture semantic meaning, allowing models to compare similarity between text chunks.\nQ: What is ChromaDB?\nA: ChromaDB is a fast and open-source vector database used to store and search embeddings efficiently. It’s commonly used in RAG-based architectures.\nQ: What is FLAN-T5?\nA: FLAN-T5 is a fine-tuned version of the T5 language model, trained to follow instructions across many tasks like question answering and summarization.\nQ: Applications of Generative AI?', 'A: FLAN-T5 is a fine-tuned vers

2

In [46]:
#3
!pip install -U langchain-community



In [47]:
from langchain.embeddings import HuggingFaceEmbeddings
embedding_model = HuggingFaceEmbeddings(model_name="multi-qa-MiniLM-L6-cos-v1")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/383 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [48]:
from langchain.vectorstores import Chroma

vectorDB = Chroma.from_texts(
    texts=chunks,
    embedding=embedding_model,
    collection_name="dharani"
)

In [49]:
query = "What are technologies reshaping industries?"
docs = vectorDB.similarity_search(query, k=3)
top_chunks = [doc.page_content for doc in docs]
print("Top retrieved chunks: ", top_chunks)

Top retrieved chunks:  ['Q: What is Generative AI?\nA: Generative AI refers to artificial intelligence systems that can produce text, images, audio, or other data. It uses models like GPT or DALL·E to generate human-like content.\nQ: What is a RAG-based chatbot?\nA: A RAG (Retrieval-Augmented Generation) chatbot retrieves relevant information from a knowledge base and generates an answer using a language model.\nQ: What is embedding in NLP?\nA: Embedding is a technique that transforms text into numerical vectors to capture semantic meaning, allowing models to compare similarity between text chunks.\nQ: What is ChromaDB?\nA: ChromaDB is a fast and open-source vector database used to store and search embeddings efficiently. It’s commonly used in RAG-based architectures.\nQ: What is FLAN-T5?\nA: FLAN-T5 is a fine-tuned version of the T5 language model, trained to follow instructions across many tasks like question answering and summarization.\nQ: Applications of Generative AI?', 'A: FLAN-

In [52]:
from transformers import pipeline
generator = pipeline("text2text-generation",model="google/flan-t5-base")
context = "\n".join(top_chunks)
prompt = f"answer the question based on the context. context: {context} question: {query}"
result = generator(prompt, max_length=200)
print("Generated Answer:", result[0]['generated_text'])

Device set to use cpu
Both `max_new_tokens` (=256) and `max_length`(=200) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Generated Answer: Generative AI is used in content creation, chatbots, design automation, drug discovery, and personalized learning tools.


In [53]:


demo.launch(share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://e327480d54c5d6cd85.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


