

> # ü§ñ **Building a QA Bot Leveraging LangChain and LLM for Intelligent Document Querying**
### üß† *Prepared by: Abdessamad Bourkibate*  
### üéì *IBM Certified in AI Engineering with Python, PyTorch & TensorFlow*  

---

## ü™∂ **Project Description**

This project develops an intelligent, interactive **Question‚ÄìAnswering (QA) bot** that applies **Retrieval-Augmented Generation (RAG)** techniques within the **LangChain** framework.

The system allows users to upload research documents (PDFs) and automatically:
- Segment their contents into semantically meaningful chunks for deep analysis.  
- Generate advanced **vector embeddings** to enable precise information retrieval.  
- Store all processed data in a **vector database (ChromaDB)** for efficient, context-aware searching.

When a user submits a question, the bot retrieves the most relevant text segments and formulates a coherent, source-grounded answer using a **Large Language Model (LLM)**.  
This ensures responses are both **accurate** and **faithful** to the original document content.

---

## üéØ **Objectives**
- Integrate **AI engineering** principles with **retrieval-based intelligence** for document comprehension.  
- Build a **modular pipeline** combining ingestion, embedding, and LLM-driven generation.  
- Provide researchers with a **practical tool** for efficient information extraction and validation.  
- Demonstrate the synergy between **LangChain**, **RAG**, and **LLM architectures** in real-world scenarios.

---

## ‚öôÔ∏è **Core Technologies**
- **Python 3.10+**  
- **LangChain Framework**  
- **PyTorch / TensorFlow**  
- **ChromaDB (Vector Storage)**  
- **OpenAI or Hugging Face LLMs**

---

## üöÄ **Impact**
This solution empowers **researchers and data professionals** to rapidly extract verified insights from extensive text corpora.  
By bridging the gap between raw information and structured reasoning, it enhances **academic productivity** and **information quality assurance** in the digital research era.

---

> üß≠ *‚ÄúTransforming raw documents into intelligent dialogue ‚Äî where AI becomes the researcher‚Äôs trusted companion.‚Äù*




In [None]:
!pip install langchain chromadb gradio pypdf
!pip install langchain-openai
!pip install -U langchain langchain-community
!pip install -U langchain langchain-community pypdf
!pip install -U langchain langchain-community








In [None]:
!pip show langchain
!pip show langchain-community


Name: langchain
Version: 1.0.3
Summary: Building applications with LLMs through composability
Home-page: https://docs.langchain.com/
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.12/dist-packages
Requires: langchain-core, langgraph, pydantic
Required-by: 
Name: langchain-community
Version: 0.4.1
Summary: Community contributed LangChain integrations.
Home-page: 
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.12/dist-packages
Requires: aiohttp, dataclasses-json, httpx-sse, langchain-classic, langchain-core, langsmith, numpy, pydantic-settings, PyYAML, requests, SQLAlchemy, tenacity
Required-by: 


In [None]:
!wget "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/WgM1DaUn2SYPcCg_It57tA/A-Comprehensive-Review-of-Low-Rank-Adaptation-in-Large-Language-Models-for-Efficient-Parameter-Tuning-1.pdf" -O paper.pdf


--2025-11-02 02:06:07--  https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/WgM1DaUn2SYPcCg_It57tA/A-Comprehensive-Review-of-Low-Rank-Adaptation-in-Large-Language-Models-for-Efficient-Parameter-Tuning-1.pdf
Resolving cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)... 198.23.119.245
Connecting to cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)|198.23.119.245|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 353955 (346K) [application/pdf]
Saving to: ‚Äòpaper.pdf‚Äô


2025-11-02 02:06:07 (4.86 MB/s) - ‚Äòpaper.pdf‚Äô saved [353955/353955]



In [None]:
from langchain_community.document_loaders import PyPDFLoader

pdf_path = "paper.pdf"
loader = PyPDFLoader(pdf_path)
documents = loader.load()


In [None]:
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(documents)
print(len(docs))
print(docs[0].page_content[:500])


11
A Comprehensive Review of Low-Rank
Adaptation in Large Language Models for
Efficient Parameter Tuning
September 10, 2024
Abstract
Natural Language Processing (NLP) often involves pre-training large
models on extensive datasets and then adapting them for specific tasks
through fine-tuning. However, as these models grow larger, like GPT-3
with 175 billion parameters, fully fine-tuning them becomes computa-
tionally expensive. We propose a novel method called LoRA (Low-Rank
Adaptation) that signifi


In [None]:
!pip install sentence-transformers

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings()
embedded_docs = embeddings.embed_documents([doc.page_content for doc in docs[:5]])
print("Embeddings created")




  embeddings = HuggingFaceEmbeddings()
  embeddings = HuggingFaceEmbeddings()
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Embeddings created


In [None]:
from langchain_community.vectorstores import Chroma

db = Chroma.from_documents(docs, embeddings)
print("Vector database created")


Vector database created


In [None]:
query = "What is LoRA adaptation in large language models?"

results = db.similarity_search(query, k=3)
for i, res in enumerate(results):
    print(f"Result {i+1}:\n{res.page_content}\n")


Result 1:
A Comprehensive Review of Low-Rank
Adaptation in Large Language Models for
Efficient Parameter Tuning
September 10, 2024
Abstract
Natural Language Processing (NLP) often involves pre-training large
models on extensive datasets and then adapting them for specific tasks
through fine-tuning. However, as these models grow larger, like GPT-3
with 175 billion parameters, fully fine-tuning them becomes computa-
tionally expensive. We propose a novel method called LoRA (Low-Rank
Adaptation) that significantly reduces the overhead by freezing the orig-
inal model weights and only training small rank decomposition matrices.
This leads to up to 10,000 times fewer trainable parameters and reduces
GPU memory usage by three times. LoRA not only maintains but some-
times surpasses fine-tuning performance on models like RoBERTa, De-
BERTa, GPT-2, and GPT-3. Unlike other methods, LoRA introduces
no extra latency during inference, making it more efficient for practical
applications. All releva

In [None]:
import gradio as gr

def answer_query(query, docs=docs, db=db, k=3):
    results = db.similarity_search(query, k=k)
    answer = results[0].page_content if results else "No answer found."
    return answer

iface = gr.Interface(
    fn=answer_query,
    inputs="text",
    outputs="text",
    title="LangChain QA Bot"
)

iface.launch()


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://3dc1c154a157dc0aed.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


