Context-aware Q&A over a local medical knowledge base using FAISS vector search + HuggingFace or Groq-hosted LLMs.
This project implements a Retrieval-Augmented Generation (RAG) pipeline that lets you ask medical questions grounded in a curated PDF knowledge base (e.g. an encyclopedia of medicine). Instead of hallucinating, the LLM is constrained by retrieved passages from a FAISS vector store built from your documents.
Two primary entry points:
connect_memory_with_llm.py– CLI prototype using a HuggingFace Inference endpoint (e.g. Mistral 7B Instruct).medibot.py– Streamlit chat UI using a Groq-hosted model (Llama 4 Maverick) with retrieval.
- FAISS vector store for fast semantic retrieval
- SentenceTransformer embeddings (
all-MiniLM-L6-v2) – switchable to remote API mode - Modular prompt template injection
- Groq or HuggingFace LLM backends
- Source document traceability (shows which chunks supported the answer)
- Caching of vector store + embeddings via Streamlit resource cache
PDF(s) --> Text Splitter --> Embeddings --> FAISS Index (vectorstore/db_faiss)
│
User Query --> Retriever (top-k) ---------------┘
│
Prompt Assembly
│
LLM Generation (HF or Groq)
│
Answer + Source Chunks
| File | Role |
|---|---|
create_memory_for_llm.py |
Builds FAISS index from PDFs (embedding + persist) |
connect_memory_with_llm.py |
CLI RAG query using HuggingFaceEndpoint |
medibot.py |
Streamlit chat interface using Groq Chat model + FAISS retrieval |
vectorstore/db_faiss |
Persisted FAISS index (created beforehand) |
data/ |
PDF source documents |
Create a .env file (or export in shell):
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxx
GROQ_API_KEY=groq_xxxxxxxxxxxxxxxxxxx
Optional (future expansion):
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
HUGGINGFACE_REPO_ID=mistralai/Mistral-7B-Instruct-v0.3
Use the provided requirements.txt or pyproject.toml.
python3 -m venv .venv
source .venv/bin/activatepip install --upgrade pip
pip install -r requirements.txtIf using uv:
uv syncexport HF_TOKEN=hf_...yourtoken...
export GROQ_API_KEY=groq_...yourtoken...Or create a .env file and rely on dotenv where enabled.
If you have not yet created vectorstore/db_faiss, run the memory creation script (adjust name if different):
python create_memory_for_llm.pyThis should:
- Load PDFs from
data/ - Chunk text
- Embed chunks using
HuggingFaceEmbeddings - Persist FAISS index under
vectorstore/db_faiss
If the file does not yet exist, implement a pipeline similar to:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
loader = PyPDFLoader("data/The_GALE_ENCYCLOPEDIA_of_MEDICINE_SECOND.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
chunks = splitter.split_documents(docs)
emb = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
db = FAISS.from_documents(chunks, emb)
db.save_local("vectorstore/db_faiss")source .venv/bin/activate
export HF_TOKEN=... # if not in .env
python connect_memory_with_llm.pyEnter a query at the prompt: How is hypertension managed?
source .venv/bin/activate
export GROQ_API_KEY=groq_... # if not in .env
streamlit run medibot.pyOpen the URL shown (default: http://localhost:8501) and start chatting.
medibot.py includes:
get_vectorstore() # local model download
get_vectorstore_hf_api(token) # uses HuggingFace APITo switch:
vectorstore = get_vectorstore_hf_api(os.environ["HF_TOKEN"]) # replace get_vectorstore()Use this if your environment (e.g. limited disk) should call the HF Inference API instead of hosting the embedding model locally.
Modify CUSTOM_PROMPT_TEMPLATE in either script to adjust answer tone or style. Ensure variables {context} and {question} remain.
After building the vector store:
python - <<'PY'
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
emb = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
db = FAISS.load_local('vectorstore/db_faiss', emb, allow_dangerous_deserialization=True)
print('Index loaded. k=2 sample:\n', db.similarity_search('What is diabetes?', k=2))
PYBoth scripts request return_source_documents=True. The final output enumerates the raw Document objects; you can pretty-print them by iterating and showing doc.metadata + a trimmed doc.page_content.
Example enhancement snippet:
for i, d in enumerate(source_documents, 1):
snippet = d.page_content[:300].replace('\n', ' ')
print(f"[Source {i}] {snippet}...")| Symptom | Cause | Fix |
|---|---|---|
InferenceClient.text_generation() unexpected keyword 'token' |
Passing token in model_kwargs |
Use huggingfacehub_api_token arg (already fixed) |
FAISS.load_local ... file not found |
Vector store not built | Run memory creation script first |
| Empty / irrelevant answers | Too small k or chunk size mismatch |
Adjust search_kwargs={'k':5} or rebuild with better chunking |
| Hallucinations | LLM ignoring context | Tighten prompt, lower temperature, reduce max tokens |
HF_TOKEN not set error |
Missing env var | Export token or add to .env |
| Virtualenv mismatch warning | Old VIRTUAL_ENV exported |
deactivate then source .venv/bin/activate |
- Add multi-PDF ingestion (glob over
data/*.pdf) - Enable streaming tokens in UI
- Add OpenAI / Anthropic backend abstraction
- Persist chat history with sources
- Add evaluation harness (e.g. RAGAS) for answer faithfulness
This tool is for educational and reference purposes only. It does not provide medical advice, diagnosis, or treatment recommendations. Always consult a licensed healthcare professional for medical decisions.