# 🔎 Hands-On: Retrieval-Augmented Generation (RAG) with LangChain + Chroma

**Last updated:** 2025-09-08 23:15

**Why this topic?**
- Bridges the gap between *playing with LLMs* and building **end-to-end applications**.
- Introduces **retrieval pipelines, embeddings, and vector databases**.
- **LangChain (or LlamaIndex)** orchestrates these components in a reproducible workflow.

**Agenda (45–60 min)**
1. Intro: Why RAG? Reducing hallucinations by grounding in data
2. Install & Setup
3. Load & Chunk Documents
4. Store/Retrieve with Chroma
5. Connect an LLM (HF or OpenAI)
6. Pipeline test (ask Qs about docs)
7. Mini-experiments: embedding swap, chunk size sensitivity
8. Log reproducibility
9. Wrap-up tasks

In [1]:
# Install dependencies
!pip -q install -U langchain langchain-community chromadb sentence-transformers pypdf transformers accelerate
# Optional OpenAI
# %pip -q install -U openai tiktoken langchain-openai

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m46.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.8/19.8 MB[0m [31m56.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.5/310.5 kB[0m [31m19.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.2/284.2 kB[0m [31m19.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m58.7 MB/s[0m eta [36m0:00:0

In [2]:
import json, sys, platform, os, chromadb, transformers, sentence_transformers
try:
    import torch
    torch_v = torch.__version__
    cuda_ok = torch.cuda.is_available()
    device_name = torch.cuda.get_device_name(0) if cuda_ok else "CPU"
except:
    torch_v, cuda_ok, device_name = "N/A", False, "CPU"

env = {
    "python": sys.version,
    "platform": platform.platform(),
    "torch": torch_v,
    "cuda": cuda_ok,
    "device": device_name,
    "transformers": transformers.__version__,
    "sentence_transformers": sentence_transformers.__version__,
    "chromadb": chromadb.__version__
}
print(json.dumps(env, indent=2))
with open("env_rag.json","w") as f: json.dump(env, f, indent=2)

{
  "python": "3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0]",
  "platform": "Linux-6.1.123+-x86_64-with-glibc2.35",
  "torch": "2.8.0+cu126",
  "cuda": false,
  "device": "CPU",
  "transformers": "4.56.1",
  "sentence_transformers": "5.1.0",
  "chromadb": "1.0.21"
}


In [3]:
sample_text = """
CS 5588 – Data Science Capstone: This course explores practical GenAI systems including LLMs,
retrieval-augmented generation (RAG), LangChain toolchains, vector databases, Stable Diffusion,
and parameter-efficient fine-tuning (LoRA). Students will develop a research-grade, reproducible
capstone prototype with clear milestones and evaluation.
"""
with open("sample.txt","w", encoding="utf-8") as f: f.write(sample_text)
print("Created sample.txt")

Created sample.txt


In [4]:
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pathlib import Path

docs = TextLoader("sample.txt", encoding="utf-8").load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = splitter.split_documents(docs)
print("Chunks:", len(chunks))
print("First chunk:\n", chunks[0].page_content[:300])

Chunks: 1
First chunk:
 CS 5588 – Data Science Capstone: This course explores practical GenAI systems including LLMs,
retrieval-augmented generation (RAG), LangChain toolchains, vector databases, Stable Diffusion,
and parameter-efficient fine-tuning (LoRA). Students will develop a research-grade, reproducible
capstone prot


In [5]:
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

emb = SentenceTransformerEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectordb = Chroma.from_documents(chunks, emb, persist_directory="chroma_minilm")
retriever = vectordb.as_retriever(search_kwargs={"k": 4})
print("Chroma DB ready")

  emb = SentenceTransformerEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Chroma DB ready


In [6]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_community.llms import HuggingFacePipeline

MODEL_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"  # fallback: "distilgpt2"
tok = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID)
pipe = pipeline("text-generation", model=model, tokenizer=tok, max_new_tokens=200)
llm = HuggingFacePipeline(pipeline=pipe)
print("LLM ready:", MODEL_ID)

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Device set to use cpu


LLM ready: TinyLlama/TinyLlama-1.1B-Chat-v1.0


  llm = HuggingFacePipeline(pipeline=pipe)


In [7]:
from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, chain_type="stuff")
q = "What does this course focus on?"
print("Q:", q)
print("A:", qa.run(q))

Q: What does this course focus on?


  print("A:", qa.run(q))


A: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

CS 5588 – Data Science Capstone: This course explores practical GenAI systems including LLMs,
retrieval-augmented generation (RAG), LangChain toolchains, vector databases, Stable Diffusion,
and parameter-efficient fine-tuning (LoRA). Students will develop a research-grade, reproducible
capstone prototype with clear milestones and evaluation.

Question: What does this course focus on?
Helpful Answer: This course focuses on practical GenAI systems. Students will learn about LLMs, retrieval-augmented generation, vector databases, Stable Diffusion, and parameter-efficient fine-tuning. They will develop a research-grade, reproducible capstone prototype with clear milestones and evaluation.


In [8]:
emb_e5 = SentenceTransformerEmbeddings(model_name="intfloat/e5-small-v2")
vectordb_e5 = Chroma.from_documents(chunks, emb_e5, persist_directory="chroma_e5")
qa_e5 = RetrievalQA.from_chain_type(llm=llm, retriever=vectordb_e5.as_retriever(), chain_type="stuff")
print("MiniLM vs E5-small test:\n")
print("MiniLM:", qa.run("List two GenAI techniques emphasized."))
print("E5-small:", qa_e5.run("List two GenAI techniques emphasized."))

modules.json:   0%|          | 0.00/387 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

MiniLM vs E5-small test:

MiniLM: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

CS 5588 – Data Science Capstone: This course explores practical GenAI systems including LLMs,
retrieval-augmented generation (RAG), LangChain toolchains, vector databases, Stable Diffusion,
and parameter-efficient fine-tuning (LoRA). Students will develop a research-grade, reproducible
capstone prototype with clear milestones and evaluation.

Question: List two GenAI techniques emphasized.
Helpful Answer: LLMs (LangModels), retrieval-augmented generation (RAG), and LangChain toolchains (a.k.a. Neural Machine Translation) are two of the most critical GenAI techniques emphasized in this capstone.
E5-small: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

CS 5588 – Data Science Cap

In [9]:
splitter_small = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks_small = splitter_small.split_documents(docs)
vectordb_small = Chroma.from_documents(chunks_small, emb)
qa_small = RetrievalQA.from_chain_type(llm=llm, retriever=vectordb_small.as_retriever(), chain_type="stuff")
print("Default chunks:", qa.run("Summarize the course in one sentence."))
print("Smaller chunks:", qa_small.run("Summarize the course in one sentence."))

Default chunks: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

CS 5588 – Data Science Capstone: This course explores practical GenAI systems including LLMs,
retrieval-augmented generation (RAG), LangChain toolchains, vector databases, Stable Diffusion,
and parameter-efficient fine-tuning (LoRA). Students will develop a research-grade, reproducible
capstone prototype with clear milestones and evaluation.

Question: Summarize the course in one sentence.
Helpful Answer: "This course explores practical GenAI systems, including LLMs, retrieval-augmented generation, LangChain toolchains, vector databases, Stable Diffusion, and parameter-efficient fine-tuning."
Smaller chunks: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

CS 5588 – Data Science Capstone: This co

In [10]:
repro = {
    "embedding_models": ["all-MiniLM-L6-v2","intfloat/e5-small-v2"],
    "chunking": [{"size":500,"overlap":100},{"size":300,"overlap":50}],
    "llm": MODEL_ID
}
with open("rag_run_config.json","w") as f: json.dump(repro,f,indent=2)
print("Saved rag_run_config.json")

Saved rag_run_config.json


In [11]:
from langchain_community.embeddings import SentenceTransformerEmbeddings
emb_e5 = SentenceTransformerEmbeddings(model_name="intfloat/e5-small-v2")
db_e5 = Chroma.from_documents(chunks, emb_e5, persist_directory="chroma_e5")
qa_e5 = RetrievalQA.from_chain_type(llm=llm, retriever=db_e5.as_retriever(), chain_type="stuff")

print("MiniLM:", qa.invoke({"query":"List two GenAI techniques emphasized."}))
print("E5-small:", qa_e5.invoke({"query":"List two GenAI techniques emphasized."}))


MiniLM: {'query': 'List two GenAI techniques emphasized.', 'result': "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nCS 5588 – Data Science Capstone: This course explores practical GenAI systems including LLMs,\nretrieval-augmented generation (RAG), LangChain toolchains, vector databases, Stable Diffusion,\nand parameter-efficient fine-tuning (LoRA). Students will develop a research-grade, reproducible\ncapstone prototype with clear milestones and evaluation.\n\nQuestion: List two GenAI techniques emphasized.\nHelpful Answer: LLMs and RAG."}
E5-small: {'query': 'List two GenAI techniques emphasized.', 'result': "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nCS 5588 – Data Science Capstone: This course explores practical GenAI systems including LLMs,\n