# **RAG-Using-Groq-LangChain-HuggingFace-and-ChromaDB**

### RAG Pipeline Overview

This project implements a **Retrieval-Augmented Generation (RAG)** pipeline using **LangChain**, **open-source Hugging Face embeddings**, **Chroma vector database**, and a **Groq-hosted LLM** for high-speed inference. The pipeline starts by loading a raw text document and splitting it into semantically meaningful chunks using a recursive text splitter. Each chunk is then converted into dense vector embeddings using a Sentence-Transformers model and stored in a Chroma vector store for efficient similarity-based retrieval.

At query time, the user’s question is embedded and compared against the stored vectors to retrieve the most relevant document chunks. These retrieved chunks are provided as contextual input to the LLM, which generates an answer grounded strictly in the source content. The system also returns the source documents used for generation, ensuring transparency and explainability. This architecture keeps retrieval fully open-source, decouples knowledge storage from generation, and enables scalable, accurate question-answering over custom text data.


# Introduction


## Objective

Build a **Retrieval-Augmented Generation (RAG)** based question–answering system using **LangChain**, **open-source Hugging Face embeddings**, **ChromaDB** as the vector database, and a **Groq-hosted LLM** for fast inference.  
The goal is to ask natural language questions over a custom text file (`biden-sotu-2023-planned-official.txt`) without fine-tuning any Large Language Model.

In a RAG setup, when a user asks a question, the system first retrieves the most relevant document chunks from a vector database and then uses an LLM to generate an answer grounded in those retrieved documents.

---

## Definitions

* **LLM (Large Language Model)** – A neural network trained on massive text corpora to understand and generate human-like language  
* **LangChain** – A framework for building applications powered by LLMs, especially retrieval-based pipelines  
* **Text Embeddings** – Dense vector representations of text capturing semantic meaning  
* **Vector Database** – A database optimized for storing and searching high-dimensional vectors  
* **ChromaDB** – An open-source vector database used for embedding storage and similarity search  
* **RAG (Retrieval-Augmented Generation)** – A technique that combines information retrieval with text generation  

---

## Model & Technology Stack

* **LLM Provider**: Groq  
* **LLM Model**: `llama-3.1-8b-instant`  
* **Embedding Model**: Hugging Face Sentence-Transformers (open source)  
* **Vector Store**: ChromaDB (local, persistent)  
* **Framework**: LangChain  
* **Environment**: Google Colab  

This setup ensures that **retrieval and storage remain fully open-source**, while Groq is used only for fast, cost-effective LLM inference.

---

## What is a Retrieval-Augmented Generation (RAG) System?

Large Language Models are highly capable when answering questions based on knowledge seen during training. However, they can hallucinate or produce incorrect answers when queried about private, recent, or domain-specific data they were never trained on.

A **RAG system** addresses this limitation by augmenting the LLM with an external knowledge source.

A RAG pipeline has two core components:

### 1. Retriever  
The retriever converts documents into embeddings and stores them in a vector database.  
At query time, the user’s question is embedded and compared against stored vectors to retrieve the most relevant document chunks.

In this project:
- Text is split into chunks using a recursive text splitter  
- Embeddings are generated using an open-source Hugging Face model  
- ChromaDB is used for similarity-based retrieval  

### 2. Generator  
The generator is an LLM that takes:
- The user query  
- The retrieved document chunks  

and produces a grounded, context-aware answer.

Here, a **Groq-hosted LLaMA 3.1 Instant model** is used for generation, ensuring fast responses while relying on retrieved content for factual accuracy.

LangChain orchestrates this entire flow, allowing retrieval and generation to be combined into a single, clean pipeline.

---

## Why This Architecture?

* Avoids LLM hallucination by grounding answers in source documents  
* No fine-tuning required  
* Works with private or custom data  
* Modular and extensible  
* Open-source embeddings and vector storage  
* Production-ready RAG pattern used in real-world systems  

---

## More About This

This notebook demonstrates a practical, end-to-end RAG workflow suitable for research, enterprise QA bots, and document intelligence systems.  
You can extend this pipeline with:
- Multiple documents  
- Metadata-based filtering  
- MLflow/DagsHub experiment tracking  
- Evaluation and feedback loops  

Refer to the **References** section for deeper reading on LangChain, ChromaDB, embeddings, and RAG system design.


## Project Pipeline Overview

This section outlines the complete project pipeline, broken down step by step, as implemented in the notebook to build a Retrieval-Augmented Generation (RAG) system.

### Introduction and Objective  
The project aims to build a **RAG-based question–answering system** using **LangChain**, **open-source Hugging Face embeddings**, **ChromaDB**, and a **Groq-hosted LLM** for fast inference.  
The system enables natural language querying over a custom text file (`biden-sotu-2023-planned-official.txt`) without fine-tuning any large language model.

### Environment Setup and Library Installation  
All required Python libraries are installed, including:
- `langchain`
- `langchain-community`
- `langchain-text-splitters`
- `chromadb`
- `sentence-transformers`
- `groq`
- `langchain-groq`  

These libraries collectively support document loading, text splitting, embeddings, vector storage, retrieval, and LLM-based generation.

### Imports & Environment Validation  
Necessary modules are imported, and the **Groq API key** is retrieved securely from Colab secrets (`GROQ_API_KEY`).  
The environment is validated to ensure the key is accessible and the Groq client can be initialized successfully.

### Load TXT Knowledge Base  
The custom knowledge source (`biden-sotu-2023-planned-official.txt`) is loaded into memory.  
This document serves as the external knowledge base over which the RAG system will operate.

### Chunk the Document  
The loaded text is split into smaller, semantically meaningful chunks using a **RecursiveCharacterTextSplitter**.  
Chunking helps manage context length limits and improves the quality of embedding-based retrieval.

### Create Embeddings  
A **Hugging Face embedding model** (e.g., `sentence-transformers/all-MiniLM-L6-v2`) is initialized to convert each text chunk into a dense vector representation that captures semantic meaning.

### Create Chroma Vector Database  
A **local, persistent ChromaDB vector store** is created using the text chunks and their embeddings.  
This vector database enables efficient similarity search during query time.

### Initialize Groq LLM  
The **ChatGroq** Large Language Model is initialized with:
- Model name (e.g., `llama-3.1-8b-instant`)
- Groq API key
- Temperature parameter for controlled response generation  

This LLM is responsible for generating final answers based on retrieved context.

### Build the RAG Pipeline  
The complete RAG pipeline is constructed using **LangChain’s `RetrievalQA`** chain.  
This step combines:
- The **ChromaDB retriever** for fetching relevant document chunks  
- The **Groq LLM** for generating grounded, context-aware answers  

### Ask Questions  
Users can now interact with the system by asking natural language questions.  
The pipeline retrieves relevant document sections and generates answers, optionally returning the source documents used for retrieval to ensure transparency and trust.


# **Create the Environment and Install Libraries**

In [None]:
!pip install --q -U langchain langchain-community langchain-core

In [17]:
# List all installed packages that contain the word "langchain"
!pip list | grep langchain

langchain                                1.2.4
langchain-classic                        1.0.1
langchain-community                      0.4.1
langchain-core                           1.2.6
langchain-groq                           1.1.1
langchain-text-splitters                 1.1.0


In [2]:
  # Install required Python packages for building a RAG (Retrieval-Augmented Generation) pipeline
  # using LangChain with ChromaDB vector store and Groq inference
  # The -U flag upgrades packages if already installed

!pip install -U \
langchain \
langchain-community \
langchain-text-splitters \
chromadb \
sentence-transformers \
groq \


Collecting langchain
  Downloading langchain-1.2.4-py3-none-any.whl.metadata (4.9 kB)
Collecting langchain-community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-text-splitters
  Downloading langchain_text_splitters-1.1.0-py3-none-any.whl.metadata (2.7 kB)
Collecting chromadb
  Downloading chromadb-1.4.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting groq
  Downloading groq-1.0.0-py3-none-any.whl.metadata (16 kB)
Collecting langchain-classic<2.0.0,>=1.0.0 (from langchain-community)
  Downloading langchain_classic-1.0.1-py3-none-any.whl.metadata (4.2 kB)
Collecting requests<3.0.0,>=2.32.5 (from langchain-community)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7.0,>=0.6.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.4.0-py3-none-any.whl.

In [1]:
! pip install --q langchain-groq

## **PHASE 1 — Imports & environment validation**

In [2]:
# Import required modules for building a RAG (Retrieval-Augmented Generation) pipeline
# using LangChain with HuggingFace embeddings, Chroma vector store, and Groq LLM

import os  # Standard library for environment variables and file paths

# Text splitting utilities for chunking documents into semantically coherent pieces
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Local embeddings using HuggingFace sentence-transformers models (offline-capable)
from langchain_community.embeddings import HuggingFaceEmbeddings

# ChromaDB vector database integration - persistent storage for document embeddings
from langchain_community.vectorstores import Chroma

# Groq API client wrapper for high-speed LLM inference (Mixtral/Llama3 models)
from langchain_groq.chat_models import ChatGroq



## **Validate Groq API key:**

In [3]:
from google.colab import userdata
groq_key = userdata.get("GROQ_API_KEY")
assert groq_key is not None, "GROQ_API_KEY not found in Colab secrets"

In [6]:
llm = ChatGroq(
    model_name="llama-3.1-8b-instant",
    temperature=0.2,
    groq_api_key=groq_key
)

In [7]:
llm.invoke("Say hello in one sentence.")


AIMessage(content='Hello, how can I assist you today?', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 41, 'total_tokens': 51, 'completion_time': 0.010869875, 'completion_tokens_details': None, 'prompt_time': 0.002017812, 'prompt_tokens_details': None, 'queue_time': 0.005650345, 'total_time': 0.012887687}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_9ca2574dca', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--019bc0dd-ec33-71c2-954a-174c856fdfd2-0', tool_calls=[], invalid_tool_calls=[], usage_metadata={'input_tokens': 41, 'output_tokens': 10, 'total_tokens': 51})

# **PHASE 2 — Load your TXT knowledge base**

In [8]:
# Define path to the input text file containing Biden's 2023 State of the Union address
# /content/ path indicates Google Colab environment with mounted Drive/dataset
file_path = "/content/biden-sotu-2023-planned-official.txt"

# Open file in read mode with UTF-8 encoding to handle special characters/quotes
# Context manager (with statement) ensures automatic file closure even if errors occur
with open(file_path, "r", encoding="utf-8") as f:
    raw_text = f.read()  # Read entire file content into string variable

# Display document statistics - character count helps assess text splitter chunk size needs
print("Document length (characters):", len(raw_text))


Document length (characters): 41661


## **PHASE 3 — Chunk the document**

In [9]:
# Initialize text splitter for RAG pipeline with optimal chunking parameters
# chunk_size=500: Creates ~500 char chunks (fits most embedding models' context)
# chunk_overlap=100: 20% overlap maintains context across chunk boundaries
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,      # Maximum characters per chunk
    chunk_overlap=100    # Overlapping characters between consecutive chunks
)

# Split raw SOTU document into semantically coherent text chunks
# Recursive splitter tries: paragraphs → sentences → words → characters
chunks = text_splitter.split_text(raw_text)

# Verify chunking results and preview formatting
print("Total chunks:", len(chunks))
print("Sample chunk:\n", chunks[0][:300])  # First 300 chars of first chunk


Total chunks: 105
Sample chunk:
 Mr. Speaker. Madam Vice President. Our First Lady and Second Gentleman. Members of Congress and the Cabinet. Leaders of our military. Mr. Chief Justice, Associate Justices, and retired Justices of the Supreme Court. And you, my fellow Americans. I start tonight by congratulating the members of the 1


# **PHASE 4 — Create embeddings**

In [10]:
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)


  embedding_model = HuggingFaceEmbeddings(


## **PHASE 5 — Create Chroma vector database (local, open source)**

In [11]:
# Create persistent Chroma vector database from text chunks for RAG semantic search
# Automatically generates embeddings for each chunk using the HuggingFace model
vectordb = Chroma.from_texts(
    texts=chunks,                    # List of pre-split text chunks from SOTU document
    embedding=embedding_model,       # Pre-initialized HuggingFaceEmbeddings instance
    persist_directory="/content/chroma_db"  # Local directory for persistent storage
)

# Convert vector database into retriever component for LangChain chains
# k=3 retrieves top 3 most semantically similar chunks per query
retriever = vectordb.as_retriever(search_kwargs={"k": 3})


## **PHASE 6 — Initialize Groq LLM**

In [12]:
# Initialize Groq LLM client for fast inference in RAG pipeline
# llama-3.1-8b-instant: Meta's 8B parameter model optimized for Groq LPUs (300+ tokens/sec)
llm = ChatGroq(
    model_name="llama-3.1-8b-instant",  # Fastest Groq model for real-time RAG responses
    groq_api_key=groq_key,              # API key from environment variable or direct input
    temperature=0.2                     # Low temperature for factual, consistent responses
)


## **PHASE 7 — Build the RAG pipeline (Retriever + LLM)**

In [13]:
# Change from: from langchain.chains import RetrievalQA
from langchain_classic.chains import RetrievalQA

# Create complete RAG pipeline combining retriever + LLM
# Automatically handles: embed query → retrieve chunks → generate answer
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,                           # Groq Llama 3.1 model for answer generation
    retriever=retriever,               # ChromaDB top-3 chunk retriever
    chain_type="stuff",                # Stuff all retrieved docs into single LLM prompt
    return_source_documents=True       # Include source chunks with every answer
)


## **PHASE 8 — Ask questions (RAG in action)**

In [14]:
query = "What were the main goals of the Biden administration mentioned in the speech?"

result = qa_chain.invoke(query)

print("Answer:\n", result["result"])


Answer:
 The main goals of the Biden administration mentioned in the speech are:

1. To restore the soul of the nation.
2. To rebuild the backbone of America, specifically the middle class.
3. To unite the country.

These goals are mentioned as part of the administration's vision for the country, and the speech suggests that the administration has been working to achieve these goals through bipartisan efforts in Congress.


In [15]:
print("\nSources used:")
for doc in result["source_documents"]:
    print("-", doc.page_content[:200], "...")



Sources used:
- work together in this new Congress. The people sent us a clear message. Fighting for the sake of fighting, power for the sake of power, conflict for the sake of conflict, gets us nowhere. And that’s a ...
- work together in this new Congress. The people sent us a clear message. Fighting for the sake of fighting, power for the sake of power, conflict for the sake of conflict, gets us nowhere. And that’s a ...
- America. When I came to office, most everyone assumed bipartisanship was impossible. But I never believed it. That’s why a year ago, I offered a Unity Agenda for the nation. We’ve made real progress.  ...


In [16]:
result = qa_chain.invoke({"query": query})
print(result["result"])


The main goals of the Biden administration mentioned in the speech are:

1. To restore the soul of the nation.
2. To rebuild the backbone of America, specifically the middle class.
3. To unite the country.

Additionally, the speech mentions specific actions taken by the administration, such as:

1. Making it easier for doctors to prescribe effective treatments for opioid addiction.
2. Passing a gun safety law making historic investments in mental health.
3. Launching ARPA-H to drive breakthroughs in the fight against cancer, Alzheimer's, diabetes, and other diseases.

These goals and actions are part of the administration's vision for the country and its efforts to "finish the job" that the people sent them to do.
