In this notebook, we build a Retrieval Augmented Generation (RAG) system using Llama 3, LangChain, and ChromaDB. The goal is to enable question-answering over external documents (not part of the model’s training data) without fine-tuning the Large Language Model (LLM).
The system follows a two-step process:
- Retrieval – relevant document chunks are fetched from a vector database.
- Generation – the LLM uses the retrieved context to produce accurate answers.
This implementation uses a real-world dataset (EU AI Act 2023) to demonstrate how RAG improves factual correctness and reduces hallucinations.
- LLM (Large Language Model) – A deep learning model trained on large text corpora for NLP tasks
- Llama 3 – Open-source LLM by Meta, used here via HuggingFace Transformers
- LangChain – Framework for building LLM-powered applications with chaining and orchestration
- Vector Database – Stores high-dimensional embeddings for semantic search
- ChromaDB – Lightweight, persistent vector database used for document retrieval
- Embeddings – Numerical vector representations of text generated using Sentence Transformers
- RAG (Retrieval Augmented Generation) – Combines retrieval systems with LLMs for grounded responses
- Model: Llama 3
- Variant: 8B Chat (HuggingFace format)
- Framework: Transformers
- Optimization: Quantization using
bitsandbytesfor efficient inference
The model is integrated via a HuggingFace pipeline and used as the generator in the RAG system.
The RAG pipeline in this notebook consists of:
- Source: EU AI Act (2023)
- Documents are split into smaller chunks for efficient retrieval
- Uses Sentence Transformers / HuggingFace embeddings
- Converts text chunks into vector representations
- Stored in ChromaDB with persistence enabled
- Enables fast semantic similarity search
- LangChain connects the retriever and LLM
- Relevant chunks are retrieved and passed to Llama 3 for answer generation
Large Language Models are powerful but limited to their training data and may hallucinate when handling unseen information.
RAG solves this by integrating external knowledge sources:
- Retriever: Finds relevant document chunks using embeddings and vector similarity
- Generator: Produces answers grounded in retrieved context
This approach ensures:
- More accurate answers
- Reduced hallucination
- Up-to-date and domain-specific knowledge integration
In this notebook, we demonstrate how combining LangChain, ChromaDB, and Llama 3 creates an effective RAG system capable of answering questions about the EU AI Act with improved reliability. Future improvements can focus on better embeddings, chunking strategies, and advanced retrieval techniques.
Future Work ⚡✨
To further enhance the solution, we will focus on refining the RAG implementation. This will involve optimizing the document embeddings and exploring the use of more intricate RAG architectures.