This project demonstrates a simple Retrieval-Augmented Generation (RAG) pipeline using Node.js, LangChain, OpenAI, and Qdrant. The goal is to index PDF documents, store their embeddings in a vector database (Qdrant), and perform semantic search and question answering over the indexed content.
- Project Structure
- Setup Instructions
- Environment Variables
- Code Explanation
- Docker Compose
- How It Works
- Troubleshooting
- License
rag/
├── .env
├── .gitignore
├── docker-compose.yml
├── indexing.js
├── query.js
├── package.json
├── package-lock.json
└── (PDF files to index)
-
Clone the repository:
git clone https://github.com/Samrat880/RAG-Assignment.git cd rag
-
Install dependencies:
npm install
Install all required packages by running:
npm install @langchain/community @langchain/core @langchain/openai pdf-parse qdrant-client dotenv
Packages used:
@langchain/community
– LangChain community integrations@langchain/core
– LangChain core utilities@langchain/openai
– OpenAI integration for LangChainpdf-parse
– PDF text extractionqdrant-client
– Qdrant vector database clientdotenv
– Loads environment variables from.env
file
-
Set up environment variables:
- Create a
.env
file in therag
directory.
- Create a
-
Start Qdrant (Vector Database) using Docker:
docker-compose up -d
-
Add your PDF files to the
rag
directory or specify their path inindexing.js
. -
Index your documents:
node indexing.js
-
Query your documents:
node query.js
Create a .env
file in the rag
directory with the following content:
OPENAI_API_KEY=your_openai_api_key_here
QDRANT_URL=http://localhost:6333
OPENAI_API_KEY
: Your OpenAI API key for embedding and language model calls.QDRANT_URL
: URL for your local Qdrant instance (default:http://localhost:6333
).
This script reads PDF files, splits them into chunks, generates embeddings using OpenAI, and stores them in Qdrant.
Key Steps:
-
Load Environment Variables:
Usesdotenv
to load API keys and configuration. -
Read PDF Files:
Usespdf-parse
to extract text from PDF files. -
Chunk Text:
Splits the extracted text into manageable chunks for embedding. -
Generate Embeddings:
Uses OpenAI's embedding API (via LangChain) to convert text chunks into vector representations. -
Store in Qdrant:
Connects to Qdrant and stores each chunk's embedding with metadata (e.g., file name, chunk index).
Example Code Snippet:
// Load environment variables
require('dotenv').config();
// Read and parse PDF
const pdf = require('pdf-parse');
const fs = require('fs');
const dataBuffer = fs.readFileSync('yourfile.pdf');
const data = await pdf(dataBuffer);
// Chunk text and generate embeddings
// ... (chunking logic)
// ... (embedding logic)
// Store in Qdrant
// ... (Qdrant client logic)
This script allows you to query the indexed documents using natural language.
Key Steps:
-
Load Environment Variables:
Loads API keys and configuration. -
Accept User Query:
Takes a question from the user (via command line or hardcoded). -
Generate Query Embedding:
Converts the user query into an embedding using OpenAI. -
Search Qdrant:
Finds the most similar document chunks in Qdrant. -
Generate Answer:
Uses OpenAI's language model to generate an answer based on the retrieved chunks.
Example Code Snippet:
// Load environment variables
require('dotenv').config();
// Accept user query
const query = "What is the main topic of the document?";
// Generate embedding and search Qdrant
// ... (embedding and search logic)
// Generate answer using OpenAI
// ... (completion logic)
The docker-compose.yml
file is used to run Qdrant locally:
version: "3.8"
services:
qdrant:
image: qdrant/qdrant
ports:
- "6333:6333"
Start Qdrant with:
docker-compose up -d
-
Indexing:
- PDF files are read and split into chunks.
- Each chunk is embedded using OpenAI and stored in Qdrant with metadata.
-
Querying:
- User submits a question.
- The question is embedded and used to search for similar chunks in Qdrant.
- The most relevant chunks are passed to OpenAI's language model to generate a final answer.
-
Missing API Key:
Ensure your.env
file contains a validOPENAI_API_KEY
. -
Qdrant Connection Issues:
Make sure Qdrant is running (docker-compose up -d
) andQDRANT_URL
is correct. -
Dependency Conflicts:
If you see npm errors, try installing with--legacy-peer-deps
:npm install --legacy-peer-deps
This project is for educational purposes.