RAG (Retrieval-Augmented Generation) Project

This project demonstrates a simple Retrieval-Augmented Generation (RAG) pipeline using Node.js, LangChain, OpenAI, and Qdrant. The goal is to index PDF documents, store their embeddings in a vector database (Qdrant), and perform semantic search and question answering over the indexed content.

Project Structure

rag/
├── .env
├── .gitignore
├── docker-compose.yml
├── indexing.js
├── query.js
├── package.json
├── package-lock.json
└── (PDF files to index)

Setup Instructions

Clone the repository:

git clone https://github.com/Samrat880/RAG-Assignment.git
cd rag

Install dependencies:
```
npm install
```
Installation

Install all required packages by running:

npm install @langchain/community @langchain/core @langchain/openai pdf-parse qdrant-client dotenv

Packages used:

@langchain/community – LangChain community integrations
@langchain/core – LangChain core utilities
@langchain/openai – OpenAI integration for LangChain
pdf-parse – PDF text extraction
qdrant-client – Qdrant vector database client
dotenv – Loads environment variables from .env file

Set up environment variables:
- Create a .env file in the rag directory.
Start Qdrant (Vector Database) using Docker:
```
docker-compose up -d
```
Add your PDF files to the rag directory or specify their path in indexing.js.
Index your documents:
```
node indexing.js
```
Query your documents:
```
node query.js
```

Environment Variables

Create a .env file in the rag directory with the following content:

OPENAI_API_KEY=your_openai_api_key_here
QDRANT_URL=http://localhost:6333

OPENAI_API_KEY: Your OpenAI API key for embedding and language model calls.
QDRANT_URL: URL for your local Qdrant instance (default: http://localhost:6333).

Code Explanation

indexing.js

This script reads PDF files, splits them into chunks, generates embeddings using OpenAI, and stores them in Qdrant.

Key Steps:

Load Environment Variables:
Uses dotenv to load API keys and configuration.
Read PDF Files:
Uses pdf-parse to extract text from PDF files.
Chunk Text:
Splits the extracted text into manageable chunks for embedding.
Generate Embeddings:
Uses OpenAI's embedding API (via LangChain) to convert text chunks into vector representations.
Store in Qdrant:
Connects to Qdrant and stores each chunk's embedding with metadata (e.g., file name, chunk index).

Example Code Snippet:

// Load environment variables
require('dotenv').config();

// Read and parse PDF
const pdf = require('pdf-parse');
const fs = require('fs');
const dataBuffer = fs.readFileSync('yourfile.pdf');
const data = await pdf(dataBuffer);

// Chunk text and generate embeddings
// ... (chunking logic)
// ... (embedding logic)

// Store in Qdrant
// ... (Qdrant client logic)

query.js

This script allows you to query the indexed documents using natural language.

Key Steps:

Load Environment Variables:
Loads API keys and configuration.
Accept User Query:
Takes a question from the user (via command line or hardcoded).
Generate Query Embedding:
Converts the user query into an embedding using OpenAI.
Search Qdrant:
Finds the most similar document chunks in Qdrant.
Generate Answer:
Uses OpenAI's language model to generate an answer based on the retrieved chunks.

Example Code Snippet:

// Load environment variables
require('dotenv').config();

// Accept user query
const query = "What is the main topic of the document?";

// Generate embedding and search Qdrant
// ... (embedding and search logic)

// Generate answer using OpenAI
// ... (completion logic)

Docker Compose

The docker-compose.yml file is used to run Qdrant locally:

version: "3.8"
services:
  qdrant:
    image: qdrant/qdrant
    ports:
      - "6333:6333"

Start Qdrant with:

docker-compose up -d

How It Works

Indexing:
- PDF files are read and split into chunks.
- Each chunk is embedded using OpenAI and stored in Qdrant with metadata.
Querying:
- User submits a question.
- The question is embedded and used to search for similar chunks in Qdrant.
- The most relevant chunks are passed to OpenAI's language model to generate a final answer.

Troubleshooting

Missing API Key:
Ensure your .env file contains a valid OPENAI_API_KEY.
Qdrant Connection Issues:
Make sure Qdrant is running (docker-compose up -d) and QDRANT_URL is correct.
Dependency Conflicts:
If you see npm errors, try installing with --legacy-peer-deps:
```
npm install --legacy-peer-deps
```

License

This project is for educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
rag		rag
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG (Retrieval-Augmented Generation) Project

Table of Contents

Project Structure

Setup Instructions

Installation

Environment Variables

Code Explanation

indexing.js

query.js

Docker Compose

How It Works

Troubleshooting

License

About

Uh oh!

Releases

Packages

Languages

Samrat880/RAG-Assignment

Folders and files

Latest commit

History

Repository files navigation

RAG (Retrieval-Augmented Generation) Project

Table of Contents

Project Structure

Setup Instructions

Installation

Environment Variables

Code Explanation

indexing.js

query.js

Docker Compose

How It Works

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages