Skip to content

Swastik466/document-intelligence-rag-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

document-intelligence-rag-system

Table of Contents

Overview

Features

High-Level Architecture

Technologies Used

Setup and Installation

Prerequisites

Google Cloud Project Setup

Local Project Setup

Usage

Future Enhancements

License

Contact

Overview

This project demonstrates a simple yet powerful Retrieval Augmented Generation (RAG) system using Java, leveraging Google Cloud's Vertex AI for embeddings and the Gemini API for generative AI capabilities. It allows you to ingest PDF documents, create a knowledge base from their content, and then query that knowledge base to get answers augmented by the retrieved context.

The application serves as a foundational example for building enterprise-grade RAG solutions to enable chatbots, document summarization, and intelligent search over private or domain-specific data.

Features

PDF Document Ingestion: Extracts text content from PDF files.

Vector Embeddings: Utilizes Vertex AI's text-embedding-004 model to create vector representations of document chunks.

In-Memory Vector Store: Stores document embeddings and their corresponding text chunks in a simple, in-memory vector database.

Semantic Search: Performs cosine similarity search to retrieve the most relevant document chunks based on a user's query.

Retrieval Augmented Generation (RAG): Augments user queries with retrieved context from documents before sending to the Gemini LLM for more accurate and grounded responses.

Generative AI: Integrates with the Google Gemini API (gemini-2.5-flash-001) for natural language understanding and generation.

High-Level Architecture

The RAG pipeline operates in two main phases:

Ingestion Phase:

PDF documents are read and text content is extracted.

The extracted text is split into manageable chunks.

Each chunk is sent to Vertex AI's text-embedding-004 model to generate a vector embedding.

The chunk text and its embedding are stored in an in-memory vector store.

Query Phase:

A user's natural language query is received.

The query is sent to Vertex AI's text-embedding-004 model to generate its embedding.

This query embedding is used to find the most semantically similar chunks in the in-memory vector store.

The original query is then augmented with the retrieved context (relevant chunks).

The augmented prompt is sent to the Gemini model (gemini-2.5-flash).

Gemini generates a response based on the query and the provided context.

image

Technologies Used

Java 11+

Apache Maven

Google Cloud Platform (GCP)

Vertex AI: For text-embedding-004 and Gemini models.

Google Gen AI SDK for Java (com.google.genai:google-genai)

Google Cloud Client Libraries for Java (com.google.cloud:google-cloud-aiplatform)

Apache PDFBox: For PDF text extraction.

slf4j-simple: For basic logging.

Setup and Installation

Prerequisites

Before you begin, ensure you have the following installed:

Java Development Kit (JDK) 11 or higher

Apache Maven 3.x

Google Cloud SDK (gcloud CLI): Authenticated and configured for your GCP project.

Install: Google Cloud SDK

Authenticate: gcloud auth application-default login

Google Cloud Project Setup

Create a Google Cloud Project: If you don't have one, create a new project in the Google Cloud Console.

Enable APIs:

Navigate to "APIs & Services" > "Enabled APIs & Services".

Enable the following APIs:

Vertex AI API

Set Environment Variables: The application automatically picks up your Google Cloud Project ID and Region from environment variables set by gcloud. Ensure these are set:

Bash

export GOOGLE_CLOUD_PROJECT="your-gcp-project-id"

export GOOGLE_CLOUD_LOCATION="us-central1" # Or your preferred region (e.g., "asia-south1")' Replace "your-gcp-project-id" with your actual project ID. us-central1 is a common region for Vertex AI models.

Local Project Setup

Clone the Repository:

Bash

git clone https://github.com/Swastik466/document-intelligence-rag-system.git

cd document-intelligence-rag-system

Place PDF Documents: Create a pdfs directory in the root of the project and place your PDF files inside it.

├── pom.xml
├── src/
│   └── main/
│       └── java/
│           └── com/
│               └── example/
│                   └── rag/
│                       ├── App.java
│                       ├── ChunkingUtil.java
│                       ├── EmbeddingService.java
│                       ├── GeminiService.java
│                       └── VectorStore.java
└── pdfs/
    ├── document1.pdf
    └── document2.pdf

The application is configured to look for PDF files in this pdfs/ directory.

Build the Project: Navigate to the project root directory in your terminal and build the executable JAR:

[Bash]

mvn clean install

This command compiles the code, runs tests, and packages the application into a single executable JAR file (named document-rag-1.0-SNAPSHOT-jar-with-dependencies.jar) in the target/ directory.

Usage

After successful build, you can run the application from the target/ directory.

[Bash]

java -jar target/document-rag-1.0-SNAPSHOT-jar-with-dependencies.jar

The application will:

Ingest and process all PDF files found in the pdfs/ directory.

Build the in-memory vector store.

Prompt you to enter questions.

Future Enhancements

Persistent Vector Database: Replace the in-memory vector store with a more robust solution like Cloud SQL with pgvector or Vertex AI Vector Search for scalability and persistence.

Google Cloud Storage Integration: Allow ingesting PDFs directly from a GCS bucket instead of local files.

REST API: Wrap the RAG functionality in a Spring Boot (or similar) REST API for easier integration with front-end applications.

User Interface: Develop a simple web UI for uploading PDFs and asking questions.

Error Handling and Robustness: Enhance error handling, retry mechanisms, and logging for production readiness.

Advanced Chunking: Implement more sophisticated chunking strategies (e.g., using overlap, semantic chunking).

Multi-modal Input: Extend to handle other document types (images, presentations) using Gemini's multi-modal capabilities.

Deployment: Containerize the application (Docker) and deploy to platforms like Cloud Run or GKE.

License

This project is licensed under the MIT License

Contact

For any questions or collaborations, feel free to reach out:

https://github.com/Swastik466

Email: swasbits@gmail.com

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages