A Spring Boot application that demonstrates how to use Spring AI with PostgreSQL's pgvector extension to store and query PDF documents using vector embeddings.
- PDF Document Processing: Automatically loads and processes PDF documents (Spring Boot and Spring Web reference guides)
- Vector Embeddings: Uses OpenAI's embedding model to create vector representations of document content
- PostgreSQL Vector Store: Stores embeddings in PostgreSQL using the pgvector extension
- Semantic Search: Provides REST API endpoints to query documents using natural language
- Intelligent Chunking: Splits documents into optimal chunks for better retrieval
- Java 17
- Spring Boot 3.5.5
- Spring AI 1.0.1
- PostgreSQL with pgvector extension
- OpenAI GPT-5 Nano
- Gradle
- Java 17 or higher
- Docker and Docker Compose
- OpenAI API key
git clone <repository-url>
cd test-openai-pgvector
Set your OpenAI API key:
export OPENAI_API_KEY=your_openai_api_key_here
docker-compose up -d
This will start a PostgreSQL database with the pgvector extension on port 5432.
./gradlew bootRun
The application will:
- Automatically create the required database schema
- Load and process PDF documents from
src/main/resources/docs/
- Generate embeddings and store them in the vector database
Once the application is running, you can query the documents using the REST API:
# Ask a question about Spring Boot
curl "http://localhost:8080/ai/ask?message=What%20is%20Spring%20Boot?"
# Ask about Spring Web
curl "http://localhost:8080/ai/ask?message=How%20do%20I%20create%20a%20REST%20controller?"
Query the document store with natural language questions.
Parameters:
message
(optional): The question to ask. Defaults to "What is Spring Boot"
Example:
curl "http://localhost:8080/ai/ask?message=How%20do%20I%20configure%20Spring%20Boot?"
The application configuration can be found in src/main/resources/application.properties
:
- OpenAI Configuration: API key and model settings
- Database Configuration: PostgreSQL connection details
- Vector Store Configuration: pgvector index and distance settings
# OpenAI settings
spring.ai.openai.api-key=${OPENAI_API_KEY:openai_api_key}
spring.ai.openai.chat.options.model=gpt-5-nano
spring.ai.openai.chat.options.temperature=1.0
# PostgreSQL connection
spring.datasource.url=jdbc:postgresql://localhost:5432/samplevectordb
spring.datasource.username=postgres
spring.datasource.password=postgres
# Vector store configuration
spring.vectorstore.pgvector.index-type=hnsw
spring.vectorstore.pgvector.distance-type=cosine_distance
spring.vectorstore.pgvector.dimensions=1536
src/
├── main/
│ ├── java/com/springai/test_openai_pgvector/
│ │ ├── TestOpenaiPgvectorApplication.java # Main Spring Boot application
│ │ ├── AskController.java # REST controller for queries
│ │ └── PdfLoader.java # PDF processing and loading
│ └── resources/
│ ├── application.properties # Application configuration
│ ├── docker-compose.yml # PostgreSQL setup
│ ├── schema.sql # Database schema
│ ├── docs/ # PDF documents to process
│ └── prompts/ # AI prompt templates
- Document Loading: The
PdfLoader
component automatically processes PDF documents on application startup - Text Extraction: PDFs are read page by page and text is extracted
- Chunking: Documents are split into smaller chunks using a token-based text splitter
- Embedding Generation: Each chunk is converted to a vector embedding using OpenAI's embedding model
- Vector Storage: Embeddings are stored in PostgreSQL with metadata
- Query Processing: When a question is asked, the system:
- Converts the question to an embedding
- Performs similarity search to find relevant document chunks
- Uses the retrieved context to generate an answer via OpenAI
The application uses a single table to store document embeddings:
CREATE TABLE vector_store (
id uuid DEFAULT uuid_generate_v4() PRIMARY KEY,
content text,
metadata json,
embedding vector(1536)
);
./gradlew build
./gradlew test
To add new PDF documents:
- Place PDF files in
src/main/resources/docs/
- Update the
PdfLoader
class to reference the new resources - Restart the application
- OpenAI API Key: Ensure your API key is set correctly
- Database Connection: Make sure PostgreSQL is running and accessible
- Memory Issues: Large PDFs may require increased JVM heap size
Enable debug logging for vector store operations:
logging.level.org.springframework.ai.vectorstore=DEBUG
This project is for demonstration purposes. Please ensure you comply with OpenAI's usage policies and any applicable licenses for the PDF documents used.