A production-ready Retrieval-Augmented Generation (RAG) system for querying Java 25 documentation using natural language.
Ask questions about Java 25 in plain English and get accurate answers from official documentation.
- 💬 Interactive Web Chat UI - React-based interface with real-time updates
- 🤖 Flexible Model Support - Use self-hosted (Ollama) or paid APIs (OpenAI, Anthropic, Gemini)
- 📊 Multiple Vector Databases - ChromaDB, pgvector, or Qdrant
- 🔍 Full Observability - OpenTelemetry + Grafana stack for traces, metrics, and logs
- 🚀 Production Ready - Docker deployment, monitoring, and health checks
- 💰 Cost Optimized - Built-in token usage optimization strategies
Get up and running in 2 commands:
# 1. Start all services (app + ChromaDB + Ollama + observability stack)
docker-compose up -d
# 2. Open http://localhost:8080 in your browserThat's it! Docker Compose deploys everything: the a### Option 2: Local Development
For local development (running the app outside Docker):
Prerequisites: Java 17+, Maven 3.6+
# 1. Start infrastructure services only
docker-compose up -d chromadb ollama
# 2. Run the application locally
mvn spring-boot:run -Dspring-boot.run.profiles=dev
# 3. Open http://localhost:8080- Getting Started
- Configuration
- Docker Deployment
- Observability
- API Reference
- Advanced Topics
- Troubleshooting
- Start the application (see Quick Start above)
- Open your browser to
http://localhost:8080 - Type your question about Java 25 documentation
- View the answer with source references and syntax-highlighted code
Chat UI Features:
- ⚡ Real-time processing status updates via WebSocket
- 📝 Markdown rendering with syntax-highlighted code blocks
- 📚 Source references showing which docs were used
- 💾 Conversation history saved in browser
- 📱 Responsive design (desktop, tablet, mobile)
Before querying, ingest Java 25 documentation:
curl -X POST "http://localhost:8080/api/ingest?path=/path/to/java25/docs"Response:
{
"documentsProcessed": 45,
"chunksCreated": 523,
"processingTimeMs": 12450,
"status": "SUCCESS"
}Supported formats: Markdown (.md), HTML (.html), Plain text (.txt)
# Feature explanation
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{"question": "What are sealed classes in Java 25?"}'
# Code examples
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{"question": "Show me an example of pattern matching"}'
# Comparisons
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{"question": "Difference between records and regular classes?"}'Choose between self-hosted (free) or paid API models:
model:
provider: self-hosted
self-hosted:
base-url: http://localhost:11434
model-name: llama2 # or mistral, codellama, llama3Available models: llama2, mistral, codellama, llama3
model:
provider: openai
openai:
api-key: ${OPENAI_API_KEY}
model-name: gpt-4-turbo-preview # or gpt-3.5-turboexport OPENAI_API_KEY="sk-..."model:
provider: anthropic
anthropic:
api-key: ${ANTHROPIC_API_KEY}
model-name: claude-3-sonnet-20240229model:
provider: gemini
gemini:
api-key: ${GOOGLE_API_KEY}
model-name: gemini-pro # or gemini-1.5-pro, gemini-1.5-flashexport GOOGLE_API_KEY="your-api-key"ChromaDB (Recommended for getting started):
docker run -d -p 8000:8000 chromadb/chroma:latestPostgreSQL with pgvector:
docker run -d -p 5432:5432 \
-e POSTGRES_PASSWORD=yourpassword \
ankane/pgvector:latestQdrant:
docker run -d -p 6333:6333 qdrant/qdrant:latestdev- Self-hosted models + local ChromaDB + verbose loggingprod- Paid APIs + optimized settings + production loggingdocker- Docker-specific URLs and settings
mvn spring-boot:run -Dspring-boot.run.profiles=dev# Start all services (app + infrastructure + observability)
docker-compose up -d
# Check status
docker-compose ps
# View application logs
docker-compose logs -f my-java-genie-app
# View all logs
docker-compose logs -f
# Stop services
docker-compose downCore Services:
- chromadb (port 8000) - Vector database
- ollama (port 11434) - Self-hosted LLM runtime
- my-java-genie-app (port 8080) - Main application
Observability Stack:
- grafana (port 3000) - Dashboards (admin/admin)
- tempo (port 3200) - Distributed tracing
- mimir (port 9009) - Metrics storage
- loki (port 3100) - Log aggregation
- alloy (ports 4317, 4318) - OpenTelemetry collector
- Port: 8000
- Volume:
chroma-datafor persistent storage - Health Check: Automatic readiness verification
- Port: 11434
- Volume:
ollama-datafor model storage - GPU Support: Enabled by default (remove if no GPU available)
- Models: Pull models using
docker exec my-java-genie-ollama ollama pull llama2
- Port: 8080
- Profile: Uses
application-docker.ymlconfiguration - Automatic startup: Deployed with docker-compose
- Volumes:
./docsmounted as/app/docs(read-only)
- Ports: 4317 (OTLP gRPC), 4318 (OTLP HTTP), 12345 (UI)
- Configuration:
alloy-config.alloy - Purpose: Receives telemetry data and routes to Tempo, Mimir, and Loki
- Port: 3200
- Configuration:
tempo-config.yaml - Volume:
tempo-datafor trace storage - Retention: 48 hours (configurable)
- Port: 9009
- Configuration:
mimir-config.yaml - Volume:
mimir-datafor metrics storage - Retention: 7 days (configurable)
- Port: 3100
- Configuration:
loki-config.yaml - Volume:
loki-datafor log storage - Retention: 7 days (configurable)
- Port: 3000
- Credentials: admin/admin (change on first login)
- Volume:
grafana-datafor dashboards and settings - Pre-configured: Datasources and RAG System dashboard included
# Start all services (including observability stack)
docker-compose up -d
# Start only core services (without observability)
docker-compose up -d chromadb ollama my-java-genie-app
# Start only observability stack
docker-compose up -d alloy tempo mimir loki grafana
# View logs
docker-compose logs -f
docker-compose logs -f my-java-genie-app
docker-compose logs -f grafana
# Check service status
docker-compose ps
# Stop services
docker-compose down
# Stop and remove volumes (clean slate)
docker-compose down -v
# Rebuild application image
docker-compose build my-java-genie-app
docker-compose up -d my-java-genie-app
# Restart specific service
docker-compose restart grafana# List available models
docker exec my-java-genie-ollama ollama list
# Pull additional models
docker exec my-java-genie-ollama ollama pull mistral
docker exec my-java-genie-ollama ollama pull codellama
docker exec my-java-genie-ollama ollama pull llama3
# Remove a model
docker exec my-java-genie-ollama ollama rm llama2
# Check model info
docker exec my-java-genie-ollama ollama show llama2Override default configuration using environment variables:
# In docker-compose.yml or .env file
SPRING_PROFILES_ACTIVE=docker
VECTOR_DB_URL=http://chromadb:8000
MODEL_BASE_URL=http://ollama:11434
MODEL_NAME=llama2
COLLECTION_NAME=java25_docsMinimum (Core Services Only):
- CPU: 4 cores
- RAM: 8GB
- Disk: 20GB (for models and data)
Recommended (With Observability Stack):
- CPU: 8 cores
- RAM: 16GB
- GPU: NVIDIA GPU with 8GB+ VRAM (for faster inference)
- Disk: 50GB
Resource Breakdown:
- ChromaDB: ~200MB RAM
- Ollama: ~4-8GB RAM (depends on model)
- Application: ~512MB RAM
- Grafana Alloy: ~256MB RAM
- Tempo: ~512MB RAM
- Mimir: ~512MB RAM
- Loki: ~256MB RAM
- Grafana: ~256MB RAM
- Total: ~7-13GB RAM with full stack
If you have an NVIDIA GPU and want to use it with Ollama:
-
Install NVIDIA Container Toolkit
-
The
docker-compose.ymlalready includes GPU configuration:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]- If you don't have a GPU, remove or comment out the
deploysection indocker-compose.yml
Issue: Ollama container fails to start
# Check logs
docker logs my-java-genie-ollama
# If GPU error, remove GPU configuration from docker-compose.ymlIssue: ChromaDB connection refused
# Verify ChromaDB is running
docker ps | grep chromadb
# Check ChromaDB logs
docker logs my-java-genie-chromadb
# Test connection
curl http://localhost:8000/api/v1/heartbeatIssue: Application can't connect to services
# Ensure all services are on the same network
docker network inspect my-java-genie_rag-network
# Check service names resolve correctly
docker exec my-java-genie-application ping chromadb
docker exec my-java-genie-application ping ollamaIssue: Out of disk space
# Check Docker disk usage
docker system df
# Clean up unused resources
docker system prune -a --volumes
# Remove specific volumes
docker volume rm my-java-genie_ollama-dataThe system supports multiple language model providers. Configure via application.yml:
model:
provider: self-hosted
self-hosted:
base-url: http://localhost:11434
model-name: llama2 # or mistral, codellama, etc.
timeout-seconds: 60
temperature: 0.7
max-tokens: 500Supported Ollama Models:
llama2- General purpose, good balancemistral- Fast and efficientcodellama- Optimized for code understandingllama3- Latest version with improved capabilities
Setup:
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model
ollama pull llama2
# Start Ollama server
ollama servemodel:
provider: openai
openai:
api-key: ${OPENAI_API_KEY}
model-name: gpt-4-turbo-preview # or gpt-3.5-turbo
timeout-seconds: 30
temperature: 0.7
max-tokens: 500Supported Models:
gpt-4-turbo-preview- Best quality, higher costgpt-4- Excellent qualitygpt-3.5-turbo- Fast and cost-effective
Setup:
export OPENAI_API_KEY="sk-..."model:
provider: anthropic
anthropic:
api-key: ${ANTHROPIC_API_KEY}
model-name: claude-3-sonnet-20240229
timeout-seconds: 30
temperature: 0.7
max-tokens: 500Supported Models:
claude-3-opus-20240229- Highest capabilityclaude-3-sonnet-20240229- Balanced performanceclaude-3-haiku-20240307- Fast and efficient
model:
provider: gemini
gemini:
project-id: ${GOOGLE_CLOUD_PROJECT}
location: us-central1
model-name: gemini-pro
api-key: ${GOOGLE_API_KEY} # Alternative to ADC
timeout-seconds: 30
temperature: 0.7
max-tokens: 500Supported Models:
gemini-pro- Text-only model, optimized for text generationgemini-1.5-pro- Latest version with extended context windowgemini-1.5-flash- Faster, cost-effective option
Authentication Options:
- API Key (Simplest):
export GOOGLE_API_KEY="your-api-key-here"- Application Default Credentials (For GCP):
gcloud auth application-default login
export GOOGLE_CLOUD_PROJECT="your-project-id"- Service Account (Production):
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
export GOOGLE_CLOUD_PROJECT="your-project-id"Setup:
# Option 1: Using API Key
export GOOGLE_API_KEY="your-api-key"
# Option 2: Using Application Default Credentials
gcloud auth application-default login
export GOOGLE_CLOUD_PROJECT="your-project-id"
# Run the application
mvn spring-boot:run -Dspring-boot.run.profiles=prodFeatures:
- Automatic retry with exponential backoff (3 attempts)
- Token usage tracking and cost estimation
- Safety filter handling
- Rate limiting support
Docker Setup:
docker run -d \
--name chromadb \
-p 8000:8000 \
-v chroma-data:/chroma/chroma \
chromadb/chroma:latestConfiguration:
vector-db:
type: chroma
connection-url: http://localhost:8000
collection-name: java25_docs
chroma:
tenant: default_tenant
database: default_databaseDocker Setup:
docker run -d \
--name postgres-pgvector \
-e POSTGRES_PASSWORD=yourpassword \
-e POSTGRES_DB=ragdb \
-p 5432:5432 \
ankane/pgvector:latestConfiguration:
vector-db:
type: pgvector
pgvector:
host: localhost
port: 5432
database: ragdb
username: postgres
password: ${POSTGRES_PASSWORD}
schema: public
table-name: document_embeddingsDocker Setup:
docker run -d \
--name qdrant \
-p 6333:6333 \
-v qdrant-data:/qdrant/storage \
qdrant/qdrant:latestConfiguration:
vector-db:
type: qdrant
connection-url: http://localhost:6333
collection-name: java25_docs
qdrant:
api-key: ${QDRANT_API_KEY:}
use-tls: falseThe system includes pre-configured profiles:
- Self-hosted models (Ollama)
- Local ChromaDB
- Verbose logging
- Lower similarity threshold for experimentation
- Enabled caching
Usage:
mvn spring-boot:run -Dspring-boot.run.profiles=dev- Paid API models (OpenAI/Anthropic)
- Production vector database
- Optimized logging
- Higher similarity threshold for quality
- Extended cache TTL
Usage:
export OPENAI_API_KEY="your-key"
mvn spring-boot:run -Dspring-boot.run.profiles=prodCreate your own profile:
# Create application-custom.yml
cp src/main/resources/application-dev.yml src/main/resources/application-custom.yml
# Edit as needed
vim src/main/resources/application-custom.yml
# Run with custom profile
mvn spring-boot:run -Dspring-boot.run.profiles=customThe easiest way to interact with the RAG system is through the web-based chat interface.
- Start the application (see Quick Start)
- Open your browser to
http://localhost:8080 - Type your question about Java 25 in the chat input
- View the answer with source references
- Real-time Updates: WebSocket connection provides live processing status (embedding, searching, generating)
- Markdown Rendering: Formatted responses with syntax-highlighted code blocks
- Source References: Click to view which documentation sections were used
- Conversation History: Full session history maintained in browser
- Session Persistence: Conversations saved in localStorage
- Responsive Design: Works on desktop, tablet, and mobile devices
- Error Handling: Clear error messages with retry options
- Clear History: Reset conversation with one click
The UI is built with React and TypeScript:
- Frontend: React 18+ with TypeScript
- Communication: REST API for queries, WebSocket for real-time updates
- State Management: React Context API
- Markdown: react-markdown with syntax highlighting
- Deployment: Bundled into Spring Boot JAR or served separately
To develop the UI locally:
cd chat-ui
npm install
npm startThe development server runs on http://localhost:3000 and proxies API requests to http://localhost:8080.
Build the UI for production:
cd chat-ui
npm run buildThis builds the UI into ../src/main/resources/static/, which will be bundled into the Spring Boot JAR.
- Styling: Modify CSS files in
chat-ui/src/components/ - Themes: Change color schemes in CSS variables
- Code Highlighting: Adjust syntax highlighter theme in
MessageList.tsx - WebSocket: Configure reconnection behavior in
websocket.ts
For detailed information, see docs/CHAT_UI.md.
Before querying, you need to ingest Java 25 documentation:
Ingest from Directory:
curl -X POST "http://localhost:8080/api/ingest?path=/path/to/java25/docs"Response:
{
"documentsProcessed": 45,
"chunksCreated": 523,
"embeddingsGenerated": 523,
"failures": 0,
"processingTimeMs": 12450,
"status": "SUCCESS"
}Supported Formats:
- Markdown (
.md) - HTML (
.html) - Plain text (
.txt)
Ingestion Configuration:
ingestion:
chunk-size: 1000 # Characters per chunk
chunk-overlap: 200 # Overlap between chunks
batch-size: 100 # Embeddings per batchBasic Query:
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{
"question": "What are sealed classes in Java 25?"
}'Response:
{
"answer": "Sealed classes in Java 25 are classes that restrict which other classes can extend them. They provide more control over inheritance hierarchies...",
"sources": [
{
"filename": "sealed-classes.md",
"section": "Introduction",
"chunkIndex": 0
},
{
"filename": "sealed-classes.md",
"section": "Usage",
"chunkIndex": 2
}
],
"tokenUsage": {
"promptTokens": 1250,
"completionTokens": 180,
"totalTokens": 1430
},
"responseTimeMs": 2340
}Here are example queries with expected response types:
1. Feature Explanation
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{"question": "Explain record classes in Java 25"}'Expected: Detailed explanation of record classes, syntax, and use cases.
2. Code Examples
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{"question": "Show me an example of a sealed class"}'Expected: Code snippet demonstrating sealed class syntax with explanation.
3. Comparison Questions
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{"question": "What is the difference between records and regular classes?"}'Expected: Comparative analysis highlighting key differences.
4. Best Practices
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{"question": "When should I use pattern matching in switch statements?"}'Expected: Guidelines and recommendations for appropriate usage.
5. Troubleshooting
curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{"question": "Why am I getting a compilation error with sealed classes?"}'Expected: Common issues and solutions related to sealed classes.
The My Java Genie includes comprehensive observability through OpenTelemetry integration with the Grafana stack.
The system automatically collects and exports:
- Distributed Traces: Track request flow through the entire query pipeline
- Custom Metrics: Monitor query performance, token usage, and costs
- Correlated Logs: All logs include trace and span IDs for easy correlation
- Start the observability stack:
docker-compose up -dThis starts:
- Grafana Alloy: OpenTelemetry collector (ports 4317, 4318, 12345)
- Grafana Tempo: Distributed tracing backend (port 3200)
- Grafana Mimir: Prometheus-compatible metrics storage (port 9009)
- Grafana Loki: Log aggregation system (port 3100)
- Grafana: Unified visualization dashboard (port 3000)
-
Access Grafana at
http://localhost:3000- Username:
admin - Password:
admin
- Username:
-
View the pre-built RAG System Overview dashboard
Enable OpenTelemetry in application.yml:
opentelemetry:
enabled: true
service-name: my-java-genie
traces:
enabled: true
endpoint: http://localhost:4317
sampling-rate: 1.0 # 100% for development, reduce for production
metrics:
enabled: true
endpoint: http://localhost:4317
export-interval-millis: 60000
logs:
enabled: true
endpoint: http://localhost:4317This implementation uses Grafana's modern observability stack:
- Grafana Alloy (vs OpenTelemetry Collector): Native Grafana integration with better performance
- Grafana Tempo (vs Jaeger): More efficient storage and better scalability
- Grafana Mimir (vs Prometheus): Horizontally scalable with long-term retention
- Grafana Loki (vs ELK): Simpler architecture with lower resource requirements
- Grafana: http://localhost:3000 - Unified interface for all observability data
- Grafana Alloy: http://localhost:12345 - Collector status and configuration
- Tempo: http://localhost:3200 - Trace storage (accessed via Grafana)
- Mimir: http://localhost:9009 - Metrics storage (accessed via Grafana)
- Loki: http://localhost:3100 - Log storage (accessed via Grafana)
Traces show the complete flow of a query through the system:
process-query (root span)
├── embed-query (embedding generation)
├── vector-search (similarity search)
├── build-prompt (prompt construction)
└── llm-generate (LLM API call)
View traces in Grafana:
- Go to Explore → Select Tempo datasource
- Search by service name, operation, or duration
- Click on a trace to see detailed span information
- View span attributes (query text, tokens, model, etc.)
- Jump to related logs using trace correlation
Available metrics:
rag.query.duration- Query processing time (histogram)rag.query.total- Total queries processed (counter)rag.query.errors- Failed queries (counter)rag.tokens.prompt- Prompt tokens per query (histogram)rag.tokens.completion- Completion tokens per query (histogram)rag.tokens.cost- Estimated cost in USD (histogram)
Query metrics in Grafana:
- Go to Explore → Select Mimir datasource
- Use PromQL queries:
# 95th percentile query duration
histogram_quantile(0.95, sum(rate(rag_query_duration_bucket[5m])) by (le, provider))
# Query rate by status
sum(rate(rag_query_total[5m])) by (status)
# Token usage by provider
histogram_quantile(0.95, sum(rate(rag_tokens_prompt_bucket[5m])) by (le, provider))
All logs include trace context for easy correlation:
View logs in Grafana:
- Go to Explore → Select Loki datasource
- Use LogQL queries:
# All logs from the RAG system
{service_name="my-java-genie"}
# Error logs only
{service_name="my-java-genie"} |= "ERROR"
# Logs for a specific trace
{service_name="my-java-genie"} | json | trace_id="abc123"
Jump between traces and logs:
- From a trace in Tempo, click "Logs for this span" to see related logs
- From logs in Loki, click on trace_id to view the full trace
The RAG System Overview dashboard includes:
- Query Duration (p50, p95, p99) - Track response times
- Query Rate by Status - Monitor success/error rates
- Token Usage (p95) - Track prompt and completion tokens
- Estimated Token Cost - Monitor API costs over time
- Error Rate by Type - Identify common errors
- Service Graph - Visualize dependencies
Access: Dashboards → RAG System → RAG System Overview
For production, reduce sampling to minimize overhead:
opentelemetry:
traces:
sampling-rate: 0.1 # Sample 10% of tracesDefault retention periods (configurable):
- Tempo: 48 hours
- Mimir: 7 days
- Loki: 7 days
Adjust in tempo-config.yaml, mimir-config.yaml, and loki-config.yaml.
- Use TLS for production OTLP endpoints
- Secure Grafana with proper authentication
- Restrict access to observability UIs
- Use API keys for Grafana datasources
- Batch processing reduces export overhead
- Async export doesn't block application threads
- Memory limits prevent resource exhaustion
- Fallback to no-op if export fails
For detailed information, see docs/OBSERVABILITY.md and OBSERVABILITY_QUICKSTART.md.
The system implements several strategies to minimize token usage and reduce costs:
Configuration:
query:
max-retrieved-chunks: 5 # Limit context sizeImpact: Reduces prompt tokens by limiting retrieved context. Balance between cost and answer quality.
Recommendation:
- Development: 5-7 chunks
- Production: 3-5 chunks
- Complex queries: 7-10 chunks
Configuration:
query:
similarity-threshold: 0.75 # Filter low-relevance chunksImpact: Only includes highly relevant chunks, reducing noise and token count.
Recommendation:
- Strict (0.80+): High precision, may miss context
- Balanced (0.70-0.80): Good trade-off
- Lenient (0.60-0.70): More context, higher cost
Configuration:
model:
max-tokens: 500 # Limit response lengthImpact: Caps completion tokens per response.
Recommendation:
- Brief answers: 300-500 tokens
- Detailed explanations: 500-800 tokens
- Comprehensive responses: 800-1500 tokens
Configuration:
model:
temperature: 0.7 # Control randomnessImpact: Lower temperature (0.3-0.5) produces more focused, deterministic responses with potentially fewer tokens.
Recommendation:
- Factual queries: 0.3-0.5
- Creative explanations: 0.7-0.9
Configuration:
query:
enable-cache: true
cache-ttl-minutes: 120Impact: Caches responses for identical queries, eliminating redundant API calls.
Recommendation: Enable in production for frequently asked questions.
For ingestion, the system processes embeddings in batches:
Configuration:
ingestion:
batch-size: 100 # Embeddings per batchImpact: Reduces API calls during document ingestion.
Track token consumption via metrics:
# Access metrics endpoint
curl http://localhost:8080/actuator/metrics/rag.tokens.total
# View token usage in Docker logs
docker-compose logs -f my-java-genie-app | grep "Token usage"Cost Estimation (OpenAI GPT-4 Turbo):
- Input: $0.01 per 1K tokens
- Output: $0.03 per 1K tokens
- Average query: ~1500 input + 300 output = $0.024
Monthly Cost Example:
- 1000 queries/month: ~$24
- 10,000 queries/month: ~$240
Endpoint: GET /api/health
Response:
{
"status": "UP",
"components": {
"languageModel": {
"status": "UP",
"provider": "openai",
"model": "gpt-4-turbo-preview"
},
"vectorDatabase": {
"status": "UP",
"type": "chroma",
"collection": "java25_docs"
}
}
}Endpoint: POST /api/query
Request:
{
"question": "string (required)"
}Response:
{
"answer": "string",
"sources": [
{
"filename": "string",
"section": "string",
"chunkIndex": "integer"
}
],
"tokenUsage": {
"promptTokens": "integer",
"completionTokens": "integer",
"totalTokens": "integer"
},
"responseTimeMs": "long"
}Status Codes:
200 OK: Successful query400 Bad Request: Invalid request format503 Service Unavailable: Model or vector DB unavailable504 Gateway Timeout: Query exceeded timeout
Endpoint: POST /api/ingest
Parameters:
path(query parameter): Directory path containing documents
Response:
{
"documentsProcessed": "integer",
"chunksCreated": "integer",
"embeddingsGenerated": "integer",
"failures": "integer",
"processingTimeMs": "long",
"status": "string"
}The system exposes Spring Boot Actuator endpoints:
Health: GET /actuator/health
curl http://localhost:8080/actuator/healthMetrics: GET /actuator/metrics
# List all metrics
curl http://localhost:8080/actuator/metrics
# Specific metric
curl http://localhost:8080/actuator/metrics/rag.query.durationPrometheus: GET /actuator/prometheus
curl http://localhost:8080/actuator/prometheusLog Levels:
logging:
level:
br.com.arquivolivre.myjavagenie: DEBUG # Application logs
br.com.arquivolivre.myjavagenie.service: DEBUG # Service layer
dev.langchain4j: WARN # LangChain4j libraryView Logs:
# Docker logs (recommended)
docker-compose logs -f my-java-genie-app
# Filter for errors
docker-compose logs my-java-genie-app | grep ERROR
# Filter for specific patterns
docker-compose logs my-java-genie-app | grep "Token usage"
# Local development logs
mvn spring-boot:run | grep ERROR1. "Language Model unavailable"
Cause: Model provider not running or misconfigured.
Solution:
# For Ollama
ollama serve
ollama list # Verify model is pulled
# For OpenAI
echo $OPENAI_API_KEY # Verify API key is set
# For Gemini
echo $GOOGLE_API_KEY # Verify API key is set
# OR
echo $GOOGLE_CLOUD_PROJECT # Verify project ID for ADC
gcloud auth application-default login2. "Vector Database connection failed"
Cause: Vector DB not running or wrong connection URL.
Solution:
# Check ChromaDB
curl http://localhost:8000/api/v1/heartbeat
# Check Docker container
docker ps | grep chroma
docker logs chromadb3. "No relevant documents found"
Cause: Documents not ingested or similarity threshold too high.
Solution:
# Verify ingestion
curl http://localhost:8080/api/health
# Lower threshold in config
query:
similarity-threshold: 0.6 # Lower from 0.754. "Query timeout"
Cause: Model response too slow or timeout too short.
Solution:
query:
timeout-seconds: 20 # Increase from 10
model:
self-hosted:
timeout-seconds: 90 # Increase for self-hosted
gemini:
timeout-seconds: 30 # Adjust for Gemini5. High Token Usage
Cause: Too many chunks or large responses.
Solution:
query:
max-retrieved-chunks: 3 # Reduce from 5
similarity-threshold: 0.80 # Increase threshold
model:
max-tokens: 300 # Reduce from 5006. Chat UI Not Loading
Cause: Static files not built or not found.
Solution:
# Rebuild the UI
cd chat-ui
npm run build
# Verify static files exist
ls -la src/main/resources/static/
# Rebuild Docker image
docker-compose build my-java-genie-app
docker-compose up -d my-java-genie-app7. WebSocket Connection Failed
Cause: WebSocket endpoint not accessible or CORS issues.
Solution:
# Check WebSocket endpoint
curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" \
http://localhost:8080/ws/chat?sessionId=test
# Check application logs for WebSocket errors
docker-compose logs -f my-java-genie-app | grep WebSocket
# Verify CORS configuration in application.yml8. Chat UI Shows Old Data
Cause: Browser cache or stale session.
Solution:
- Clear browser cache (Ctrl+Shift+Delete)
- Clear localStorage: Open browser console and run
localStorage.clear() - Hard refresh: Ctrl+Shift+R (Windows/Linux) or Cmd+Shift+R (Mac)
9. Traces Not Appearing in Grafana
Cause: OpenTelemetry not configured or Alloy not running.
Solution:
# Check OpenTelemetry is enabled
grep "opentelemetry.enabled" src/main/resources/application.yml
# Verify Alloy is running
docker ps | grep alloy
curl http://localhost:12345/ready
# Check Alloy logs
docker logs my-java-genie-alloy
# Verify Tempo is receiving traces
docker logs my-java-genie-tempo
curl http://localhost:3200/api/search10. Metrics Not Updating
Cause: Mimir not running or metrics not being exported.
Solution:
# Check Mimir status
docker logs my-java-genie-mimir
curl http://localhost:9009/ready
# Verify Alloy is forwarding metrics
curl http://localhost:12345/metrics
# Check metric export interval in application.yml
# Query Mimir directly
curl 'http://localhost:9009/prometheus/api/v1/query?query=up'11. Logs Not Appearing in Loki
Cause: Loki not running or log format incorrect.
Solution:
# Check Loki status
docker logs my-java-genie-loki
curl http://localhost:3100/ready
# Verify log format includes trace_id
docker-compose logs my-java-genie-app | grep trace_id
# Query Loki directly
curl 'http://localhost:3100/loki/api/v1/labels'12. Grafana Can't Connect to Datasources
Cause: Datasources not configured or services not reachable.
Solution:
# Check all observability services are running
docker-compose ps
# Verify network connectivity
docker exec my-java-genie-grafana ping tempo
docker exec my-java-genie-grafana ping mimir
docker exec my-java-genie-grafana ping loki
# Check Grafana datasource configuration
# Go to Grafana → Configuration → Data Sources
# Test each datasource connection13. Gemini API Authentication Failed
Cause: Invalid API key or missing credentials.
Solution:
# Verify API key is set
echo $GOOGLE_API_KEY
# Or verify ADC is configured
gcloud auth application-default print-access-token
# Check project ID
echo $GOOGLE_CLOUD_PROJECT
# Verify credentials file exists (if using service account)
echo $GOOGLE_APPLICATION_CREDENTIALS
ls -la $GOOGLE_APPLICATION_CREDENTIALS14. Gemini Rate Limiting
Cause: Exceeded API quota or rate limits.
Solution:
- Check Google Cloud Console for quota limits
- Implement request throttling in application
- Upgrade to higher quota tier if needed
- The system automatically retries with exponential backoff
15. Gemini Safety Filters Triggered
Cause: Query or response triggered content safety filters.
Solution:
- Review the query content
- Check application logs for safety filter details
- Adjust query phrasing if needed
- Consider using different model variant
Enable verbose logging:
logging:
level:
br.com.arquivolivre.myjavagenie: DEBUG
dev.langchain4j: DEBUG
io.opentelemetry: DEBUG # For OpenTelemetry debuggingOr via command line:
mvn spring-boot:run -Dlogging.level.br.com.arquivolivre.myjavagenie=DEBUGIf issues persist:
- Check the health endpoint:
curl http://localhost:8080/actuator/health - Review application logs:
docker-compose logs -f my-java-genie-app - View traces in Grafana for detailed request flow
- Consult specific documentation:
- docs/TROUBLESHOOTING.md - Comprehensive troubleshooting guide
- docs/CHAT_UI.md - Chat UI specific issues
- docs/OBSERVABILITY.md - Observability details
- OBSERVABILITY_QUICKSTART.md - Quick observability setup
The following files support Docker deployment:
- docker-compose.yml: Main Docker Compose configuration (deploys all services)
- Dockerfile: Multi-stage build for the Java application
- .dockerignore: Files to exclude from Docker build context
- .env.example: Example environment variables (copy to
.env) - src/main/resources/application-docker.yml: Docker-specific configuration
- docs/DOCKER.md: Comprehensive Docker deployment guide
my-java-genie/
├── docker-compose.yml # Docker orchestration
├── Dockerfile # Application container
├── pom.xml # Maven configuration
│
├── src/main/
│ ├── java/ # Java source code
│ │ └── br/com/arquivolivre/myjavagenie/
│ │ ├── config/ # Configuration classes
│ │ ├── controller/ # REST API endpoints
│ │ ├── service/ # Business logic
│ │ ├── repository/ # Data access layer
│ │ ├── model/ # Domain models & DTOs
│ │ └── exception/ # Custom exceptions
│ │
│ └── resources/
│ ├── application.yml # Default configuration
│ ├── application-dev.yml # Development profile
│ ├── application-prod.yml # Production profile
│ └── application-docker.yml # Docker profile
│
├── chat-ui/ # React TypeScript UI
│ ├── src/ # React components & services
│ └── package.json
│
├── grafana/ # Observability stack
│ ├── dashboards/ # Pre-built dashboards
│ └── provisioning/ # Datasource configs
│
├── docs/ # Documentation
│ ├── CHAT_UI.md
│ ├── OBSERVABILITY.md
│ ├── TROUBLESHOOTING.md
│ └── DOCKER.md
│
└── *.yaml # Observability configs
├── alloy-config.alloy # OpenTelemetry collector
├── tempo-config.yaml # Distributed tracing
├── mimir-config.yaml # Metrics storage
└── loki-config.yaml # Log aggregation
This project follows clean code principles and SOLID design. When contributing:
- Follow existing code structure and naming conventions
- Write unit tests for new functionality
- Update documentation for configuration changes
- Ensure all tests pass before submitting
This project is licensed under the MIT License - see the LICENSE file for details.
Copyright (c) 2025 Thiago Gonzaga
For issues and questions:
- Check the Troubleshooting section
- Review application logs:
docker-compose logs -f my-java-genie-app - Check health endpoint:
http://localhost:8080/api/health
