ForgeQueue

Distributed Job Processing Platform (Horizontally Scalable by Design)

ForgeQueue is a distributed job processing system that allows users to submit long-running tasks—such as report generation, file processing, or data analysis—without waiting for immediate results. When a task is submitted, the system returns a job ID and processes it asynchronously in the background.

Users can query job status at any time (QUEUED, PROCESSING, COMPLETED, and DEAD_LETTER). The system includes automatic retry with backoff for transient failures, dead-letter handling for persistent failures, rate limiting to prevent abuse, and idempotent submission to avoid duplicate processing.

ForgeQueue provides reliable, trackable, and horizontally scalable background execution through safe multi-worker coordination.

Architecture Overview

Inspiration

This project was inspired by observing submission queues on competitive programming platforms like Codeforces during high-traffic contests.

Submissions often remain in an “in queue” state before evaluation, highlighting real-world issues such as queue backlogs, fairness, and increased processing latency under load.

This led to a deeper exploration of how distributed systems manage asynchronous workloads, handle concurrency, and maintain fault tolerance — ultimately resulting in the design of ForgeQueue.

Tech Stack

Backend

Java 17
Spring Boot 3.5
Spring Cloud Gateway
Hibernate / JPA
PostgreSQL 15
Redis 7

Reliability & Resilience

Atomic row-level leasing (SELECT FOR UPDATE SKIP LOCKED)
30s visibility timeout
Exponential backoff (5--300s)
±20% retry jitter
Dead-letter handling
Resilience4j (Circuit Breaker + Short-Term Retry)

Infrastructure & DevOps

Docker (multi-stage builds)
Docker Compose
GitHub Actions (CI/CD)
Docker Hub (image registry)
Testcontainers (integration testing)

API Documentation

OpenAPI (Swagger), aggregated via Gateway

Core Features

1️⃣ Asynchronous & Idempotent Job Submission

Immediate job acceptance with jobId (non-blocking API)
Duplicate-safe via composite constraint (user_id, idempotency_key)
Transaction-safe under concurrent submissions

2️⃣ Safe Distributed Job Processing

Atomic job leasing using:
```
SELECT ... FOR UPDATE SKIP LOCKED
```
Ensures a job is processed by only one worker
Enables safe multi-worker execution without coordination
Horizontally Scalable -- stateless workers scale independently
No central coordinator required & System scales linearly by adding more workers

3️⃣ Visibility Timeout & Crash Recovery

Jobs in PROCESSING are leased with expiry (lease_expires_at)
If a worker crashes, lease expires and job becomes eligible again
Guarantees no job remains permanently stuck

4️⃣ Retry Strategy

Short-Term Retry (Resilience4j)

Millisecond-level retries
Circuit breaker protection
Protects downstream systems

Long-Term Retry (Queue Logic)

Exponential backoff (5--300 seconds)
±20% jitter to prevent retry storms
Dead-letter transition after max attempts

5️⃣ Multi-Tenant Fairness

Gateway-Level Rate Limiting

Redis-backed RedisRateLimiter (token bucket)
Header based Per-user limiting , fallback to IP
Enforced consistently across instances using Redis

Worker-Level Concurrency Throttling

Limits number of active jobs per user
Redis TTL-based counters prevent stale locks on crashes
Ensures no single user can saturate worker capacity

6️⃣ Distributed Correctness Validation

System correctness validated under real infrastructure using Testcontainers

Validated properties:

Safe concurrent job leasing using SELECT FOR UPDATE SKIP LOCKED
No duplicate job leasing across concurrent worker instances
Automatic job retry after worker crash (visibility timeout recovery)
Idempotent job submission under concurrent retries

All tests run against real PostgreSQL and Redis containers and execute automatically in CI.

7️⃣ Observability & Execution Insights

On success:

Stores result_payload
Stores completed_at

On failure:

Stores last_error_message
Stores last_error_stacktrace
Stores failed_at on dead-letter transition

Provides traceable execution history and failure diagnostics.

8️⃣ Performance & Polling Optimization

Indexed polling on (status, next_run_at, priority)
Indexed lease expiry lookup
Short polling interval with bounded batch size
Optimized for high-concurrency worker coordination

Reduces lock contention and prevents sequential scan degradation.

9️⃣Containerization

Multi-stage Docker builds
Separate images for Core and Gateway
Docker Compose orchestration
Healthchecks enabled

Images are automatically published to Docker Hub on merge to main.

🔟 CI/CD Pipeline

Continuous Integration (CI)

On every push and pull request: - Builds multi-module Maven project - Runs integration tests (Testcontainers) - Validates distributed behavior - Builds Docker images

Continuous Delivery (CD)

On merge to main branch: - Logs into Docker Hub via GitHub Secrets - Builds production images - Tags images as latest - Pushes images automatically to Docker Hub

docker.io/<username>{=html}/forgequeue-core:latest\
docker.io/<username>{=html}/forgequeue-gateway:latest

Running Locally

1️⃣ Build Project

mvn clean install

2️⃣ Start Services

docker compose up --build

3️⃣ Access APIs

Core:

http://localhost:8080

Gateway (Public API + Swagger):

http://localhost:8081/swagger-ui.html

References

AWS Architecture Blog — Exponential Backoff and Jitter
https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
AWS Builders Library — Timeouts, Retries and Backoff with Jitter
https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/
Resilience4j Documentation (Retry & Circuit Breaker)
https://resilience4j.readme.io/docs
PostgreSQL Documentation — Row-level locking (FOR UPDATE SKIP LOCKED)
https://www.postgresql.org/docs/current/sql-select.html
RabbitMQ Documentation — Dead Letter Queues
https://www.rabbitmq.com/dlx.html
Spring Framework — Scheduling
https://docs.spring.io/spring-framework/reference/integration/scheduling.html
Spring Data JPA Documentation
https://docs.spring.io/spring-data/jpa/docs/current/reference/html/
Spring Cloud Gateway — Redis Rate Limiter
https://docs.spring.io/spring-cloud-gateway/docs/current/reference/html/#redis-rate-limiter
Testcontainers — Integration testing library for managing containerized dependencies during test.

https://java.testcontainers.org/

Future Improvements

Tier-Based Scheduling – Priority handling for paid users with fairness control
Job Notifications – Email/webhook callbacks with retries
Audit Logging – Track job state transitions
Metrics & Monitoring – Throughput, failures, alerting (e.g., Prometheus)
Worker Heartbeat – Extend leases for long-running jobs

Load Testing

Performed load testing using Apache JMeter on the POST /api/jobs endpoint to validate system stability under concurrent traffic.

Simulated approximately 200 requests/sec in a local single-node environment, observing controlled degradation via rate limiting
Rate limiting behavior (HTTP 429) verified under burst traffic scenarios
Stable database connection pool utilization observed (HikariCP)
No crashes, deadlocks, or job duplication detected during test runs

Testing conducted on a single-machine setup; distributed load generation and multi-node validation planned for future scaling evaluation.

Conclusion

ForgeQueue is designed as a backend systems showcase --- demonstrating distributed coordination, concurrency control, resilience engineering, and production-grade architectural decisions.

Built for learning, system design mastery, and real-world backend engineering.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/workflows		.github/workflows
forgequeue-core		forgequeue-core
forgequeue-gateway		forgequeue-gateway
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
pom.xml		pom.xml

Folders and files

Latest commit

History

Repository files navigation

ForgeQueue

Distributed Job Processing Platform (Horizontally Scalable by Design)

Architecture Overview

Inspiration

Tech Stack

Backend

Reliability & Resilience

Infrastructure & DevOps

API Documentation

Core Features

1️⃣ Asynchronous & Idempotent Job Submission

2️⃣ Safe Distributed Job Processing

3️⃣ Visibility Timeout & Crash Recovery

4️⃣ Retry Strategy

Short-Term Retry (Resilience4j)

Long-Term Retry (Queue Logic)

5️⃣ Multi-Tenant Fairness

Gateway-Level Rate Limiting

Worker-Level Concurrency Throttling

6️⃣ Distributed Correctness Validation

7️⃣ Observability & Execution Insights

8️⃣ Performance & Polling Optimization

9️⃣Containerization

🔟 CI/CD Pipeline

Continuous Integration (CI)

Continuous Delivery (CD)

Running Locally

1️⃣ Build Project

2️⃣ Start Services

3️⃣ Access APIs

References

Future Improvements

Load Testing

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages