A minimal distributed task queue system implemented in Python, where a Master process distributes jobs to Worker nodes using Sockets or REST, with retry logic, job status tracking, and concurrency support.
This project is designed as an educational and architectural reference for distributed systems concepts rather than a production replacement for systems like Celery or Kafka.
# setup env
python -m venv .env
# activate env
.env\Scripts\active # Windows
# or
source .env/bin/activate
# run from project root
pip install -e .
# Run example
python examples/simple_task.py- Master–Worker architecture
- Distributed job execution
- REST / Socket-based communication
- Job lifecycle tracking
- Retry logic with configurable limits
- Worker health monitoring (heartbeat-based)
- Concurrent task execution
- Fault-tolerant job reassignment
- Clean separation of concerns
+-------------+ REST / TCP +-------------+
| | --------------------> | |
| Master | <-------------------- | Worker |
| | Status Updates | Node |
+-------------+ +-------------+
|
| Job State Tracking
v
+------------------+
| Job Repository |
| (In-Memory / DB) |
+------------------+
Responsible for:
- Accepting job submissions
- Scheduling and dispatching jobs
- Tracking job states
- Retrying failed jobs
- Monitoring worker health
Responsible for:
- Registering with the Master
- Fetching tasks
- Executing jobs concurrently
- Reporting results and failures
- Sending periodic heartbeats
Each job transitions through well-defined states:
SUBMITTED
↓
QUEUED
↓
ASSIGNED
↓
RUNNING
↓
COMPLETED
↓
FAILED ──→ RETRY_QUEUED (if retries remain)
↓
DEAD_LETTER (max retries exceeded)
-
Jobs are retried on:
- Worker failure
- Task execution failure
- Worker heartbeat timeout
-
Retry attempts are capped
-
Backoff strategy can be configured
-
Failed jobs beyond retry limit are moved to a dead-letter queue
- Handles multiple workers concurrently
- Thread-safe job state transitions
- Supports async or multi-threaded execution
- Executes multiple tasks in parallel
- Uses thread or process pools
- Task failures are isolated
Supported communication styles:
- REST (HTTP + JSON)
- Raw TCP sockets
Typical interactions:
- Job submission
- Task assignment
- Result reporting
- Heartbeat monitoring
- Worker crash detection via heartbeat timeout
- Automatic reassignment of in-progress jobs
- Network failure recovery
- Idempotent job handling (at-least-once delivery)
- Job state transitions
- Worker registration / removal
- Failure and retry metrics
- Execution logs for debugging
- Token-based authentication (optional)
- TLS support for communication
- Job payload validation
- Worker identity verification
- Horizontally scalable worker nodes
- Stateless workers
- Pluggable job storage backend
- Supports high-throughput task dispatch
- Background job processing
- Distributed batch computation
- Microservice task orchestration
- Learning distributed systems internals
- Interview / system design demonstrations
Development Status: Alpha This project is intended for:
- Learning
- Prototyping
- Architectural reference
Not recommended for production use without additional hardening.
- Job priorities
- Delayed / scheduled jobs
- DAG-based workflows
- Exactly-once semantics
- Persistent storage backend
- Web-based monitoring dashboard
This project is licensed under the GNU General Public License v3.0 or later (GPL-3.0-or-later).
See the LICENSE file for details.
Developer Jarvis (Pen Name) GitHub: https://github.com/DeveloperJarvis
This project intentionally avoids heavy dependencies to expose the core mechanics of distributed task execution clearly and transparently.