Skip to content

DeveloperJarvis/distributed_task_queue

Repository files navigation

Distributed Task Queue (Python)

A minimal distributed task queue system implemented in Python, where a Master process distributes jobs to Worker nodes using Sockets or REST, with retry logic, job status tracking, and concurrency support.

This project is designed as an educational and architectural reference for distributed systems concepts rather than a production replacement for systems like Celery or Kafka.


How to Run

# setup env
python -m venv .env

# activate env
.env\Scripts\active # Windows
# or
source .env/bin/activate

# run from project root
pip install -e .

# Run example
python examples/simple_task.py

📌 Features

  • Master–Worker architecture
  • Distributed job execution
  • REST / Socket-based communication
  • Job lifecycle tracking
  • Retry logic with configurable limits
  • Worker health monitoring (heartbeat-based)
  • Concurrent task execution
  • Fault-tolerant job reassignment
  • Clean separation of concerns

🏗 Architecture Overview

+-------------+        REST / TCP        +-------------+
|             |  -------------------->  |             |
|   Master    |  <--------------------  |   Worker    |
|             |     Status Updates       |   Node      |
+-------------+                          +-------------+
       |
       |  Job State Tracking
       v
+------------------+
| Job Repository   |
| (In-Memory / DB) |
+------------------+

🧠 Core Concepts

Master Process

Responsible for:

  • Accepting job submissions
  • Scheduling and dispatching jobs
  • Tracking job states
  • Retrying failed jobs
  • Monitoring worker health

Worker Nodes

Responsible for:

  • Registering with the Master
  • Fetching tasks
  • Executing jobs concurrently
  • Reporting results and failures
  • Sending periodic heartbeats

🔄 Job Lifecycle

Each job transitions through well-defined states:

SUBMITTED
   ↓
QUEUED
   ↓
ASSIGNED
   ↓
RUNNING
   ↓
COMPLETED
   ↓
FAILED ──→ RETRY_QUEUED (if retries remain)
   ↓
DEAD_LETTER (max retries exceeded)

🔁 Retry Logic

  • Jobs are retried on:

    • Worker failure
    • Task execution failure
    • Worker heartbeat timeout
  • Retry attempts are capped

  • Backoff strategy can be configured

  • Failed jobs beyond retry limit are moved to a dead-letter queue


⚙ Concurrency Model

Master

  • Handles multiple workers concurrently
  • Thread-safe job state transitions
  • Supports async or multi-threaded execution

Worker

  • Executes multiple tasks in parallel
  • Uses thread or process pools
  • Task failures are isolated

📡 Communication

Supported communication styles:

  • REST (HTTP + JSON)
  • Raw TCP sockets

Typical interactions:

  • Job submission
  • Task assignment
  • Result reporting
  • Heartbeat monitoring

🧪 Fault Tolerance

  • Worker crash detection via heartbeat timeout
  • Automatic reassignment of in-progress jobs
  • Network failure recovery
  • Idempotent job handling (at-least-once delivery)

📊 Observability

  • Job state transitions
  • Worker registration / removal
  • Failure and retry metrics
  • Execution logs for debugging

🔐 Security Considerations

  • Token-based authentication (optional)
  • TLS support for communication
  • Job payload validation
  • Worker identity verification

🚀 Scalability

  • Horizontally scalable worker nodes
  • Stateless workers
  • Pluggable job storage backend
  • Supports high-throughput task dispatch

🎯 Use Cases

  • Background job processing
  • Distributed batch computation
  • Microservice task orchestration
  • Learning distributed systems internals
  • Interview / system design demonstrations

📦 Project Status

Development Status: Alpha This project is intended for:

  • Learning
  • Prototyping
  • Architectural reference

Not recommended for production use without additional hardening.


🛠 Future Enhancements

  • Job priorities
  • Delayed / scheduled jobs
  • DAG-based workflows
  • Exactly-once semantics
  • Persistent storage backend
  • Web-based monitoring dashboard

📜 License

This project is licensed under the GNU General Public License v3.0 or later (GPL-3.0-or-later).

See the LICENSE file for details.


👤 Author

Developer Jarvis (Pen Name) GitHub: https://github.com/DeveloperJarvis


💡 Notes

This project intentionally avoids heavy dependencies to expose the core mechanics of distributed task execution clearly and transparently.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published