Distributed Task Queue (Python)

A minimal distributed task queue system implemented in Python, where a Master process distributes jobs to Worker nodes using Sockets or REST, with retry logic, job status tracking, and concurrency support.

This project is designed as an educational and architectural reference for distributed systems concepts rather than a production replacement for systems like Celery or Kafka.

How to Run

# setup env
python -m venv .env

# activate env
.env\Scripts\active # Windows
# or
source .env/bin/activate

# run from project root
pip install -e .

# Run example
python examples/simple_task.py

📌 Features

Master–Worker architecture
Distributed job execution
REST / Socket-based communication
Job lifecycle tracking
Retry logic with configurable limits
Worker health monitoring (heartbeat-based)
Concurrent task execution
Fault-tolerant job reassignment
Clean separation of concerns

🏗 Architecture Overview

+-------------+        REST / TCP        +-------------+
|             |  -------------------->  |             |
|   Master    |  <--------------------  |   Worker    |
|             |     Status Updates       |   Node      |
+-------------+                          +-------------+
       |
       |  Job State Tracking
       v
+------------------+
| Job Repository   |
| (In-Memory / DB) |
+------------------+

🧠 Core Concepts

Master Process

Responsible for:

Accepting job submissions
Scheduling and dispatching jobs
Tracking job states
Retrying failed jobs
Monitoring worker health

Worker Nodes

Responsible for:

Registering with the Master
Fetching tasks
Executing jobs concurrently
Reporting results and failures
Sending periodic heartbeats

🔄 Job Lifecycle

Each job transitions through well-defined states:

SUBMITTED
   ↓
QUEUED
   ↓
ASSIGNED
   ↓
RUNNING
   ↓
COMPLETED
   ↓
FAILED ──→ RETRY_QUEUED (if retries remain)
   ↓
DEAD_LETTER (max retries exceeded)

🔁 Retry Logic

Jobs are retried on:
- Worker failure
- Task execution failure
- Worker heartbeat timeout
Retry attempts are capped
Backoff strategy can be configured
Failed jobs beyond retry limit are moved to a dead-letter queue

⚙ Concurrency Model

Master

Handles multiple workers concurrently
Thread-safe job state transitions
Supports async or multi-threaded execution

Worker

Executes multiple tasks in parallel
Uses thread or process pools
Task failures are isolated

📡 Communication

Supported communication styles:

REST (HTTP + JSON)
Raw TCP sockets

Typical interactions:

Job submission
Task assignment
Result reporting
Heartbeat monitoring

🧪 Fault Tolerance

Worker crash detection via heartbeat timeout
Automatic reassignment of in-progress jobs
Network failure recovery
Idempotent job handling (at-least-once delivery)

📊 Observability

Job state transitions
Worker registration / removal
Failure and retry metrics
Execution logs for debugging

🔐 Security Considerations

Token-based authentication (optional)
TLS support for communication
Job payload validation
Worker identity verification

🚀 Scalability

Horizontally scalable worker nodes
Stateless workers
Pluggable job storage backend
Supports high-throughput task dispatch

🎯 Use Cases

Background job processing
Distributed batch computation
Microservice task orchestration
Learning distributed systems internals
Interview / system design demonstrations

📦 Project Status

Development Status: Alpha This project is intended for:

Learning
Prototyping
Architectural reference

Not recommended for production use without additional hardening.

🛠 Future Enhancements

Job priorities
Delayed / scheduled jobs
DAG-based workflows
Exactly-once semantics
Persistent storage backend
Web-based monitoring dashboard

📜 License

This project is licensed under the GNU General Public License v3.0 or later (GPL-3.0-or-later).

See the LICENSE file for details.

👤 Author

Developer Jarvis (Pen Name) GitHub: https://github.com/DeveloperJarvis

💡 Notes

This project intentionally avoids heavy dependencies to expose the core mechanics of distributed task execution clearly and transparently.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cli		cli
common		common
config		config
docs		docs
examples		examples
master		master
master_async		master_async
networking		networking
networking_async		networking_async
storage		storage
tests		tests
utils		utils
worker		worker
worker_async		worker_async
.gitignore		.gitignore
LICENSE		LICENSE
LLD.md		LLD.md
Project_structure.md		Project_structure.md
README.md		README.md
code.txt		code.txt
create_structure.bat		create_structure.bat
dump_project_code.py		dump_project_code.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Task Queue (Python)

How to Run

📌 Features

🏗 Architecture Overview

🧠 Core Concepts

Master Process

Worker Nodes

🔄 Job Lifecycle

🔁 Retry Logic

⚙ Concurrency Model

Master

Worker

📡 Communication

🧪 Fault Tolerance

📊 Observability

🔐 Security Considerations

🚀 Scalability

🎯 Use Cases

📦 Project Status

🛠 Future Enhancements

📜 License

👤 Author

💡 Notes

About

Uh oh!

Releases

Packages

Languages

License

DeveloperJarvis/distributed_task_queue

Folders and files

Latest commit

History

Repository files navigation

Distributed Task Queue (Python)

How to Run

📌 Features

🏗 Architecture Overview

🧠 Core Concepts

Master Process

Worker Nodes

🔄 Job Lifecycle

🔁 Retry Logic

⚙ Concurrency Model

Master

Worker

📡 Communication

🧪 Fault Tolerance

📊 Observability

🔐 Security Considerations

🚀 Scalability

🎯 Use Cases

📦 Project Status

🛠 Future Enhancements

📜 License

👤 Author

💡 Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages