Skip to content

feat: worker discovery, lifecycle events, and coordination#34

Merged
pratyush618 merged 4 commits intomasterfrom
feat/worker-discovery
Mar 22, 2026
Merged

feat: worker discovery, lifecycle events, and coordination#34
pratyush618 merged 4 commits intomasterfrom
feat/worker-discovery

Conversation

@pratyush618
Copy link
Copy Markdown
Collaborator

Summary

  • Enhanced worker metadata: started_at, hostname, pid, pool_type columns on workers table. Operators can now see which machine/process each worker runs on.
  • Worker lifecycle events: WORKER_ONLINE (after registration), WORKER_OFFLINE (dead worker reaped), WORKER_UNHEALTHY (resource health degraded). Subscribe via queue.on_event().
  • Status transitions: Workers report active → draining → stopped status. Shutdown signal sets status to "draining" before drain timeout.
  • Reap visibility: reap_dead_workers now returns reaped worker IDs (was silent count). WORKER_OFFLINE events fired for each.
  • Health tracking: Heartbeat thread tracks resource health transitions and emits WORKER_UNHEALTHY on degradation.
  • Orphan rescue prep: list_claims_by_worker method added to all backends for future orphaned job rescue.

Test plan

  • cargo check all feature combos
  • cargo test --workspace — 43+ Rust tests pass
  • uv run python -m pytest tests/python/ -v — 352 pass, 9 skipped
  • uv run ruff check py_src/ tests/ — clean
  • uv run mypy py_src/taskito/ tests/python/ --no-incremental — 0 errors

Storage layer changes:
- Add started_at, hostname, pid, pool_type columns to workers table
- register_worker accepts hostname, pid, pool_type params
- reap_dead_workers returns Vec<String> (reaped worker IDs)
- New update_worker_status method for lifecycle transitions
- New list_claims_by_worker for orphan rescue
- All 3 backends (SQLite, Postgres, Redis) + delegate macro updated
- gethostname crate for hostname detection in PyO3 bindings
- Add WORKER_ONLINE, WORKER_OFFLINE, WORKER_UNHEALTHY event types
- Emit WORKER_ONLINE after registration in run_worker()
- Emit WORKER_OFFLINE for each reaped dead worker during heartbeat
- Track resource health transitions, emit WORKER_UNHEALTHY on degradation
- Set worker status to "draining" on shutdown signal
- Update event type test for new events
@pratyush618 pratyush618 merged commit d21baf5 into master Mar 22, 2026
10 checks passed
@pratyush618 pratyush618 deleted the feat/worker-discovery branch March 31, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant