Skip to content

🪲 BUG-#68: Fix busy-wait loop and enable multiprocessing on macOS via fork context#78

Merged
FernandoCelmer merged 8 commits intodevelopfrom
feature/68
Mar 26, 2026
Merged

🪲 BUG-#68: Fix busy-wait loop and enable multiprocessing on macOS via fork context#78
FernandoCelmer merged 8 commits intodevelopfrom
feature/68

Conversation

@FernandoCelmer
Copy link
Copy Markdown
Member

@FernandoCelmer FernandoCelmer commented Mar 26, 2026

Description

  • dotflow/core/workflow.py: Replaced multiprocessing.Process/Queue with get_context("fork") on POSIX and get_context("spawn") on Windows; removed is_darwin() platform checks and macOS fallback warnings; replaced busy-wait polling loop with blocking queue.get(timeout=0.1) catching queue.Empty; simplified SequentialGroup.run() by removing the intermediate thread layer; added _processes list for liveness tracking; added else fallback to mark tasks as TypeStatus.FAILED when a worker process terminates without reporting a result; imported TaskError for structured error wrapping
  • tests/core/test_workflow_deadlock.py: New test file — verifies that Parallel and SequentialGroup do not deadlock when a task fails before putting to the queue
  • tests/core/test_workflow_platform.py: New test file — verifies correct multiprocessing context selection per platform (fork on POSIX, spawn on Windows)
  • tests/core/test_workflow_parallel.py: Updated Queue type assertion to use fork context via get_context()
  • tests/core/test_workflow_sequential_group.py: Updated Queue type assertion to use fork context via get_context()
  • requirements.txt: Added requests dependency

Motivation and Context

The Parallel and SequentialGroup modes had two critical bugs:

  1. Busy-wait CPU spike: the original while not queue.empty(): queue.get() loop consumed 100% CPU while waiting for task results.
  2. macOS silent failures: Python 3.8+ defaulted to spawn on macOS, causing bound methods to fail silently in child processes — the original code worked around this with is_darwin() checks that degraded to sequential execution.

Additionally, if a worker process terminated without calling queue.put() (e.g., due to an unhandled exception not caught by Execution), the collection loop would deadlock indefinitely.

This PR fixes all three issues: forces fork context on POSIX (enabling macOS parallel mode), replaces busy-wait with a timeout-based queue.get(), adds process-liveness detection to break out of the collection loop safely, and marks unreported tasks as FAILED to ensure on_failure callbacks fire correctly.

Closes #68

Types of changes

  • Bug fix (change that fixes an issue)
  • New feature (change which adds functionality)
  • Documentation

Checklist

  • I have performed a self-review of my own code
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the CHANGELOG
  • I have updated the documentation accordingly

Copy link
Copy Markdown
Member Author

@FernandoCelmer FernandoCelmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Code Review

Code issues found: 2

See inline comments below.

Comment thread dotflow/core/workflow.py
Comment thread dotflow/core/workflow.py
@FernandoCelmer FernandoCelmer added the bug Something isn't working label Mar 26, 2026
Copy link
Copy Markdown
Member Author

@FernandoCelmer FernandoCelmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Code Review

Code issues found: 3

# Severity Comment
1 [Suggestion] Windows spawn — bound method pickling risk
2 [Blocking] SequentialGroup: broad exception + silent task masking
3 [Blocking] Parallel: broad exception + silent task masking

Comment thread dotflow/core/workflow.py
Comment thread dotflow/core/workflow.py
Comment thread dotflow/core/workflow.py
@FernandoCelmer FernandoCelmer merged commit 933f9a4 into develop Mar 26, 2026
6 checks passed
@FernandoCelmer FernandoCelmer deleted the feature/68 branch March 26, 2026 06:28
@FernandoCelmer FernandoCelmer self-assigned this Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallel mode: busy-wait polling loop causes 100% CPU usage

1 participant