Skip to content

fix(pyamber): allow worker state to transit READY -> COMPLETED#5214

Merged
mengw15 merged 2 commits into
apache:mainfrom
mengw15:fix/5197-allow-worker-ready-to-completed
May 26, 2026
Merged

fix(pyamber): allow worker state to transit READY -> COMPLETED#5214
mengw15 merged 2 commits into
apache:mainfrom
mengw15:fix/5197-allow-worker-ready-to-completed

Conversation

@mengw15
Copy link
Copy Markdown
Contributor

@mengw15 mengw15 commented May 25, 2026

What changes were proposed in this PR?

A Python UDF worker whose upstream produces zero tuples receives an EndChannel marker before any data, so it never visits RUNNING. When _process_end_channel calls complete(), the state machine in Context rejected READY → COMPLETED and the worker died with InvalidTransitionException, leaving the downstream operator stuck and the workflow hung.

The Scala-side WorkerStateManager already lists COMPLETED in READY's allowed targets — this is a Python ↔ Scala parity drift. Add WorkerState.COMPLETED to the Python READY set.

Also lift the state-transition graph out of Context.__init__ into a module-level WORKER_STATE_TRANSITIONS constant so the test fixture can import it (single source of truth — the previous fixture independently duplicated the graph, which is what masked the parity gap from existing tests).

Any related issues, documentation, discussions?

Closes #5197.

How was this PR tested?

Added a regression test in test_state_manager.py covering READY → COMPLETED. Also manually verified the issue's reproducer workflow now completes; previously hung with the worker stuck in READY.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-7)

A Python UDF worker whose upstream produces zero tuples receives an
EndChannel marker before any data, so it never visits RUNNING. When
`_process_end_channel` then calls `complete()`, the state machine in
`Context` rejected the transition because READY's allowed targets were
`{PAUSED, RUNNING}`, and the worker thread died with
`InvalidTransitionException: Cannot transit from READY to COMPLETED`.
The downstream operator stayed in READY and the workflow hung.

The Scala-side WorkerStateManager already includes COMPLETED in
READY's allowed targets; the Python state graph had drifted out of
sync. Add `WorkerState.COMPLETED` to the READY set so the two stay
aligned and zero-input workers can complete cleanly.

Lift the state-transition graph out of `Context.__init__` into a
module-level `WORKER_STATE_TRANSITIONS` constant so the test fixture
can import it (single source of truth) and so the graph is built once
per process instead of once per Context.

Closes apache#5197.
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Yicong-Huang Yicong-Huang changed the title fix(python): allow worker state to transit READY -> COMPLETED fix(pyamber): allow worker state to transit READY -> COMPLETED May 25, 2026
@Yicong-Huang
Copy link
Copy Markdown
Contributor

let's call it pyamber, because we have other components like pybuilder as well. calling it python is a bit too general.

@Yicong-Huang
Copy link
Copy Markdown
Contributor

Yicong-Huang commented May 25, 2026

also it will be ideal to simplify pr description. it is a bit too detailed. Especially on the "How was this PR tested?" part

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.33%. Comparing base (79a1d17) to head (648d89c).

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #5214   +/-   ##
=========================================
  Coverage     48.33%   48.33%           
  Complexity     2342     2342           
=========================================
  Files          1042     1042           
  Lines         39973    39974    +1     
  Branches       4251     4251           
=========================================
+ Hits          19322    19323    +1     
  Misses        19507    19507           
  Partials       1144     1144           
Flag Coverage Δ *Carryforward flag
access-control-service 39.53% <ø> (ø) Carriedforward from 79a1d17
agent-service 33.76% <ø> (ø) Carriedforward from 79a1d17
amber 51.09% <ø> (ø) Carriedforward from 79a1d17
computing-unit-managing-service 0.00% <ø> (ø) Carriedforward from 79a1d17
config-service 0.00% <ø> (ø) Carriedforward from 79a1d17
file-service 32.18% <ø> (ø) Carriedforward from 79a1d17
frontend 40.02% <ø> (ø) Carriedforward from 79a1d17
python 90.51% <100.00%> (+<0.01%) ⬆️
workflow-compiling-service 56.81% <ø> (ø) Carriedforward from 79a1d17

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mengw15 mengw15 added this pull request to the merge queue May 25, 2026
Merged via the queue into apache:main with commit f89c07c May 26, 2026
19 checks passed
@mengw15 mengw15 deleted the fix/5197-allow-worker-ready-to-completed branch May 26, 2026 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A workflow with two python udf operators doesn't complete the execution.

3 participants