Skip to content

fix(providers/standard): add response_timeout to HITLOperator to prevent race with execution_timeout#63475

Open
antonio-mello-ai wants to merge 3 commits intoapache:mainfrom
antonio-mello-ai:fix/hitl-response-timeout
Open

fix(providers/standard): add response_timeout to HITLOperator to prevent race with execution_timeout#63475
antonio-mello-ai wants to merge 3 commits intoapache:mainfrom
antonio-mello-ai:fix/hitl-response-timeout

Conversation

@antonio-mello-ai
Copy link
Contributor

Summary

  • Add a dedicated response_timeout parameter to HITLOperator that controls how long the HITLTrigger waits for a human response
  • This separates the human-response wait window from BaseOperator.execution_timeout, which wraps the entire execute() method and can kill the task before defer() is reached
  • For backward compatibility, if execution_timeout is set without response_timeout, it is automatically migrated with a deprecation warning and execution_timeout is cleared

Root Cause

HITLOperator reused BaseOperator.execution_timeout for two conflicting purposes:

  1. BaseOperator behavior: wraps execute() in a timeout that kills the task
  2. HITL trigger timeout: passed to HITLTrigger as the human response wait window

When execution_timeout is short (or notifiers are slow), the BaseOperator timeout fires before self.defer() is reached, killing the task with AirflowTaskTimeout instead of deferring.

Test Plan

  • New test: response_timeout correctly passed to trigger
  • New test: execution_timeout backward compatibility emits deprecation warning and migrates
  • New test: both timeouts work together when execution_timeout > response_timeout
  • Updated existing timeout tests to use response_timeout
  • All 78 existing tests pass (1 skipped)
  • All pre-commit hooks pass (prek)

Closes #55866

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

@eladkal eladkal requested a review from Lee-W March 12, 2026 20:33
@antonio-mello-ai antonio-mello-ai force-pushed the fix/hitl-response-timeout branch 2 times, most recently from abb5062 to 9bf0d68 Compare March 13, 2026 23:04
@Lee-W
Copy link
Member

Lee-W commented Mar 14, 2026

I'm not convinced this is required. From worker to triggerer should'nt take that long and a short execution_time usually does not make sense

@antonio-mello-ai
Copy link
Contributor Author

Thanks for the review, @Lee-W.

You're right that worker-to-triggerer is normally fast, and short execution_timeout values are uncommon. But the core issue here isn't about speed — it's a semantic conflict:

execution_timeout on BaseOperator wraps the entire execute() method and kills the task on expiry. HITLOperator reuses that same parameter to compute the trigger's response deadline. These are two fundamentally different concerns:

  1. Task execution guard (how long execute() can run before being killed)
  2. Human response window (how long the trigger waits for input after deferral)

When notifiers are involved (e.g., SmtpNotifier), the time before defer() is not negligible. With execution_timeout=timedelta(seconds=30) and a slow SMTP server, the task can be killed by BaseOperator's timeout before it even defers — which defeats the entire purpose of HITL.

Even without notifiers, the current design means users can't independently control "how long my task is allowed to run" vs "how long to wait for human input." These are separate knobs that shouldn't be coupled.

The fix is backward-compatible (auto-migrates execution_timeoutresponse_timeout with deprecation warning), so existing users aren't affected.

Happy to discuss alternative approaches if you see a simpler path, but I believe the semantic separation is the right fix here.

antonio-mello-ai and others added 2 commits March 15, 2026 10:29
…ent race with execution_timeout

HITLOperator reused BaseOperator's execution_timeout for two conflicting
purposes: (1) as a task execution guard that kills the task, and (2) as
the human response wait window passed to HITLTrigger. When short, the
BaseOperator timeout fires before defer() is reached, killing the task
instead of deferring.

Add a dedicated response_timeout parameter for the trigger wait window.
For backward compatibility, if execution_timeout is set without
response_timeout, it is migrated with a deprecation warning and
execution_timeout is cleared to prevent the race condition.

Closes apache#55866

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…timeout

The example DAG was using execution_timeout which triggers the new
deprecation warning during DAG import, causing DagBag import errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@antonio-mello-ai antonio-mello-ai force-pushed the fix/hitl-response-timeout branch from 9bf0d68 to 20ef120 Compare March 15, 2026 13:29
@Lee-W
Copy link
Member

Lee-W commented Mar 17, 2026

The task is basically waiting for a response, so I think it's okay to use execution_timeout. The notifier part is somewhat rare, but could be. thoughts ?@jason810496 @guan404ming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HITLOperator misuses execution_timeout, causing premature timeout before deferral to HITLTrigger

3 participants