Skip to content

Fix LocalExecutor crash on non-picklable exceptions (e.g. httpx.HTTPStatusError)#64484

Closed
Pranaykarvi wants to merge 6 commits intoapache:mainfrom
Pranaykarvi:fix/localexecutor-unserializable-exception
Closed

Fix LocalExecutor crash on non-picklable exceptions (e.g. httpx.HTTPStatusError)#64484
Pranaykarvi wants to merge 6 commits intoapache:mainfrom
Pranaykarvi:fix/localexecutor-unserializable-exception

Conversation

@Pranaykarvi
Copy link
Copy Markdown
Contributor

Description

Closes #64476

When LocalExecutor runs a task in a subprocess, the result (including any
exception) is passed back to the scheduler via a multiprocessing.Queue.
Python serializes queue entries using pickle.

Some exceptions — such as httpx.HTTPStatusError from the httpx library —
are not pickle-safe. Their __init__ requires keyword arguments (request,
response) that cannot be reconstructed during deserialization. This causes:

TypeError: HTTPStatusError.__init__() missing 2 required keyword-only arguments: 'request' and 'response'

This exception propagates out of _read_results(), crashes the scheduler
loop, and takes down the entire scheduler pod. The only recovery is to
disable the offending DAG.

Root Cause

In _execute_work_in_subprocess, raw exception objects were placed directly
onto the result queue:

output.put((key, TaskInstanceState.FAILED, e))

Any exception whose class is not trivially picklable would cause
deserialization to fail on the receiving end.

Fix

Wrap the exception in a plain Exception before putting it on the queue,
preserving the original type name, message, and full traceback as a string:

safe_exc = Exception(f"{type(e).__name__}: {str(e)}\n{traceback.format_exc()}")
output.put((key, TaskInstanceState.FAILED, safe_exc))

This is applied to both the ExecuteTask and ExecuteCallback branches.
A plain Exception with a string message is always pickle-safe regardless
of the original exception type.

Impact

  • Scheduler no longer crashes when a task raises a non-picklable exception
  • Full debugging information (type, message, traceback) is preserved
  • Fix is minimal and does not affect any other executor behaviour
  • Applies to both task execution and callback execution paths

Testing

Reproduced the crash locally using a PythonOperator DAG that triggers an
httpx.HTTPStatusError (as reported in #64476). After this fix the
scheduler remains stable and the task is correctly marked as FAILED with
the full traceback visible in logs.

@boring-cyborg boring-cyborg bot added area:Executors-core LocalExecutor & SequentialExecutor area:providers provider:google Google (including GCP) related issues labels Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Executors-core LocalExecutor & SequentialExecutor area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Running Python DAG crashes Scheduler pod

1 participant