Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix QueuedLocalWorker crashing with EOFError #13215

Merged
merged 1 commit into from Dec 21, 2020

Conversation

RikHeijdens
Copy link
Contributor

LocalExecutor uses a multiprocessing.Queue to distribute tasks to the instances of QueuedLocalWorker. If for some reason LocalExecutor exits (e.g. because it encountered an unhandled exception), then each of the QueuedLocalWorker instances that it manages will also exit with an uncaught EOFError while trying to read from the task queue.

Because this causes a traceback to be logged for each of the workers that exits, this obfuscates the root cause of the issue, i.e. that the managing LocalExecutor terminated. By catching the EOFError in QueuedLocalWorker, logging an error and exiting gracefully we circumvent this problem, providing more clarity to the root cause of the issue to the Airflow administrator.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

LocalExecutor uses a multiprocessing.Queue to distribute tasks to the
instances of QueuedLocalWorker. If for some reason LocalExecutor exits
(e.g. because it encountered an unhandled exception), then each of the
QueuedLocalWorker instances that it manages will also exit while trying
to read from the task queue.

This obfuscates the root cause of the issue, i.e. that the LocalExecutor
terminated. By catching EOFError, logging an error and exiting gracefully
we circumvent this issue.
@boring-cyborg boring-cyborg bot added the area:Scheduler Scheduler or dag parsing Issues label Dec 21, 2020
@kaxil kaxil requested a review from ashb December 21, 2020 11:42
@ashb ashb modified the milestones: Airflow 2.0.1, Airflow 2.1 Dec 21, 2020
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Dec 21, 2020
@kaxil kaxil merged commit 484f95f into apache:master Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler Scheduler or dag parsing Issues full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants