[AIRFLOW-4401] SynchronizedQueue used where empty() is used. #5199
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Make sure you have checked all steps below.
Jira
Description
It is a known problem https://bugs.python.org/issue23582 that
multiprocessing.Queue empty() method is not reliable - sometimes it might
return True even if another process already put something in the queue.
This resulted in some of the tasks not picked up when sync() methods
were called (in AirflowKubernetesScheduler, LocalExecutor,
DagFileProcessor). This was less of a problem if the method was called in sync()
in tests and when graceful shutdown was executed (some tasks could be still
unprocessed while the shutdown occured).
All the cases impacted follow the same pattern now:
while not queue.empty():
res = queue.get()
....
This loop runs always in single (main) process so it is safe to run it this way -
there is no risk that some other process will retrieve the data from the queue in
between empty() and get().
Note that unlike in the standard multiprocessing.Queue, you cannot rely
on data being immediately available after empty() is False. You should be
prepared that subsequent get_nowait() raises Empty, or (better) use get()
to retrieve the data.
In all these cases overhead for inter-processing locking is negligible
comparing to the action executed (Parsing DAG, executing job)
so it appears it should be safe to merge the change.
Tests
No need. Lots of tests for that already (flaky ones).
Commits
Documentation
Code Quality
flake8