-
Notifications
You must be signed in to change notification settings - Fork 17.2k
Resolve Beam Dataflow job id by name after launcher returns #67711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
evgeniy-b
wants to merge
1
commit into
apache:main
Choose a base branch
from
evgeniy-b:fix-beam-dataflow-job-id-resolve-by-name
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+129
−71
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evgeniy-b as I understand in case when users run in parallel 2 or more Jobs with the same name or on Dataflow the Job with this name already present than this code returns
Noneas JobID value, please correct me if I am wrong?In the current logic with callbacks the code parse Apache Beam logs for availability of JobID and when getting it then starts the waiting process in deferrable or non-deferable mode. It means that we always have unique Job ID.
This new logic looks for me as a breaking change because returns
Noneas JobID in case when in Dataflow the users have 2 or more Jobs with the same name. It is possible scenario for the most of our users because in Dataflow is impossible to remove finished Jobs the user can only archived it. And our_fetch_all_jobsmethod does not sort Jobs by finished or running and returns all Jobs with the same name.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me explain a bit how I arrived here. On an airflow cluster I maintain I noticed python beam jobs running with
deferrable=False, so I switched that flag to true to not waste worker resources. On the next day the jobs failed while transitioning to async triggers because their STDOUT didn't contain the job ID. In the sync mode a missing job ID doesn't prevent the task from succeeding:_DataflowJobsController.wait_for_donepollsself._refresh_jobs():airflow/providers/google/src/airflow/providers/google/cloud/hooks/dataflow.py
Lines 532 to 542 in a7174b5
_refresh_jobscallsself._get_current_jobs():airflow/providers/google/src/airflow/providers/google/cloud/hooks/dataflow.py
Lines 465 to 471 in a7174b5
_get_current_jobs— with no_job_id— callsself._fetch_jobs_by_prefix_name(self._job_name.lower()):airflow/providers/google/src/airflow/providers/google/cloud/hooks/dataflow.py
Lines 328 to 339 in a7174b5
_fetch_jobs_by_prefix_namecallsself._fetch_all_jobs()and returns every prefix-matched job (archived + running, no terminal-state filter):airflow/providers/google/src/airflow/providers/google/cloud/hooks/dataflow.py
Lines 460 to 463 in a7174b5
So today's sync path already silently picks up every prefix-matched job whenever the regex misses.
With default
append_job_name=Truethe job name will be unique and job ID will be retrieved.But you are right, it is a degradation: for jobs without unique names but printing out their IDs to console, the job ID will become missing.
I guess an alternative could be to replicate the sync mode's behavior in the async path which currently fails without
job_id. However it means that xcom and a link to the job will stay broken.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think job ID in output detection should be reverted. While it is awkward in principle, it is the only way (?) to reliably get ID when job names are not unique. Then name-based ID detection can be used as a fallback but only when
append_job_name=True. And if the trigger receives empty job ID it should fallback to polling status of all jobs matching the name (and not in terminal status).@MaksYermak what's your take on this?