Skip to content

Comments

Query TaskReschedule only if task is UP_FOR_RESCHEDULE#9087

Merged
turbaszek merged 6 commits intoapache:masterfrom
PolideaInternal:local-job-perf-improvement
Jun 9, 2020
Merged

Query TaskReschedule only if task is UP_FOR_RESCHEDULE#9087
turbaszek merged 6 commits intoapache:masterfrom
PolideaInternal:local-job-perf-improvement

Conversation

@turbaszek
Copy link
Member


Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Unit tests coverage for changes (not needed for documentation changes)
  • Target Github ISSUE in description if exists
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Jun 1, 2020
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimisation :)

@mik-laj
Copy link
Member

mik-laj commented Jun 1, 2020

@seelmann Can you look at it?

@seelmann
Copy link
Member

seelmann commented Jun 1, 2020

Makes perfect sense, +1.

Another optimization could be to not fetch all TaskReschedule objects but only the first one. Similar in ready_to_reschedule.py only the last one is needed. I wanted to change that long time ago but didn't find the time and then forgot...

@turbaszek
Copy link
Member Author

Another optimization could be to not fetch all TaskReschedule objects but only the first one. Similar in ready_to_reschedule.py only the last one is needed.

Sounds reasonable, thanks!

@turbaszek turbaszek requested a review from potiuk June 2, 2020 13:51
@turbaszek
Copy link
Member Author

@seelmann would like to take another look?

Copy link
Member

@seelmann seelmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

turbaszek and others added 2 commits June 3, 2020 09:54
Co-authored-by: Stefan Seelmann <mail@stefan-seelmann.de>

task_reschedules = TaskReschedule.find_for_task_instance(task_instance=ti)
if not task_reschedules:
task_reschedule = TaskReschedule.query_for_task_instance(task_instance=ti, descending=True).first()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
task_reschedule = TaskReschedule.query_for_task_instance(task_instance=ti, descending=True).first()
task_reschedule = (
TaskReschedule.query_for_task_instance(task_instance=ti, descending=True)
.with_entities(TR.reschedule_date)
.first()
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh does with_entites say "just select this one column? Nice!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows you to change the content of the parameter passed to the query method on an existing object. We already use it in the project. https://github.com/apache/airflow/pull/8729/files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


task_reschedules = TaskReschedule.find_for_task_instance(task_instance=ti)
if not task_reschedules:
task_reschedule = TaskReschedule.query_for_task_instance(task_instance=ti, descending=True).first()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pass on the same session object.

Suggested change
task_reschedule = TaskReschedule.query_for_task_instance(task_instance=ti, descending=True).first()
task_reschedule = (
TaskReschedule.query_for_task_instance(task_instance=ti, descending=True, session=session)
.with_entities(TR.reschedule_date)
.first()
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You missed Kamil's suggestion of .with_entites here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's there :)

@turbaszek
Copy link
Member Author

@ashb @mik-laj your suggestions are added

@turbaszek turbaszek requested review from ashb and mik-laj June 4, 2020 13:09
@mik-laj
Copy link
Member

mik-laj commented Jun 7, 2020

@turbaszek Should we merge it?

@turbaszek turbaszek merged commit b762763 into apache:master Jun 9, 2020
@turbaszek turbaszek deleted the local-job-perf-improvement branch June 9, 2020 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:performance area:Scheduler including HA (high availability) scheduler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants