Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce sla_miss metric #23402

Merged
merged 3 commits into from
Jul 26, 2022
Merged

Introduce sla_miss metric #23402

merged 3 commits into from
Jul 26, 2022

Conversation

Jorricks
Copy link
Contributor

@Jorricks Jorricks commented May 2, 2022

We have a metric for both sla callback failures and sla email failures.
However, we are not tracking when an SLA is not met.
This is crucial to know how many SLAs we had in total in statsd/prometheus.

Update: Forgot to install pre-commit :) Will fix it in a bit.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named {pr_number}.significant.rst, in newsfragments.

@@ -231,6 +233,7 @@ def test_dag_file_processor_sla_miss_doesnot_raise_integrity_error(self, dag_mak
.count()
)
assert sla_miss_count == 1
mock_stats_incr.assert_called_with('sla_missed')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super small nit: There are other tests that already mock the stats object (for testing other SLA related metrics) that you could add this line to, instead of adding a net-new mock object to this test. But it's not a big deal, still approving. Thanks for the commit :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I thought about this but I choose this test on purpose as this is one of the only ones(if not the only one) that creates the SLA by the scheduler code (instead of creating it inside the testing code).
Therefore I feel like this was the best way to do it.

@o-nikolas
Copy link
Contributor

Just looking through open PRs I've reviewed. Is there any development waiting for this one @Jorricks? Or is it just awaiting a review from a committer?

@potiuk
Copy link
Member

potiuk commented Jun 28, 2022

needs at least rebase to fix conflict

@eladkal
Copy link
Contributor

eladkal commented Jul 24, 2022

@Jorricks can you rebase and resolve conflicts?

@Jorricks
Copy link
Contributor Author

Hey,
Sorry I dropped the ball on this one. Had some other projects going on.
Rebased and fixed conflict :)

@potiuk
Copy link
Member

potiuk commented Jul 25, 2022

Rebased again @Jorricks - we had some "Werkzeug" drama this morning. But I am afraid some tests that failed will fail again and will need to be fixed :(

@Jorricks
Copy link
Contributor Author

@potiuk could you please help me out here?
I don't think the error I saw was actually related to my PR but I am not sure. Could you check to be sure?
I rebased my branch in the meantime to see if that fixes it.

@potiuk
Copy link
Member

potiuk commented Jul 26, 2022

Likely it was intermittent failure

@potiuk
Copy link
Member

potiuk commented Jul 26, 2022

MsSQL is not the most "stable" part of our CI.

@Jorricks
Copy link
Contributor Author

MsSQL is not the most "stable" part of our CI.

So should we merge this then?

@potiuk potiuk merged commit 7e631a9 into apache:main Jul 26, 2022
@potiuk
Copy link
Member

potiuk commented Jul 26, 2022

Why not :)

@ephraimbuddy ephraimbuddy added the type:new-feature Changelog: New Features label Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler Scheduler or dag parsing Issues kind:documentation type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants