-
Notifications
You must be signed in to change notification settings - Fork 3
fix: take flaky_fail_count into account for total tests in new TA impl #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅ ✅ All tests successful. No failed tests found. 📢 Thoughts on this report? Let us know! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #275 +/- ##
==========================================
+ Coverage 94.31% 94.33% +0.01%
==========================================
Files 1229 1229
Lines 45293 45296 +3
Branches 1448 1448
==========================================
+ Hits 42720 42729 +9
+ Misses 2269 2263 -6
Partials 304 304
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
in the old TA implementation the possible outcomes were: pass, fail, and skip. flaky failures were a subset of fails which meant we shouldn't include the flaky_fail_count in the total tests calculation In the new implementation that's not the case, flakiness is represented by the outcome of a testrun, which means that we have to count flaky fails in the total tests This commit also does some light refactoring of the polars aggregation expressions for simplicity, and fixes a bug where we weren't running fill_nan when it was needed.
db169bf
to
9626f7d
Compare
.top_k(min(100, max(table.height // 20, 1))) | ||
.sum() | ||
).alias("total_slow_tests"), | ||
pl.lit(num_slow_tests).alias("total_slow_tests"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cursor what does .lit do here
@@ -63,11 +68,11 @@ def dedup_table(table: pl.DataFrame) -> pl.DataFrame: | |||
.agg( | |||
pl.col("testsuite").alias("testsuite"), | |||
pl.col("flags").explode().unique().alias("flags"), | |||
failure_rate_expr.fill_nan(0).alias("failure_rate"), | |||
flake_rate_expr.fill_nan(0).alias("flake_rate"), | |||
failure_rate_expr.alias("failure_rate"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DId the old approach not do the same thing? Or did you just want everything on the same line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the former, I kinda thought the other way was more readable
@@ -93,6 +98,17 @@ def _has_commits_before_cutoff(repoid: int) -> bool: | |||
).exists() | |||
|
|||
|
|||
@lru_cache(maxsize=1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 1000? Is the default of 128 not good enough for us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come the test results got swapped around in that test? Couple other small things but nothing major
in the old TA implementation the possible outcomes were: pass, fail, and skip.
flaky failures were a subset of fails which meant we shouldn't include the
flaky_fail_count in the total tests calculation
In the new implementation that's not the case, flakiness is represented by the
outcome of a testrun, which means that we have to count flaky fails in the total
tests
This commit also does some light refactoring of the polars aggregation
expressions for simplicity, and fixes a bug where we weren't running fill_nan
when it was needed.