Skip to content

fix: take flaky_fail_count into account for total tests in new TA impl #275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

joseph-sentry
Copy link
Contributor

in the old TA implementation the possible outcomes were: pass, fail, and skip.
flaky failures were a subset of fails which meant we shouldn't include the
flaky_fail_count in the total tests calculation

In the new implementation that's not the case, flakiness is represented by the
outcome of a testrun, which means that we have to count flaky fails in the total
tests

This commit also does some light refactoring of the polars aggregation
expressions for simplicity, and fixes a bug where we weren't running fill_nan
when it was needed.

@codecov-notifications
Copy link

codecov-notifications bot commented Jul 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Copy link

codecov bot commented Jul 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.33%. Comparing base (0e2bf73) to head (9626f7d).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #275      +/-   ##
==========================================
+ Coverage   94.31%   94.33%   +0.01%     
==========================================
  Files        1229     1229              
  Lines       45293    45296       +3     
  Branches     1448     1448              
==========================================
+ Hits        42720    42729       +9     
+ Misses       2269     2263       -6     
  Partials      304      304              
Flag Coverage Δ
apiunit 96.52% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

in the old TA implementation the possible outcomes were: pass, fail, and skip.
flaky failures were a subset of fails which meant we shouldn't include the
flaky_fail_count in the total tests calculation

In the new implementation that's not the case, flakiness is represented by the
outcome of a testrun, which means that we have to count flaky fails in the total
tests

This commit also does some light refactoring of the polars aggregation
expressions for simplicity, and fixes a bug where we weren't running fill_nan
when it was needed.
@joseph-sentry joseph-sentry force-pushed the joseph/fix-ta-aggregate branch from db169bf to 9626f7d Compare July 9, 2025 17:35
@joseph-sentry joseph-sentry enabled auto-merge July 9, 2025 17:41
.top_k(min(100, max(table.height // 20, 1)))
.sum()
).alias("total_slow_tests"),
pl.lit(num_slow_tests).alias("total_slow_tests"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cursor what does .lit do here

@@ -63,11 +68,11 @@ def dedup_table(table: pl.DataFrame) -> pl.DataFrame:
.agg(
pl.col("testsuite").alias("testsuite"),
pl.col("flags").explode().unique().alias("flags"),
failure_rate_expr.fill_nan(0).alias("failure_rate"),
flake_rate_expr.fill_nan(0).alias("flake_rate"),
failure_rate_expr.alias("failure_rate"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DId the old approach not do the same thing? Or did you just want everything on the same line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the former, I kinda thought the other way was more readable

@@ -93,6 +98,17 @@ def _has_commits_before_cutoff(repoid: int) -> bool:
).exists()


@lru_cache(maxsize=1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 1000? Is the default of 128 not good enough for us?

Copy link
Contributor

@ajay-sentry ajay-sentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come the test results got swapped around in that test? Couple other small things but nothing major

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants