FIX]: Fix false-positive SUCCESS status for commit tests with missing comparison rows#1119
Open
x15sr71 wants to merge 1 commit into
Open
FIX]: Fix false-positive SUCCESS status for commit tests with missing comparison rows#1119x15sr71 wants to merge 1 commit into
x15sr71 wants to merge 1 commit into
Conversation
7a8582c to
b276a67
Compare
b276a67 to
a6b5d2d
Compare
|
This was referenced Jun 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Please prefix your pull request with one of the following: [FEATURE] [FIX] [IMPROVEMENT].
In raising this pull request, I confirm the following (please check boxes):
My familiarity with the project is as follows (check one):
Dependency
PR #1118 must be merged first or deployed simultaneously. This PR calls
get_test_results()insideprogress_type_request. Without #1118, that function intermittently crashes with SQLAlchemy lazy-load failures on every commit test completion.Supersedes #1091
PR #1091 was raised two months ago for the same bug and remains open.
This PR supersedes it with the following differences:
from sqlalchemy.sql.functions import countimportupdate_build_badgeupdated to accept pre-computed results, eliminating a redundantget_test_results()call per completionPR #1091 is closed in favour of this one.
The bug
File:
mod_ci/controllers.py—progress_type_request(),TestStatus.completedbranchWhen a commit test completes, status was determined by two counts:
count(got IS NOT NULL)returns 0 in two distinct cases:TestResultFilerows exist and all havegot = NULL(files matched) — correct SUCCESSTestResultFilerows exist at all — incorrect SUCCESSThe dual-count has no way to distinguish these.
get_test_results()explicitly checks whether expected outputs exist in the schema when noTestResultFilerows are present, and flags that as an error. The dual-count logic has no equivalent check.For
TestType.pull_requesttests this is harmless —comment_pr()overrides the result viaget_test_results(), which detects missing rows. ForTestType.committests,comment_pr()is never called. The dual-count result is posted to GitHub as-is.Production evidence
Commands run on the production VM and database:
Confirmed the code path is entered on completion:
Confirmed false positives — 10 distinct commit tests on the main CCExtractor repository where
crashes = 0,results = 0, missing comparison rows exist, and no other failure in the same run could have caused FAILURE. These tests definitively receivedStatus.SUCCESS:Five concrete examples. GitHub API confirms
CI - windows: successwas delivered for tests 6199 and 6197 (commits12a27f34,ba59eb08) — the linux context for the same commits correctly showed failure, confirming these are the windows-platform test entries. For the three older tests (3484, 3464, 3460), no GitHub status history is available; the platform code deterministically computes and attempts to post SUCCESS when crashes = 0 and results = 0, which the DB confirms was the case for all 10:12a27f34a0e9201ca30cbe588695a0e122d0843cba59eb0887551b8f0944021991b26bfcbf945ee49784cd5bd116b991fed24abdd07259df6ddcdb950bbdfc13eee68b39ed2196c05d87b87dd7a3eefc5127da50d14655401c4086e39d8b2d7786c5038fScope of detection gap — 876 distinct commit tests on the main repo completed with at least one missing comparison row where the dual-count could not detect the gap (whether those tests showed SUCCESS or FAILURE depended on whether other regression tests in the same run had detected failures independently):
Each test run evaluates many regression tests and sums failures across all of them. For the 866 runs outside the confirmed 10, other regression tests within the same run produced real detected failures — wrong exit codes or hash mismatches — which independently triggered FAILURE. The missing rows in those runs did not affect the final verdict because the run was already failing for another reason. For the 10, no other regression test in the same run produced a detectable failure, so the counting logic had nothing else to catch it and posted SUCCESS.
The fix
get_test_results(), already used bycomment_pr()for PR tests and byupdate_build_badgein the same file.get_test_resultsis already imported at line 47 — no new imports needed.t['error']key at the test level is confirmed fromupdate_build_badgein the same file, which already iteratesget_test_results()output usingtest['error']on every test completion.update_build_badgeis updated to accept the pre-computedtest_resultsas an optional parameter, avoiding a redundant second call toget_test_resultson every test completion.Impact
SUCCESSSUCCESS(unchanged)FAILUREFAILURE(unchanged)SUCCESS← bugFAILURE(correct)FAILUREFAILURE(unchanged)Previously
update_build_badgemade a separate internal call toget_test_results()on every completion. The pre-computed results are now passed through, so no additional queries are introduced by this fix.