FIX]: Fix false-positive SUCCESS status for commit tests with missing comparison rows by x15sr71 · Pull Request #1119 · CCExtractor/sample-platform

x15sr71 · 2026-06-08T06:52:36Z

Please prefix your pull request with one of the following: [FEATURE] [FIX] [IMPROVEMENT].

In raising this pull request, I confirm the following (please check boxes):

I have read and understood the contributors guide.
I have checked that another pull request for this purpose does not exist.
I have considered, and confirmed that this submission will be valuable to others.
I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

I have never used the project.
I have used the project briefly.
I have used the project extensively, but have not contributed previously.
I am an active contributor to the project.

Dependency

PR #1118 must be merged first or deployed simultaneously. This PR calls get_test_results() inside progress_type_request. Without #1118, that function intermittently crashes with SQLAlchemy lazy-load failures on every commit test completion.

Supersedes #1091

PR #1091 was raised two months ago for the same bug and remains open.
This PR supersedes it with the following differences:

Production database evidence quantifying confirmed false positives (10 proven, 876 affected)
GitHub API verification confirming SUCCESS was delivered for specific commits
Removal of the now-unused from sqlalchemy.sql.functions import count import
update_build_badge updated to accept pre-computed results, eliminating a redundant get_test_results() call per completion

PR #1091 is closed in favour of this one.

The bug

File: mod_ci/controllers.py — progress_type_request(), TestStatus.completed branch

When a commit test completes, status was determined by two counts:

crashes = count(TestResult where exit_code != expected_rc)
results  = count(TestResultFile where got IS NOT NULL)

if crashes > 0 or results > 0:
    state = Status.FAILURE
else:
    state = Status.SUCCESS   # also fires when zero rows exist

count(got IS NOT NULL) returns 0 in two distinct cases:

All TestResultFile rows exist and all have got = NULL (files matched) — correct SUCCESS
No TestResultFile rows exist at all — incorrect SUCCESS

The dual-count has no way to distinguish these. get_test_results() explicitly checks whether expected outputs exist in the schema when no TestResultFile rows are present, and flags that as an error. The dual-count logic has no equivalent check.

For TestType.pull_request tests this is harmless — comment_pr() overrides the result via get_test_results(), which detects missing rows. For TestType.commit tests, comment_pr() is never called. The dual-count result is posted to GitHub as-is.

Production evidence

Commands run on the production VM and database:

Confirmed the code path is entered on completion:

sed -n '2430,2437p' /var/www/sample-platform/mod_ci/controllers.py
    elif status == TestStatus.completed:
        # Determine if success or failure
        # It fails if any of these happen:
        # - A crash (unexpected exit code)
        # - A not None value on the "got" of a TestResultFile (
        #       meaning the hashes do not match)
        crashes = g.db.query(count(TestResult.exit_code)).filter(

Confirmed false positives — 10 distinct commit tests on the main CCExtractor repository where crashes = 0, results = 0, missing comparison rows exist, and no other failure in the same run could have caused FAILURE. These tests definitively received Status.SUCCESS:

SELECT COUNT(DISTINCT t.id) AS confirmed_false_positive_commit_tests
FROM test t
JOIN test_progress tp ON tp.test_id = t.id AND tp.status = 'completed'
JOIN fork f ON f.id = t.fork_id
WHERE t.test_type = 'commit'
  AND f.github LIKE '%CCExtractor/ccextractor%'
  AND EXISTS (
      SELECT 1
      FROM test_result tr
      LEFT JOIN test_result_file trf
          ON trf.test_id = tr.test_id
         AND trf.regression_test_id = tr.regression_test_id
      WHERE tr.test_id = t.id
        AND trf.test_id IS NULL
        AND tr.exit_code = 0
        AND tr.expected_rc = 0
        AND tr.regression_test_id NOT IN (139, 238, 239)
  )
  AND NOT EXISTS (
      SELECT 1 FROM test_result_file trf2
      WHERE trf2.test_id = t.id AND trf2.got IS NOT NULL
  )
  AND NOT EXISTS (
      SELECT 1 FROM test_result tr2
      WHERE tr2.test_id = t.id AND tr2.exit_code != tr2.expected_rc
  );
-- Result: 10

Five concrete examples. GitHub API confirms CI - windows: success was delivered for tests 6199 and 6197 (commits 12a27f34, ba59eb08) — the linux context for the same commits correctly showed failure, confirming these are the windows-platform test entries. For the three older tests (3484, 3464, 3460), no GitHub status history is available; the platform code deterministically computes and attempts to post SUCCESS when crashes = 0 and results = 0, which the DB confirms was the case for all 10:

Test ID	Commit
6199	`12a27f34a0e9201ca30cbe588695a0e122d0843c`
6197	`ba59eb0887551b8f0944021991b26bfcbf945ee4`
3484	`9784cd5bd116b991fed24abdd07259df6ddcdb95`
3464	`0bbdfc13eee68b39ed2196c05d87b87dd7a3eefc`
3460	`5127da50d14655401c4086e39d8b2d7786c5038f`

Scope of detection gap — 876 distinct commit tests on the main repo completed with at least one missing comparison row where the dual-count could not detect the gap (whether those tests showed SUCCESS or FAILURE depended on whether other regression tests in the same run had detected failures independently):

SELECT COUNT(DISTINCT t.id) AS affected_commit_tests
FROM test t
JOIN test_progress tp ON tp.test_id = t.id AND tp.status = 'completed'
JOIN fork f ON f.id = t.fork_id
WHERE t.test_type = 'commit'
  AND f.github LIKE '%CCExtractor/ccextractor%'
  AND EXISTS (
      SELECT 1
      FROM test_result tr
      LEFT JOIN test_result_file trf
          ON trf.test_id = tr.test_id
         AND trf.regression_test_id = tr.regression_test_id
      WHERE tr.test_id = t.id
        AND trf.test_id IS NULL
        AND tr.exit_code = 0
        AND tr.expected_rc = 0
        AND tr.regression_test_id NOT IN (139, 238, 239)
  );
-- Result: 876

Each test run evaluates many regression tests and sums failures across all of them. For the 866 runs outside the confirmed 10, other regression tests within the same run produced real detected failures — wrong exit codes or hash mismatches — which independently triggered FAILURE. The missing rows in those runs did not affect the final verdict because the run was already failing for another reason. For the 10, no other regression test in the same run produced a detectable failure, so the counting logic had nothing else to catch it and posted SUCCESS.

The fix

Replace the dual-count logic with get_test_results(), already used by comment_pr() for PR tests and by update_build_badge in the same file. get_test_results is already imported at line 47 — no new imports needed.
The t['error'] key at the test level is confirmed from update_build_badge in the same file, which already iterates get_test_results() output using test['error'] on every test completion.
The upstream cause of missing rows is addressed separately by [FIX]: Guard against empty output file path in ServerComparer ccx_testsuite#14, but this platform-side counting bug exists independently of how rows went missing.
update_build_badge is updated to accept the pre-computed test_results as an optional parameter, avoiding a redundant second call to get_test_results on every test completion.

Impact

Scenario	Before	After
All comparisons pass	`SUCCESS`	`SUCCESS` (unchanged)
Hash mismatch detected	`FAILURE`	`FAILURE` (unchanged)
Comparison rows missing (outputs expected)	`SUCCESS` ← bug	`FAILURE` (correct)
Exit code mismatch	`FAILURE`	`FAILURE` (unchanged)

Previously update_build_badge made a separate internal call to get_test_results() on every completion. The pre-computed results are now passed through, so no additional queries are introduced by this fix.

…ve SUCCESS

sonarqubecloud · 2026-06-08T07:16:49Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

x15sr71 requested review from canihavesomecoffee and thealphadollar as code owners June 8, 2026 06:52

x15sr71 force-pushed the fix/commit-status-false-positive branch from 7a8582c to b276a67 Compare June 8, 2026 07:07

fix: use get_test_results() for commit status to prevent false-positi…

a6b5d2d

…ve SUCCESS

x15sr71 force-pushed the fix/commit-status-false-positive branch from b276a67 to a6b5d2d Compare June 8, 2026 07:15

This was referenced Jun 8, 2026

[FIX]: correct GitHub commit status when test output files are missing #1091

Closed

ci: trigger sample platform test runs CCExtractor/ccextractor#2279

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX]: Fix false-positive SUCCESS status for commit tests with missing comparison rows#1119

FIX]: Fix false-positive SUCCESS status for commit tests with missing comparison rows#1119
x15sr71 wants to merge 1 commit into
CCExtractor:masterfrom
x15sr71:fix/commit-status-false-positive

x15sr71 commented Jun 8, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

x15sr71 commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency

Supersedes #1091

The bug

Production evidence

The fix

Impact

Uh oh!

sonarqubecloud Bot commented Jun 8, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

x15sr71 commented Jun 8, 2026 •

edited

Loading