Skip to content

FIX]: Fix false-positive SUCCESS status for commit tests with missing comparison rows#1119

Open
x15sr71 wants to merge 1 commit into
CCExtractor:masterfrom
x15sr71:fix/commit-status-false-positive
Open

FIX]: Fix false-positive SUCCESS status for commit tests with missing comparison rows#1119
x15sr71 wants to merge 1 commit into
CCExtractor:masterfrom
x15sr71:fix/commit-status-false-positive

Conversation

@x15sr71

@x15sr71 x15sr71 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Please prefix your pull request with one of the following: [FEATURE] [FIX] [IMPROVEMENT].

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

  • I have never used the project.
  • I have used the project briefly.
  • I have used the project extensively, but have not contributed previously.
  • I am an active contributor to the project.

Dependency

PR #1118 must be merged first or deployed simultaneously. This PR calls get_test_results() inside progress_type_request. Without #1118, that function intermittently crashes with SQLAlchemy lazy-load failures on every commit test completion.

Supersedes #1091

PR #1091 was raised two months ago for the same bug and remains open.
This PR supersedes it with the following differences:

  • Production database evidence quantifying confirmed false positives (10 proven, 876 affected)
  • GitHub API verification confirming SUCCESS was delivered for specific commits
  • Removal of the now-unused from sqlalchemy.sql.functions import count import
  • update_build_badge updated to accept pre-computed results, eliminating a redundant get_test_results() call per completion

PR #1091 is closed in favour of this one.


The bug

File: mod_ci/controllers.pyprogress_type_request(), TestStatus.completed branch

When a commit test completes, status was determined by two counts:

crashes = count(TestResult where exit_code != expected_rc)
results  = count(TestResultFile where got IS NOT NULL)

if crashes > 0 or results > 0:
    state = Status.FAILURE
else:
    state = Status.SUCCESS   # also fires when zero rows exist

count(got IS NOT NULL) returns 0 in two distinct cases:

  • All TestResultFile rows exist and all have got = NULL (files matched) — correct SUCCESS
  • No TestResultFile rows exist at all — incorrect SUCCESS

The dual-count has no way to distinguish these. get_test_results() explicitly checks whether expected outputs exist in the schema when no TestResultFile rows are present, and flags that as an error. The dual-count logic has no equivalent check.

For TestType.pull_request tests this is harmless — comment_pr() overrides the result via get_test_results(), which detects missing rows. For TestType.commit tests, comment_pr() is never called. The dual-count result is posted to GitHub as-is.


Production evidence

Commands run on the production VM and database:

Confirmed the code path is entered on completion:

sed -n '2430,2437p' /var/www/sample-platform/mod_ci/controllers.py
    elif status == TestStatus.completed:
        # Determine if success or failure
        # It fails if any of these happen:
        # - A crash (unexpected exit code)
        # - A not None value on the "got" of a TestResultFile (
        #       meaning the hashes do not match)
        crashes = g.db.query(count(TestResult.exit_code)).filter(

Confirmed false positives — 10 distinct commit tests on the main CCExtractor repository where crashes = 0, results = 0, missing comparison rows exist, and no other failure in the same run could have caused FAILURE. These tests definitively received Status.SUCCESS:

SELECT COUNT(DISTINCT t.id) AS confirmed_false_positive_commit_tests
FROM test t
JOIN test_progress tp ON tp.test_id = t.id AND tp.status = 'completed'
JOIN fork f ON f.id = t.fork_id
WHERE t.test_type = 'commit'
  AND f.github LIKE '%CCExtractor/ccextractor%'
  AND EXISTS (
      SELECT 1
      FROM test_result tr
      LEFT JOIN test_result_file trf
          ON trf.test_id = tr.test_id
         AND trf.regression_test_id = tr.regression_test_id
      WHERE tr.test_id = t.id
        AND trf.test_id IS NULL
        AND tr.exit_code = 0
        AND tr.expected_rc = 0
        AND tr.regression_test_id NOT IN (139, 238, 239)
  )
  AND NOT EXISTS (
      SELECT 1 FROM test_result_file trf2
      WHERE trf2.test_id = t.id AND trf2.got IS NOT NULL
  )
  AND NOT EXISTS (
      SELECT 1 FROM test_result tr2
      WHERE tr2.test_id = t.id AND tr2.exit_code != tr2.expected_rc
  );
-- Result: 10

Five concrete examples. GitHub API confirms CI - windows: success was delivered for tests 6199 and 6197 (commits 12a27f34, ba59eb08) — the linux context for the same commits correctly showed failure, confirming these are the windows-platform test entries. For the three older tests (3484, 3464, 3460), no GitHub status history is available; the platform code deterministically computes and attempts to post SUCCESS when crashes = 0 and results = 0, which the DB confirms was the case for all 10:

Test ID Commit
6199 12a27f34a0e9201ca30cbe588695a0e122d0843c
6197 ba59eb0887551b8f0944021991b26bfcbf945ee4
3484 9784cd5bd116b991fed24abdd07259df6ddcdb95
3464 0bbdfc13eee68b39ed2196c05d87b87dd7a3eefc
3460 5127da50d14655401c4086e39d8b2d7786c5038f

Scope of detection gap — 876 distinct commit tests on the main repo completed with at least one missing comparison row where the dual-count could not detect the gap (whether those tests showed SUCCESS or FAILURE depended on whether other regression tests in the same run had detected failures independently):

SELECT COUNT(DISTINCT t.id) AS affected_commit_tests
FROM test t
JOIN test_progress tp ON tp.test_id = t.id AND tp.status = 'completed'
JOIN fork f ON f.id = t.fork_id
WHERE t.test_type = 'commit'
  AND f.github LIKE '%CCExtractor/ccextractor%'
  AND EXISTS (
      SELECT 1
      FROM test_result tr
      LEFT JOIN test_result_file trf
          ON trf.test_id = tr.test_id
         AND trf.regression_test_id = tr.regression_test_id
      WHERE tr.test_id = t.id
        AND trf.test_id IS NULL
        AND tr.exit_code = 0
        AND tr.expected_rc = 0
        AND tr.regression_test_id NOT IN (139, 238, 239)
  );
-- Result: 876

Each test run evaluates many regression tests and sums failures across all of them. For the 866 runs outside the confirmed 10, other regression tests within the same run produced real detected failures — wrong exit codes or hash mismatches — which independently triggered FAILURE. The missing rows in those runs did not affect the final verdict because the run was already failing for another reason. For the 10, no other regression test in the same run produced a detectable failure, so the counting logic had nothing else to catch it and posted SUCCESS.


The fix

  1. Replace the dual-count logic with get_test_results(), already used by comment_pr() for PR tests and by update_build_badge in the same file. get_test_results is already imported at line 47 — no new imports needed.
  2. The t['error'] key at the test level is confirmed from update_build_badge in the same file, which already iterates get_test_results() output using test['error'] on every test completion.
  3. The upstream cause of missing rows is addressed separately by [FIX]: Guard against empty output file path in ServerComparer ccx_testsuite#14, but this platform-side counting bug exists independently of how rows went missing.
  4. update_build_badge is updated to accept the pre-computed test_results as an optional parameter, avoiding a redundant second call to get_test_results on every test completion.

Impact

Scenario Before After
All comparisons pass SUCCESS SUCCESS (unchanged)
Hash mismatch detected FAILURE FAILURE (unchanged)
Comparison rows missing (outputs expected) SUCCESS ← bug FAILURE (correct)
Exit code mismatch FAILURE FAILURE (unchanged)

Previously update_build_badge made a separate internal call to get_test_results() on every completion. The pre-computed results are now passed through, so no additional queries are introduced by this fix.

@x15sr71 x15sr71 force-pushed the fix/commit-status-false-positive branch from 7a8582c to b276a67 Compare June 8, 2026 07:07
@x15sr71 x15sr71 force-pushed the fix/commit-status-false-positive branch from b276a67 to a6b5d2d Compare June 8, 2026 07:15
@sonarqubecloud

sonarqubecloud Bot commented Jun 8, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant