Skip to content

test(InsightsTest): match retrySearchUntil threshold to asserted asset count#2500

Merged
cmgrote merged 2 commits into
mainfrom
fix-ci-flakes-insights-suggestions
May 18, 2026
Merged

test(InsightsTest): match retrySearchUntil threshold to asserted asset count#2500
cmgrote merged 2 commits into
mainfrom
fix-ci-flakes-insights-suggestions

Conversation

@sriram-atlan
Copy link
Copy Markdown
Contributor

Summary

Two surgical fixes to make the daily Test (leangraph-test) workflow deterministic. Both target eventual-consistency races against the ES index that surface only under the workflow's parallel matrix on a shared tenant.

Test Failure signature Fix
InsightsTest.searchAssets expected [3] but found [2] on typeName aggregation bucket count Bump retrySearchUntil(index, 3L)4L so retry waits for all 4 assets to be indexed before evaluating the aggregation
SuggestionsTest.findSuggestions* expected [1] but found [0] on ownerGroups / descriptions / assignedTerms suggestion lists Extend awaitConsistency() to retry-search for the peer columns until their non-tag metadata is also visible in ES, not just tags

Together these address MS-1269 and MS-1270.

Root-cause walkthrough

InsightsTest.searchAssets

The test creates 4 assets and asserts that the typeName term-aggregation has 3 buckets (AtlanCollection, Folder, AtlanQuery). It also asserts entities.size() == 4 and approximateCount == 4L further down.

But it called retrySearchUntil(index, 3L) — retry until hits ≥ 3. When the 4th asset (typically the most-recently-created AtlanQuery) hasn't been indexed yet, the retry stops at 3 hits but only 2 distinct typeName values are present, so the bucket-count assertion intermittently fires. Aligning the retry threshold to the asset count closes the race.

SuggestionsTest.awaitConsistency

awaitConsistency() currently calls waitForTagsToSync(taggedAssetGuids, log), which covers __classificationNames and __traitNames but not ownerGroups, description, userDescription, or __meanings. The Suggestions API aggregates these additional fields from peer columns. Under parallel matrix load on a shared tenant, the ES outbox processor (30-second idle poll) hasn't drained the peer's non-tag updates by the time findSuggestionsDefault runs, so all aggregations come back empty.

Diagnostic evidence:

  • Local isolated run: SuggestionsTest passes 24/24 in 1m 39s against leangraph-test
  • CI parallel run on the same tenant: 0 aggregation buckets for owner groups
  • [ms-1268-trace] server logs (separate diagnostic branch) confirm the peer column's __meanings and ownerGroups are correctly persisted in Cassandra and emitted into the ES bulk update body — they just hadn't arrived in ES yet when the test queried

The extension issues an explicit search that filters Columns named COLUMN_NAME1 with all four metadata fields existing, retrying until both metadata-bearing peers (t1c1 and v1c1) are visible. retrySearchUntil already encapsulates exponential backoff and bounded retries.

What this is not fixing

  • MS-1267 (CategoryPreProcessor parent-anchor) — that was a real lean-graph product bug, already merged to switchable-graph-provider as #6721
  • MS-1268 (term mutation response certificateStatus) — that was a cascade of MS-1267; resolved once MS-1267 was deployed

After this PR plus the existing PurposeTest / asset-import token-permission gaps that are environment-specific, the Test (leangraph-test) workflow should hold steady.

Test plan

  • CI green on this PR (PR build / unit tests)
  • After merge: re-run Test (leangraph-test) workflow, confirm Integration (InsightsTest) and Integration (SuggestionsTest) are green for 3 consecutive runs
  • Confirm the non-leangraph nightly Test workflow still passes both jobs (no regression on the other tenant)

🤖 Generated with Claude Code

@sriram-atlan sriram-atlan requested a review from cmgrote as a code owner May 17, 2026 19:38
…t count

InsightsTest.searchAssets creates 4 assets (1 AtlanCollection + 2 Folder
+ 1 AtlanQuery) and asserts both that there are 3 distinct typeName
aggregation buckets and that entities.size() == 4 / approximateCount == 4L.

It called retrySearchUntil(index, 3L), which retries until hits >= 3.
When the 4th asset (commonly the last-created AtlanQuery) hasn't been
indexed yet, retry stops at 3 hits but only 2 distinct types exist, and
the bucket-count assertion intermittently fails as "expected [3] but
found [2]" on the new daily Test (leangraph-test) workflow's matrix run.

Match the threshold to the asset count actually under assertion. This
is a latent test bug — independent of any ES refresh-semantics changes
on the server side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sriram-atlan sriram-atlan force-pushed the fix-ci-flakes-insights-suggestions branch from 41ca63c to bf51b71 Compare May 18, 2026 06:14
@sriram-atlan sriram-atlan changed the title test: stabilise InsightsTest.searchAssets + SuggestionsTest.awaitConsistency test(InsightsTest): match retrySearchUntil threshold to asserted asset count May 18, 2026
@cmgrote cmgrote enabled auto-merge May 18, 2026 09:46
@cmgrote cmgrote merged commit 6d39885 into main May 18, 2026
7 checks passed
@cmgrote cmgrote added the ignore Exclude from release notes label May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ignore Exclude from release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants