Preserve row signature column order when column analysis has errors by abhishekrb19 · Pull Request #19162 · apache/druid

abhishekrb19 · 2026-03-16T01:59:02Z

Fixes #18437 and relates to #18966.

When column analysis encounters errors during fold, the current behavior can cause row signatures to flap on the Brokers, which in turn leads to sporadic query failures or incorrect query results, since query plans rely on the Broker’s segment metadata cache. This issue is more pronounced during segment analysis on realtime servers with JSON columns, where the fold may sometimes produce column analysis errors, presumably due to type coercion.

This patch ensures that columns are not skipped when such errors occur preserving the row signature's order.

Release note

Preserve row signature column order when column analysis encounters errors, preventing schema flapping and sporadic query failures or incorrect results (fixes #18437).

Note: #19176 may still occur, where the current behavior is that types will fall back to string when such errors are encountered.

This PR has:

been self-reviewed.
a release note entry in the PR description.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
been tested in a test Druid cluster.

When column analysis has errors, the current behavior causes signatures to flap causing invalid query plans and errors / incorrect query results. This patch ensures that columns aren't skipped on such errors.

abhishekrb19 · 2026-03-16T02:08:11Z

server/src/main/java/org/apache/druid/segment/metadata/AbstractSegmentMetadataCache.java

      ColumnType valueType = entry.getValue().getTypeSignature();

      // this shouldn't happen, but if it does, first try to fall back to legacy type information field in case


I think it’s also possible to simplify the surrounding code as the comments note. However, it may be better to do that separately but happy to bundle it into this bug fix as well if folks think it's safe to do so.

gianm

Interesting that this is what turned out to be causing unstable signatures. I suppose this change is fine. It would be good to fix the underlying problem too. Single-column analyses really shouldn't have errors.

gianm · 2026-03-16T16:36:24Z

server/src/main/java/org/apache/druid/segment/metadata/AbstractSegmentMetadataCache.java

  {
    final RowSignature.Builder rowSignatureBuilder = RowSignature.builder();
    for (Map.Entry<String, ColumnAnalysis> entry : analysis.getColumns().entrySet()) {
-      if (entry.getValue().isError()) {


I'm surprised you're facing analysis errors when merge is off. Most analysis errors I've seen are related to incompatible types in fold, which won't happen here since we aren't doing merge. I wonder if we should fix the underlying probably that causes the errors to happen… do you have more info about that?

In particular, on your historicals or realtime tasks do you see warnings from SegmentAnalyzer like Error analyzing column or Unknown column type? If we can fix those then the type would be correct here too, rather than getting reset to STRING.

Thanks for taking a look!

Yes, these analysis errors are indeed coming from fold. I think I got tripped up by the error logs I had added: Folding column[xyz] is an error[error:cannot_merge_diff_types: [json] and [STRING]] for mergeStrategy[strict] dataSources[abc].

As for why this happens here, I’m not entirely sure....but these were happening only on the realtime tasks in our case. So I had a suspicion that the data for some of these JSON columns in the ingestion spec were numeric strings/null/sparse data and contributing to these analysis errors [cannot_merge_diff_types](error:cannot_merge_diff_types: [json] and [STRING]]) .

For the unstable signature issue, this can cause spurious query failures or return no data. Unfortunately, there’s no good workaround other than forcing a reorder of the columns in the ingestion spec, so this fix should help address it.

For any type-related correctness issues, I think users can likely work around them with some form of casting in the queries. (I hadn't seen this issue reported by users so far)

As for why this happens here, I’m not entirely sure....but these were happening only on the realtime tasks in our case.

Ah, they're probably trying to merge the analyses from different persists. Are you using auto typing? Either type: auto for a dimension spec or useSchemaDiscovery: true? I wonder if there is some issue when different persists select different types for the same column. That kind of thing is generally reconciled at query time and also reconciled when the segment was built and merged, but maybe we missed a place where it needs to be reconciled in the segmentMetadata path.

Are you using auto typing? Either type: auto for a dimension spec or useSchemaDiscovery: true?

We don't have auto discovery enabled. The columns are a mix of json type ("formatVersion": v5) and primitives string and long (no auto typing).

I wonder if there is some issue when different persists select different types for the same column. That kind of thing is generally reconciled at query time and also reconciled when the segment was built and merged, but maybe we missed a place where it needs to be reconciled in the segmentMetadata path

Hmm, I see. I wonder if the json type follows a similar code path as the auto-type indexer. Also created an issue to track the type correctness issue - #19176.

gianm · 2026-03-16T16:36:54Z

server/src/test/java/org/apache/druid/segment/metadata/CoordinatorSegmentMetadataCacheTest.java

  }

+  /**
+   * Verifies that columns with analysis errors are included in the row signature with {@link ColumnType#UNKNOWN_COMPLEX}


I believe errors show up as STRING not COMPLEX.

gianm

Approved but I'm interested in more information about your setup that leads to these errors. There might be another thing we can fix that would cause the types to be detected more correctly.

abhishekrb19 · 2026-03-18T03:38:47Z

Approved but I'm interested in more information about your setup that leads to these errors. There might be another thing we can fix that would cause the types to be detected more correctly.

Thanks @gianm. I’ve responded in the discussion with some details of our setup #19162 (comment) (and #19176 to detect types more accurately in such cases).

Fix apache#18437

8cd7cc4

When column analysis has errors, the current behavior causes signatures to flap causing invalid query plans and errors / incorrect query results. This patch ensures that columns aren't skipped on such errors.

abhishekrb19 added the Bug label Mar 16, 2026

abhishekrb19 commented Mar 16, 2026

View reviewed changes

gianm reviewed Mar 16, 2026

View reviewed changes

abhishekrb19 added 2 commits March 16, 2026 22:11

Merge branch 'master' into fix_analysis_order

e61d4ac

Fixup comment.

735e74a

gianm approved these changes Mar 17, 2026

View reviewed changes

abhishekrb19 mentioned this pull request Mar 18, 2026

Column analyses errors results in falling back to string types #19176

Open

abhishekrb19 merged commit 746cae6 into apache:master Mar 18, 2026
37 checks passed

abhishekrb19 deleted the fix_analysis_order branch March 18, 2026 03:42

github-actions bot added this to the 37.0.0 milestone Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve row signature column order when column analysis has errors#19162

Preserve row signature column order when column analysis has errors#19162
abhishekrb19 merged 3 commits intoapache:masterfrom
abhishekrb19:fix_analysis_order

abhishekrb19 commented Mar 16, 2026 •

edited

Loading

Uh oh!

abhishekrb19 Mar 16, 2026

Uh oh!

gianm left a comment

Uh oh!

gianm Mar 16, 2026

Uh oh!

gianm Mar 16, 2026

Uh oh!

abhishekrb19 Mar 17, 2026 •

edited

Loading

Uh oh!

gianm Mar 17, 2026

Uh oh!

abhishekrb19 Mar 18, 2026

Uh oh!

gianm Mar 16, 2026

Uh oh!

gianm left a comment

Uh oh!

abhishekrb19 commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		ColumnType valueType = entry.getValue().getTypeSignature();

		// this shouldn't happen, but if it does, first try to fall back to legacy type information field in case

Conversation

abhishekrb19 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release note

Uh oh!

abhishekrb19 Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

gianm Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

gianm Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

abhishekrb19 Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

abhishekrb19 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

gianm Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

abhishekrb19 commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abhishekrb19 commented Mar 16, 2026 •

edited

Loading

abhishekrb19 Mar 17, 2026 •

edited

Loading