Preserve row signature column order when column analysis has errors#19162
Preserve row signature column order when column analysis has errors#19162abhishekrb19 merged 3 commits intoapache:masterfrom
Conversation
When column analysis has errors, the current behavior causes signatures to flap causing invalid query plans and errors / incorrect query results. This patch ensures that columns aren't skipped on such errors.
| ColumnType valueType = entry.getValue().getTypeSignature(); | ||
|
|
||
| // this shouldn't happen, but if it does, first try to fall back to legacy type information field in case |
There was a problem hiding this comment.
I think it’s also possible to simplify the surrounding code as the comments note. However, it may be better to do that separately but happy to bundle it into this bug fix as well if folks think it's safe to do so.
gianm
left a comment
There was a problem hiding this comment.
Interesting that this is what turned out to be causing unstable signatures. I suppose this change is fine. It would be good to fix the underlying problem too. Single-column analyses really shouldn't have errors.
| { | ||
| final RowSignature.Builder rowSignatureBuilder = RowSignature.builder(); | ||
| for (Map.Entry<String, ColumnAnalysis> entry : analysis.getColumns().entrySet()) { | ||
| if (entry.getValue().isError()) { |
There was a problem hiding this comment.
I'm surprised you're facing analysis errors when merge is off. Most analysis errors I've seen are related to incompatible types in fold, which won't happen here since we aren't doing merge. I wonder if we should fix the underlying probably that causes the errors to happen… do you have more info about that?
There was a problem hiding this comment.
In particular, on your historicals or realtime tasks do you see warnings from SegmentAnalyzer like Error analyzing column or Unknown column type? If we can fix those then the type would be correct here too, rather than getting reset to STRING.
There was a problem hiding this comment.
Thanks for taking a look!
Yes, these analysis errors are indeed coming from fold. I think I got tripped up by the error logs I had added: Folding column[xyz] is an error[error:cannot_merge_diff_types: [json] and [STRING]] for mergeStrategy[strict] dataSources[abc].
As for why this happens here, I’m not entirely sure....but these were happening only on the realtime tasks in our case. So I had a suspicion that the data for some of these JSON columns in the ingestion spec were numeric strings/null/sparse data and contributing to these analysis errors [cannot_merge_diff_types](error:cannot_merge_diff_types: [json] and [STRING]]) .
- For the unstable signature issue, this can cause spurious query failures or return no data. Unfortunately, there’s no good workaround other than forcing a reorder of the columns in the ingestion spec, so this fix should help address it.
- For any type-related correctness issues, I think users can likely work around them with some form of casting in the queries. (I hadn't seen this issue reported by users so far)
There was a problem hiding this comment.
As for why this happens here, I’m not entirely sure....but these were happening only on the realtime tasks in our case.
Ah, they're probably trying to merge the analyses from different persists. Are you using auto typing? Either type: auto for a dimension spec or useSchemaDiscovery: true? I wonder if there is some issue when different persists select different types for the same column. That kind of thing is generally reconciled at query time and also reconciled when the segment was built and merged, but maybe we missed a place where it needs to be reconciled in the segmentMetadata path.
There was a problem hiding this comment.
Are you using auto typing? Either type: auto for a dimension spec or useSchemaDiscovery: true?
We don't have auto discovery enabled. The columns are a mix of json type ("formatVersion": v5) and primitives string and long (no auto typing).
I wonder if there is some issue when different persists select different types for the same column. That kind of thing is generally reconciled at query time and also reconciled when the segment was built and merged, but maybe we missed a place where it needs to be reconciled in the segmentMetadata path
Hmm, I see. I wonder if the json type follows a similar code path as the auto-type indexer. Also created an issue to track the type correctness issue - #19176.
| } | ||
|
|
||
| /** | ||
| * Verifies that columns with analysis errors are included in the row signature with {@link ColumnType#UNKNOWN_COMPLEX} |
There was a problem hiding this comment.
I believe errors show up as STRING not COMPLEX.
gianm
left a comment
There was a problem hiding this comment.
Approved but I'm interested in more information about your setup that leads to these errors. There might be another thing we can fix that would cause the types to be detected more correctly.
Thanks @gianm. I’ve responded in the discussion with some details of our setup #19162 (comment) (and #19176 to detect types more accurately in such cases). |
Fixes #18437 and relates to #18966.
When column analysis encounters errors during fold, the current behavior can cause row signatures to flap on the Brokers, which in turn leads to sporadic query failures or incorrect query results, since query plans rely on the Broker’s segment metadata cache. This issue is more pronounced during segment analysis on realtime servers with JSON columns, where the fold may sometimes produce column analysis errors, presumably due to type coercion.
This patch ensures that columns are not skipped when such errors occur preserving the row signature's order.
Release note
Preserve row signature column order when column analysis encounters errors, preventing schema flapping and sporadic query failures or incorrect results (fixes #18437).
Note: #19176 may still occur, where the current behavior is that types will fall back to string when such errors are encountered.
This PR has: