Found via ClickGap automated review. Please close or comment if this is incorrect or needs adjustment.
Retrospective finding from a historical scan of PR #69445 (merged 2024-12-05). Confirmed on current codebase — close with a note if already fixed.
Describe what's wrong
ComplexTypeSchemaProcessorFunctions.cpp:258-260 has a copy-paste error in the tuple-of-map subfield evolution branch. tryGet(tmp_node_map) succeeds (populating tmp_node_map), but the next line moves tmp_node_array into current_node — leaving a stale/null Array variable as the evolved type instead of the freshly-populated Map. Result: corrupted schema evolution for Iceberg Array-of-Map columns; potential data loss when reading.
Root cause: src/Storages/ObjectStorage/DataLakes/Iceberg/ComplexTypeSchemaProcessorFunctions.cpp:258-260 — the conditional dispatches on tmp_node_map but moves tmp_node_array. The two earlier branches in the if/else-if chain correctly move the variable they tested with tryGet; this branch breaks the pattern (likely a copy-paste from the Array branch above).
Why we believe this is a bug: src/Storages/ObjectStorage/DataLakes/Iceberg/ComplexTypeSchemaProcessorFunctions.cpp:258-260 — the conditional dispatches on tmp_node_map but moves tmp_node_array. The two earlier branches in the if/else-if chain correctly move the variable they tested with tryGet; this branch breaks the pattern (likely a copy-paste from the Array branch above).
Affected locations:
src/Storages/ObjectStorage/DataLakes/Iceberg/ComplexTypeSchemaProcessorFunctions.cpp:259 — current_node = std::move(tmp_node_array) — should be tmp_node_map after tryGet(tmp_node_map) succeeded on line 258
Impact: Iceberg tables with Array-of-Map columns that undergo schema evolution may have their evolved schema corrupted. The processor outputs an Array-typed transformer where a Map-typed transformer is expected, causing wrong-type ActionsDAG construction. Downstream: incorrect column reads (likely empty/null) or type-mismatch errors.
Does it reproduce on most recent release?
Likely yes — see testability note in additional context.
How to reproduce
Create an Iceberg table with a column of type Array(Map(String, X)). Evolve the schema (add/remove fields in the inner Map). Read the table after evolution — expect type mismatch or wrong values. Reading the tryBuildSchemaProcessor path in stateless tests can't reach this (Iceberg requires real Avro/S3); maintainer to validate via integration test or direct code review.
Expected behavior
The code should not exhibit the behavior described in the root cause above.
Error message and/or stacktrace
Additional context
Suggested fix: Line 259: change std::move(tmp_node_array) to std::move(tmp_node_map). Same shape as the matching tuple-of-array branch one block above which correctly uses tmp_node_array.
Analysis details: Confidence HIGH | Severity P1 | Testability: INTEGRATION_TEST
Found during automated review of PR #69445.
ClickGapAI · Confidence: HIGH · Severity: P1 · Finding: h_pr69445_audit_recovery_4f701264
Found via ClickGap automated review. Please close or comment if this is incorrect or needs adjustment.
Retrospective finding from a historical scan of PR #69445 (merged 2024-12-05). Confirmed on current codebase — close with a note if already fixed.
Describe what's wrong
ComplexTypeSchemaProcessorFunctions.cpp:258-260 has a copy-paste error in the tuple-of-map subfield evolution branch.
tryGet(tmp_node_map)succeeds (populating tmp_node_map), but the next line movestmp_node_arrayinto current_node — leaving a stale/null Array variable as the evolved type instead of the freshly-populated Map. Result: corrupted schema evolution for Iceberg Array-of-Map columns; potential data loss when reading.Root cause: src/Storages/ObjectStorage/DataLakes/Iceberg/ComplexTypeSchemaProcessorFunctions.cpp:258-260 — the conditional dispatches on tmp_node_map but moves tmp_node_array. The two earlier branches in the if/else-if chain correctly move the variable they tested with tryGet; this branch breaks the pattern (likely a copy-paste from the Array branch above).
Why we believe this is a bug: src/Storages/ObjectStorage/DataLakes/Iceberg/ComplexTypeSchemaProcessorFunctions.cpp:258-260 — the conditional dispatches on tmp_node_map but moves tmp_node_array. The two earlier branches in the if/else-if chain correctly move the variable they tested with tryGet; this branch breaks the pattern (likely a copy-paste from the Array branch above).
Affected locations:
src/Storages/ObjectStorage/DataLakes/Iceberg/ComplexTypeSchemaProcessorFunctions.cpp:259— current_node = std::move(tmp_node_array) — should be tmp_node_map after tryGet(tmp_node_map) succeeded on line 258Impact: Iceberg tables with Array-of-Map columns that undergo schema evolution may have their evolved schema corrupted. The processor outputs an Array-typed transformer where a Map-typed transformer is expected, causing wrong-type ActionsDAG construction. Downstream: incorrect column reads (likely empty/null) or type-mismatch errors.
Does it reproduce on most recent release?
Likely yes — see testability note in additional context.
How to reproduce
Expected behavior
The code should not exhibit the behavior described in the root cause above.
Error message and/or stacktrace
Additional context
Suggested fix: Line 259: change
std::move(tmp_node_array)tostd::move(tmp_node_map). Same shape as the matching tuple-of-array branch one block above which correctly uses tmp_node_array.Analysis details: Confidence HIGH | Severity P1 | Testability:
INTEGRATION_TESTFound during automated review of PR #69445.
ClickGapAI · Confidence: HIGH · Severity: P1 · Finding:
h_pr69445_audit_recovery_4f701264