#847 Fix array flattening in Spark 4+.#848
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughflattenSchema's array-handling now detects Spark 4+ and generates version-appropriate array access expressions ( ChangesSpark 4+ Array Indexing Compatibility
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/SparkUtils.scala (1)
198-203:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
path.contains('[')log-path selector misses Spark 4get(…)expressions.In
flattenGroup, when apatharrives fromflattenStructArrayorflattenNestedArrayson Spark 4, it will look likeget(parentfield, 0).— no[present — sopath.contains('[')evaluates tofalseand the log entry usescol(...)syntax even though the path is a SQL expression thatcolcannot evaluate. This only affects thestringFieldslog output (line 209), not actual execution, but the logged "flattening code" becomes incorrect/misleading for Spark 4 paths.🔧 Proposed fix
- if (path.contains('[')) + if (path.contains('[') || path.contains('(')) stringFields += s"""expr("$path`${field.name}` AS `$newFieldName`")""" else stringFields += s"""col("$path`${field.name}`").as("$newFieldName")"""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/SparkUtils.scala` around lines 198 - 203, The log-path selector in flattenGroup currently checks only path.contains('[') so Spark 4 SQL-style paths like get(parent`field`, 0) are misclassified; update the condition that decides whether to append an expr(...) vs col(...) to stringFields to also detect SQL/get-style expressions (for example check path.contains("get(") or otherwise detect '(' in the path) and use the expr(...) branch for those cases; locate the logic in flattenGroup where stringFields is appended (related symbols: flattenGroup, flattenStructArray, flattenNestedArrays, path, stringFields, fields, getNewFieldName) and change the conditional so Spark 4 get(...) expressions are logged with expr(...) rather than col(...).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/SparkUtils.scala`:
- Around line 129-133: In flattenStructArray's primitive-array branch the actual
Column added to fields still uses bracket indexing
(expr(s"$path`${structField.name}`[$i]")) which fails on Spark 4; change the
Column expression to use getArrayIndexExpr(fieldName, i) like stringFields does
so fields and stringFields are consistent; update the fields += expr(...) call
inside flattenStructArray to build the Column via
expr(getArrayIndexExpr(fieldName, i)) and keep the existing
.as(getNewFieldName(...), structField.metadata) so the selection matches the
logger output.
---
Outside diff comments:
In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/SparkUtils.scala`:
- Around line 198-203: The log-path selector in flattenGroup currently checks
only path.contains('[') so Spark 4 SQL-style paths like get(parent`field`, 0)
are misclassified; update the condition that decides whether to append an
expr(...) vs col(...) to stringFields to also detect SQL/get-style expressions
(for example check path.contains("get(") or otherwise detect '(' in the path)
and use the expr(...) branch for those cases; locate the logic in flattenGroup
where stringFields is appended (related symbols: flattenGroup,
flattenStructArray, flattenNestedArrays, path, stringFields, fields,
getNewFieldName) and change the conditional so Spark 4 get(...) expressions are
logged with expr(...) rather than col(...).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 89b865c8-138f-40cc-808c-24df907d5622
📒 Files selected for processing (1)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/utils/SparkUtils.scala
JaCoCo code coverage report - 'cobol-parser'
|
JaCoCo code coverage report - 'spark-cobol'
|
Summary by CodeRabbit