-
Notifications
You must be signed in to change notification settings - Fork 86
#792 Add EBCDIC to ASCII encoders for all single byte code pages #794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… in error line.
…the error message.
WalkthroughThe PR adds bidirectional ASCII↔EBCDIC mappings across single-byte code pages, refactors SingleByteCodePage to accept an explicit asciiToEbcdic Array and provides getReverseTable, and enhances parser error reporting by adding comment-aware position adjustment (CommentPolicy) and extending SyntaxErrorException with optional position and field metadata. Changes
Sequence Diagram(s)sequenceDiagram
participant API as ANTLRParser.parse
participant Policy as CommentPolicy
participant Visitor as ParserVisitor
participant Strategy as ThrowErrorStrategy
participant Exc as SyntaxErrorException
API->>Policy: read commentPolicy (truncateComments ...)
Policy-->>API: return policy
API->>API: compute adjPos from policy
API->>Strategy: instantiate ThrowErrorStrategy(adjPos)
API->>Visitor: new ParserVisitor(..., commentPolicy, ...)
Note over API,Strategy: on syntax error during parse
Strategy->>Strategy: recover() calculates offset column = col - adjPos
Strategy->>Exc: throw SyntaxErrorException(line, Some(adjCol), fieldOpt, msg)
Exc->>API: error propagated (message built using posOpt & fieldOpt)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧬 Code graph analysis (2)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (1)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/SyntaxErrorsSpec.scala (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
🔇 Additional comments (6)
Comment |
JaCoCo code coverage report - 'spark-cobol'
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala (1)
85-519: Consider parameterized tests to reduce duplication.The test structure is highly repetitive across code pages. Consider refactoring to use ScalaTest's parameterized test features (e.g.,
TableDrivenPropertyChecks) to reduce duplication while maintaining coverage.Example approach:
forAll(Table( ("codePage", "expectedUnicode", "ebcdicBytes"), (new CodePage273, " {Ä!~Ü^...", Array(0x40, 0x43, ...)), (new CodePage274, " æÄ!üÜ^...", Array(0x40, 0x9C, ...)), // ... )) { (enc, expected, bytes) => val actualUnicode = decodeEbcdicString(bytes, KeepAll, enc, false) val actualEbcdicBytes = enc.convert(expected, expected.length) assert(actualUnicode == expected) assert(actualEbcdicBytes.sameElements(bytes)) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (40)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ANTLRParser.scala(3 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala(5 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/BinaryPropertiesAdder.scala(1 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage037.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage037Ext.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1025.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1047.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1140.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1141.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1142.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1143.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1144.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1145.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1146.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1147.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1148.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1160.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage273.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage274.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage275.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage277.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage278.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage280.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage284.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage285.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage297.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage500.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage838.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage870.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage875.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommonExt.scala(2 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala(3 hunks)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/exceptions/SyntaxErrorException.scala(1 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala(8 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/FakeCodePage.scala(2 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/DataSizeSpec.scala(3 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/PicValidationSpec.scala(3 hunks)cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/SyntaxErrorsSpec.scala(9 hunks)spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/utils/CustomCodePage.scala(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-19T09:12:31.865Z
Learnt from: yruslan
Repo: AbsaOSS/cobrix PR: 775
File: spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala:0-0
Timestamp: 2025-08-19T09:12:31.865Z
Learning: In Cobrix COBOL writer, null values are encoded as zero bytes (0x00) for all field types, including PIC X fields, rather than using traditional COBOL padding (EBCDIC spaces 0x40 for PIC X). This design decision prioritizes consistency with the reader side, which already treats zero bytes as nulls, ensuring round-trip fidelity between reading and writing operations.
Applied to files:
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/PicValidationSpec.scala
🧬 Code graph analysis (37)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage297.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1147.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (2)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/policies/CommentPolicy.scala (1)
CommentPolicy(19-23)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ANTLRParser.scala (1)
truncateComments(105-119)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage037Ext.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage284.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage870.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommonExt.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1148.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1141.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage875.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage500.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/DataSizeSpec.scala (2)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/policies/CommentPolicy.scala (1)
CommentPolicy(19-23)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/policies/StringTrimmingPolicy.scala (1)
StringTrimmingPolicy(19-46)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage274.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage277.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1143.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/PicValidationSpec.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/policies/CommentPolicy.scala (1)
CommentPolicy(19-23)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage280.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/utils/CustomCodePage.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1160.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1145.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/TwoByteCodePage.scala (3)
convert(35-92)convert(94-96)supportsEncoding(98-98)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage275.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage273.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage037.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1025.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/FakeCodePage.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala (2)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala (1)
decodeEbcdicString(48-63)cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (2)
convert(37-45)convert(54-76)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1140.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage285.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1146.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage278.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage838.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1142.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1047.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/SyntaxErrorsSpec.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/CopybookParser.scala (2)
CopybookParser(42-490)parseTree(198-241)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1144.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala (3)
SingleByteCodePage(24-79)SingleByteCodePage(81-96)getReverseTable(82-95)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
- GitHub Check: Spark 3.4.4 on Scala 2.12.20
- GitHub Check: Spark 3.5.7 on Scala 2.12.20
- GitHub Check: Spark 2.4.8 on Scala 2.11.12
- GitHub Check: Spark 3.5.7 on Scala 2.13.17
🔇 Additional comments (43)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/BinaryPropertiesAdder.scala (1)
56-56: LGTM! Exception constructor correctly adapted.The update to the new
SyntaxErrorExceptionsignature is correct, properly passing optional field metadata.cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1145.scala (1)
23-23: LGTM! Bidirectional mapping correctly implemented.The constructor now passes both forward and reverse mappings to
SingleByteCodePage, and the reverse mapping is computed lazily usinggetReverseTable. This enables ASCII-to-EBCDIC encoding support for CP1145.Also applies to: 56-56
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala (2)
45-45: LGTM! CommentPolicy integration enables accurate error positions.The new
commentPolicyparameter allows the parser to adjust error positions based on whether comment truncation is enabled, improving the accuracy of syntax error reporting.
561-562: The review comment is incorrect and should be ignored.The position adjustment does not assume single-byte characters; it operates on character positions (code points) throughout. The code uses character-based indexing on all sides:
commentsUpToChar(Int = 6 by default) represents a COBOL column number- Scala's
String.slice()andString.drop()operations in ANTLRParser.scala work on character indices, not bytes- ANTLR 4.7.2's
getCharPositionInLine()returns character positions (code points), not byte positions- The calculation
commentPolicy.commentsUpToChar + 1andctx.stop.getCharPositionInLineare both character-position valuesCOBOL fixed format is inherently character-based (column positions), not byte-based. Multi-byte UTF-8 characters are correctly represented as single characters in this model, so the position adjustment handles them correctly.
Likely an incorrect or invalid review comment.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage277.scala (1)
22-22: LGTM! Bidirectional mapping correctly implemented.The changes follow the same pattern as other code pages, enabling ASCII-to-EBCDIC encoding support for CP277.
Also applies to: 55-55
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/FakeCodePage.scala (1)
19-19: LGTM! Test code updated to match new API.The test code page correctly implements the two-mapping constructor pattern, ensuring test coverage for the bidirectional encoding functionality.
Also applies to: 29-29
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1147.scala (1)
23-23: LGTM! Bidirectional mapping correctly implemented.The changes enable ASCII-to-EBCDIC encoding support for CP1147, consistent with the pattern applied across all single-byte code pages.
Also applies to: 56-56
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage285.scala (1)
22-22: LGTM! Bidirectional mapping correctly implemented.The changes enable ASCII-to-EBCDIC encoding support for CP285, following the established pattern.
Also applies to: 55-55
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1142.scala (1)
25-25: LGTM! Bidirectional mapping correctly implemented.The changes enable ASCII-to-EBCDIC encoding support for CP1142, consistent with the pattern applied across all single-byte code pages.
Also applies to: 58-58
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/SyntaxErrorsSpec.scala (3)
44-46: LGTM! Enhanced error assertions validate position and field metadata.The updated assertions now verify
posOpt.isEmptyandfieldOpt.contains("GRP_FIELD"), ensuring the error reporting enhancements work correctly for group field validation errors.
49-63: LGTM! New test validates position reporting for malformed statements.The test verifies that parser syntax errors include accurate position information (line 2, position 13), which is essential for developer experience when debugging copybook issues.
78-79: LGTM! Comprehensive test coverage for enhanced error reporting.All existing tests are updated to verify the new
posOptandfieldOptfields, ensuring consistent error reporting behavior across different syntax error scenarios.Also applies to: 93-94, 108-109, 124-126, 140-141, 155-156, 182-183, 198-199
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/PicValidationSpec.scala (1)
29-29: Test setup correctly updated for new API.The changes properly align with the new ParserVisitor constructor signature requiring CommentPolicy and the updated error strategy accepting position adjustment.
Also applies to: 38-38, 59-59
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage275.scala (1)
22-22: Bidirectional mapping correctly implemented.The changes properly add ASCII-to-EBCDIC reverse mapping support by passing both mappings to the base class and lazily computing the reverse table. This pattern is consistent across all code pages in this PR.
Also applies to: 54-55
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommonExt.scala (1)
25-25: LGTM.Consistent with the bidirectional mapping pattern applied across all code pages.
Also applies to: 92-93
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1141.scala (1)
26-26: LGTM.Bidirectional mapping correctly implemented following the established pattern.
Also applies to: 58-59
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1143.scala (1)
25-25: LGTM.Changes follow the consistent bidirectional mapping pattern.
Also applies to: 57-58
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage274.scala (1)
22-22: LGTM.Bidirectional mapping implementation is consistent with the PR pattern.
Also applies to: 54-55
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage284.scala (1)
22-22: LGTM.The bidirectional mapping changes are correctly applied.
Also applies to: 54-55
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1148.scala (1)
26-26: LGTM.Bidirectional mapping correctly implemented, completing the consistent pattern across all code pages in this PR.
Also applies to: 58-59
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1025.scala (1)
22-22: LGTM! Bidirectional mapping support added correctly.The addition of
asciiToEbcdicMappingenables ASCII-to-EBCDIC encoding for this code page. The lazy evaluation ensures the reverse table is computed only when needed, and the use ofSingleByteCodePage.getReverseTableprovides a consistent implementation across all code pages.Also applies to: 59-60
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1160.scala (1)
23-23: LGTM! Consistent bidirectional mapping implementation.The changes follow the same pattern as other code pages in this PR, enabling ASCII-to-EBCDIC encoding with lazy evaluation of the reverse mapping.
Also applies to: 72-73
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1047.scala (1)
23-23: LGTM! Bidirectional mapping support added.The implementation is consistent with the PR's objective to add ASCII-to-EBCDIC encoding support for all single-byte code pages.
Also applies to: 58-59
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage037Ext.scala (1)
24-24: LGTM! Bidirectional mapping correctly implemented.The addition of
asciiToEbcdicMappingenables encoding support for the extended Code Page 037, consistent with the refactoring pattern across all code pages.Also applies to: 56-57
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1140.scala (1)
26-26: LGTM! Encoding support added for Code Page 1140.The bidirectional mapping implementation follows the consistent pattern established across all code pages in this PR.
Also applies to: 58-59
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/DataSizeSpec.scala (1)
28-28: LGTM! Test adapted to parser API changes.The addition of
CommentPolicy()and the update toThrowErrorStrategy(6)correctly adapt this test to the enhanced error reporting with comment-aware position adjustment introduced in the broader parser refactoring.Also applies to: 38-38, 59-59
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage870.scala (1)
23-23: LGTM! Bidirectional mapping support added for Code Page 870.The implementation follows the consistent pattern across all code pages, enabling ASCII-to-EBCDIC encoding for the Latin-2 charset.
Also applies to: 60-61
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1146.scala (1)
23-23: LGTM! Bidirectional mapping correctly implemented.The addition of
asciiToEbcdicMappingcompletes the consistent refactoring pattern across all code pages in this PR, enabling ASCII-to-EBCDIC encoding for Code Page 1146.Also applies to: 55-56
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala (2)
86-96: LGTM! Excellent round-trip testing.The updated test pattern now verifies both Unicode decoding and EBCDIC encoding round-trips, ensuring bidirectional mapping correctness. The approach of comparing
actualEbcdicBytes.sameElements(ebcdicBytes)validates that the reverse mapping is accurate.
296-314: LGTM! Good coverage for Cyrillic characters.The new CP1025 test validates both Cyrillic and Latin character support with round-trip verification, following the same robust pattern used for other code pages.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage273.scala (1)
24-24: LGTM! Bidirectional mapping correctly implemented.The changes properly enable EBCDIC↔ASCII conversion by:
- Passing both mappings to the superclass constructor (line 24)
- Lazily computing the reverse mapping via
getReverseTable(line 57)The
lazy valappropriately defers computation until first use, and the pattern aligns with the broader PR changes across all code pages.Also applies to: 57-57
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage278.scala (1)
22-22: LGTM! Consistent with other code page updates.The bidirectional mapping implementation follows the same pattern as CodePage273 and other code pages in this PR.
Also applies to: 55-55
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/utils/CustomCodePage.scala (1)
21-21: LGTM! Test helper updated correctly.The test code page now supports bidirectional mapping, enabling comprehensive testing of the encoding/decoding paths.
Also applies to: 54-54
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage838.scala (1)
22-22: LGTM! Thai script code page updated correctly.The bidirectional mapping support is properly implemented following the established pattern.
Also applies to: 73-73
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage500.scala (1)
22-22: LGTM! International code page updated correctly.The implementation is consistent with other code pages in this PR.
Also applies to: 55-55
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePageCommon.scala (1)
24-24: LGTM! Common code page refactored correctly.The change from an optional/function-based approach to a direct
Array[Byte]mapping simplifies the implementation and aligns with the refactoredSingleByteCodePagebase class.Also applies to: 59-59
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/exceptions/SyntaxErrorException.scala (1)
19-40: LGTM! Enhanced error reporting with optional metadata.The updated exception properly handles optional position and field information:
- Position: Formats as "line X:Y" when available, "line X" otherwise
- Field: Appends ", field " when available
- Gracefully handles all combinations of present/absent optional values
The implementation improves error diagnostics while maintaining backward compatibility for cases where position/field information isn't available.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/CodePage1144.scala (2)
25-25: LGTM! Bidirectional encoding support enabled.The updated constructor correctly passes both
ebcdicToAsciiMappingandasciiToEbcdicMappingto enable ASCII-to-EBCDIC conversion for the Spark writer, aligning with the PR objectives.
58-58: LGTM! Efficient reverse mapping generation.The lazy evaluation ensures the reverse table is only computed when ASCII-to-EBCDIC encoding is actually needed, which is appropriate for this potentially expensive operation.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ANTLRParser.scala (4)
32-41: LGTM! Configurable position adjustment for accurate error reporting.The
posAdjustmentparameter correctly shifts error column positions to account for truncated comments, ensuring error messages reference positions in the original copybook rather than the processed version.
69-69: LGTM! ParserVisitor now receives comment policy.Passing
commentPolicyto theParserVisitorenables it to handle comment-aware position adjustments during parsing.
71-71: LGTM! Position adjustment correctly accounts for leading comment truncation.The
adjPoscalculation properly sets the offset to match the number of leading characters removed from each line, ensuring error positions are reported relative to the original copybook.
86-86: LGTM! Error handler properly configured with position adjustment.The error handler instantiation correctly passes
adjPosto enable comment-aware error position reporting throughout the parsing process.
...arser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/BinaryPropertiesAdder.scala
Outdated
Show resolved
Hide resolved
...ser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/codepage/SingleByteCodePage.scala
Outdated
Show resolved
Hide resolved
…transform/BinaryPropertiesAdder.scala Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Closes #792
Summary by CodeRabbit
New Features
Bug Fixes
Tests