Skip to content

#842 Add options for handling of null values when writing EBCDIC files#846

Merged
yruslan merged 4 commits into
masterfrom
feature/842-add-null-handling-options-for-writer
May 6, 2026
Merged

#842 Add options for handling of null values when writing EBCDIC files#846
yruslan merged 4 commits into
masterfrom
feature/842-add-null-handling-options-for-writer

Conversation

@yruslan
Copy link
Copy Markdown
Collaborator

@yruslan yruslan commented May 6, 2026

Summary by CodeRabbit

  • New Features

    • Introduced three new writer-only configuration options for null value handling during EBCDIC output: write_null_strings_as_spaces, write_null_display_numbers_as_zeros, and write_null_comp3_numbers_as_zeros.
    • Enhanced null-aware support for primitive fields during COBOL copybook writing.
  • Documentation

    • Simplified limitations section in README, retaining only EBCDIC output restrictions.
  • Tests

    • Expanded test coverage for null handling scenarios with variable-length fields and COMP-3 number encoding.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Warning

Rate limit exceeded

@yruslan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 43 minutes and 42 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5e6ed249-6765-4afb-9408-bae30e3eec2e

📥 Commits

Reviewing files that changed from the base of the PR and between 703d8e1 and 6b30b93.

📒 Files selected for processing (2)
  • README.md
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala

Walkthrough

This PR introduces writer parameter infrastructure to the COBOL parser. It adds a new WriterParameters configuration class, extends CobolParameters and ReaderParameters to carry writer settings, implements null-aware primitive field writing in the Copybook class, threads writer parameters through the record combiner's write paths, and updates documentation with new writer-only options and simplified limitations.

Changes

Writer Configuration & Null-Aware Field Writing

Layer / File(s) Summary
Data Model
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/WriterParameters.scala, CobolParameters.scala, ReaderParameters.scala
New WriterParameters case class with isEbcdic, nullStringsAsSpaces, nullDisplayNumbersAsZeros, nullComp3NumbersAsZeros fields. CobolParameters and ReaderParameters each gain an optional writerParameters field to carry writer configuration through the reader and writer pipelines.
Configuration Parsing
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala
Added three public writer option constants. Extended parse method signature with isWriter: Boolean = false flag. Introduced parseWriterParameters helper to construct WriterParameters from configuration when in writer mode; wired result into CobolParameters and ReaderParameters construction.
Core Writer Logic
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala
New public method setPrimitiveFieldNullAware performs null-aware field writing for EBCDIC output: for null values, fills bytes based on field type (AlphaNumeric → spaces; Integral/Decimal → zeros; COMP3 → zero values) according to WriterParameters; delegates to existing setPrimitiveField for non-null values.
Writer Integration
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala, spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/NestedRecordCombiner.scala
DefaultSource writer path now passes isWriter = true to parser. NestedRecordCombiner extracts writerParameters from readerParameters and threads it through processRDD, new writeToBytes signature, and recursive write paths (GroupField, PrimitiveArray, GroupArray), enabling null-aware encoding at all nesting levels.
Tests & Documentation
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParserSuite.scala, spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/NestedWriterSuite.scala, spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/fixtures/TextComparisonFixture.scala, README.md
New CobolParametersParserSuite validates writer parameter parsing. NestedWriterSuite expanded with new copybook definition copybookWithDependingOnWithComp3 and test cases for OCCURS DEPENDING ON with null values and COMP-3 numbers. TextComparisonFixture adds ANSI color support for binary comparison output. README documents new writer-only options and simplifies limitations to single EBCDIC output note.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • AbsaOSS/cobrix#827: Modifies writer codepath and Copybook primitive-field write APIs; this PR adds null-aware setPrimitiveFieldNullAware and threads WriterParameters through those same paths.
  • AbsaOSS/cobrix#840: Also modifies NestedRecordCombiner's processRDD and write paths to accept new writer-related parameters alongside existing infrastructure.
  • AbsaOSS/cobrix#829: Introduces earlier EBCDIC writer work that this PR extends with WriterParameters, null-aware writing, and full configuration threading.

Poem

🐰 A copybook now sees what's null,
And fills each byte with grace so full—
Spaces, zeros, COMP3 too,
The writer brings precision true!
Parameters dance through every call,
Making null-aware handling all in all! 📝✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the changeset: adding null handling options for EBCDIC file writing, which is the core purpose reflected across all modified files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/842-add-null-handling-options-for-writer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

JaCoCo code coverage report - 'cobol-parser'

Overall Project 91.05% -0.29% 🍏
Files changed 0%

File Coverage
Copybook.scala 71.92% -14.16%

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

JaCoCo code coverage report - 'spark-cobol'

Overall Project 83.37% 🍏
Files changed 100% 🍏

File Coverage
NestedRecordCombiner.scala 89.06% 🍏
DefaultSource.scala 83.16% 🍏

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/NestedRecordCombiner.scala (2)

138-170: 💤 Low value

Document the new writerParameters parameter.

The Scaladoc blocks for processRDD (lines 138–159) and writeToBytes (lines 361–379) describe every other parameter but omit the newly added writerParameters. Please extend both docblocks so the public contract of these helpers is complete.

Also applies to: 361-385

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/NestedRecordCombiner.scala`
around lines 138 - 170, The Scaladoc for processRDD and writeToBytes is missing
documentation for the newly added writerParameters parameter; update both
docblocks to include a `@param` entry named writerParameters describing its type
(WriterParameters), purpose (configuration options for writing—e.g.,
compression, encoding, padding, trimming, RDW behavior, etc.), and any key
effects on output (how it alters record generation such as trimming, padding, or
RDW adjustments). Locate the docblocks above the private[cobrix] def
processRDD(...) and the writeToBytes(...) method and add a concise `@param`
writerParameters line consistent with the existing style and detail level used
for other parameters.

86-86: ⚡ Quick win

Avoid .get on writerParameters; surface a clearer error or default.

readerParameters.writerParameters.get will throw a confusing NoSuchElementException: None.get if any future caller invokes combine without isWriter = true populating the field. Either fall back to a default WriterParameters() or fail fast with a descriptive message.

♻️ Proposed defensive fix
-      readerParameters.writerParameters.get)
+      readerParameters.writerParameters.getOrElse(
+        throw new IllegalStateException(
+          "Writer parameters were not initialized. Ensure CobolParametersParser.parse is invoked with isWriter = true on the writer path.")
+      ))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/NestedRecordCombiner.scala`
at line 86, The call readerParameters.writerParameters.get is unsafe and can
throw NoSuchElementException; in the combine method (and any path relying on
isWriter) replace the .get with a safe alternative: use
readerParameters.writerParameters.getOrElse(WriterParameters()) to supply a
sensible default WriterParameters instance, or use getOrElse(throw new
IllegalArgumentException("writerParameters must be set when calling combine with
isWriter=true")) to fail fast with a clear message; update the code paths that
reference readerParameters.writerParameters accordingly so they no longer call
.get directly.
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParserSuite.scala (1)

21-40: ⚡ Quick win

Consider expanding test coverage.

The suite covers a single happy-path case. Consider adding tests that:

  • Verify default values when writer options are absent (e.g., nullStringsAsSpaces default).
  • Assert writerParameters is None when isWriter = false, to lock down the reader vs. writer parsing contract.
  • Validate that unknown writer options are rejected when pedantic = true (the test sets pedantic but never relies on it).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParserSuite.scala`
around lines 21 - 40, Add tests to cover defaults, reader-vs-writer behavior,
and pedantic unknown-option rejection: create new test cases in
CobolParametersParserSuite that call CobolParametersParser.parse with (a) a
Parameters map missing writer options and isWriter = true and assert
parsedParams.writerParameters.get has expected default values (e.g.,
nullStringsAsSpaces default), (b) a Parameters map with writer options but
isWriter = false and assert parsedParams.writerParameters is None, and (c) a
Parameters map containing an unknown writer option with pedantic = "true" and
assert the parser throws or returns an error according to
CobolParametersParser.parse behavior; reference the Parameters constructor,
CobolParametersParser.parse, WriterParameters and parsedParams.writerParameters
to locate the code under test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala`:
- Around line 551-554: getFieldPositionAndSize currently treats any non-zero
fieldStartOffsetOverride as an override which allows negative sentinels to be
used directly; change its override check to "fieldStartOffsetOverride > 0" (the
same logic used in setPrimitiveField) so 0 or negative values fall back to
configuredStartOffset + field.binaryProperties.offset, preventing negative
offsets from being applied and out-of-bounds writes.
- Around line 522-525: The two branches handling DISPLAY numeric nulls (case i:
Integral ... and case d: Decimal ...) are bypassing the encoder by filling
recordBytes with 0xF0, which breaks sign-aware DISPLAY encodings (e.g., PIC
S9...). Replace the raw java.util.Arrays.fill(...) calls with a call into the
existing encoder via setPrimitiveField(...), passing a zero value (0 for
Integral, BigDecimal(0) for Decimal) and the same field
descriptor/offset/fieldLength so the normal sign-aware DISPLAY formatting is
used; keep the guard writerParameters.nullDisplayNumbersAsZeros and reuse the
same parameters (Integral/Decimal instances and offset/fieldLength) when
invoking setPrimitiveField.

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala`:
- Around line 117-118: The public case class constructor CobolParameters was
changed by inserting writerParameters before options which breaks source and
binary compatibility; restore compatibility by moving writerParameters to the
end of the parameter list or give it a default value and keep the original
parameter order so existing positional calls still work (e.g., ensure the
signature keeps options in its original position and append writerParameters:
Option[WriterParameters] = None or provide a factory/builder that accepts the
new field), updating the CobolParameters definition and any companion-object
constructors accordingly.

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala`:
- Around line 143-145: The change inserted writerParameters before options in
the ReaderParameters case class which breaks the public constructor signature
and thus binary compatibility (affecting generated apply() and copy()); to fix,
restore the original parameter order so options remains in its original position
(move writerParameters to the end of the parameter list) or alternatively add a
backward-compatible companion factory/overload that preserves the original
positional signature; update the ReaderParameters declaration and any companion
object factory methods so that apply() and copy() signatures remain unchanged
while still supporting writerParameters as an optional field.

In `@README.md`:
- Around line 1697-1702: The README's EBCDIC Writer "Handling of PIC X(n)" text
is inconsistent with the new options; update that section to reference the new
options (write_null_strings_as_spaces, write_null_display_numbers_as_zeros,
write_null_comp3_numbers_as_zeros) and describe the actual behavior depending on
their values (e.g., when write_null_strings_as_spaces=true PIC X(n) nulls are
written as space characters, otherwise as 0x00 bytes; similarly describe DISPLAY
and COMP-3 numeric behaviors and state defaults), and mention that strict_schema
remains unaffected; replace the hardcoded statement that PIC X(n) nulls are
written as 0x00 with this configurable description.

---

Nitpick comments:
In
`@cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParserSuite.scala`:
- Around line 21-40: Add tests to cover defaults, reader-vs-writer behavior, and
pedantic unknown-option rejection: create new test cases in
CobolParametersParserSuite that call CobolParametersParser.parse with (a) a
Parameters map missing writer options and isWriter = true and assert
parsedParams.writerParameters.get has expected default values (e.g.,
nullStringsAsSpaces default), (b) a Parameters map with writer options but
isWriter = false and assert parsedParams.writerParameters is None, and (c) a
Parameters map containing an unknown writer option with pedantic = "true" and
assert the parser throws or returns an error according to
CobolParametersParser.parse behavior; reference the Parameters constructor,
CobolParametersParser.parse, WriterParameters and parsedParams.writerParameters
to locate the code under test.

In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/NestedRecordCombiner.scala`:
- Around line 138-170: The Scaladoc for processRDD and writeToBytes is missing
documentation for the newly added writerParameters parameter; update both
docblocks to include a `@param` entry named writerParameters describing its type
(WriterParameters), purpose (configuration options for writing—e.g.,
compression, encoding, padding, trimming, RDW behavior, etc.), and any key
effects on output (how it alters record generation such as trimming, padding, or
RDW adjustments). Locate the docblocks above the private[cobrix] def
processRDD(...) and the writeToBytes(...) method and add a concise `@param`
writerParameters line consistent with the existing style and detail level used
for other parameters.
- Line 86: The call readerParameters.writerParameters.get is unsafe and can
throw NoSuchElementException; in the combine method (and any path relying on
isWriter) replace the .get with a safe alternative: use
readerParameters.writerParameters.getOrElse(WriterParameters()) to supply a
sensible default WriterParameters instance, or use getOrElse(throw new
IllegalArgumentException("writerParameters must be set when calling combine with
isWriter=true")) to fail fast with a clear message; update the code paths that
reference readerParameters.writerParameters accordingly so they no longer call
.get directly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a08db0fd-1906-4a01-9ae9-33cd22d207f6

📥 Commits

Reviewing files that changed from the base of the PR and between 9e407ff and 703d8e1.

📒 Files selected for processing (11)
  • README.md
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/WriterParameters.scala
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParserSuite.scala
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/NestedRecordCombiner.scala
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/fixtures/TextComparisonFixture.scala
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/NestedWriterSuite.scala

Comment on lines +522 to +525
case i: Integral if writerParameters.nullDisplayNumbersAsZeros && i.compact.isEmpty =>
java.util.Arrays.fill(recordBytes, offset, offset + fieldLength, 0xF0.toByte)
case d: Decimal if writerParameters.nullDisplayNumbersAsZeros && d.compact.isEmpty =>
java.util.Arrays.fill(recordBytes, offset, offset + fieldLength, 0xF0.toByte)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t bypass the encoder for DISPLAY zero writes.

Filling DISPLAY numerics with 0xF0 works for unsigned zoned digits, but it skips sign-aware encoding for signed or sign-separate DISPLAY fields. That can emit an invalid zero representation for PIC S9... DISPLAY while the normal encoder already knows how to format zero correctly. Please route these branches through setPrimitiveField(..., 0 / BigDecimal(0), ...) instead of raw byte fills.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala`
around lines 522 - 525, The two branches handling DISPLAY numeric nulls (case i:
Integral ... and case d: Decimal ...) are bypassing the encoder by filling
recordBytes with 0xF0, which breaks sign-aware DISPLAY encodings (e.g., PIC
S9...). Replace the raw java.util.Arrays.fill(...) calls with a call into the
existing encoder via setPrimitiveField(...), passing a zero value (0 for
Integral, BigDecimal(0) for Decimal) and the same field
descriptor/offset/fieldLength so the normal sign-aware DISPLAY formatting is
used; keep the guard writerParameters.nullDisplayNumbersAsZeros and reuse the
same parameters (Integral/Decimal instances and offset/fieldLength) when
invoking setPrimitiveField.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a valid point. But I'm going to wait for a use case from a user before implementing the suggestion. The reason is that null is different from 0 and the way different systems handle nulls may vary. As soon as anybody is having issues with nulls for signed DISPLAY numbers, will implement this suggestion

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yruslan, that's a fair call — deferring until there's a concrete use case avoids over-engineering a potentially system-specific behaviour. I'll note the rationale for future reviews.

(ʕ •ᴥ•ʔ)っ 🐇✨


✏️ Learnings added
Learnt from: yruslan
Repo: AbsaOSS/cobrix PR: 846
File: cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala:522-525
Timestamp: 2026-05-06T08:58:03.850Z
Learning: In `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala`, the `setPrimitiveFieldNullAware` method intentionally fills DISPLAY numeric null fields with raw `0xF0` bytes (EBCDIC zero) rather than routing through the encoder with a zero value. The maintainer (yruslan) has deliberately deferred sign-aware encoding for null DISPLAY numerics (e.g. `PIC S9... DISPLAY`) until a concrete user case arises, because `null` and `0` have different semantics and null handling for signed DISPLAY fields may vary across systems. Do not flag this as a bug in future reviews unless there is a specific user-reported issue.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: yruslan
Repo: AbsaOSS/cobrix PR: 820
File: cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala:50-50
Timestamp: 2026-02-11T08:15:51.206Z
Learning: Rename the field from _corrupted_fields to _corrupt_fields to align with Apache Spark's naming convention (e.g., _corrupt_record). Apply this change consistently across the codebase where the field is defined or referenced (including cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala) and update any tests or serialization logic that rely on the old name.

Comment on lines +117 to 118
writerParameters: Option[WriterParameters],
options: Map[String, String]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

cd cobol-parser && find . -name "CobolParameters.scala" -type f

Repository: AbsaOSS/cobrix

Length of output: 139


🏁 Script executed:

cat -n cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala | sed -n '100,130p'

Repository: AbsaOSS/cobrix

Length of output: 1520


🏁 Script executed:

# Check if CobolParameters is a case class and its visibility
cat -n cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala | sed -n '1,120p' | tail -40

Repository: AbsaOSS/cobrix

Length of output: 3034


🏁 Script executed:

# Search for existing usages of CobolParameters constructor with positional arguments
rg "CobolParameters\s*\(" cobol-parser/src --type scala -A 2 -B 1 | head -60

Repository: AbsaOSS/cobrix

Length of output: 1077


🏁 Script executed:

# Get the full class definition and its scope
cat -n cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala | sed -n '1,80p'

Repository: AbsaOSS/cobrix

Length of output: 7066


🏁 Script executed:

# Check the constructor call in CobolParametersParser to see if positional arguments are used
cat -n cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala | grep -A 100 "val cobolParameters = CobolParameters("

Repository: AbsaOSS/cobrix

Length of output: 5451


🏁 Script executed:

# Check git history to see if writerParameters is new and its placement changed
cd cobol-parser && git log --oneline -n 20 -- src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala

Repository: AbsaOSS/cobrix

Length of output: 160


🏁 Script executed:

# Check for default value on writerParameters - if it has one, it's backwards compatible
cat cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala | grep -A 1 "writerParameters"

Repository: AbsaOSS/cobrix

Length of output: 381


This changes the public CobolParameters constructor shape and breaks backward compatibility.

Adding writerParameters before options (without a default value) breaks both source and binary compatibility for code using positional arguments with this public case class. Existing callers will fail to compile, and previously compiled code will break at runtime. Since this case class is part of the public API, the safe approach is to append new fields at the end with a default value, or refactor behind a builder/factory method instead of inserting into the middle of the signature.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala`
around lines 117 - 118, The public case class constructor CobolParameters was
changed by inserting writerParameters before options which breaks source and
binary compatibility; restore compatibility by moving writerParameters to the
end of the parameter list or give it a default value and keep the original
parameter order so existing positional calls still work (e.g., ensure the
signature keeps options in its original position and append writerParameters:
Option[WriterParameters] = None or provide a factory/builder that accepts the
new field), updating the CobolParameters definition and any companion-object
constructors accordingly.

Comment on lines 143 to 145
recordsToExclude: Set[String] = Set.empty,
writerParameters: Option[WriterParameters] = None,
options: Map[String, String] = Map.empty
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

# First, find and read the ReaderParameters.scala file
find . -path "*cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala" -type f

Repository: AbsaOSS/cobrix

Length of output: 153


🏁 Script executed:

# Read the ReaderParameters.scala file to see the full case class definition
cat -n cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala | head -160

Repository: AbsaOSS/cobrix

Length of output: 13134


🏁 Script executed:

# Search for usages of ReaderParameters() constructor with positional arguments
rg "ReaderParameters\s*\(" --type scala -A 3 -B 1 | head -100

Repository: AbsaOSS/cobrix

Length of output: 11424


🏁 Script executed:

# Check for publication configuration and artifact metadata
find . -name "pom.xml" -o -name "build.sbt" -o -name "build.gradle" | head -5

Repository: AbsaOSS/cobrix

Length of output: 196


🏁 Script executed:

# Check for any binary compatibility or versioning guidelines
find . -name "CHANGELOG*" -o -name "CONTRIBUTING*" -o -name "README*" | grep -E "\.(md|txt)$" | head -10

Repository: AbsaOSS/cobrix

Length of output: 148


🏁 Script executed:

# Check the main build configuration to see if this is published to Maven/artifact repos
head -100 build.sbt

Repository: AbsaOSS/cobrix

Length of output: 3677


🏁 Script executed:

# Check pom.xml files for publish configuration
cat cobol-parser/pom.xml | grep -A 10 -E "<distributionManagement>|<repository>|<name>" | head -40

Repository: AbsaOSS/cobrix

Length of output: 40


🏁 Script executed:

# Check README to understand the project scope
head -50 README.md

Repository: AbsaOSS/cobrix

Length of output: 2583


🏁 Script executed:

# Check if there are any explicit tests or documentation about binary compatibility
rg -i "binary.?compatible|backward.?compatible" --type md --type scala | head -20

Repository: AbsaOSS/cobrix

Length of output: 40


🏁 Script executed:

# Look for any versioning or release documentation
find . -name "VERSION*" -o -name "RELEASE*" -o -name ".github" -type d

Repository: AbsaOSS/cobrix

Length of output: 68


🏁 Script executed:

# Check for any version management or semantic versioning docs
cat .github/workflows/*.yml 2>/dev/null | grep -A 5 -B 5 "version\|release" | head -40

Repository: AbsaOSS/cobrix

Length of output: 1312


🏁 Script executed:

# Verify whether WriterParameters is a new addition and check imports
rg "WriterParameters" --type scala -B 2 | head -60

Repository: AbsaOSS/cobrix

Length of output: 5095


🏁 Script executed:

# Check if any test or documentation mentions positional constructor usage
rg "ReaderParameters\(" cobol-parser/src/test/scala --type scala -A 2 | grep -v "recordFormat\|isEbcdic\|isText" | head -40

Repository: AbsaOSS/cobrix

Length of output: 4342


Binary compatibility risk: inserting writerParameters before options breaks the public ReaderParameters case-class constructor signature.

Even with a default value, inserting writerParameters before options changes the generated apply() and copy() method signatures. Pre-compiled external callers using positional construction will no longer be binary-compatible. Since this is a published library on Maven Central, binary compatibility with external consumers must be preserved.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala`
around lines 143 - 145, The change inserted writerParameters before options in
the ReaderParameters case class which breaks the public constructor signature
and thus binary compatibility (affecting generated apply() and copy()); to fix,
restore the original parameter order so options remains in its original position
(move writerParameters to the end of the parameter list) or alternatively add a
backward-compatible companion factory/overload that preserves the original
positional signature; update the ReaderParameters declaration and any companion
object factory methods so that apply() and copy() signatures remain unchanged
while still supporting writerParameters as an optional field.

Comment thread README.md
Comment on lines +1697 to +1702
| Option (usage example) | Description |
|---------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| .option("strict_schema", "true") | If 'true' (default) Cobrix will throw an exception if a field exists in the copybook but not in the Spark schema. Array count fields (defined in DEPENDING ON clause) are auto-generated and never required to exist in Spark schema. |
| .option("write_null_strings_as_spaces", "false") | If 'true' Cobrix will write `null` alphanumeric fields as spaces when writing output files. |
| .option("write_null_display_numbers_as_zeros", "false") | If 'true' Cobrix will write `null` numeric fields having DISPLAY format as serquence of zeros when writing output files. |
| .option("write_null_comp3_numbers_as_zeros", "false") | If 'true' Cobrix will write `null` numeric fields having COMP-3 format as zeros when writing EBCDIC files. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the later null-handling docs to match these new options.

This section now documents configurable null handling, but the later EBCDIC Writer -> Implementation details -> Handling of PIC X(n) section still says null values are written as 0x00 bytes. That leaves the README internally contradictory for write_null_strings_as_spaces=true.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 1697 - 1702, The README's EBCDIC Writer "Handling of
PIC X(n)" text is inconsistent with the new options; update that section to
reference the new options (write_null_strings_as_spaces,
write_null_display_numbers_as_zeros, write_null_comp3_numbers_as_zeros) and
describe the actual behavior depending on their values (e.g., when
write_null_strings_as_spaces=true PIC X(n) nulls are written as space
characters, otherwise as 0x00 bytes; similarly describe DISPLAY and COMP-3
numeric behaviors and state defaults), and mention that strict_schema remains
unaffected; replace the hardcoded statement that PIC X(n) nulls are written as
0x00 with this configurable description.

@yruslan yruslan merged commit 672bde3 into master May 6, 2026
7 checks passed
@yruslan yruslan deleted the feature/842-add-null-handling-options-for-writer branch May 6, 2026 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant