Skip to content

#826 Add support for writing VRL files with RDW headers#827

Merged
yruslan merged 5 commits intomasterfrom
feature/826-add-support-vrl-writes
Mar 3, 2026
Merged

#826 Add support for writing VRL files with RDW headers#827
yruslan merged 5 commits intomasterfrom
feature/826-add-support-vrl-writes

Conversation

@yruslan
Copy link
Collaborator

@yruslan yruslan commented Mar 3, 2026

Closes #826

Summary by CodeRabbit

  • New Features

    • Writing: RDW support for variable-length records (big- and little-endian) with RDW-aware field encoding.
    • Field offsets: allow explicit field start-offset overrides for precise write positioning.
  • Bug Fixes / Validation

    • Stricter write-parameter validation to reject unsupported configurations with clear errors.
  • Tests

    • New tests validating variable-length writer behavior, RDW handling, and validation error cases.
  • Style / Docs

    • Shortened string codes for record formats for clearer output/diagnostics.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 493c763 and 5de36fa.

📒 Files selected for processing (1)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala
🚧 Files skipped from review as they are similar to previous changes (1)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala

Walkthrough

Adds RDW (4-byte Record Description Word) writing support and writer-side parameter validation; changes Copybook.setPrimitiveField to accept configuredStartOffset and fieldStartOffsetOverride; sets concise RecordFormat string codes; integrates RDW-aware encoding in BasicRecordCombiner and adds variable-length EBCDIC writer tests.

Changes

Cohort / File(s) Summary
Copybook offset handling
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala
Replaced single startOffset param with configuredStartOffset and fieldStartOffsetOverride; compute startByte/endByte using override when >0, otherwise use field.binaryProperties.offset + configuredStartOffset.
RecordFormat string codes
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/recordformats/RecordFormat.scala
Added toString overrides returning short codes: FixedLength="F", FixedBlock="FB", VariableLength="V", VariableBlock="VB", AsciiText="D", CobrixAsciiText="D2".
Writer parameter validation
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala, spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/parameters/CobolParametersValidator.scala
Call to validateParametersForWriting(readerParameters) added in DefaultSource; new validator enforces allowed record formats (FixedLength or VariableLength), disallows variable_size_occurs, and prohibits start/end offsets for writing.
RDW-aware encoding
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala
Detects RDW presence and endianness, adjusts output size (+4 bytes when RDW), sets data startOffset (4) for RDW, writes 4-byte RDW header (big/little-endian variants), validates RDW lengths, and calls Copybook.setPrimitiveField with the new offset params.
Variable-length writer tests
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/VariableLengthEbcdicWriterSuite.scala
New test suite validating big- and little-endian RDW writes and asserting errors for unsupported writer parameter combinations (record_format, variable_size_occurs, start/end offsets).

Sequence Diagram(s)

sequenceDiagram
    participant DefaultSource as DefaultSource
    participant Validator as CobolParametersValidator
    participant Combiner as BasicRecordCombiner
    participant Copybook as Copybook

    DefaultSource->>Validator: validateParametersForWriting(readerParameters)
    activate Validator
    Validator->>Validator: check recordFormat, variable_size_occurs, offsets
    alt invalid
        Validator-->>DefaultSource: throw IllegalArgumentException
    else valid
        Validator-->>DefaultSource: return
    end
    deactivate Validator

    DefaultSource->>Combiner: combine(records, readerParameters)
    activate Combiner
    Combiner->>Combiner: Detect RDW & endianness
    alt RDW present
        Combiner->>Combiner: Compute record length, validate
        Combiner->>Combiner: Write 4-byte RDW header (big/little-endian)
        Combiner->>Combiner: set startOffset = 4
    else No RDW
        Combiner->>Combiner: set startOffset = 0
    end
    loop for each field
        Combiner->>Copybook: setPrimitiveField(field, outBytes, value, configuredStartOffset, fieldStartOffsetOverride)
        activate Copybook
        Copybook->>Copybook: use override if > 0 else configuredStartOffset
        Copybook-->>Combiner: field bytes written
        deactivate Copybook
    end
    Combiner-->>DefaultSource: return output byte array
    deactivate Combiner
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 Hoppity-hop, four bytes lead the way,

RDW headers whisper where records play,
Offsets sorted, overrides in store,
Fields nestle in bytes and leap some more,
A rabbit cheers — write on, encore! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding support for writing VRL files with RDW headers, which matches the core objective.
Linked Issues check ✅ Passed The pull request successfully implements RDW handling for variable-record-length file writes as required by issue #826, with comprehensive validation, parameter updates, and test coverage.
Out of Scope Changes check ✅ Passed All changes directly support VRL/RDW writing: Copybook offset handling, RecordFormat string representations, parameter validation, RDW encoding in BasicRecordCombiner, and comprehensive tests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/826-add-support-vrl-writes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

JaCoCo code coverage report - 'cobol-parser'

Overall Project 91.4% -0.24% 🍏
Files changed 30.84% 🍏

File Coverage
Copybook.scala 83.07% -0.93% 🍏
RecordFormat.scala 58.06% 🍏

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

JaCoCo code coverage report - 'spark-cobol'

Overall Project 82.15% -0.69% 🍏
Files changed 74.26%

File Coverage
DefaultSource.scala 93.69% 🍏
CobolParametersValidator.scala 78.32% -1.64% 🍏
BasicRecordCombiner.scala 58.59% -14.93%

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/VariableLengthEbcdicWriterSuite.scala (1)

85-100: Test intent and configuration are slightly out of sync.

The test name says “RDW adjustment,” but no rdw_adjustment option is set. Either rename the test or add that option so coverage matches the name.

Suggested minimal fix (rename for accuracy)
-    "write simple variable -record-length EBCDIC data files with little-endian RDWs and RDW adjustment" in {
+    "write simple variable-record-length EBCDIC data files with little-endian RDWs and record-length-includes-RDW" in {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/VariableLengthEbcdicWriterSuite.scala`
around lines 85 - 100, The test name claims “RDW adjustment” but the write
options do not set rdw_adjustment; either update the test description string in
the test case starting with "write simple variable -record-length EBCDIC data
files with little-endian RDWs and RDW adjustment" to remove the “RDW adjustment”
phrase, or add the missing writer option by calling .option("rdw_adjustment",
"<value>") on the DataFrameWriter (alongside .option("is_rdw_big_endian",
"false") and .option("is_rdw_part_of_record_length", "true")) so the
configuration matches the test name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala`:
- Around line 82-95: Validate recordLength computed from
cobolSchema.getRecordSize + adjustment1 + adjustment2 before writing into ar:
ensure it is non-negative and within allowed range for the chosen header format
(if isRdwBigEndian then recordLength must be <= 0xFFFF; if little-endian then
must be within 0..0xFFFFFFFF), and throw a clear IllegalArgumentException (or
similar) referencing recordLength and the offending adjustments when the check
fails. Do this validation in the method that contains the ar[] writes (the block
using recordLength and isRdwBigEndian) so serialization never writes
truncated/negative RDW values.

---

Nitpick comments:
In
`@spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/VariableLengthEbcdicWriterSuite.scala`:
- Around line 85-100: The test name claims “RDW adjustment” but the write
options do not set rdw_adjustment; either update the test description string in
the test case starting with "write simple variable -record-length EBCDIC data
files with little-endian RDWs and RDW adjustment" to remove the “RDW adjustment”
phrase, or add the missing writer option by calling .option("rdw_adjustment",
"<value>") on the DataFrameWriter (alongside .option("is_rdw_big_endian",
"false") and .option("is_rdw_part_of_record_length", "true")) so the
configuration matches the test name.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 307daf5 and 84bcc54.

📒 Files selected for processing (6)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/Copybook.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/recordformats/RecordFormat.scala
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/parameters/CobolParametersValidator.scala
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/VariableLengthEbcdicWriterSuite.scala

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala (1)

81-93: Move RDW validation outside the map for efficiency.

The RDW length computation and validation depend only on cobolSchema and readerParameters, which are constant across all rows. Performing this validation inside df.rdd.map repeats the same check for every row.

♻️ Suggested refactor
     val startOffset = if (hasRdw) 4 else 0
 
+    val recordLength = if (hasRdw) {
+      val recordLengthLong = cobolSchema.getRecordSize.toLong + adjustment1.toLong + adjustment2.toLong
+      if (recordLengthLong < 0) {
+        throw new IllegalArgumentException(
+          s"Invalid RDW length $recordLengthLong. Check 'is_rdw_part_of_record_length' and 'rdw_adjustment'."
+        )
+      }
+      if (isRdwBigEndian && recordLengthLong > 0xFFFFL) {
+        throw new IllegalArgumentException(
+          s"RDW length $recordLengthLong exceeds 65535 and cannot be encoded in big-endian mode."
+        )
+      }
+      recordLengthLong.toInt
+    } else 0
+
     df.rdd.map { row =>
       val ar = new Array[Byte](size)
 
       if (hasRdw) {
-        val recordLengthLong = cobolSchema.getRecordSize.toLong + adjustment1.toLong + adjustment2.toLong
-        if (recordLengthLong < 0) {
-          throw new IllegalArgumentException(
-            s"Invalid RDW length $recordLengthLong. Check 'is_rdw_part_of_record_length' and 'rdw_adjustment'."
-          )
-        }
-        if (isRdwBigEndian && recordLengthLong > 0xFFFFL) {
-          throw new IllegalArgumentException(
-            s"RDW length $recordLengthLong exceeds 65535 and cannot be encoded in big-endian mode."
-          )
-        }
-        val recordLength = recordLengthLong.toInt
         if (isRdwBigEndian) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala`
around lines 81 - 93, The RDW length calculation and validations currently
inside the df.rdd.map should be hoisted out and performed once before the map:
compute recordLengthLong = cobolSchema.getRecordSize.toLong + adjustment1.toLong
+ adjustment2.toLong, validate it (throw on negative or on >0xFFFFL when
isRdwBigEndian), then set recordLength = recordLengthLong.toInt and capture
recordLength (and any flags like hasRdw) in local vals used inside
BasicRecordCombiner’s mapping logic so the per-row map uses the precomputed
recordLength instead of recalculating and revalidating for every row.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala`:
- Line 65: Update the RDW detection in BasicRecordCombiner by expanding the
hasRdw condition to include RecordFormat.VariableBlock; specifically modify the
logic that computes hasRdw from checking only readerParameters.recordFormat ==
RecordFormat.VariableLength to also consider readerParameters.recordFormat ==
RecordFormat.VariableBlock so VariableBlock files get RDW headers just like
VariableLength files (look for the hasRdw variable and the
readerParameters.recordFormat / RecordFormat enum references).

---

Nitpick comments:
In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala`:
- Around line 81-93: The RDW length calculation and validations currently inside
the df.rdd.map should be hoisted out and performed once before the map: compute
recordLengthLong = cobolSchema.getRecordSize.toLong + adjustment1.toLong +
adjustment2.toLong, validate it (throw on negative or on >0xFFFFL when
isRdwBigEndian), then set recordLength = recordLengthLong.toInt and capture
recordLength (and any flags like hasRdw) in local vals used inside
BasicRecordCombiner’s mapping logic so the per-row map uses the precomputed
recordLength instead of recalculating and revalidating for every row.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84bcc54 and 39013cd.

📒 Files selected for processing (2)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/VariableLengthEbcdicWriterSuite.scala
🚧 Files skipped from review as they are similar to previous changes (1)
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/VariableLengthEbcdicWriterSuite.scala

}

val size = cobolSchema.getRecordSize
val hasRdw = readerParameters.recordFormat == RecordFormat.VariableLength
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, find the RecordFormat enum definition
fd -e scala -x grep -l "enum.*RecordFormat\|object RecordFormat" {} \;

Repository: AbsaOSS/cobrix

Length of output: 151


🏁 Script executed:

#!/bin/bash
# Search for VariableBlock usage in the codebase
rg "VariableBlock" --type scala -B2 -A2

Repository: AbsaOSS/cobrix

Length of output: 13006


🏁 Script executed:

#!/bin/bash
# Check how RDW is handled in reader parameters across the codebase
rg "hasRdw|rdw" --type scala -i -B2 -A2 | head -100

Repository: AbsaOSS/cobrix

Length of output: 13042


🏁 Script executed:

cat -n spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala | sed -n '55,85p'

Repository: AbsaOSS/cobrix

Length of output: 1375


🏁 Script executed:

# Search for how VariableBlock is handled in BasicRecordCombiner context
rg "VariableBlock|VariableLength|hasRdw" spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala -B2 -A2

Repository: AbsaOSS/cobrix

Length of output: 681


🏁 Script executed:

# Check if there are other uses of RDW checks for VariableBlock vs VariableLength
rg "recordFormat.*==.*Variable" spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/ -A1

Repository: AbsaOSS/cobrix

Length of output: 372


Add VariableBlock to the RDW check.

Line 65 should also check for RecordFormat.VariableBlock since VB files use RDW headers for individual records within blocks, similar to V format files. Change:

val hasRdw = readerParameters.recordFormat == RecordFormat.VariableLength

to:

val hasRdw = readerParameters.recordFormat == RecordFormat.VariableLength || readerParameters.recordFormat == RecordFormat.VariableBlock

Without this, VariableBlock files will be written without proper RDW headers, causing downstream reading failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala`
at line 65, Update the RDW detection in BasicRecordCombiner by expanding the
hasRdw condition to include RecordFormat.VariableBlock; specifically modify the
logic that computes hasRdw from checking only readerParameters.recordFormat ==
RecordFormat.VariableLength to also consider readerParameters.recordFormat ==
RecordFormat.VariableBlock so VariableBlock files get RDW headers just like
VariableLength files (look for the hasRdw variable and the
readerParameters.recordFormat / RecordFormat enum references).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not applicable here. The coborl writer is not going to support variable-block headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yruslan, thank you for the clarification! I understand that variable-block format support is intentionally not included in the writer implementation. I've noted this design decision for future reference.


✏️ Learnings added
Learnt from: yruslan
Repo: AbsaOSS/cobrix PR: 827
File: spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala:65-65
Timestamp: 2026-03-03T08:34:52.931Z
Learning: In the COBOL writer (BasicRecordCombiner), variable-block (VB) format is not supported for writing. Only VariableLength format uses RDW headers in the writer; VariableBlock format will not be implemented.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: yruslan
Repo: AbsaOSS/cobrix PR: 799
File: cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/impl/StreamProcessor.scala:83-103
Timestamp: 2025-11-13T14:28:55.864Z
Learning: In Cobrix, RDW (Record Descriptor Word) length field contains only the payload size, not including the 4-byte RDW header itself. This is a historical convention. When reading files with RDW that includes the header size, users should use `option("rdw_adjustment", "-4")`.

Learnt from: yruslan
Repo: AbsaOSS/cobrix PR: 820
File: cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala:435-436
Timestamp: 2026-02-11T08:01:53.018Z
Learning: In `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala`, the `extractHierarchicalRecord` method hardcodes `generateCorruptedFields = false` because validation in `CobolParametersParser` throws an `IllegalArgumentException` when `generate_corrupted_fields` is used with segment-children (hierarchical data), preventing users from enabling this unsupported combination.

Learnt from: yruslan
Repo: AbsaOSS/cobrix PR: 820
File: cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala:50-50
Timestamp: 2026-02-11T08:15:51.206Z
Learning: Rename the field from _corrupted_fields to _corrupt_fields to align with Apache Spark's naming convention (e.g., _corrupt_record). Apply this change consistently across the codebase where the field is defined or referenced (including cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala) and update any tests or serialization logic that rely on the old name.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala (1)

65-65: ⚠️ Potential issue | 🟠 Major

Handle VariableBlock explicitly in RDW detection (or reject it upstream).

hasRdw only matches RecordFormat.VariableLength. If VariableBlock is allowed by writer validation, records from this combiner will miss RDW handling.

💡 Proposed fix
-    val hasRdw = readerParameters.recordFormat == RecordFormat.VariableLength
+    val hasRdw = readerParameters.recordFormat == RecordFormat.VariableLength ||
+      readerParameters.recordFormat == RecordFormat.VariableBlock

To verify whether this is already blocked upstream, run:

#!/bin/bash
# Verify writer-side validation and local RDW gating for VariableBlock.
fd 'CobolParametersValidator.scala' -e scala -x rg -nP -C3 'VariableBlock|VariableLength|recordFormat|rdw|writer' {}
rg -nP --type scala -C3 'hasRdw|RecordFormat\.VariableLength|RecordFormat\.VariableBlock' spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala`
at line 65, The RDW detection in BasicRecordCombiner currently sets hasRdw only
when readerParameters.recordFormat == RecordFormat.VariableLength, which misses
RecordFormat.VariableBlock; update the logic handling hasRdw (or add an explicit
guard) to treat VariableBlock appropriately (either include
RecordFormat.VariableBlock in the RDW-enabled branch or assert/reject
VariableBlock upstream during writer validation), and ensure any downstream
RDW-dependent code paths reference the updated hasRdw flag or the new explicit
check so VariableBlock records are either processed correctly or blocked with a
clear validation error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala`:
- Around line 78-90: The code in BasicRecordCombiner computes recordLengthLong
and then narrows it to an Int (recordLength), which can silently overflow; add a
guard before the toInt conversion that throws an IllegalArgumentException if
recordLengthLong > Int.MaxValue (and keep the existing big-endian > 65535
check), so that when isRdwBigEndian is false and the little-endian path uses
4-byte RDW it still fails fast for lengths exceeding Int.MaxValue instead of
producing a wrapped recordLength value.

---

Duplicate comments:
In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala`:
- Line 65: The RDW detection in BasicRecordCombiner currently sets hasRdw only
when readerParameters.recordFormat == RecordFormat.VariableLength, which misses
RecordFormat.VariableBlock; update the logic handling hasRdw (or add an explicit
guard) to treat VariableBlock appropriately (either include
RecordFormat.VariableBlock in the RDW-enabled branch or assert/reject
VariableBlock upstream during writer validation), and ensure any downstream
RDW-dependent code paths reference the updated hasRdw flag or the new explicit
check so VariableBlock records are either processed correctly or blocked with a
clear validation error.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39013cd and 493c763.

📒 Files selected for processing (1)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/BasicRecordCombiner.scala

@yruslan yruslan merged commit 5e31b62 into master Mar 3, 2026
6 of 7 checks passed
@yruslan yruslan deleted the feature/826-add-support-vrl-writes branch March 3, 2026 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for writing variable-record-length files with RDWs

1 participant