#795 Do not create a redundant data stream. #800

yruslan · 2025-11-13T14:42:31Z

Summary by CodeRabbit

Refactor
- Optimized stream processing by eliminating unnecessary stream duplication during in-place operations, reducing memory overhead.
Documentation
- Added clarifying comments documenting internal processing behavior during EBCDIC conversions.

coderabbitai · 2025-11-13T14:42:42Z

Walkthrough

The pull request eliminates stream duplication in COBOL in-place processing by passing the original input stream directly to the stream processor instead of creating a copy. Additionally, explanatory comments are added to document multi-invocation behavior and redundant options logging in COBOL parameter handling.

Changes

Cohort / File(s)	Summary
Stream copying optimization `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/impl/CobolProcessorInPlace.scala`	Removed `dataStream = inputStream.copyStream()` and updated `StreamProcessor.processStreamInPlace()` call to receive original `inputStream`. Updated resource cleanup to close `inputStream` instead of `dataStream`.
Documentation comments `spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/SparkCobolProcessor.scala`	Added explanatory comments inside `getCobolParameters()` documenting multi-invocation behavior during EBCDIC processing, redundant options logging, and conditional logging behavior when `ignoreRedundantOptions` is enabled.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

CobolProcessorInPlace.scala: Verify that passing the original stream instead of a copy does not introduce unintended side effects on stream state or buffering behavior. Confirm that closing the original stream at the end of processing is appropriate for all call paths.
SparkCobolProcessor.scala: Straightforward documentation addition; minimal review burden.

Poem

🐰 A stream once copied, now flows free,
No duplicate dance, just unity!
Comments light the way so clear,
What happens when invoked twice, my dear.
Simpler paths and lighter chains,
Efficiency flows through our veins! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly describes the main change: removing redundant stream copying in CobolProcessorInPlace.scala, which aligns perfectly with the primary modification in the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fixup/795-changes-forgot-to-commit

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9cb4a42 and 163ec6f.

📒 Files selected for processing (2)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/impl/CobolProcessorInPlace.scala (1 hunks)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/SparkCobolProcessor.scala (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/impl/CobolProcessorInPlace.scala (1)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/VarLenNestedReader.scala (1)

recordExtractor (54-89)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
GitHub Check: Spark 2.4.8 on Scala 2.11.12
GitHub Check: Spark 3.4.4 on Scala 2.12.20
GitHub Check: Spark 3.5.7 on Scala 2.12.20
GitHub Check: Spark 3.5.7 on Scala 2.13.17

🔇 Additional comments (2)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/SparkCobolProcessor.scala (1)

201-202: Clear documentation of method invocation and parameter semantics.

The added comments effectively document that getCobolParameters may be invoked multiple times during EBCDIC processing and explain the purpose of the ignoreRedundantOptions parameter. The comments correctly indicate that when ignoreRedundantOptions=true (used for copybook parsing), redundant option logging can be suppressed—this aligns with line 203 where the parameter controls the negated boolean passed to CobolParametersParser.parse().

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/impl/CobolProcessorInPlace.scala (1)

49-59: LGTM! Efficient elimination of redundant stream copy.

The change correctly removes the intermediate stream copy and passes inputStream directly to the processor. The resource management is properly maintained with inputStream.close() in the finally block. This improves memory efficiency by avoiding unnecessary stream duplication while preserving the original error-handling semantics.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-11-13T14:44:52Z

JaCoCo code coverage report - 'cobol-parser'

Overall Project	91.52%	🍏
Files changed	100%	🍏

File	Coverage
CobolProcessorInPlace.scala	100%	🍏

github-actions · 2025-11-13T14:44:54Z

JaCoCo code coverage report - 'spark-cobol'

File	Coverage [96.68%]	🍏
SparkCobolProcessor.scala	96.68%	🍏

Total Project Coverage	81.1%	🍏

#795 Do not create a redundant data stream.

163ec6f

yruslan merged commit e85b90d into master Nov 13, 2025
7 checks passed

yruslan deleted the fixup/795-changes-forgot-to-commit branch November 13, 2025 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#795 Do not create a redundant data stream. #800

#795 Do not create a redundant data stream. #800

Uh oh!

yruslan commented Nov 13, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

#795 Do not create a redundant data stream. #800

#795 Do not create a redundant data stream. #800

Uh oh!

Conversation

yruslan commented Nov 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Nov 13, 2025

JaCoCo code coverage report - 'cobol-parser'

Uh oh!

github-actions bot commented Nov 13, 2025

JaCoCo code coverage report - 'spark-cobol'

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yruslan commented Nov 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 13, 2025 •

edited

Loading