Skip to content

fix: close the file writers properly in fail cases#18776

Open
danny0405 wants to merge 3 commits into
apache:masterfrom
danny0405:close-handles
Open

fix: close the file writers properly in fail cases#18776
danny0405 wants to merge 3 commits into
apache:masterfrom
danny0405:close-handles

Conversation

@danny0405
Copy link
Copy Markdown
Contributor

@danny0405 danny0405 commented May 19, 2026

Describe the issue this Pull Request addresses

Several writer construction and write-handle failure paths can leave underlying file writers or output streams open when an exception is thrown before the normal close path runs. This affects create, append, merge, binary copy, stream writer, HFile bootstrap index, Parquet utility, and Spark helper code paths.

This PR makes those failure paths close the relevant writer resources, preserving the original failure while adding close failures as suppressed exceptions where applicable. It does not change storage format, public APIs, or write configuration defaults.

Summary and Changelog

This change improves resource cleanup for file writers and output streams when write initialization or write/close logic fails, and consolidates repeated close-with-suppression logic in client write handles.

Commit 1: fix: close the file writers properly in fail cases (0f4784f2c99)

  • Close HoodieFileWriter/log writer resources when BaseCreateHandle, HoodieAppendHandle, and HoodieWriteMergeHandle encounter fail-fast write or close errors.
  • Move HoodieSortedMergeHandle pending-record writing into writeIncomingRecords() so the base merge close path can handle writer cleanup consistently.
  • Close HoodieBinaryCopyHandle copier on binary-copy failures.
  • Close output streams if Parquet/HFile stream writer construction fails in Spark, Flink, Hadoop, and HFile writer paths.
  • Use try-with-resources in ParquetUtils.serializeRecordsToLogBlock.
  • Ensure SparkHelpers closes HoodieAvroParquetWriter in a finally block.
  • Harden HFileBootstrapIndexWriter begin/close handling so partially initialized writers and streams are closed.
  • Harden HoodieParquetBinaryCopyBase initialization, close, and column writer cleanup paths.
  • Add tests for create-handle writer cleanup and Parquet binary-copy close behavior.

Commit 2: add common utils (a7da5bc4d5)

  • Add org.apache.hudi.util.AutoClosableUtils in hudi-client-common for closing AutoCloseable resources while preserving suppressed exceptions.
  • Reuse the utility in BaseCreateHandle, HoodieAppendHandle, and HoodieWriteMergeHandle.
  • Update testFileWriterClosedWhenDoWriteFails to explicitly run with hoodie.write.ignore.failed=false, matching the fail-fast path under test.

Impact

No public API, configuration, storage format, or compatibility changes are introduced. The user-visible behavior is safer cleanup on exceptional writer paths, reducing leaked file writers/output streams and preserving close failures as suppressed exceptions where applicable.

There is no expected performance impact on successful write paths beyond small helper calls during close.

Risk Level

medium

The changes touch core write-handle close paths and multiple writer initialization paths, so lifecycle regressions are possible if a writer has unusual close semantics. The risk is mitigated by keeping normal close behavior intact, nulling writer references after close attempts, and adding targeted tests for failure cleanup.

Validation run:

  • git diff --check
  • mvn -pl hudi-client/hudi-client-common -am -DskipTests -DskipITs -Dcheckstyle.skip -Dspotbugs.skip compile
  • mvn -pl hudi-client/hudi-client-common -am -DskipITs -Dcheckstyle.skip -Dspotbugs.skip -Dtest=TestHoodieCreateHandle#testFileWriterClosedWhenDoWriteFails -Dsurefire.failIfNoSpecifiedTests=false test
  • mvn -pl hudi-hadoop-common -DskipITs -Dcheckstyle.skip -Dspotbugs.skip -Dtest=TestHoodieParquetBinaryCopyBaseSchemaEvolution test

Documentation Update

none

This is an internal resource-cleanup fix and utility refactor with no user-facing configuration, API, or behavior that requires documentation updates.

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions Bot added the size:L PR with lines of changes in (300, 1000] label May 19, 2026
Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for working on this! The PR tightens resource cleanup across the write-handle and stream-writer paths, and the new AutoClosableUtils helper makes the close-with-suppressed pattern reusable. A couple of edge cases worth double-checking in the inline comments around RuntimeException handling in the HFile bootstrap close path. Please take a look at any inline comments, and this should be ready for a Hudi committer or PMC member to take it from here. Two small naming/readability nits below — the class name typo and the unexplained reflection.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 55.34884% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.21%. Comparing base (cd2c8b8) to head (1ee5fbf).
⚠️ Report is 20 commits behind head on master.

Files with missing lines Patch % Lines
...e/hudi/parquet/io/HoodieParquetBinaryCopyBase.java 52.85% 27 Missing and 6 partials ⚠️
...otstrap/index/hfile/HFileBootstrapIndexWriter.java 48.48% 14 Missing and 3 partials ⚠️
.../java/org/apache/hudi/util/AutoCloseableUtils.java 39.13% 14 Missing ⚠️
...in/java/org/apache/hudi/io/HoodieAppendHandle.java 30.76% 9 Missing ⚠️
...ava/org/apache/hudi/io/HoodieBinaryCopyHandle.java 0.00% 9 Missing ⚠️
...ge/row/HoodieRowDataParquetOutputStreamWriter.java 40.00% 3 Missing ⚠️
...udi/io/storage/HoodieSparkParquetStreamWriter.java 78.57% 3 Missing ⚠️
...i/io/storage/hadoop/HoodieParquetStreamWriter.java 78.57% 3 Missing ⚠️
...va/org/apache/hudi/io/HoodieSortedMergeHandle.java 0.00% 2 Missing ⚠️
.../hudi/io/storage/hadoop/HoodieAvroHFileWriter.java 50.00% 1 Missing and 1 partial ⚠️
... and 1 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18776      +/-   ##
============================================
+ Coverage     68.15%   68.21%   +0.05%     
- Complexity    29149    29332     +183     
============================================
  Files          2521     2528       +7     
  Lines        141353   141939     +586     
  Branches      17549    17633      +84     
============================================
+ Hits          96341    96819     +478     
- Misses        37079    37144      +65     
- Partials       7933     7976      +43     
Flag Coverage Δ
common-and-other-modules 44.38% <21.86%> (-0.06%) ⬇️
hadoop-mr-java-client 44.86% <13.61%> (-0.12%) ⬇️
spark-client-hadoop-common 48.22% <38.53%> (-0.10%) ⬇️
spark-java-tests 48.84% <39.52%> (-0.09%) ⬇️
spark-scala-tests 44.92% <42.85%> (+0.07%) ⬆️
utilities 37.45% <26.19%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...main/java/org/apache/hudi/io/BaseCreateHandle.java 89.79% <100.00%> (+2.56%) ⬆️
...java/org/apache/hudi/common/util/ParquetUtils.java 91.08% <100.00%> (ø)
...scala/org/apache/spark/sql/hudi/SparkHelpers.scala 63.04% <100.00%> (ø)
...ava/org/apache/hudi/io/HoodieWriteMergeHandle.java 81.56% <91.66%> (+0.49%) ⬆️
...va/org/apache/hudi/io/HoodieSortedMergeHandle.java 0.00% <0.00%> (ø)
.../hudi/io/storage/hadoop/HoodieAvroHFileWriter.java 88.60% <50.00%> (-2.31%) ⬇️
...ge/row/HoodieRowDataParquetOutputStreamWriter.java 75.75% <40.00%> (-7.01%) ⬇️
...udi/io/storage/HoodieSparkParquetStreamWriter.java 72.72% <78.57%> (-6.59%) ⬇️
...i/io/storage/hadoop/HoodieParquetStreamWriter.java 72.72% <78.57%> (-6.59%) ⬇️
...in/java/org/apache/hudi/io/HoodieAppendHandle.java 76.08% <30.76%> (-1.79%) ⬇️
... and 4 more

... and 37 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for working on this! This PR consistently propagates close failures as suppressed exceptions and tightens cleanup across many write handles and stream-writer constructors. A few interaction-level concerns to double-check in the inline comments — mostly around what state callers see after the new "close on failure" paths run and where the new try/finally regions don't quite cover all failure modes. Please take a look at any inline comments, and this should be ready for a Hudi committer or PMC member to take it from here. A couple of consistency suggestions below around reusing the new close helper.

} catch (Throwable t) {
log.error("Error writing record " + record, t);
if (!config.getIgnoreWriteFailed()) {
closeLogWriterQuietly(t);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Closing writer here makes the handle inconsistent: recordList may still hold buffered records from earlier successful writes, and if a caller (or finally-block) later invokes close(), appendDataAndDeleteBlocks will hit writer.appendBlocks(blocks) with a null writer, then the catch block at line 519 will dereference writer.getLogFile().getPath() and throw a second NPE that replaces the original exception. Could we either skip closing the writer here, or also clear/flush recordList so the subsequent close() no-ops?

- AI-generated; verify before applying. React 👍/👎 to flag quality.


private void closeWriterAfterFailure(Throwable failure) {
try {
this.writer.close();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This only catches IOException — if writer.close() throws a RuntimeException, it propagates out of closeWriterAfterFailure and replaces the original failure passed in via failure. Could we route this through AutoCloseableUtils.closeQuietlyWithSuppressed(writer, failure) for consistency with the other handles?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

this.writer = new HFileWriterImpl(context, outputStream);
} finally {
if (this.writer == null) {
closeQuietly(outputStream);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The try { ... } finally { if (this.writer == null) closeQuietly(outputStream); } only protects the HFileWriterImpl constructor. If writer.appendFileInfo(...) on line 116-117 throws (or any later constructor line), this.writer is non-null but the constructor exits, leaving the writer (and its underlying output stream) leaked since the caller never gets a reference. Could the try region cover the rest of the constructor, or close this.writer itself on failure?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

}

private void closeWriterAfterFailure(Throwable failure) {
try {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: this handle re-implements the same close-and-suppress logic instead of reusing the new AutoCloseableUtils.closeQuietlyWithSuppressed that the other handles in this PR adopted — could you switch to the shared util for consistency?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

// ParquetFileWriter does not implement Closeable in all versions, so attempt close() reflectively.
closeMethod = parquetFileWriter.getClass().getMethod("close");
} catch (NoSuchMethodException e) {
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: the comment explains what the reflection does but not why — could you call out which Parquet versions are missing close() here? Otherwise a future reader is likely to delete the reflection thinking it's gratuitous.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

throw e;
} finally {
keyToNewRecords = null;
writtenRecordKeys = null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: keyToNewRecords / writtenRecordKeys are already nulled inside the try-block on the success path — duplicating the assignment in finally is harmless but a bit noisy. Could the finally just rely on the try-block, or alternatively drop the success-path assignment and keep only the finally?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

return record.prependMetaFields(schema, targetSchema, metadataValues, prop);
}

private void closeFileWriterQuietly(Throwable failure) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you intentionally not call markClosed() here or its an oversight.

}

private void closeWriterAfterFailure(Throwable failure) {
try {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

log.info("Schema evolution enabled for binary copy: {}", schemaEvolutionEnabled);
records = this.writer.binaryCopy(inputFiles, Collections.singletonList(path), writeScheMessageType, schemaEvolutionEnabled);
} catch (IOException e) {
closeWriterAfterFailure(e);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about closeWriterQuietly to be in line w/ other naming in this patch


private void closeWriterAfterFailure(Throwable failure) {
try {
this.writer.close();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

/**
* Utility methods for closing {@link AutoCloseable} resources.
*/
public final class AutoCloseableUtils {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have UTs for this class?

.withConf(parquetConfig.getStorageConf().unwrapAs(Configuration.class))
.build();
} catch (IOException | RuntimeException e) {
closeQuietly(outputStream);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets standardize all naming.
closeOutputStreamQuietly

}
failure.addSuppressed(e);
}
return failure;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we set the writer = null at the end of the method

try {
writer.start();
} catch (Exception e) {
closeParquetFileWriterQuietly(writer);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we set the writer = null w/n closeParquetFileWriterQuietly

cWriter.writeNull(rlvl, dlvl - 1);
}
} else {
cWriter.writeNull(rlvl, dlvl); // 因为repeatition level没有重复所以后面都是以0在第一层,definition level是字段path的第0层
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry. can we avoid comments in non english language. I understand it was not from this patch.
but lets fix it.

} else {
throw closeException;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we set cWriter = null in the end?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants