fix: close the file writers properly in fail cases#18776
Conversation
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for working on this! The PR tightens resource cleanup across the write-handle and stream-writer paths, and the new AutoClosableUtils helper makes the close-with-suppressed pattern reusable. A couple of edge cases worth double-checking in the inline comments around RuntimeException handling in the HFile bootstrap close path. Please take a look at any inline comments, and this should be ready for a Hudi committer or PMC member to take it from here. Two small naming/readability nits below — the class name typo and the unexplained reflection.
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for working on this! This PR consistently propagates close failures as suppressed exceptions and tightens cleanup across many write handles and stream-writer constructors. A few interaction-level concerns to double-check in the inline comments — mostly around what state callers see after the new "close on failure" paths run and where the new try/finally regions don't quite cover all failure modes. Please take a look at any inline comments, and this should be ready for a Hudi committer or PMC member to take it from here. A couple of consistency suggestions below around reusing the new close helper.
| } catch (Throwable t) { | ||
| log.error("Error writing record " + record, t); | ||
| if (!config.getIgnoreWriteFailed()) { | ||
| closeLogWriterQuietly(t); |
There was a problem hiding this comment.
🤖 Closing writer here makes the handle inconsistent: recordList may still hold buffered records from earlier successful writes, and if a caller (or finally-block) later invokes close(), appendDataAndDeleteBlocks will hit writer.appendBlocks(blocks) with a null writer, then the catch block at line 519 will dereference writer.getLogFile().getPath() and throw a second NPE that replaces the original exception. Could we either skip closing the writer here, or also clear/flush recordList so the subsequent close() no-ops?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
|
|
||
| private void closeWriterAfterFailure(Throwable failure) { | ||
| try { | ||
| this.writer.close(); |
There was a problem hiding this comment.
🤖 This only catches IOException — if writer.close() throws a RuntimeException, it propagates out of closeWriterAfterFailure and replaces the original failure passed in via failure. Could we route this through AutoCloseableUtils.closeQuietlyWithSuppressed(writer, failure) for consistency with the other handles?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
| this.writer = new HFileWriterImpl(context, outputStream); | ||
| } finally { | ||
| if (this.writer == null) { | ||
| closeQuietly(outputStream); |
There was a problem hiding this comment.
🤖 The try { ... } finally { if (this.writer == null) closeQuietly(outputStream); } only protects the HFileWriterImpl constructor. If writer.appendFileInfo(...) on line 116-117 throws (or any later constructor line), this.writer is non-null but the constructor exits, leaving the writer (and its underlying output stream) leaked since the caller never gets a reference. Could the try region cover the rest of the constructor, or close this.writer itself on failure?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
| } | ||
|
|
||
| private void closeWriterAfterFailure(Throwable failure) { | ||
| try { |
There was a problem hiding this comment.
🤖 nit: this handle re-implements the same close-and-suppress logic instead of reusing the new AutoCloseableUtils.closeQuietlyWithSuppressed that the other handles in this PR adopted — could you switch to the shared util for consistency?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
| // ParquetFileWriter does not implement Closeable in all versions, so attempt close() reflectively. | ||
| closeMethod = parquetFileWriter.getClass().getMethod("close"); | ||
| } catch (NoSuchMethodException e) { | ||
| return; |
There was a problem hiding this comment.
🤖 nit: the comment explains what the reflection does but not why — could you call out which Parquet versions are missing close() here? Otherwise a future reader is likely to delete the reflection thinking it's gratuitous.
- AI-generated; verify before applying. React 👍/👎 to flag quality.
| throw e; | ||
| } finally { | ||
| keyToNewRecords = null; | ||
| writtenRecordKeys = null; |
There was a problem hiding this comment.
🤖 nit: keyToNewRecords / writtenRecordKeys are already nulled inside the try-block on the success path — duplicating the assignment in finally is harmless but a bit noisy. Could the finally just rely on the try-block, or alternatively drop the success-path assignment and keep only the finally?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
| return record.prependMetaFields(schema, targetSchema, metadataValues, prop); | ||
| } | ||
|
|
||
| private void closeFileWriterQuietly(Throwable failure) { |
There was a problem hiding this comment.
did you intentionally not call markClosed() here or its an oversight.
| } | ||
|
|
||
| private void closeWriterAfterFailure(Throwable failure) { | ||
| try { |
| log.info("Schema evolution enabled for binary copy: {}", schemaEvolutionEnabled); | ||
| records = this.writer.binaryCopy(inputFiles, Collections.singletonList(path), writeScheMessageType, schemaEvolutionEnabled); | ||
| } catch (IOException e) { | ||
| closeWriterAfterFailure(e); |
There was a problem hiding this comment.
how about closeWriterQuietly to be in line w/ other naming in this patch
|
|
||
| private void closeWriterAfterFailure(Throwable failure) { | ||
| try { | ||
| this.writer.close(); |
| /** | ||
| * Utility methods for closing {@link AutoCloseable} resources. | ||
| */ | ||
| public final class AutoCloseableUtils { |
There was a problem hiding this comment.
do we have UTs for this class?
| .withConf(parquetConfig.getStorageConf().unwrapAs(Configuration.class)) | ||
| .build(); | ||
| } catch (IOException | RuntimeException e) { | ||
| closeQuietly(outputStream); |
There was a problem hiding this comment.
lets standardize all naming.
closeOutputStreamQuietly
| } | ||
| failure.addSuppressed(e); | ||
| } | ||
| return failure; |
There was a problem hiding this comment.
shouldn't we set the writer = null at the end of the method
| try { | ||
| writer.start(); | ||
| } catch (Exception e) { | ||
| closeParquetFileWriterQuietly(writer); |
There was a problem hiding this comment.
shouldn't we set the writer = null w/n closeParquetFileWriterQuietly
| cWriter.writeNull(rlvl, dlvl - 1); | ||
| } | ||
| } else { | ||
| cWriter.writeNull(rlvl, dlvl); // 因为repeatition level没有重复所以后面都是以0在第一层,definition level是字段path的第0层 |
There was a problem hiding this comment.
sorry. can we avoid comments in non english language. I understand it was not from this patch.
but lets fix it.
| } else { | ||
| throw closeException; | ||
| } | ||
| } |
There was a problem hiding this comment.
should we set cWriter = null in the end?
Describe the issue this Pull Request addresses
Several writer construction and write-handle failure paths can leave underlying file writers or output streams open when an exception is thrown before the normal close path runs. This affects create, append, merge, binary copy, stream writer, HFile bootstrap index, Parquet utility, and Spark helper code paths.
This PR makes those failure paths close the relevant writer resources, preserving the original failure while adding close failures as suppressed exceptions where applicable. It does not change storage format, public APIs, or write configuration defaults.
Summary and Changelog
This change improves resource cleanup for file writers and output streams when write initialization or write/close logic fails, and consolidates repeated close-with-suppression logic in client write handles.
Commit 1: fix: close the file writers properly in fail cases (
0f4784f2c99)HoodieFileWriter/log writer resources whenBaseCreateHandle,HoodieAppendHandle, andHoodieWriteMergeHandleencounter fail-fast write or close errors.HoodieSortedMergeHandlepending-record writing intowriteIncomingRecords()so the base merge close path can handle writer cleanup consistently.HoodieBinaryCopyHandlecopier on binary-copy failures.ParquetUtils.serializeRecordsToLogBlock.SparkHelpersclosesHoodieAvroParquetWriterin afinallyblock.HFileBootstrapIndexWriterbegin/close handling so partially initialized writers and streams are closed.HoodieParquetBinaryCopyBaseinitialization, close, and column writer cleanup paths.Commit 2: add common utils (
a7da5bc4d5)org.apache.hudi.util.AutoClosableUtilsinhudi-client-commonfor closingAutoCloseableresources while preserving suppressed exceptions.BaseCreateHandle,HoodieAppendHandle, andHoodieWriteMergeHandle.testFileWriterClosedWhenDoWriteFailsto explicitly run withhoodie.write.ignore.failed=false, matching the fail-fast path under test.Impact
No public API, configuration, storage format, or compatibility changes are introduced. The user-visible behavior is safer cleanup on exceptional writer paths, reducing leaked file writers/output streams and preserving close failures as suppressed exceptions where applicable.
There is no expected performance impact on successful write paths beyond small helper calls during close.
Risk Level
medium
The changes touch core write-handle close paths and multiple writer initialization paths, so lifecycle regressions are possible if a writer has unusual close semantics. The risk is mitigated by keeping normal close behavior intact, nulling writer references after close attempts, and adding targeted tests for failure cleanup.
Validation run:
git diff --checkmvn -pl hudi-client/hudi-client-common -am -DskipTests -DskipITs -Dcheckstyle.skip -Dspotbugs.skip compilemvn -pl hudi-client/hudi-client-common -am -DskipITs -Dcheckstyle.skip -Dspotbugs.skip -Dtest=TestHoodieCreateHandle#testFileWriterClosedWhenDoWriteFails -Dsurefire.failIfNoSpecifiedTests=false testmvn -pl hudi-hadoop-common -DskipITs -Dcheckstyle.skip -Dspotbugs.skip -Dtest=TestHoodieParquetBinaryCopyBaseSchemaEvolution testDocumentation Update
none
This is an internal resource-cleanup fix and utility refactor with no user-facing configuration, API, or behavior that requires documentation updates.
Contributor's checklist