CC-2515: Open temp Parquet files with OVERWRITE mode #363
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change the
ParquetRecordWriterProvider
class to open theAvroParquetWriter
instance for each Parquet temp file with ‘overwrite’ mode. This required using theAvroParquetWriter.Builder
class rather than the deprecated constructor, which did not have a way to specify the mode.The Avro format's writer already uses an "overwrite" mode, so this brings the same behavior to the Parquet format.
Note that the connector "commits" the records written to a temp file by closing it and moving it to the final location, as part of the HDFS connector's write ahead log behavior. The change in this PR is useful when the connector restarts and there's already an existing Parquet temp file; without this change a
FileAlreadyExistsException
is thrown.