Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CC-2515: Open temp Parquet files with OVERWRITE mode #363

Merged
merged 2 commits into from Aug 9, 2018

Conversation

rhauch
Copy link
Member

@rhauch rhauch commented Aug 9, 2018

Change the ParquetRecordWriterProvider class to open the AvroParquetWriter instance for each Parquet temp file with ‘overwrite’ mode. This required using the AvroParquetWriter.Builder class rather than the deprecated constructor, which did not have a way to specify the mode.

The Avro format's writer already uses an "overwrite" mode, so this brings the same behavior to the Parquet format.

Note that the connector "commits" the records written to a temp file by closing it and moving it to the final location, as part of the HDFS connector's write ahead log behavior. The change in this PR is useful when the connector restarts and there's already an existing Parquet temp file; without this change a FileAlreadyExistsException is thrown.

Change the `ParquetRecordWriterProvider` class to open the `AvroParquetWriter` instance for each Parquet file with ‘overwrite’ mode. This required using the `AvroParquetWriter.Builder` class rather than the deprecated constructor, which did not have a way to specify the mode.
@rhauch rhauch changed the title CC-2515: Open Parquet files with OVERWRITE mode CC-2515: Open temp Parquet files with OVERWRITE mode Aug 9, 2018
@rhauch rhauch requested a review from a team August 9, 2018 18:34
conf.getHadoopConfiguration()
);
writer = AvroParquetWriter.<GenericRecord>builder(path)
.withSchema(avroSchema)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the format in multiline chained calls is still 2 tabs in in connectors. Not a big deal though.

Copy link
Member

@kkonstantine kkonstantine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhauch rhauch merged commit 6be4c55 into confluentinc:4.0.x Aug 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants