Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-353: Recycle compressors in parquet write path #282

Closed
wants to merge 3 commits into from

Conversation

nitin2goyal
Copy link

No description provided.

@liancheng
Copy link
Contributor

How can we retest this PR? The last failure should have been fixed by #269.

@nitin2goyal
Copy link
Author

Commited a dummy checkin such that tests re-run automatically

@liancheng
Copy link
Contributor

+1

@nitin2goyal Thanks for the fix :)

@julienledem
Copy link
Member

This code has changed. you may want to rebase.

@@ -171,6 +171,8 @@ private final STATE error() throws IOException {

private STATE state = STATE.NOT_STARTED;

private final CodecFactory codecFactory;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this is not the right place to put this as this class does not use the codecFactory.
it just creates it and makes it available. I'd rather have private members not accessible to the outside to respect encapsulation and layering of the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@julienledem
Copy link
Member

Thanks for looking into this. A lot of the codec factory code has changed recently.

@rdblue
Copy link
Contributor

rdblue commented Dec 1, 2015

It looks like the purpose of this is to tie the lifecycle of the CodecFactory to ParquetFileWriter. That seems like a good idea (resources are freed when the file is closed), but ends up exposing internals (as Julien noted) because we don't delegate compression to ParquetFileWriter.

I think it would be cleaner to attach the lifecycle of the codec factory to the record writer for MR, which has a close method for clean up. The CodecFactory would be instantiated in the writer's constructor instead of passing in a compressor. This should also call release in the ParquetWriter#close method to avoid the same leak when not writing via MR.

@rdblue
Copy link
Contributor

rdblue commented Dec 1, 2015

To get this into the pending release, I've implemented the changes I recommended in #295.

@asfgit asfgit closed this in 4916903 Dec 11, 2015
piyushnarang pushed a commit to piyushnarang/parquet-mr that referenced this pull request Jun 15, 2016
This updates the use of CodecFactory in the output format and writer
classes so that its lifecycle is tied to ParquetWriter and
ParquetRecordWriter. When those classes are closed, the resources held
by the CodecFactory associated with the instance are released.

This is an alternative to and closes apache#282.

Author: Ryan Blue <blue@apache.org>

Closes apache#295 from rdblue/PARQUET-353-release-compressor-resources and squashes the following commits:

a00f4b7 [Ryan Blue] PARQUET-353: Release compression resources.
rdblue added a commit to rdblue/parquet-mr that referenced this pull request Jul 13, 2016
This updates the use of CodecFactory in the output format and writer
classes so that its lifecycle is tied to ParquetWriter and
ParquetRecordWriter. When those classes are closed, the resources held
by the CodecFactory associated with the instance are released.

This is an alternative to and closes apache#282.

Author: Ryan Blue <blue@apache.org>

Closes apache#295 from rdblue/PARQUET-353-release-compressor-resources and squashes the following commits:

a00f4b7 [Ryan Blue] PARQUET-353: Release compression resources.

Conflicts:
	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java
	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordWriter.java
	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java
Resolution:
    Minor changes due to an argument relocation in the ByteBuffer patch that wasn't backported
rdblue added a commit to rdblue/parquet-mr that referenced this pull request Jan 6, 2017
This updates the use of CodecFactory in the output format and writer
classes so that its lifecycle is tied to ParquetWriter and
ParquetRecordWriter. When those classes are closed, the resources held
by the CodecFactory associated with the instance are released.

This is an alternative to and closes apache#282.

Author: Ryan Blue <blue@apache.org>

Closes apache#295 from rdblue/PARQUET-353-release-compressor-resources and squashes the following commits:

a00f4b7 [Ryan Blue] PARQUET-353: Release compression resources.

Conflicts:
	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java
	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordWriter.java
	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java
Resolution:
    Minor changes due to an argument relocation in the ByteBuffer patch that wasn't backported
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants