New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-353: Recycle compressors in parquet write path #282
Conversation
How can we retest this PR? The last failure should have been fixed by #269. |
Commited a dummy checkin such that tests re-run automatically |
+1 @nitin2goyal Thanks for the fix :) |
This code has changed. you may want to rebase. |
@@ -171,6 +171,8 @@ private final STATE error() throws IOException { | |||
|
|||
private STATE state = STATE.NOT_STARTED; | |||
|
|||
private final CodecFactory codecFactory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this is not the right place to put this as this class does not use the codecFactory.
it just creates it and makes it available. I'd rather have private members not accessible to the outside to respect encapsulation and layering of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thanks for looking into this. A lot of the codec factory code has changed recently. |
It looks like the purpose of this is to tie the lifecycle of the CodecFactory to ParquetFileWriter. That seems like a good idea (resources are freed when the file is closed), but ends up exposing internals (as Julien noted) because we don't delegate compression to ParquetFileWriter. I think it would be cleaner to attach the lifecycle of the codec factory to the record writer for MR, which has a close method for clean up. The CodecFactory would be instantiated in the writer's constructor instead of passing in a compressor. This should also call |
To get this into the pending release, I've implemented the changes I recommended in #295. |
This updates the use of CodecFactory in the output format and writer classes so that its lifecycle is tied to ParquetWriter and ParquetRecordWriter. When those classes are closed, the resources held by the CodecFactory associated with the instance are released. This is an alternative to and closes apache#282. Author: Ryan Blue <blue@apache.org> Closes apache#295 from rdblue/PARQUET-353-release-compressor-resources and squashes the following commits: a00f4b7 [Ryan Blue] PARQUET-353: Release compression resources.
This updates the use of CodecFactory in the output format and writer classes so that its lifecycle is tied to ParquetWriter and ParquetRecordWriter. When those classes are closed, the resources held by the CodecFactory associated with the instance are released. This is an alternative to and closes apache#282. Author: Ryan Blue <blue@apache.org> Closes apache#295 from rdblue/PARQUET-353-release-compressor-resources and squashes the following commits: a00f4b7 [Ryan Blue] PARQUET-353: Release compression resources. Conflicts: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordWriter.java parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java Resolution: Minor changes due to an argument relocation in the ByteBuffer patch that wasn't backported
This updates the use of CodecFactory in the output format and writer classes so that its lifecycle is tied to ParquetWriter and ParquetRecordWriter. When those classes are closed, the resources held by the CodecFactory associated with the instance are released. This is an alternative to and closes apache#282. Author: Ryan Blue <blue@apache.org> Closes apache#295 from rdblue/PARQUET-353-release-compressor-resources and squashes the following commits: a00f4b7 [Ryan Blue] PARQUET-353: Release compression resources. Conflicts: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordWriter.java parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java Resolution: Minor changes due to an argument relocation in the ByteBuffer patch that wasn't backported
No description provided.