Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2022: ZstdDecompressorStream should close zstdInputStream #889

Merged
merged 1 commit into from
Apr 19, 2021

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Apr 11, 2021

ZstdDecompressorStream should close its resource because CompressionInputStream.close closes only the inner stream.

public class ZstdDecompressorStream extends CompressionInputStream {

  private ZstdInputStream zstdInputStream;

  public ZstdDecompressorStream(InputStream stream) throws IOException {
    super(stream);
    zstdInputStream = new ZstdInputStream(stream);
  }
}

Make sure you have checked all steps below.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    Use revised test case.

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

`ZstdDecompressorStream` should close its resource because `CompressionInputStream.close` closes only the inner stream.

```
public class ZstdDecompressorStream extends CompressionInputStream {

  private ZstdInputStream zstdInputStream;

  public ZstdDecompressorStream(InputStream stream) throws IOException {
    super(stream);
    zstdInputStream = new ZstdInputStream(stream);
  }
}
```
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for fixing this @dongjoon-hyun

@dongjoon-hyun
Copy link
Member Author

Thank you, @Fokko !

@dongjoon-hyun
Copy link
Member Author

Could you review this, @gszadovszky ?

@dongjoon-hyun
Copy link
Member Author

Could you review this, @ggershinsky ?

@ggershinsky
Copy link
Contributor

will do

@dongjoon-hyun
Copy link
Member Author

Thank you so much, @ggershinsky !

Copy link
Contributor

@ggershinsky ggershinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dongjoon-hyun
Copy link
Member Author

Thank you for your review and approval, @ggershinsky !

@dongjoon-hyun
Copy link
Member Author

Gentle ping, @gszadovszky .

@gszadovszky gszadovszky merged commit 8c08403 into apache:master Apr 19, 2021
@dongjoon-hyun
Copy link
Member Author

Thank you, @Fokko , @ggershinsky , @gszadovszky !

@dongjoon-hyun dongjoon-hyun deleted the PARQUET-2022 branch April 19, 2021 15:35
@shangxinli
Copy link
Contributor

Thanks @dongjoon-hyun for working on this!

cc @vectorijk

@dongjoon-hyun
Copy link
Member Author

Thank you, @shangxinli .

elikkatz added a commit to TheWeatherCompany/parquet-mr that referenced this pull request Jun 2, 2021
* 'master' of https://github.com/apache/parquet-mr: (222 commits)
  PARQUET-2052: Integer overflow when writing huge binary using dictionary encoding (apache#910)
  PARQUET-2041: Add zstd to `parquet.compression` description of ParquetOutputFormat Javadoc (apache#899)
  PARQUET-2050: Expose repetition & definition level from ColumnIO (apache#908)
  PARQUET-1761: Lower Logging Level in ParquetOutputFormat (apache#745)
  PARQUET-2046: Upgrade Apache POM to 23 (apache#904)
  PARQUET-2048: Deprecate BaseRecordReader (apache#906)
  PARQUET-1922: Deprecate IOExceptionUtils (apache#825)
  PARQUET-2037: Write INT96 with parquet-avro (apache#901)
  PARQUET-2044: Enable ZSTD buffer pool by default (apache#903)
  PARQUET-2038: Upgrade Jackson version used in parquet encryption. (apache#898)
  Revert "[WIP] Refactor GroupReadSupport to unuse deprecated api (apache#894)"
  PARQUET-2027: Fix calculating directory offset for merge (apache#896)
  [WIP] Refactor GroupReadSupport to unuse deprecated api (apache#894)
  PARQUET-2030: Expose page size row check configurations to ParquetWriter.Builder (apache#895)
  PARQUET-2031: Upgrade to parquet-format 2.9.0 (apache#897)
  PARQUET-1448: Review of ParquetFileReader (apache#892)
  PARQUET-2020: Remove deprecated modules (apache#888)
  PARQUET-2025: Update Snappy version to 1.1.8.3 (apache#893)
  PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream` (apache#889)
  PARQUET-1982: Random access to row groups in ParquetFileReader (apache#871)
  ...

# Conflicts:
#	parquet-column/src/main/java/org/apache/parquet/example/data/simple/SimpleGroup.java
#	parquet-hadoop/pom.xml
#	parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
#	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
dongjoon-hyun added a commit to dongjoon-hyun/parquet-mr that referenced this pull request Aug 20, 2021
…pache#889)

(cherry picked from commit 8c08403)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
shangxinli pushed a commit to shangxinli/parquet-mr that referenced this pull request Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants