New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-783: Close the underlying stream when an H2SeekableInputStream is closed #388

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@mallman
Contributor

mallman commented Dec 1, 2016

This PR addresses https://issues.apache.org/jira/browse/PARQUET-783.

ParquetFileReader opens a SeekableInputStream to read a footer. In the process, it opens a new FSDataInputStream and wraps it. However, H2SeekableInputStream does not override the close method. Therefore, when ParquetFileReader closes it, the underlying FSDataInputStream is not closed. As a result, these stale connections can exhaust a clusters' data nodes' connection resources and lead to mysterious HDFS read failures in HDFS clients, e.g.

org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-905337612-172.16.70.103-1444328960665:blk_1720536852_646811517

@mallman mallman changed the title from Close the underlying stream when an H2SeekableInputStream is closed to PARQUET-783: Close the underlying stream when an H2SeekableInputStream is closed Dec 1, 2016

@mallman mallman force-pushed the VideoAmp:parquet-783-close_underlying_inputstream branch from 83e88ad to f4b27c1 Dec 1, 2016

@gszadovszky

Thanks for the patch.
LGTM.

@julienledem

This comment has been minimized.

Member

julienledem commented Dec 5, 2016

+1

@asfgit asfgit closed this in 09d28fe Dec 5, 2016

rdblue added a commit to rdblue/parquet-mr that referenced this pull request Jan 6, 2017

PARQUET-783: Close the underlying stream when an H2SeekableInputStrea…
…m is closed

This PR addresses https://issues.apache.org/jira/browse/PARQUET-783.

`ParquetFileReader` opens a `SeekableInputStream` to read a footer. In the process, it opens a new `FSDataInputStream` and wraps it. However, `H2SeekableInputStream` does not override the `close` method. Therefore, when `ParquetFileReader` closes it, the underlying `FSDataInputStream` is not closed. As a result, these stale connections can exhaust a clusters' data nodes' connection resources and lead to mysterious HDFS read failures in HDFS clients, e.g.

```
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-905337612-172.16.70.103-1444328960665:blk_1720536852_646811517
```

Author: Michael Allman <michael@videoamp.com>

Closes apache#388 from mallman/parquet-783-close_underlying_inputstream and squashes the following commits:

f4b27c1 [Michael Allman] PARQUET-783 Close the underlying stream when an H2SeekableInputStream is closed

rdblue added a commit to rdblue/parquet-mr that referenced this pull request Jan 10, 2017

PARQUET-783: Close the underlying stream when an H2SeekableInputStrea…
…m is closed

This PR addresses https://issues.apache.org/jira/browse/PARQUET-783.

`ParquetFileReader` opens a `SeekableInputStream` to read a footer. In the process, it opens a new `FSDataInputStream` and wraps it. However, `H2SeekableInputStream` does not override the `close` method. Therefore, when `ParquetFileReader` closes it, the underlying `FSDataInputStream` is not closed. As a result, these stale connections can exhaust a clusters' data nodes' connection resources and lead to mysterious HDFS read failures in HDFS clients, e.g.

```
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-905337612-172.16.70.103-1444328960665:blk_1720536852_646811517
```

Author: Michael Allman <michael@videoamp.com>

Closes apache#388 from mallman/parquet-783-close_underlying_inputstream and squashes the following commits:

f4b27c1 [Michael Allman] PARQUET-783 Close the underlying stream when an H2SeekableInputStream is closed
@LuciferYang

This comment has been minimized.

LuciferYang commented Jan 23, 2017

should we release a new version like 1.9.1,this is a serious problem ........

@julienledem

This comment has been minimized.

Member

julienledem commented Jan 23, 2017

@LuciferYang To get the discussion going around a patch release you can start a thread on the dev mailing list and open a jira for it to track progress.

julienledem added a commit to julienledem/parquet-mr that referenced this pull request Jun 9, 2017

PARQUET-783: Close the underlying stream when an H2SeekableInputStrea…
…m is closed

This PR addresses https://issues.apache.org/jira/browse/PARQUET-783.

`ParquetFileReader` opens a `SeekableInputStream` to read a footer. In the process, it opens a new `FSDataInputStream` and wraps it. However, `H2SeekableInputStream` does not override the `close` method. Therefore, when `ParquetFileReader` closes it, the underlying `FSDataInputStream` is not closed. As a result, these stale connections can exhaust a clusters' data nodes' connection resources and lead to mysterious HDFS read failures in HDFS clients, e.g.

```
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-905337612-172.16.70.103-1444328960665:blk_1720536852_646811517
```

Author: Michael Allman <michael@videoamp.com>

Closes apache#388 from mallman/parquet-783-close_underlying_inputstream and squashes the following commits:

f4b27c1 [Michael Allman] PARQUET-783 Close the underlying stream when an H2SeekableInputStream is closed

@jhpoelen jhpoelen referenced this pull request Jul 12, 2018

Closed

GUODA service down? #49

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment