-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement unbuffer interface for HdfsFileInputStream #16017
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not familiar with unbuffer. Firstly I want to know how and when does the client call the unbuffer
method. Secondly I want to know how it affect the performance and what the optimized benefits are.
And I need to make sure it doesn't affect other scenes, please give me some more information and test examples.
Thanks!
core/client/fs/src/main/java/alluxio/client/file/AlluxioFileInStream.java
Show resolved
Hide resolved
core/client/hdfs3/src/main/java/alluxio/hadoop/HdfsFileInputStream.java
Outdated
Show resolved
Hide resolved
core/client/hdfs/src/main/java/alluxio/hadoop/HdfsFileInputStream.java
Outdated
Show resolved
Hide resolved
@Jackson-Wang-7 Thanks for you review. I rebase the master branch and commit our latest changes. |
Mostly LGTM, and I suggest you’d better find one more person to take a look. |
/ping @jja725 PTAL thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core/client/hdfs3/pom.xml
Outdated
<parent> | ||
<groupId>org.alluxio</groupId> | ||
<artifactId>alluxio-core-client</artifactId> | ||
<version>2.10.0-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Xenorith do we need to update the script for this new pom file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need, the script looks for the version string in all pom.xml files:
function update_poms() {
find . -name pom.xml | xargs -t -n 1 perl -pi -e "s/${1}/${2}/g"
}
core/client/hdfs3/pom.xml
Outdated
<artifactId>alluxio-core-client-hdfs3</artifactId> | ||
<packaging>jar</packaging> | ||
<name>Alluxio Core - Client - HDFS3</name> | ||
<description>HDFS Client of Alluxio Core For HDFS 3</description> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please help us update the doc regarding the hdfs3 client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Let's align first internally then comment on this PR on the recommended doc change.
the new client/hdfs3 and shaded/client-hadoop3 modules are currently a copy of the existing client/hdfs and shaded/client modules the addition of this will allow for changes that are available only in hadoop3, such as Alluxio#16017 (comment) both client jars will be built by default, but the symlink at client/alluxio-VERSION-client.jar will point to the hadoop-2 one to maintain backcompat. if the hadoop-3 profile is activated by adding `-Phadoop-3`, then the symlink will be overridden to point to the new hadoop3 shaded client jar note that in the current state of this PR, the tarball generation will result in pointing to the hadoop3 shaded client jar because the hadoop-3 profile is activated by default
the new client/hdfs3 and shaded/client-hadoop3 modules are currently a copy of the existing client/hdfs and shaded/client modules the addition of this will allow for changes that are available only in hadoop3, such as #16017 (comment) both client jars will be built by default, but the symlink at client/alluxio-VERSION-client.jar will point to the hadoop-2 one to maintain backcompat. if the hadoop-3 profile is activated by adding `-Phadoop-3`, then the symlink will be overridden to point to the new hadoop3 shaded client jar pr-link: #16699 change-id: cid-a6ffd09414e8259078fd5a8f68c3e287d85feec5
#16699 has merged to unblock the maven/pom parts of this change |
the new client/hdfs3 and shaded/client-hadoop3 modules are currently a copy of the existing client/hdfs and shaded/client modules the addition of this will allow for changes that are available only in hadoop3, such as Alluxio#16017 (comment) both client jars will be built by default, but the symlink at client/alluxio-VERSION-client.jar will point to the hadoop-2 one to maintain backcompat. if the hadoop-3 profile is activated by adding `-Phadoop-3`, then the symlink will be overridden to point to the new hadoop3 shaded client jar pr-link: Alluxio#16699 change-id: cid-a6ffd09414e8259078fd5a8f68c3e287d85feec5
@jiacheliu3 @jja725 @Jackson-Wang-7 @dbw9580 Please revirwe this PR again, thanks! |
Is there any new change from my LGTM last time except the pom change? |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Just curious is there any other compute framework use this function except impala? |
From HADOOP-14747 we can know that HBase also use the unbffer function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the work!
byte[] cacheMiss = new byte[partialReadSize]; | ||
stream.unbuffer(); | ||
stream.seek(offset); | ||
stream.unbuffer(); | ||
Assert.assertEquals(partialReadSize, stream.read(cacheMiss)); | ||
stream.unbuffer(); | ||
Assert.assertArrayEquals( | ||
Arrays.copyOfRange(testData, offset, offset + partialReadSize), cacheMiss); | ||
Assert.assertEquals(0, manager.mPagesServed); | ||
Assert.assertEquals(1, manager.mPagesCached); | ||
|
||
byte[] cacheHit = new byte[partialReadSize]; | ||
stream.unbuffer(); | ||
stream.seek(offset); | ||
stream.unbuffer(); | ||
Assert.assertEquals(partialReadSize, stream.read(cacheHit)); | ||
stream.unbuffer(); | ||
Assert.assertArrayEquals( | ||
Arrays.copyOfRange(testData, offset, offset + partialReadSize), cacheHit); | ||
Assert.assertEquals(1, manager.mPagesServed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add some more comments in the test explaining the behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is essentially the same as readPartialPage
, which is to test that unbffer method does not affect the read behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
alluxio-bot, merge this please |
### What changes are proposed in this pull request? Implement unbuffer interface for HdfsFileInputStream. Fix Alluxio#16016. ### Why are the changes needed? If the unbuffer method is not implemented, then impala will not be able to use the file handle cache. ### Does this PR introduce any user facing changes? Implement CanUnbuffer and StreamCapabilities for HdfsFileInputStream. pr-link: Alluxio#16017 change-id: cid-b50163c7b4f199b8a61d5818a0e4739039f2745c
…ed yet: Add client-hadoop3 module the new client/hdfs3 and shaded/client-hadoop3 modules are currently a copy of the existing client/hdfs and shaded/client modules the addition of this will allow for changes that are available only in hadoop3, such as Alluxio#16017 (comment) both client jars will be built by default, but the symlink at client/alluxio-VERSION-client.jar will point to the hadoop-2 one to maintain backcompat. if the hadoop-3 profile is activated by adding `-Phadoop-3`, then the symlink will be overridden to point to the new hadoop3 shaded client jar pr-link: Alluxio#16699 change-id: cid-a6ffd09414e8259078fd5a8f68c3e287d85feec5
…ed yet: Add client-hadoop3 module the new client/hdfs3 and shaded/client-hadoop3 modules are currently a copy of the existing client/hdfs and shaded/client modules the addition of this will allow for changes that are available only in hadoop3, such as Alluxio#16017 (comment) both client jars will be built by default, but the symlink at client/alluxio-VERSION-client.jar will point to the hadoop-2 one to maintain backcompat. if the hadoop-3 profile is activated by adding `-Phadoop-3`, then the symlink will be overridden to point to the new hadoop3 shaded client jar pr-link: Alluxio#16699 change-id: cid-a6ffd09414e8259078fd5a8f68c3e287d85feec5
…dfsFileInputStream Implement unbuffer interface for HdfsFileInputStream. Fix Alluxio#16016. If the unbuffer method is not implemented, then impala will not be able to use the file handle cache. Implement CanUnbuffer and StreamCapabilities for HdfsFileInputStream. pr-link: Alluxio#16017 change-id: cid-b50163c7b4f199b8a61d5818a0e4739039f2745c
…dfsFileInputStream Implement unbuffer interface for HdfsFileInputStream. Fix Alluxio#16016. If the unbuffer method is not implemented, then impala will not be able to use the file handle cache. Implement CanUnbuffer and StreamCapabilities for HdfsFileInputStream. pr-link: Alluxio#16017 change-id: cid-b50163c7b4f199b8a61d5818a0e4739039f2745c
What changes are proposed in this pull request?
Implement unbuffer interface for HdfsFileInputStream. Fix #16016.
Why are the changes needed?
If the unbuffer method is not implemented, then impala will not be able to use the file handle cache.
Does this PR introduce any user facing changes?
Implement CanUnbuffer and StreamCapabilities for HdfsFileInputStream.