Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-25960: Fix S3a recursive listing logic. #3031

Merged
merged 3 commits into from Feb 22, 2022

Conversation

ayushtkn
Copy link
Member

@ayushtkn ayushtkn commented Feb 15, 2022

Changes replace to replaceFirst and rather than comparing with FileStatus, changes to compare with the Actual Path

@ayushtkn
Copy link
Member Author

Got introduced as part of HIVE-22411.
@abstractdog any pointers on this. I see only you amongst the people involved there active :-)

@@ -361,7 +361,7 @@ private static void listS3FilesRecursive(FileStatus base, FileSystem fs, List<Fi
RemoteIterator<LocatedFileStatus> remoteIterator = fs.listFiles(base.getPath(), true);
while (remoteIterator.hasNext()) {
LocatedFileStatus each = remoteIterator.next();
Path relativePath = new Path(each.getPath().toString().replace(base.toString(), ""));
Path relativePath = new Path(each.getPath().toString().replaceFirst(base.getPath().toString(), ""));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change makes sense
some ideas:
isn't Path.relativize for the same purpose?
a) is so could you please check its source if it does this optimized version or not worse (from performance POV)
b) if it isn't, could you please refactor this to utility method here? in this case, please include unit test into TestFileUtils

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx @abstractdog for the review.

isn't Path.relativize for the same purpose?

This Path & the path used here is different, the first one is from the Java package. The latter one is from Hadoop-Common
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Path.java

b) if it isn't, could you please refactor this to utility method here? in this case, please include unit test into TestFileUtils

Done

@ayushtkn
Copy link
Member Author

Test failure testCliDriver[empty_skip_header_footer_aggr] isn't related to this change.

@abstractdog abstractdog self-requested a review February 22, 2022 10:49
Copy link
Contributor

@abstractdog abstractdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@ayushtkn ayushtkn merged commit 756a8fc into apache:master Feb 22, 2022
DongWei-4 pushed a commit to DongWei-4/hive that referenced this pull request Oct 28, 2022
dengzhhu653 pushed a commit to dengzhhu653/hive that referenced this pull request Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants