Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-4050]Avoid redundant RPC calls to get file status when CarbonFile is instantiated with fileStatus constructor #4010

Closed

Conversation

VenuReddy2103
Copy link
Contributor

@VenuReddy2103 VenuReddy2103 commented Nov 18, 2020

Why is this PR needed?

In createCarbonDataFileBlockMetaInfoMapping method, we get list of carbondata files in the segment, loop through all the carbon files and make a map of fileNameToMetaInfoMapping<path-string, BlockMetaInfo>

In that carbon files loop, if the file is of AbstractDFSCarbonFile type, we get the org.apache.hadoop.fs.FileStatus thrice for each file. And the method to get file status is an RPC call(fileSystem.getFileStatus(path)). It takes ~2ms in the cluster for each call. Thus, incur an overhead of ~6ms per file. So overall driver side query processing time has increased significantly when there are more carbon files. Hence caused TPC-DS queries performance degradation.

What changes were proposed in this PR?

Avoided redundant RPC calls to get file status in getAbsolutePath(), getSize() and getLocations() methods when CarbonFile is instantiated with FileStatus constructor

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • No

…tSize() and getLocations() methods when CarbonFile is instantiated with fileStatus constructor
@VenuReddy2103
Copy link
Contributor Author

retest this please

@@ -541,7 +547,10 @@ public boolean createNewLockFile() throws IOException {
@Override
public String[] getLocations() throws IOException {
BlockLocation[] blkLocations;
FileStatus fileStatus = fileSystem.getFileStatus(path);
FileStatus fileStatus = this.fileStatus;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the fileSystem.getFileStatus(path); in line 533, it can be changed also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to reply in below comment.

FileStatus fileStatus = fileSystem.getFileStatus(path);
FileStatus fileStatus = this.fileStatus;
if (fileStatus == null) {
fileStatus = fileSystem.getFileStatus(path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one file status is got, we can cache it in the class field member

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, as mentioned in PR #2465, we shouldn't get once and cache the filestatus in class field member. So have fixed to use the existing fileStatus from class field only if available (i.e., when Carbon file is instantiated with a FileStatus constructor AbstractDFSCarbonFile(FileStatus fileStatus)). This is particularly the case where we have obtained all the filestatuses in the segment and create carbonfiles with filestatus argument constructor. It happens only from implementation of protected abstract CarbonFile[] getFiles(FileStatus[] listStatus);

@CarbonDataQA2
Copy link

retest this please

1 similar comment
@ajantha-bhat
Copy link
Member

retest this please

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3079/

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4835/

@kunal642
Copy link
Contributor

kunal642 commented Dec 2, 2020

retest this please

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5028/

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3272/

@kunal642
Copy link
Contributor

kunal642 commented Dec 3, 2020

LGTM

@asfgit asfgit closed this in c5f464f Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants