New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-3052] Improve drop table performance by reducing the namenode RPC calls during physical deletion of files #2868
Conversation
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1102/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1313/ |
Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9365/ |
retest this please |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1115/ |
Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9377/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1325/ |
f79f0fa
to
c415d4b
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1118/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1330/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9382/ |
@@ -64,15 +67,14 @@ class SubqueryWithFilterAndSortTestCase extends QueryTest with BeforeAndAfterAll | |||
dis.close() | |||
} | |||
def deleteFile(filePath: String) { | |||
val file = FileFactory.getCarbonFile(filePath, FileFactory.getFileType(filePath)) | |||
val file = new File(filePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this modification needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required. I will remove
c415d4b
to
eddcc00
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1134/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9398/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1346/ |
If the table is on S3, will it behave correctly since it does not have "folder" concept? |
/** | ||
* This method will delete the files recursively from file system | ||
* | ||
* @return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
complete the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
try { | ||
return deleteFile(file.getAbsolutePath(), FileFactory.getFileType(file.getAbsolutePath())); | ||
} catch (IOException e) { | ||
LOGGER.error("Exception occurred:" + e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include the exception in the error log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
I have not changed any existing behavior, so it should work fine |
eddcc00
to
4cdcbc6
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1167/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1380/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9431/ |
CarbonFile carbonFile = FileFactory.getCarbonFile(path[i].getAbsolutePath()); | ||
boolean delete = carbonFile.delete(); | ||
if (!delete) { | ||
throw new IOException("Error while deleting the folders and files"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to print the file location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
deleteRecursive(file[i]); | ||
boolean delete = file[i].delete(); | ||
if (!delete) { | ||
throw new IOException("Error while deleting the folders and files"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to print the file location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
4cdcbc6
to
f9cc4dd
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1176/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1389/ |
retest this please |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1179/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9443/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1392/ |
LGTM |
…node RPC calls during physical deletion of files Problem Current drop table command takes more than 1 minute to delete 3000 files during drop table operation from HDFS Analysis Even though we are using HDFS file system we are explicitly we are recursively iterating through the table folders and deleting each file. For each file deletion and file listing one rpc call is made to namenode. To delete 3000 files 3000 rpc calls are made to namenode for file deletion and few more rpc calls for file listing in each folder. Solution HDFS provides an API for deleting all folders and files recursively for a given path in a single RPC call. Use that API and improve the drop table operation performance. Result: After these code changes drop table operation time to delete 3000 files from HDFS has reduced from 1 minute to ~2 sec. This closes #2868
Problem
Current drop table command takes more than 1 minute to delete 3000 files during drop table operation from HDFS
Analysis
Even though we are using HDFS file system we are explicitly we are recursively iterating through the table folders and deleting each file. For each file deletion and file listing one rpc call is made to namenode. To delete 3000 files 3000 rpc calls are made to namenode for file deletion and few more rpc calls for file listing in each folder.
Solution
HDFS provides an API for deleting all folders and files recursively for a given path in a single RPC call. Use that API and improve the drop table operation performance.
Result: After these code changes drop table operation time to delete 3000 files from HDFS has reduced from 1 minute to ~2 sec.
No
No
No
Verified on cluster
NA