-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-4081] Fix multiple issues with clean files command #4051
Conversation
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5137/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3375/ |
can you add a test case for fault testing? |
9853b3d
to
f87b3a5
Compare
@QiangCai , i have changed current test cases itself. Please check |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5166/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3404/ |
fileStatus is null
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5168/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3406/ |
@vikramahuja1001 : when the issue is found, what other file was present in segment metadata directory ? how it is impacting ? |
6090eda
to
fff268e
Compare
@ajantha-bhat , so in the case of partition table, we have a tmp directory in the segment folder. In this case, while we check for stale segments, we are blindly reading all the files in the segment folder and collecting the name of it and then based on that we read the segment files, since it is not a file, thus it cannot be read and would give an exception. In this fix, we just need to filter if the file ends with ".segment" |
@@ -157,11 +157,12 @@ public static void deleteExpiredDataFromTrash(String tablePath) { | |||
// Deleting the timestamp based subdirectories in the trashfolder by the given timestamp. | |||
try { | |||
if (FileFactory.isFileExist(trashPath)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (FileFactory.isFileExist(trashPath)) { | |
CarbonFile trashFile = FileFactory.getCarbonFile(trashPath); | |
if (trashFile.exists()) { | |
CarbonFile[] timestampFolderList = trashFile.listFiles(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should deprecate listfiles and fileExist and other API in file factory that takes path, it has to take carbon file instead of path. This way we can force user to think about reusing the carbonFile Object.
we can raise a JIRA and assign some new contributors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently Vikram can handle for his PR by accepting suggested changes by Indhu. But many places we are not resuing the CarbonFile object. that can be handled by my above point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajantha-bhat , i can raise a PR for that as well
@@ -181,7 +182,7 @@ public static void emptyTrash(String tablePath) { | |||
// if the trash folder exists delete the contents of the trash folder | |||
try { | |||
if (FileFactory.isFileExist(trashPath)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle same as above comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
CarbonFilters.getPartitions(Seq.empty[Expression], sparkSession, carbonTable)) | ||
} | ||
} else { | ||
LOGGER.info(s"Can not do clean files operation for the MV: ${carbonTable.getTableName}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean files for MV is not supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please handle based on isInternalCleanCall
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5175/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3413/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5182/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3420/ |
// if insert overwrite in progress, do not allow delete segment | ||
if (SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) { | ||
// if insert overwrite in progress and table not a MV, do not allow delete segment | ||
if (!carbonTable.isMV && SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not here, handle at the place where we call clean files for MV when cleanfiles is called for maintable. Else if the user calls clean files on MV table when concurrently insert overwrite is happening, now you don't throw an exception. which is out of synch with main table clean files behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please handle based on isInternalCleanCall
override def processData(sparkSession: SparkSession): Seq[Row] = { | ||
Checker.validateTableExists(databaseNameOp, tableName, sparkSession) | ||
val carbonTable = CarbonEnv.getCarbonTable(databaseNameOp, tableName)(sparkSession) | ||
setAuditTable(carbonTable) | ||
// if insert overwrite in progress, do not allow delete segment | ||
if (SegmentStatusManager.isOverwriteInProgressInTable(carbonTable)) { | ||
// if insert overwrite in progress and table not a MV, do not allow delete segment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// if insert overwrite in progress and table not a MV, do not allow delete segment | |
// if insert overwrite in progress and table is not a MV, do not allow delete segment |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3434/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5194/ |
LGTM |
Why is this PR needed?
While getting stale segment, we do a list files and take all the files, if there is any folder/file other than .segment file, it will lead to further issues while copying data to the trash folder.
In the case when AbstractDFSCarbonFile is created with path and HadoopConf instead of fileStatus and if the file does not exist, since fileStatus is empty ListDirs returns empty result and getAbsolutePath throws file does not exist exception
Clean files is not allowed with concurrent insert overwrite in progress, In case with concurrent loading to MV is by default an insert overwrite operation, clean files operation on that MV would fail and throw exception.
What changes were proposed in this PR?
Does this PR introduce any user interface change?
Is any new testcase added?