Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse… #3977

Closed
wants to merge 1 commit into from

Conversation

marchpure
Copy link
Contributor

@marchpure marchpure commented Oct 12, 2020

…rt stage

Why is this PR needed?

ISSUE1: In the insertstage flow, there is a empty file with suffix '.loading' to mark the stage in the status of 'in processing'. We update the modifiedtime of '.loading' file for monitoring the insertstage start time, which can be used for calculate TIMEOUT, help to retry and recovery.
Before, we use setModifiedTime function to update the modifiedtime, which has a serious bug.
For S3 file, setModifiedTime operation do not take effect. leading to the incorrect inserstage starttime of 'loading' file.

ISSUE2: For now, Insertstage non-parttion table will not merge index files, which will degrade the query performance heavily.

What changes were proposed in this PR?

  1. Update the modifiedtime of loading files based on recreating files.
  2. Trigger indexfiles merging after insertstage non-parttiontable.

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • No

@marchpure
Copy link
Contributor Author

retest this please

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2610/

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4360/

@marchpure
Copy link
Contributor Author

retest this please

@Indhumathi27
Copy link
Contributor

@marchpure If setLastModifiedTime operation do not take effect on S3, in other places also, we need to check and update

// Try to recreate loading files if the loading file exists
// or create loading files directly if the loading file doesn't exist
// set isFailed to be false when (delete and) createfile success
var isFailed = if (stageLoadingFile.exists()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isFailed will always be false. can remove it and update the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if IOException happend in stageLoadingFile.createNewFile(), createNewFile() will return FALE, make isFailed to be TRUE

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4394/

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2642/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4398/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2647/

@marchpure
Copy link
Contributor Author

retest this please

// Try to recreate loading files if the loading file exists
// or create loading files directly if the loading file doesn't exist
// set isFailed to be false when (delete and) createfile success
var isFailed = if (stageLoadingFile.exists()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var isFailed = if (stageLoadingFile.exists()) {
val isFailed = if (stageLoadingFile.exists()) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified code according to your suggestion

FileFactory.createNewFile(
this.schemaIndexFilePath,
new FsPermission(FsAction.ALL, FsAction.ALL, FsAction.ALL));
if (FileFactory.isFileExist(this.schemaIndexFilePath)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (FileFactory.isFileExist(this.schemaIndexFilePath)) {
CarbonFile schemaIndexFile = FileFactory.getCarbonFile(this.schemaIndexFilePath);
if (schemaIndexFile.exists()) {
schemaIndexFile.delete();
}
schemaIndexFile.createNewFile(new FsPermission(FsAction.ALL, FsAction.ALL, FsAction.ALL));
this.lastModifiedTime = schemaIndexFile.getLastModifiedTime();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified code according to your suggestion

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4405/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2652/

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4409/

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2655/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4413/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2659/

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4417/

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2663/

@marchpure
Copy link
Contributor Author

retest this please

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4419/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2665/

@marchpure
Copy link
Contributor Author

retest this please

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2791/

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4545/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4553/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2802/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2812/

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4563/

@marchpure
Copy link
Contributor Author

retest this please

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4575/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2826/

@QiangCai
Copy link
Contributor

retest this please

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2894/

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4650/

@Indhumathi27
Copy link
Contributor

@marchpure please update the PR description for MergeIndex changes also

@@ -43,11 +43,6 @@ class MergeIndexEventListener extends OperationEventListener with Logging {
override def onEvent(event: Event, operationContext: OperationContext): Unit = {
event match {
case preStatusUpdateEvent: LoadTablePreStatusUpdateEvent =>
// skip merge index in case of insert stage flow
if (null != operationContext.getProperty(CarbonCommonConstants.IS_INSERT_STAGE) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property also has to be removed from CarbonCommonConstants, as no more usage will be found

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified code according to your suggestion

…rt stage

Why is this PR needed?
In the insertstage flow, there is a empty file with suffix '.loading' to mark the stage in the status of 'in processing'. We update the modifiedtime of '.loading' file for monitoring the insertstage start time, which can be used for calculate TIMEOUT, help to retry and recovery.
Before, we use setModifiedTime function to update the modifiedtime, which has a serious bug.
For S3 file, setModifiedTime operation do not take effect. leading to the incorrect inserstage starttime of 'loading' file.

What changes were proposed in this PR?
Update the modifiedtime of loading files based on recreating files.

Does this PR introduce any user interface change?
No

Is any new testcase added?
No
@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4670/

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2914/

@marchpure
Copy link
Contributor Author

retest this please

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4676/

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2919/

@marchpure
Copy link
Contributor Author

@marchpure please update the PR description for MergeIndex changes also

I have updated the PR description for MergeIndex changes

@Indhumathi27
Copy link
Contributor

retest this please

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4692/

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2935/

@Indhumathi27
Copy link
Contributor

LGTM

@asfgit asfgit closed this in f5118f9 Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants