Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success #3994

Closed
wants to merge 1 commit into from

Conversation

ajantha-bhat
Copy link
Member

Why is this PR needed?

For compaction, we don't register in-progress segment. so, when unable to get a table status lock. compaction can fail. That time compaction partial segment needs to be cleaned. If the partial segment is failed to clean up due to unable to get lock or IO issues. When the user retries the compaction. carbon uses the same segment id. so while writing the segment file for new compaction. list only the files mapping to the current compaction, not all the files which contain stale files.

What changes were proposed in this PR?

While writing the segment file, consider index files belongs to the current load only in the segment folder.

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • No [As it happens in concurrent scenario randomly, manually verified]

@ajantha-bhat
Copy link
Member Author

@QiangCai : please check this.

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2876/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4632/

@ajantha-bhat
Copy link
Member Author

retest this please

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4640/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2884/

@@ -398,27 +398,29 @@ public static void mergeIndexAndWriteSegmentFile(CarbonTable carbonTable, String
* @throws IOException
*/
public static String writeSegmentFile(CarbonTable carbonTable, String segmentId, String UUID,
Copy link
Contributor

@QiangCai QiangCai Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the update, it will have more than one timestamp, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, Let me check and modify

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed now. waiting for the build

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QiangCai : I have thought about, only non update scenario I can handle this issue. For update there is no easy way currently to find out which is stale and which is not. One way for update is to read old segment file and add files that timestamp greater than old segment file content + old segment file content.

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2903/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4659/

@marchpure
Copy link
Contributor

Hi, Ajantha. I have fixed this in #3999 , please have a check.

@ajantha-bhat
Copy link
Member Author

Hi, Ajantha. I have fixed this in #3999 , please have a check.

@marchpure: I think your PR is not handling all the scenarios, so update scenario and test case will fail (even mine also)

@marchpure
Copy link
Contributor

Hi, Ajantha. I have fixed this in #3999 , please have a check.

@marchpure: I think your PR is not handling all the scenarios, so update scenario and test case will fail (even mine also)

yes. update test case all passes
i am handing other 10 test failures

@ajantha-bhat
Copy link
Member Author

@marchpure : If you dont assign UUID for all update case or compaction after update case, all testcases will pass. But the original data mismatch will still happen for segment that has stale files and went for update and compaction (that scenario you can test)

@marchpure
Copy link
Contributor

marchpure commented Oct 27, 2020

@marchpure : If you dont assign UUID for all update case or compaction after update case, all testcases will pass. But the original data mismatch will still happen for segment that has stale files and went for update and compaction (that scenario you can test)

yes. I have test it. in pr3999, update will generate new segment like merge into. which can void recreate segment file.

@ajantha-bhat
Copy link
Member Author

@marchpure : yes, If update writes into new segment, It can fix the issue. If you handle this in your PR then I will close mine.

@ajantha-bhat
Copy link
Member Author

handled in #3999 , I will close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants