[CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success #3994

ajantha-bhat · 2020-10-22T12:02:33Z

Why is this PR needed?

For compaction, we don't register in-progress segment. so, when unable to get a table status lock. compaction can fail. That time compaction partial segment needs to be cleaned. If the partial segment is failed to clean up due to unable to get lock or IO issues. When the user retries the compaction. carbon uses the same segment id. so while writing the segment file for new compaction. list only the files mapping to the current compaction, not all the files which contain stale files.

What changes were proposed in this PR?

While writing the segment file, consider index files belongs to the current load only in the segment folder.

Does this PR introduce any user interface change?

No

Is any new testcase added?

No [As it happens in concurrent scenario randomly, manually verified]

ajantha-bhat · 2020-10-22T12:02:47Z

@QiangCai : please check this.

CarbonDataQA1 · 2020-10-22T15:01:48Z

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2876/

CarbonDataQA1 · 2020-10-22T15:21:35Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4632/

ajantha-bhat · 2020-10-22T16:38:29Z

retest this please

CarbonDataQA1 · 2020-10-22T18:16:13Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4640/

CarbonDataQA1 · 2020-10-22T19:26:57Z

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2884/

QiangCai · 2020-10-23T01:39:55Z

core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java

@@ -398,27 +398,29 @@ public static void mergeIndexAndWriteSegmentFile(CarbonTable carbonTable, String
   * @throws IOException
   */
  public static String writeSegmentFile(CarbonTable carbonTable, String segmentId, String UUID,


for the update, it will have more than one timestamp, right?

Agree, Let me check and modify

I have pushed now. waiting for the build

@QiangCai : I have thought about, only non update scenario I can handle this issue. For update there is no easy way currently to find out which is stale and which is not. One way for update is to read old segment file and add files that timestamp greater than old segment file content + old segment file content.

CarbonDataQA1 · 2020-10-23T08:28:01Z

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2903/

CarbonDataQA1 · 2020-10-23T08:28:32Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4659/

marchpure · 2020-10-27T02:16:31Z

Hi, Ajantha. I have fixed this in #3999 , please have a check.

ajantha-bhat · 2020-10-27T05:43:49Z

Hi, Ajantha. I have fixed this in #3999 , please have a check.

@marchpure: I think your PR is not handling all the scenarios, so update scenario and test case will fail (even mine also)

marchpure · 2020-10-27T06:13:48Z

Hi, Ajantha. I have fixed this in #3999 , please have a check.

@marchpure: I think your PR is not handling all the scenarios, so update scenario and test case will fail (even mine also)

yes. update test case all passes
i am handing other 10 test failures

ajantha-bhat · 2020-10-27T06:23:33Z

@marchpure : If you dont assign UUID for all update case or compaction after update case, all testcases will pass. But the original data mismatch will still happen for segment that has stale files and went for update and compaction (that scenario you can test)

marchpure · 2020-10-27T06:26:43Z

@marchpure : If you dont assign UUID for all update case or compaction after update case, all testcases will pass. But the original data mismatch will still happen for segment that has stale files and went for update and compaction (that scenario you can test)

yes. I have test it. in pr3999, update will generate new segment like merge into. which can void recreate segment file.

ajantha-bhat · 2020-10-27T06:30:04Z

@marchpure : yes, If update writes into new segment, It can fix the issue. If you handle this in your PR then I will close mine.

ajantha-bhat · 2020-10-27T06:30:40Z

handled in #3999 , I will close this.

ajantha-bhat force-pushed the hotfix branch from e05e313 to f4750f8 Compare October 22, 2020 12:05

ajantha-bhat mentioned this pull request Oct 22, 2020

[WIP] Support Global Unique Id for SegmentNo #3934

Closed

QiangCai reviewed Oct 23, 2020

View reviewed changes

Fix data mismatch incase of compaction failure and retry success

79bd020

ajantha-bhat force-pushed the hotfix branch from f4750f8 to 79bd020 Compare October 23, 2020 06:45

ajantha-bhat closed this Oct 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success #3994

[CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success #3994

ajantha-bhat commented Oct 22, 2020

ajantha-bhat commented Oct 22, 2020

CarbonDataQA1 commented Oct 22, 2020

CarbonDataQA1 commented Oct 22, 2020

ajantha-bhat commented Oct 22, 2020

CarbonDataQA1 commented Oct 22, 2020

CarbonDataQA1 commented Oct 22, 2020

QiangCai Oct 23, 2020 •

edited

ajantha-bhat Oct 23, 2020

ajantha-bhat Oct 23, 2020

ajantha-bhat Oct 23, 2020

CarbonDataQA1 commented Oct 23, 2020

CarbonDataQA1 commented Oct 23, 2020

marchpure commented Oct 27, 2020

ajantha-bhat commented Oct 27, 2020

marchpure commented Oct 27, 2020

ajantha-bhat commented Oct 27, 2020

marchpure commented Oct 27, 2020 •

edited

ajantha-bhat commented Oct 27, 2020

ajantha-bhat commented Oct 27, 2020

[CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success #3994

[CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success #3994

Conversation

ajantha-bhat commented Oct 22, 2020

Why is this PR needed?

What changes were proposed in this PR?

Does this PR introduce any user interface change?

Is any new testcase added?

ajantha-bhat commented Oct 22, 2020

CarbonDataQA1 commented Oct 22, 2020

CarbonDataQA1 commented Oct 22, 2020

ajantha-bhat commented Oct 22, 2020

CarbonDataQA1 commented Oct 22, 2020

CarbonDataQA1 commented Oct 22, 2020

QiangCai Oct 23, 2020 • edited

Choose a reason for hiding this comment

ajantha-bhat Oct 23, 2020

Choose a reason for hiding this comment

ajantha-bhat Oct 23, 2020

Choose a reason for hiding this comment

ajantha-bhat Oct 23, 2020

Choose a reason for hiding this comment

CarbonDataQA1 commented Oct 23, 2020

CarbonDataQA1 commented Oct 23, 2020

marchpure commented Oct 27, 2020

ajantha-bhat commented Oct 27, 2020

marchpure commented Oct 27, 2020

ajantha-bhat commented Oct 27, 2020

marchpure commented Oct 27, 2020 • edited

ajantha-bhat commented Oct 27, 2020

ajantha-bhat commented Oct 27, 2020

QiangCai Oct 23, 2020 •

edited

marchpure commented Oct 27, 2020 •

edited