[CARBONDATA-3334] fixed multiple segment file issue for partition #3167

kunal642 · 2019-03-28T09:06:30Z

Problem:
During partition load, while writing merge index files the FactTimestamp in load model is being changed to current timestamp due to which a new file with mergeindex entry is written.

Solution:
Set new timestamp if FactTimestamp in load model is 0L(meaning nothing is set).

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

CarbonDataQA · 2019-03-28T09:11:55Z

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11083/

CarbonDataQA · 2019-03-28T09:19:52Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2823/

...src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala

CarbonDataQA · 2019-03-28T09:54:53Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2824/

CarbonDataQA · 2019-03-28T10:53:53Z

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11084/

CarbonDataQA · 2019-03-28T10:57:59Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3056/

kunal642 · 2019-03-28T14:20:47Z

retest this please

CarbonDataQA · 2019-03-28T14:23:41Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3057/

CarbonDataQA · 2019-03-28T14:34:51Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2825/

CarbonDataQA · 2019-03-28T15:29:57Z

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11085/

xuchuanyin · 2019-04-04T12:03:04Z

Feel like this modification is not elegant...
The modification still confuse my even I've read the description of the PR...

It is the MergeIndexWriter for partition loading that cause the problem, but you modify the loadModel instead. It seems that the writer and loadModel have some negotiations which makes the code complicated especially when the loadModel is at the beginning of loading while the MergeIndexWriter is at almost the end of loading.

kunal642 · 2019-04-04T12:28:55Z

Hi xuchuanyin,
The problem was that before launching the job to merge index files we are changing the fact timestamp sue to which a new segments file is written with mergeindex details.
Example:-
load writes 0_t1.segment
while the mergeindex will write 0_t2.segment.

My fix is that if the load has already identifier a timestamp for the segments file then mergeindex should use the same.

I understand the check does not look good but the other solution would be to remove the 0_t1.segment file from merge index flow. i dont think that way would be clean either

kunal642 · 2019-04-15T04:52:42Z

retest this please

CarbonDataQA · 2019-04-15T05:05:24Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2873/

CarbonDataQA · 2019-04-15T06:21:06Z

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11133/

CarbonDataQA · 2019-04-15T06:35:00Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3104/

ravipesala · 2019-04-15T07:32:19Z

retest this please

ravipesala · 2019-04-15T07:32:26Z

LGTM

CarbonDataQA · 2019-04-15T07:47:05Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2876/

CarbonDataQA · 2019-04-15T08:56:48Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3107/

CarbonDataQA · 2019-04-15T08:59:35Z

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11136/

kunal642 · 2019-04-16T07:55:05Z

@ravipesala build passed..please merge

Problem: During partition load, while writing merge index files the FactTimestamp in load model is being changed to current timestamp due to which a new file with mergeindex entry is written. Solution: Set new timestamp if FactTimestamp in load model is 0L(meaning nothing is set). This closes #3167

Problem: During partition load, while writing merge index files the FactTimestamp in load model is being changed to current timestamp due to which a new file with mergeindex entry is written. Solution: Set new timestamp if FactTimestamp in load model is 0L(meaning nothing is set). This closes apache#3167

qiuchenjian reviewed Mar 28, 2019

View reviewed changes

...src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala Outdated Show resolved Hide resolved

fixed multiple segment file issue for partition

4584a23

kunal642 force-pushed the bug/CARBONDATA-3334 branch from 0700b14 to 4584a23 Compare March 28, 2019 09:42

asfgit closed this in 32af97e Apr 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CARBONDATA-3334] fixed multiple segment file issue for partition #3167

[CARBONDATA-3334] fixed multiple segment file issue for partition #3167

kunal642 commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

kunal642 commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

xuchuanyin commented Apr 4, 2019 •

edited

kunal642 commented Apr 4, 2019 •

edited

kunal642 commented Apr 15, 2019

CarbonDataQA commented Apr 15, 2019

CarbonDataQA commented Apr 15, 2019

CarbonDataQA commented Apr 15, 2019

ravipesala commented Apr 15, 2019

ravipesala commented Apr 15, 2019

CarbonDataQA commented Apr 15, 2019

CarbonDataQA commented Apr 15, 2019

CarbonDataQA commented Apr 15, 2019

kunal642 commented Apr 16, 2019

[CARBONDATA-3334] fixed multiple segment file issue for partition #3167

[CARBONDATA-3334] fixed multiple segment file issue for partition #3167

Conversation

kunal642 commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

kunal642 commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

CarbonDataQA commented Mar 28, 2019

xuchuanyin commented Apr 4, 2019 • edited

kunal642 commented Apr 4, 2019 • edited

kunal642 commented Apr 15, 2019

CarbonDataQA commented Apr 15, 2019

CarbonDataQA commented Apr 15, 2019

CarbonDataQA commented Apr 15, 2019

ravipesala commented Apr 15, 2019

ravipesala commented Apr 15, 2019

CarbonDataQA commented Apr 15, 2019

CarbonDataQA commented Apr 15, 2019

CarbonDataQA commented Apr 15, 2019

kunal642 commented Apr 16, 2019

xuchuanyin commented Apr 4, 2019 •

edited

kunal642 commented Apr 4, 2019 •

edited