Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3334] fixed multiple segment file issue for partition #3167

Closed
wants to merge 1 commit into from

Conversation

kunal642
Copy link
Contributor

Problem:
During partition load, while writing merge index files the FactTimestamp in load model is being changed to current timestamp due to which a new file with mergeindex entry is written.

Solution:
Set new timestamp if FactTimestamp in load model is 0L(meaning nothing is set).

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?

  • Any backward compatibility impacted?

  • Document update required?

  • Testing done
    Please provide details on
    - Whether new unit test cases have been added or why no new tests are required?
    - How it is tested? Please attach test report.
    - Is it a performance related change? Please attach the performance test report.
    - Any additional information to help reviewers in testing this change.

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@CarbonDataQA
Copy link

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11083/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2823/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2824/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11084/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3056/

@kunal642
Copy link
Contributor Author

retest this please

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3057/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2825/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11085/

@xuchuanyin
Copy link
Contributor

xuchuanyin commented Apr 4, 2019

Feel like this modification is not elegant...
The modification still confuse my even I've read the description of the PR...

It is the MergeIndexWriter for partition loading that cause the problem, but you modify the loadModel instead. It seems that the writer and loadModel have some negotiations which makes the code complicated especially when the loadModel is at the beginning of loading while the MergeIndexWriter is at almost the end of loading.

@kunal642
Copy link
Contributor Author

kunal642 commented Apr 4, 2019

Hi xuchuanyin,
The problem was that before launching the job to merge index files we are changing the fact timestamp sue to which a new segments file is written with mergeindex details.
Example:-
load writes 0_t1.segment
while the mergeindex will write 0_t2.segment.

My fix is that if the load has already identifier a timestamp for the segments file then mergeindex should use the same.

I understand the check does not look good but the other solution would be to remove the 0_t1.segment file from merge index flow. i dont think that way would be clean either

@kunal642
Copy link
Contributor Author

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2873/

@CarbonDataQA
Copy link

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11133/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3104/

@ravipesala
Copy link
Contributor

retest this please

@ravipesala
Copy link
Contributor

LGTM

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2876/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3107/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11136/

@kunal642
Copy link
Contributor Author

@ravipesala build passed..please merge

@asfgit asfgit closed this in 32af97e Apr 17, 2019
asfgit pushed a commit that referenced this pull request May 16, 2019
Problem:
During partition load, while writing merge index files the FactTimestamp in load model is being changed to current timestamp due to which a new file with mergeindex entry is written.

Solution:
Set new timestamp if FactTimestamp in load model is 0L(meaning nothing is set).

This closes #3167
qiuchenjian pushed a commit to qiuchenjian/carbondata that referenced this pull request Jun 14, 2019
Problem:
During partition load, while writing merge index files the FactTimestamp in load model is being changed to current timestamp due to which a new file with mergeindex entry is written.

Solution:
Set new timestamp if FactTimestamp in load model is 0L(meaning nothing is set).

This closes apache#3167
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants