Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3820] CDC update as new segment should use target table sort_scope #3901

Closed
wants to merge 1 commit into from

Conversation

ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented Aug 26, 2020

Why is this PR needed?

Originally when CDC is developed, it was using target table sort_scope for new segment added but
#3764 has added nosort (this is wrong code, but no functional impact as it was not changing new segment load to no_sort)
#3856 has changed it to no_sort (creates a functional impact by changing target table new segment to use to no_sort)

What changes were proposed in this PR?

CDC update as new segment should use target table sort_scope

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • No (verified manually the flows)

@ajantha-bhat
Copy link
Member Author

@QiangCai , @ravipesala @marchpure @akashrn5 : please check this

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3875/

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2134/

@Zhangshunyu
Copy link
Contributor

set 'no_sort' for cdc is for better load performance during merge, but i think we should keep same as target table.

@ajantha-bhat
Copy link
Member Author

ajantha-bhat commented Aug 28, 2020

set 'no_sort' for cdc is for better load performance during merge, but i think we should keep same as target table.

@Zhangshunyu : to have a faster CDC merge, target table itself can be created with no_sort. Now if target table is global sort, old segments are sorted and new CDC segmets are not. so, I fixed it.

@Zhangshunyu
Copy link
Contributor

@ajantha-bhat yes, agree with this pr's change, to keep same as target.

@akashrn5
Copy link
Contributor

@ajantha-bhat if the target table is no sort and since we are inserting new segment as a separate segment during merge, we can sort this segment and write which will help in query, instead of blindly going with target table sort?

@ajantha-bhat
Copy link
Member Author

@ajantha-bhat if the target table is no sort and since we are inserting new segment as a separate segment during merge, we can sort this segment and write which will help in query, instead of blindly going with target table sort?

It is not blindly. The user has decided whether his table needs to be sorted or not based on his requirement (no_sort if want good load speed, global_sort if want good query speed), so it is better to have all segment follow user decision.

@QiangCai
Copy link
Contributor

QiangCai commented Sep 1, 2020

LGTM

@asfgit asfgit closed this in 5e4f16c Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants