Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3273] [CARBONDATA-3274] Fix for SORT_SCOPE in CarbonLoadDataCommand #3103

Closed
wants to merge 1 commit into from

Conversation

NamanRastogi
Copy link
Contributor

@NamanRastogi NamanRastogi commented Jan 25, 2019

Problem1: With no SORT_COLUMNS, loading data was taking SORT_SCOPE=LOCAL_SORT instead of NO_SORT.
Solution: Added a check for SORT_COLUMNS in CarbonLoadDataCommand

Problem2: On table with some SORT_COLUMNS and SORT_SCOPE not specified, SORT_SCOPE was not considering CARBON.OPTIONS.SORT.SCOPE for SORT_SCOPE.
Solution: Added checking of CARBON.OPTIONS.SORT.SCOPE while loading.

  • Any interfaces changed? -> No
  • Any backward compatibility impacted? -> No
  • Document update required? -> No
  • Testing done -> Yes
  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2441/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10699/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2669/

// If there are Sort Columns given for the table and Sort Scope is not specified,
// we will take it as whichever sort scope given or LOCAL_SORT as default
if (tableProperties.get(CarbonCommonConstants.SORT_COLUMNS) == null ||
tableProperties.get(CarbonCommonConstants.SORT_COLUMNS).equals("")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tableProperties.get(CarbonCommonConstants.SORT_COLUMNS).equals("")) {
tableProperties.get(CarbonCommonConstants.SORT_COLUMNS).trim.equals("")) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String with spaces will be caught before reaching this step. So no need to add trim.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use StringUtils.isBlank() to avoid retrieve themember twice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

optionsFinal.put(CarbonCommonConstants.SORT_SCOPE, "NO_SORT")
}
else if (tableProperties.get(CarbonCommonConstants.SORT_SCOPE) == null ||
tableProperties.get(CarbonCommonConstants.SORT_SCOPE).equals("")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tableProperties.get(CarbonCommonConstants.SORT_SCOPE).equals("")) {
tableProperties.get(CarbonCommonConstants.SORT_SCOPE).trim.equals("")) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String with spaces will be caught before reaching this step. So no need to add trim.

@NamanRastogi NamanRastogi changed the title [CARBONDATA-3273] Fix SORT_SCOPE with no SORT_COLUMNS [CARBONDATA-3273] Fix for SORT_SCOPE in CarbonLoadDataCommand Jan 25, 2019
@NamanRastogi NamanRastogi changed the title [CARBONDATA-3273] Fix for SORT_SCOPE in CarbonLoadDataCommand [CARBONDATA-3273] [CARBONDATA-3274] Fix for SORT_SCOPE in CarbonLoadDataCommand Jan 25, 2019
@NamanRastogi NamanRastogi force-pushed the load_sort_scope_fix branch 2 times, most recently from a36a71c to 8b73763 Compare January 25, 2019 12:26
table.getDatabaseName + "." + table.getTableName,
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE,
carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE,
"LOCAL_SORT"))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is LOAD_SORT_SCOPE_DEFAULT changed into "NO_SORT" in CarbonCommonConstants and this default value is also "LOCAL_SORT"
the ddl doc also describe that "NO_SORT" is the default value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the SORT_COLUMNS are set and any SORT_SCOPE is not found, then LOCAL_SORT is used.
If no SORT_COLUMNS are set, then take the SORT_SCOPE as NO_SORT.

The default behaviour is NO_SORT if SORT_COLUMNS are not provided, and LOCAL_SORT if SORT_COLUMNS are provided explicetely in CREATE TABLE command. If nothing is specified in CREATE TABLE command, it takes SORT_SCOPE as NO_SORT and SORT_COLUMNS as none.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid using the plain string here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the behavior you mentioned above, is it exactly the doc trying to describe? If not, please optimize the doc in this PR to ensure the modifications' integrity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides, please check the comments of this method and ensure its correctness after your modification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuchuanyin
Changed plain string to enum.
Updated the doc as well. It now includes the priority of Sort Scope checking.
Comments above are correct.

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2445/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2448/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2675/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10705/

// If there are Sort Columns given for the table and Sort Scope is not specified,
// we will take it as whichever sort scope given or LOCAL_SORT as default
if (tableProperties.get(CarbonCommonConstants.SORT_COLUMNS) == null ||
tableProperties.get(CarbonCommonConstants.SORT_COLUMNS).equals("")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use StringUtils.isBlank() to avoid retrieve themember twice

if (tableProperties.get(CarbonCommonConstants.SORT_COLUMNS) == null ||
tableProperties.get(CarbonCommonConstants.SORT_COLUMNS).equals("")) {
// If tableProperties.SORT_COLUMNS is null
optionsFinal.put(CarbonCommonConstants.SORT_SCOPE, "NO_SORT")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of plain string 'NO_SORT', do we have Constant Variable or enum for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
Used SortScopeOptions.SortScope.LOCAL_SORT.name for string "LOCAL_SORT"

// If tableProperties.SORT_COLUMNS is null
optionsFinal.put(CarbonCommonConstants.SORT_SCOPE, "NO_SORT")
}
else if (tableProperties.get(CarbonCommonConstants.SORT_SCOPE) == null ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to the previous line.
Please be careful with the code stype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
else if (tableProperties.get(CarbonCommonConstants.SORT_SCOPE) == null ||
tableProperties.get(CarbonCommonConstants.SORT_SCOPE).equals("")) {
// If tableProperties.SORT_COLUMNS is not null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please optimize the comment here:
in case SORT_COLUMNS is not set and SORT_SCOPE is set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment

    else if (StringUtils.isBlank(tableProperties.get(CarbonCommonConstants.SORT_SCOPE))) {
      // If tableProperties.SORT_COLUMNS is not null
      // and tableProperties.SORT_SCOPE is null
      optionsFinal.put(CarbonCommonConstants.SORT_SCOPE,

is already there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By saying ‘set’ or 'not set', it refers to the business logic while ‘null’ or 'not null' does not refer to that logic directly.

table.getDatabaseName + "." + table.getTableName,
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE,
carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE,
"LOCAL_SORT"))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid using the plain string here

table.getDatabaseName + "." + table.getTableName,
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE,
carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE,
"LOCAL_SORT"))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the behavior you mentioned above, is it exactly the doc trying to describe? If not, please optimize the doc in this PR to ensure the modifications' integrity.

table.getDatabaseName + "." + table.getTableName,
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE,
carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE,
"LOCAL_SORT"))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides, please check the comments of this method and ensure its correctness after your modification.

@NamanRastogi
Copy link
Contributor Author

NamanRastogi commented Jan 28, 2019

@xuchuanyin I have updated the Doc as well. Added Priority for Sort_Scope checking.

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2472/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10730/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2700/

@kumarvishal09
Copy link
Contributor

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2479/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10737/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2708/

// If tableProperties.SORT_COLUMNS is not null
// and tableProperties.SORT_SCOPE is null
optionsFinal.put(CarbonCommonConstants.SORT_SCOPE,
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_TABLE_LOAD_SORT_SCOPE +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first priority should be load options. Please check it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

1. With SORT_COLUMNS=null, loading data was resulting in SORT_SCOPE=LOCAL_SORT.

2. With SORT_COLUMNS!=null and SORT_SCOPE not provided, loading data was not checking CARBON.OPTIONS.SORT.SCOPE.
@ravipesala
Copy link
Contributor

LGTM

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10746/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2718/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2490/

@xuchuanyin
Copy link
Contributor

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2494/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10752/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2723/

@asfgit asfgit closed this in b82cd1b Jan 29, 2019
asfgit pushed a commit that referenced this pull request Jan 30, 2019
…ataCommand

Problem1: With no SORT_COLUMNS, loading data was taking SORT_SCOPE=LOCAL_SORT instead of NO_SORT.
Solution: Added a check for SORT_COLUMNS in CarbonLoadDataCommand

Problem2: On table with some SORT_COLUMNS and SORT_SCOPE not specified, SORT_SCOPE was not considering CARBON.OPTIONS.SORT.SCOPE for SORT_SCOPE.
Solution: Added checking of CARBON.OPTIONS.SORT.SCOPE while loading.

This closes #3103
asfgit pushed a commit that referenced this pull request Jan 30, 2019
…ataCommand

Problem1: With no SORT_COLUMNS, loading data was taking SORT_SCOPE=LOCAL_SORT instead of NO_SORT.
Solution: Added a check for SORT_COLUMNS in CarbonLoadDataCommand

Problem2: On table with some SORT_COLUMNS and SORT_SCOPE not specified, SORT_SCOPE was not considering CARBON.OPTIONS.SORT.SCOPE for SORT_SCOPE.
Solution: Added checking of CARBON.OPTIONS.SORT.SCOPE while loading.

This closes #3103
qiuchenjian pushed a commit to qiuchenjian/carbondata that referenced this pull request Jun 14, 2019
…ataCommand

Problem1: With no SORT_COLUMNS, loading data was taking SORT_SCOPE=LOCAL_SORT instead of NO_SORT.
Solution: Added a check for SORT_COLUMNS in CarbonLoadDataCommand

Problem2: On table with some SORT_COLUMNS and SORT_SCOPE not specified, SORT_SCOPE was not considering CARBON.OPTIONS.SORT.SCOPE for SORT_SCOPE.
Solution: Added checking of CARBON.OPTIONS.SORT.SCOPE while loading.

This closes apache#3103
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants