Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3373] Optimize scenes with in numbers in SQL #3209

Closed
wants to merge 2 commits into from

Conversation

@zhxiaoping
Copy link

zhxiaoping commented May 7, 2019

when sql with 'in numbers' and spark.sql.codegen.wholeStage is false,the query is slow,

the reason is that canbonscan row level filter's time complexity is O(n^2), we can replace list with hashset to improve query performance

sql example: select * from xx where filed in (1,2,3,4,5,6)
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?

  • Any backward compatibility impacted?

  • Document update required?

  • Testing done
    Please provide details on
    - Whether new unit test cases have been added or why no new tests are required?
    - How it is tested? Please attach test report.
    - Is it a performance related change? Please attach the performance test report.
    - Any additional information to help reviewers in testing this change.

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@zzcclp

This comment has been minimized.

Copy link
Contributor

zzcclp commented May 7, 2019

add to whitelist

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented May 7, 2019

Can one of the admins verify this patch?

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented May 7, 2019

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11387/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented May 7, 2019

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3322/

.getMeasureObjectBasedOnDataType(columnPage, startIndex,
msrType, msrColumnEvaluatorInfo.getMeasure());

if (filterValuesSet.contains(msrValue)) {

This comment has been minimized.

Copy link
@KanakaKumar

KanakaKumar May 8, 2019

Contributor

HashSet/Map lookup depends on hashcode . I think this may not match incase of decimal & float value Example. 123.000 & 123.0 comparison with equals would work but hashcode would be different.

I think the optimal approach would be to use TreeMap with comparator by data type [TreeMap(Comparator<? super K> comparator)] which does binary search on the sorted filter values.

@ravipesala , @kumarvishal09 please correct me If it was intentional to use Array for any specific reason.

This comment has been minimized.

Copy link
@zhxiaoping

zhxiaoping May 8, 2019

Author

I debuged case of decimal and float, if one filed's type is float and one of values is 123.000 and filter condition is 123.00, in picture's place the filed's value was converted to 123.0 and filter's value was converted to 123.0, so they are equal, the decimal is the same.

This comment has been minimized.

Copy link
@KanakaKumar

KanakaKumar May 14, 2019

Contributor

Ok.. Float and Double hashcode implementation is taken care. But Decimal data type seems creates problem. Please refer below sample code.

BigDecimal bigDecimal = new BigDecimal("4743.00");
BigDecimal bigDecimal2 = new BigDecimal("4743.0");

System.out.println(bigDecimal.compareTo(bigDecimal2)==0); // compareTo returns true

HashSet<BigDecimal> decimals = new HashSet<>();
decimals.add(bigDecimal);
decimals.add(bigDecimal2);

System.out.println(decimals.toString()); // Output: [4743.00, 4743.0]
System.out.println(decimals.contains(new BigDecimal("4743"))); //Returns false
System.out.println(decimals.contains(new BigDecimal("4743.0000"))); // Returns false
System.out.println(decimals.contains(new BigDecimal("4743.00"))); // Returns true
System.out.println(decimals.contains(new BigDecimal("4743.0"))); // Returns true

TreeSet<BigDecimal> treeSet = new TreeSet();
treeSet.add(bigDecimal);
treeSet.add(bigDecimal2);

System.out.println(treeSet.toString()); // Output:  [4743.00]
System.out.println(treeSet.contains(new BigDecimal("4743"))); //Returns true
System.out.println(treeSet.contains(new BigDecimal("4743.0000"))); // Returns true
System.out.println(treeSet.contains(new BigDecimal("4743.00"))); // Returns true
System.out.println(treeSet.contains(new BigDecimal("4743.0"))); // Returns true

So, please add a test case to validate this scenario also.

This comment has been minimized.

Copy link
@ravipesala

ravipesala May 16, 2019

Contributor

I agree with @KanakaKumar current implementation is based on SerializableComparator so it is better to use TreeSet to use compareTo instead of 'hashcode' and 'equals' methods. BigDecimal behaviour is different for equal and compareTo methods. So better stick with TreeSet to continue the old behaviour rather than changing the behaviour.

This comment has been minimized.

Copy link
@zhxiaoping

zhxiaoping May 19, 2019

Author

if you guys review this code without context, it exactly is what you said above, but the filter value and column value already were converted to be that type is the same and precision is the same before my modified code. so the situation you worry about will never happen. I add a test case to the project, and it all passed. the last point : if I use treemap, it will increase the time complexity ,please review again, thank you,

This comment has been minimized.

Copy link
@MarvinLitt

MarvinLitt May 25, 2019

Contributor

how about this PR,does need to do modify again?

This comment has been minimized.

Copy link
@KanakaKumar

KanakaKumar Jun 7, 2019

Contributor

@zhxiaoping , OK. Seems right now IncludeFilterExecuterImpl is not used for Decimal type case.
Please correct the test case with rows having different precision (4743, 4743.00, 4743.0000, 4743.0) and assert the IN filter query should result all values to identify failures in case its used in future for BigDecimals.


public void setFilterKeys(Object[] filterKeys) {
this.filterKeys = filterKeys;
}

public void setFilterKeysSet(Object[] filterKeys) {
this.filterKeysSet = new HashSet<Object>(Arrays.asList(filterKeys));

This comment has been minimized.

Copy link
@qiuchenjian

qiuchenjian May 15, 2019

Contributor

better to judge "filterKeys is null"

This comment has been minimized.

Copy link
@zhxiaoping

zhxiaoping May 19, 2019

Author

it must contain null, because when the case is sql with where field is null, it will also run this code.
if we remove null value, it can not filter null value, and the result will be wrong.

This comment has been minimized.

Copy link
@qiuchenjian

qiuchenjian May 22, 2019

Contributor

I mean that if filterKeys is null, we should use Arrays.asList(filterKeys), we can use new HashSet<Object>() directly, because Arrays.asList will throw NPE

This comment has been minimized.

Copy link
@zhxiaoping

zhxiaoping May 22, 2019

Author

Object[] keysBasedOnFilter = filterValues.getMeasuresFilterValuesList() .toArray((new Object[filterValues.getMeasuresFilterValuesList().size()]));
msrColumnExecuterInfo.setFilteKeysSet(keysBasedOnFilter);
the filterkeys is nerver null. but the element in filterkeys can be null.

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented May 19, 2019

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11505/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented May 19, 2019

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3437/

@ravipesala

This comment has been minimized.

Copy link
Contributor

ravipesala commented Jun 7, 2019

LGTM

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 10, 2019

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3688/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 10, 2019

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11753/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 11, 2019

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11758/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 11, 2019

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3692/

@zhxiaoping

This comment has been minimized.

Copy link
Author

zhxiaoping commented Jun 11, 2019

retest this please

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 11, 2019

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11769/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 11, 2019

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3703/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 12, 2019

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11776/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 12, 2019

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3710/

@zhxiaoping zhxiaoping force-pushed the zhxiaoping:zcarbon branch from 50ee512 to 4424f2b Jun 12, 2019
@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 12, 2019

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11795/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 12, 2019

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3728/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 13, 2019

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11802/

@zhxiaoping

This comment has been minimized.

Copy link
Author

zhxiaoping commented Jun 13, 2019

retest this please

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 13, 2019

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3735/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 13, 2019

Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11805/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 13, 2019

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3738/

@kevinjmh kevinjmh mentioned this pull request Jun 13, 2019
0 of 5 tasks complete
@kevinjmh

This comment has been minimized.

Copy link
Member

kevinjmh commented Jun 14, 2019

retest this please

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 14, 2019

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3561/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 14, 2019

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3768/

@CarbonDataQA

This comment has been minimized.

Copy link

CarbonDataQA commented Jun 14, 2019

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/11835/

@zhxiaoping zhxiaoping mentioned this pull request Jun 19, 2019
0 of 5 tasks complete
@zhxiaoping zhxiaoping closed this Jun 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.