-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-1086] updated configuration-parameters.md and dml-operation-on-carbondata for SORT_SCOPE #1205
Conversation
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3236/ |
Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/641/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort.inmemory.size.inmb the default value is 2048. Please check and confirm once and update the same.
carbon.load.batch.sort.size.inmb - I think it doesnt have adefault value. Please check and confirm once and updaye the same.
sort.inmemory.size.inmb the default value is 1024 as per the code |
@sgururajshetty i have updated the required changes please review |
Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/695/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3290/ |
Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/706/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3302/ |
LGTM |
|
||
* batch_sort_size_inmb : Size of data in MB to be processed in batch. By default it is the 45 percent size of sort.inmemory.size.inmb(Memory size in MB available for in-memory sort). | ||
|
||
For GLOBAL_SORT : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: add below note:
'SINGLE_PASS' must be false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
I tried to execute the LOAD query with single_pass= 'true' and sort_scope='BATCH_SORT', it successfully executed and i was able to fetch the records in sorted way
syntax i used to execute load query - LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_PASS'='TRUE','SORT_SCOPE'='BATCH_SORT','batch_sort_size_inmb'='7');
Please let me know if i am doing anything wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that if SORT_SCOPE=GLOBAL_SORT,single_pass must be false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my mistake, now when sort_scope='GLOBAL_SORT', single_pass can be 'true', I have raised a pr to remove this restriction of code (PR-1224).
431a333
to
fe83540
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3307/ |
Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/711/ |
Can one of the admins verify this patch? |
docs/dml-operation-on-carbondata.md
Outdated
|
||
* GLOBAL_SORT : The sorting scope is bigger and one index tree per task will be created, thus loading is slower but query is faster. | ||
|
||
* NO_SORT : Feasible if we want to load our data in unsorted manner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduce this first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not we
docs/dml-operation-on-carbondata.md
Outdated
@@ -149,6 +149,50 @@ You can use the following options to load data: | |||
|
|||
* If this option is set to TRUE, then high.cardinality.identify.enable property will be disabled during data load. | |||
|
|||
- **SORT_SCOPE:** This property can have four possible values : | |||
|
|||
* BATCH_SORT : The sorting scope is smaller and more index tree will be created,thus loading is faster but query maybe slower. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mention this is sort based on memory size, and carbon will create one index for each batch
docs/dml-operation-on-carbondata.md
Outdated
|
||
* LOCAL_SORT : The sorting scope is bigger and one index tree per data node will be created, thus loading is slower but query is faster. | ||
|
||
* GLOBAL_SORT : The sorting scope is bigger and one index tree per task will be created, thus loading is slower but query is faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not per task, it is sorting in whole cluster
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3371/ |
Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/774/ |
c226ad1
to
ee0bf41
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3372/ |
Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/775/ |
…md for sort_scope feature
…batch.sort.size.inmb
ee0bf41
to
c0e33a9
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/33/ |
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1212/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/581/ |
No description provided.