[CARBONDATA-1086] updated configuration-parameters.md and dml-operation-on-carbondata for SORT_SCOPE #1205

vandana7 · 2017-07-28T07:11:46Z

No description provided.

asfgit · 2017-07-28T07:11:48Z

Can one of the admins verify this patch?

asfgit · 2017-07-28T07:11:48Z

Can one of the admins verify this patch?

CarbonDataQA · 2017-07-28T07:24:01Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3236/

CarbonDataQA · 2017-07-28T07:25:17Z

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/641/

sgururajshetty

sort.inmemory.size.inmb the default value is 2048. Please check and confirm once and update the same.
carbon.load.batch.sort.size.inmb - I think it doesnt have adefault value. Please check and confirm once and updaye the same.

PallaviSingh1992 · 2017-07-31T10:21:01Z

sort.inmemory.size.inmb the default value is 1024 as per the code

vandana7 · 2017-07-31T10:33:10Z

@sgururajshetty i have updated the required changes please review

CarbonDataQA · 2017-07-31T10:43:49Z

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/695/

CarbonDataQA · 2017-07-31T10:44:28Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3290/

CarbonDataQA · 2017-07-31T14:58:46Z

Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/706/

CarbonDataQA · 2017-07-31T15:00:13Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3302/

sgururajshetty · 2017-08-01T06:26:21Z

LGTM
@chenliang613 kindly review and merge.

zzcclp · 2017-08-01T07:14:31Z

docs/dml-operation-on-carbondata.md

+
+    * batch_sort_size_inmb : Size of data in MB to be processed in batch. By default it is the 45 percent size of sort.inmemory.size.inmb(Memory size in MB available for in-memory sort).
+
+    For GLOBAL_SORT :


Suggestion: add below note:
'SINGLE_PASS' must be false.

Hi,
I tried to execute the LOAD query with single_pass= 'true' and sort_scope='BATCH_SORT', it successfully executed and i was able to fetch the records in sorted way
syntax i used to execute load query - LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_PASS'='TRUE','SORT_SCOPE'='BATCH_SORT','batch_sort_size_inmb'='7');

Please let me know if i am doing anything wrong

I mean that if SORT_SCOPE=GLOBAL_SORT,single_pass must be false

Sorry for my mistake, now when sort_scope='GLOBAL_SORT', single_pass can be 'true', I have raised a pr to remove this restriction of code (PR-1224).

CarbonDataQA · 2017-08-01T09:52:17Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3307/

CarbonDataQA · 2017-08-01T09:55:05Z

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/711/

CarbonDataQA · 2017-08-02T08:45:20Z

Can one of the admins verify this patch?

jackylk · 2017-08-02T16:15:16Z

docs/dml-operation-on-carbondata.md

+
+    * GLOBAL_SORT : The sorting scope is bigger and one index tree per task will be created, thus loading is slower but query is faster.
+
+    * NO_SORT     : Feasible if we want to load our data in unsorted manner.


Introduce this first

jackylk · 2017-08-02T16:16:42Z

docs/dml-operation-on-carbondata.md

@@ -149,6 +149,50 @@ You can use the following options to load data:

   * If this option is set to TRUE, then high.cardinality.identify.enable property will be disabled during data load.

+- **SORT_SCOPE:** This property can have four possible values :
+
+    * BATCH_SORT : The sorting scope is smaller and more index tree will be created,thus loading is faster but query maybe slower.


mention this is sort based on memory size, and carbon will create one index for each batch

jackylk · 2017-08-02T16:17:35Z

docs/dml-operation-on-carbondata.md

+
+    * LOCAL_SORT : The sorting scope is bigger and one index tree per data node will be created, thus loading is slower but query is faster.
+
+    * GLOBAL_SORT : The sorting scope is bigger and one index tree per task will be created, thus loading is slower but query is faster.


not per task, it is sorting in whole cluster

ravipesala · 2017-08-04T13:09:02Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3371/

ravipesala · 2017-08-04T13:10:52Z

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/774/

ravipesala · 2017-08-04T13:27:06Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3372/

ravipesala · 2017-08-04T13:30:21Z

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/775/

…md for sort_scope feature

…batch.sort.size.inmb

CarbonDataQA · 2017-09-12T23:44:07Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/33/

ravipesala · 2017-10-23T08:51:06Z

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1212/

CarbonDataQA · 2017-10-23T14:31:51Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/581/

sgururajshetty suggested changes Jul 31, 2017

View reviewed changes

zzcclp approved these changes Aug 1, 2017

View reviewed changes

vandana7 force-pushed the sort_scope_update branch from 431a333 to fe83540 Compare August 1, 2017 09:39

jackylk reviewed Aug 2, 2017

View reviewed changes

vandana7 force-pushed the sort_scope_update branch from c226ad1 to ee0bf41 Compare August 4, 2017 13:15

vandana7 added 4 commits August 23, 2017 09:53

updated configuration-parameters.md and dml-operation-on-carbondata.…

fec89c5

…md for sort_scope feature

updated configuration-parameters.md for default value of carbon.load.…

f982b48

…batch.sort.size.inmb

updated full stop for carbon.load.batch.sort.size.inmb property

2a73c0c

updated sort_scope as per required changes

c0e33a9

vandana7 force-pushed the sort_scope_update branch from ee0bf41 to c0e33a9 Compare August 23, 2017 04:26

vandana7 closed this Nov 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CARBONDATA-1086] updated configuration-parameters.md and dml-operation-on-carbondata for SORT_SCOPE #1205

[CARBONDATA-1086] updated configuration-parameters.md and dml-operation-on-carbondata for SORT_SCOPE #1205

vandana7 commented Jul 28, 2017

asfgit commented Jul 28, 2017

asfgit commented Jul 28, 2017

CarbonDataQA commented Jul 28, 2017

CarbonDataQA commented Jul 28, 2017

sgururajshetty left a comment

PallaviSingh1992 commented Jul 31, 2017

vandana7 commented Jul 31, 2017

CarbonDataQA commented Jul 31, 2017

CarbonDataQA commented Jul 31, 2017

CarbonDataQA commented Jul 31, 2017

CarbonDataQA commented Jul 31, 2017

sgururajshetty commented Aug 1, 2017

zzcclp Aug 1, 2017

vandana7 Aug 1, 2017

zzcclp Aug 1, 2017

zzcclp Aug 2, 2017

CarbonDataQA commented Aug 1, 2017

CarbonDataQA commented Aug 1, 2017

CarbonDataQA commented Aug 2, 2017

jackylk Aug 2, 2017

jackylk Aug 2, 2017

jackylk Aug 2, 2017

jackylk Aug 2, 2017

ravipesala commented Aug 4, 2017

ravipesala commented Aug 4, 2017

ravipesala commented Aug 4, 2017

ravipesala commented Aug 4, 2017

CarbonDataQA commented Sep 12, 2017

ravipesala commented Oct 23, 2017

CarbonDataQA commented Oct 23, 2017


		* batch_sort_size_inmb : Size of data in MB to be processed in batch. By default it is the 45 percent size of sort.inmemory.size.inmb(Memory size in MB available for in-memory sort).

		For GLOBAL_SORT :


		* GLOBAL_SORT : The sorting scope is bigger and one index tree per task will be created, thus loading is slower but query is faster.

		* NO_SORT : Feasible if we want to load our data in unsorted manner.


		* LOCAL_SORT : The sorting scope is bigger and one index tree per data node will be created, thus loading is slower but query is faster.

		* GLOBAL_SORT : The sorting scope is bigger and one index tree per task will be created, thus loading is slower but query is faster.

[CARBONDATA-1086] updated configuration-parameters.md and dml-operation-on-carbondata for SORT_SCOPE #1205

[CARBONDATA-1086] updated configuration-parameters.md and dml-operation-on-carbondata for SORT_SCOPE #1205

Conversation

vandana7 commented Jul 28, 2017

asfgit commented Jul 28, 2017

asfgit commented Jul 28, 2017

CarbonDataQA commented Jul 28, 2017

CarbonDataQA commented Jul 28, 2017

sgururajshetty left a comment

Choose a reason for hiding this comment

PallaviSingh1992 commented Jul 31, 2017

vandana7 commented Jul 31, 2017

CarbonDataQA commented Jul 31, 2017

CarbonDataQA commented Jul 31, 2017

CarbonDataQA commented Jul 31, 2017

CarbonDataQA commented Jul 31, 2017

sgururajshetty commented Aug 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarbonDataQA commented Aug 1, 2017

CarbonDataQA commented Aug 1, 2017

CarbonDataQA commented Aug 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ravipesala commented Aug 4, 2017

ravipesala commented Aug 4, 2017

ravipesala commented Aug 4, 2017

ravipesala commented Aug 4, 2017

CarbonDataQA commented Sep 12, 2017

ravipesala commented Oct 23, 2017

CarbonDataQA commented Oct 23, 2017