Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-1086] updated configuration-parameters.md and dml-operation-on-carbondata for SORT_SCOPE #1205

Closed
wants to merge 4 commits into from

Conversation

vandana7
Copy link
Contributor

No description provided.

@asfgit
Copy link

asfgit commented Jul 28, 2017

Can one of the admins verify this patch?

1 similar comment
@asfgit
Copy link

asfgit commented Jul 28, 2017

Can one of the admins verify this patch?

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3236/

@CarbonDataQA
Copy link

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/641/

Copy link
Contributor

@sgururajshetty sgururajshetty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort.inmemory.size.inmb the default value is 2048. Please check and confirm once and update the same.
carbon.load.batch.sort.size.inmb - I think it doesnt have adefault value. Please check and confirm once and updaye the same.

@PallaviSingh1992
Copy link
Contributor

sort.inmemory.size.inmb the default value is 1024 as per the code

@vandana7
Copy link
Contributor Author

@sgururajshetty i have updated the required changes please review

@CarbonDataQA
Copy link

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/695/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3290/

@CarbonDataQA
Copy link

Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/706/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3302/

@sgururajshetty
Copy link
Contributor

LGTM
@chenliang613 kindly review and merge.


* batch_sort_size_inmb : Size of data in MB to be processed in batch. By default it is the 45 percent size of sort.inmemory.size.inmb(Memory size in MB available for in-memory sort).

For GLOBAL_SORT :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: add below note:
'SINGLE_PASS' must be false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
I tried to execute the LOAD query with single_pass= 'true' and sort_scope='BATCH_SORT', it successfully executed and i was able to fetch the records in sorted way
syntax i used to execute load query - LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','SINGLE_PASS'='TRUE','SORT_SCOPE'='BATCH_SORT','batch_sort_size_inmb'='7');

Please let me know if i am doing anything wrong

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that if SORT_SCOPE=GLOBAL_SORT,single_pass must be false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my mistake, now when sort_scope='GLOBAL_SORT', single_pass can be 'true', I have raised a pr to remove this restriction of code (PR-1224).

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3307/

@CarbonDataQA
Copy link

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/711/

@CarbonDataQA
Copy link

Can one of the admins verify this patch?


* GLOBAL_SORT : The sorting scope is bigger and one index tree per task will be created, thus loading is slower but query is faster.

* NO_SORT : Feasible if we want to load our data in unsorted manner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce this first

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not we

@@ -149,6 +149,50 @@ You can use the following options to load data:

* If this option is set to TRUE, then high.cardinality.identify.enable property will be disabled during data load.

- **SORT_SCOPE:** This property can have four possible values :

* BATCH_SORT : The sorting scope is smaller and more index tree will be created,thus loading is faster but query maybe slower.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention this is sort based on memory size, and carbon will create one index for each batch


* LOCAL_SORT : The sorting scope is bigger and one index tree per data node will be created, thus loading is slower but query is faster.

* GLOBAL_SORT : The sorting scope is bigger and one index tree per task will be created, thus loading is slower but query is faster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not per task, it is sorting in whole cluster

@ravipesala
Copy link
Contributor

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3371/

@ravipesala
Copy link
Contributor

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/774/

@ravipesala
Copy link
Contributor

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3372/

@ravipesala
Copy link
Contributor

Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/775/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/33/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1212/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/581/

@vandana7 vandana7 closed this Nov 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants