Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-2750] Updated documentation on Local Dictionary Supoort #2590

Closed

Conversation

praveenmeenakshi56
Copy link
Contributor

@praveenmeenakshi56 praveenmeenakshi56 commented Jul 31, 2018

Updated Documentation on Local Dictionary Support. Changed default scenario for Local dictionary to false.

NOTE: The Bad Records path was moved in between Local Dictionary documentation in another PR. In this, that has just been moved into Bad Records Handling

  • Any interfaces changed?
    NA
  • Any backward compatibility impacted?
    NA
  • Document update required?
    Document Updated
  • Testing done
    Please provide details on
    - Whether new unit test cases have been added or why no new tests are required?
    - How it is tested? Please attach test report.
    - Is it a performance related change? Please attach the performance test report.
    - Any additional information to help reviewers in testing this change.
    NA
  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA

@@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

- **Local Dictionary Configuration**

Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add one Note and list which data type don't support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a small sentence on what local dictionary means

| LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. |
| LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table |
| LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (maximum - 100000) |
| LOCAL_DICTIONARY_INCLUDE | all string/varchar columns which are not included in dictionary include| Columns for which Local Dictionary is generated. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"which are not included in dictionary include" -- please refine.

@@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

- **Local Dictionary Configuration**

Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in:
1. Getting more compression on dimension columns with less cardinality.
2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain : what is the cost for enabling local dictionary.

@@ -508,6 +511,9 @@ Users can specify which columns to include and exclude for local dictionary gene
```
ALTER TABLE tablename UNSET TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE','LOCAL_DICTIONARY_THRESHOLD','LOCAL_DICTIONARY_INCLUDE','LOCAL_DICTIONARY_EXCLUDE')
```

**NOTE:** For old tables, by default, local dictionary is disabled. If user wants local dictionary, he/she can enable/disable local dictionary for new data on those tables at his/her discretion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"he/she" change to "user"

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6078/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6396/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7675/


By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.

**Bottleneck for Local Dictionary:** The memory size will increase when local dictionary is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change "bottleneck" to "The cost"

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7683/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6407/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6083/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7708/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6434/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6100/

@@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

- **Local Dictionary Configuration**

Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a small sentence on what local dictionary means

@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

- **Local Dictionary Configuration**

Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in:
Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in:
1. Getting more compression on dimension columns with less cardinality.
2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove No-Dictionary


By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns.

**The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can add a sentence as to why it will increase

| LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (range- 1000 to 100000) |
| LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. |
| LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table |
| LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (maximum - 100000) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

description not correct. need to explain what threshold means and what happens beyond threshold

| LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. |
| LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table |
| LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (maximum - 100000) |
| LOCAL_DICTIONARY_INCLUDE | all string/varchar columns not specified in dictionary include| Columns for which Local Dictionary is generated. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i don't specify this property, what is the behaviour?

| LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. |
| LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table |
| LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (maximum - 100000) |
| LOCAL_DICTIONARY_INCLUDE | all string/varchar columns not specified in dictionary include| Columns for which Local Dictionary is generated. |
| LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated |

**NOTE:** If the cardinality exceeds the threshold, this column will not use local dictionary encoding. And in this case, the data loading performance will decrease since there is a rollback procedure for local dictionary encoding.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fallback?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For line 149 (162):

Encoded data and Actual data are both stored when Local Dictionary is enabled.

please change it to:

Encoded data with & without Local dictionary are both stored when Local Dictionary is enabled during data loading, so it requires more memory than before.

@@ -170,6 +183,9 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='true','LOCAL_DICTIONARY_THRESHOLD'='1000',
'LOCAL_DICTIONARY_INCLUDE'='column1','LOCAL_DICTIONARY_EXCLUDE'='column2')
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sentence not easy to understand. need simpler language to explain the reason

@@ -524,6 +540,9 @@ Users can specify which columns to include and exclude for local dictionary gene
```
ALTER TABLE tablename UNSET TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE','LOCAL_DICTIONARY_THRESHOLD','LOCAL_DICTIONARY_INCLUDE','LOCAL_DICTIONARY_EXCLUDE')
```

**NOTE:** For old tables, by default, local dictionary is disabled. If user wants local dictionary, user can enable/disable local dictionary for new data on those tables at their discretion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local dictionary is disabled for new tables also.Need to mention it can be enabled for old tables also

* DECIMAL
* TIMESTAMP
* DATE
* CHAR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is CHAR not supported? As I know, SparkSQL treat both varchar and char as string, so in carbon data we actually see string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about complex? you didn't mention it

| LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. |
| LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table |
| LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary generation (maximum - 100000) |
| LOCAL_DICTIONARY_INCLUDE | all string/varchar columns not specified in dictionary include| Columns for which Local Dictionary is generated. |
| LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated |

**NOTE:** If the cardinality exceeds the threshold, this column will not use local dictionary encoding. And in this case, the data loading performance will decrease since there is a rollback procedure for local dictionary encoding.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For line 149 (162):

Encoded data and Actual data are both stored when Local Dictionary is enabled.

please change it to:

Encoded data with & without Local dictionary are both stored when Local Dictionary is enabled during data loading, so it requires more memory than before.

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7741/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6466/

@praveenmeenakshi56
Copy link
Contributor Author

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7747/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6472/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7759/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6484/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6130/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6132/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6491/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7767/

@sraghunandan
Copy link
Contributor

Lgtm

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6143/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6150/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6160/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6502/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7778/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6161/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6162/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6163/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7780/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6505/

Conflicts:
	docs/data-management-on-carbondata.md
@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6169/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6515/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6170/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7790/

@chenliang613
Copy link
Contributor

LGTM

@asfgit asfgit closed this in 40571b8 Aug 6, 2018
kunal642 pushed a commit to kunal642/carbondata that referenced this pull request Aug 7, 2018
Updated Documentation on Local Dictionary Support. Changed default scenario for Local dictionary to false

This closes apache#2590
asfgit pushed a commit that referenced this pull request Aug 9, 2018
Updated Documentation on Local Dictionary Support. Changed default scenario for Local dictionary to false

This closes #2590
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants