[CARBONDATA-2760] Reduce Memory footprint and store size for local dictionary encoded columns #2529

kumarvishal09 · 2018-07-19T13:45:36Z

Why this PR?

Local dictionary encoded page is using unsafevarlenghtcolumn column page which internally maintains offset of each value in another column page because of this memory footprint is high.
for complex primitive string data type column page while compressing, it is converting to LV even if it is encoded with dictionary values, because of this store size is high.

Solution:

Use UnsafeFixedLength Column page for local dictionary encoded columns
No need to convert to LV during query if local dictionary is present so use UnsafeFixLength Column page

Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
All Testcase will take care. Tested in 3 Node setup with 135 million records
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

gvramana · 2018-07-19T14:34:33Z

core/src/main/java/org/apache/carbondata/core/datastore/page/SafeFixLengthColumnPage.java

@@ -431,6 +439,15 @@ private void ensureArraySize(int requestSize, DataType dataType) {
        System.arraycopy(doubleData, 0, newArray, 0, arrayElementCount);
        doubleData = newArray;
      }
+    } else if (dataType == DataTypes.BYTE_ARRAY) {


increasing by 16 is too low, it can be doubled like array list case

gvramana · 2018-07-19T14:40:16Z

core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java

@@ -201,12 +201,18 @@ public void putDouble(int rowId, double value) {

  @Override
  public void putBytes(int rowId, byte[] bytes) {
+    try {
+      ensureMemory(eachRowSize);
+    } catch (MemoryException e) {


MemoryException can be runtime exception

ok Will handle this in different PR

gvramana · 2018-07-19T14:42:35Z

core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java

+    byte[] data = new byte[totalLength];
+    int numberOfRows = getEndLoop();
+    int destOffset = 0;
+    for (int i = 0; i < numberOfRows; i++) {


Directly get single byte array

CarbonDataQA · 2018-07-19T14:43:49Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6088/

CarbonDataQA · 2018-07-19T15:46:16Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7327/

CarbonDataQA · 2018-07-19T16:46:11Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7321/

CarbonDataQA · 2018-07-19T17:10:59Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7332/

brijoobopanna · 2018-07-19T18:39:18Z

retest this please

CarbonDataQA · 2018-07-19T18:43:40Z

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6097/

CarbonDataQA · 2018-07-19T20:51:53Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7338/

CarbonDataQA · 2018-07-19T22:51:06Z

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6102/

ravipesala · 2018-07-20T03:34:52Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5929/

ravipesala · 2018-07-20T07:57:59Z

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5932/

CarbonDataQA · 2018-07-20T15:38:16Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7366/

CarbonDataQA · 2018-07-20T16:03:03Z

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6127/

ravipesala · 2018-07-20T19:48:11Z

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5940/

brijoobopanna · 2018-07-21T15:33:18Z

retest sdv please

ravipesala · 2018-07-21T21:14:36Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5948/

gvramana · 2018-07-23T05:20:42Z

LGTM

…ctionary encoded columns Problem: Local dictionary encoded page is using unsafevarlenghtcolumn column page which internally maintains offset of each value in another column page because of this memory footprint is high. for complex primitive string data type column page while compressing, it is converting to LV even if it is encoded with dictionary values, because of this store size is high. Solution: Use UnsafeFixedLength Column page for local dictionary encoded columns No need to convert to LV during query if local dictionary is present so use UnsafeFixLength Column page This closes #2529

fixed performance issue

c54aa97

kumarvishal09 force-pushed the localdictperformance1 branch from 97673c7 to c54aa97 Compare July 19, 2018 14:17

gvramana reviewed Jul 19, 2018

View reviewed changes

fixed review comments

7ba3fa4

kumarvishal09 changed the title ~~[WIP] Reduce Memory footprint and store size for local dictionary encoded columns~~ [CARBONDATA-2760] Reduce Memory footprint and store size for local dictionary encoded columns Jul 19, 2018

fixed include filter issue in case of local dictionary

9c5da79

asfgit closed this in 43285bb Jul 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CARBONDATA-2760] Reduce Memory footprint and store size for local dictionary encoded columns #2529

[CARBONDATA-2760] Reduce Memory footprint and store size for local dictionary encoded columns #2529

kumarvishal09 commented Jul 19, 2018 •

edited

gvramana Jul 19, 2018

kumarvishal09 Jul 19, 2018

gvramana Jul 19, 2018

kumarvishal09 Jul 19, 2018

gvramana Jul 19, 2018

kumarvishal09 Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

brijoobopanna commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

ravipesala commented Jul 20, 2018

ravipesala commented Jul 20, 2018

CarbonDataQA commented Jul 20, 2018

CarbonDataQA commented Jul 20, 2018

ravipesala commented Jul 20, 2018

brijoobopanna commented Jul 21, 2018

ravipesala commented Jul 21, 2018

gvramana commented Jul 23, 2018

[CARBONDATA-2760] Reduce Memory footprint and store size for local dictionary encoded columns #2529

[CARBONDATA-2760] Reduce Memory footprint and store size for local dictionary encoded columns #2529

Conversation

kumarvishal09 commented Jul 19, 2018 • edited

gvramana Jul 19, 2018

Choose a reason for hiding this comment

kumarvishal09 Jul 19, 2018

Choose a reason for hiding this comment

gvramana Jul 19, 2018

Choose a reason for hiding this comment

kumarvishal09 Jul 19, 2018

Choose a reason for hiding this comment

gvramana Jul 19, 2018

Choose a reason for hiding this comment

kumarvishal09 Jul 19, 2018

Choose a reason for hiding this comment

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

brijoobopanna commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

ravipesala commented Jul 20, 2018

ravipesala commented Jul 20, 2018

CarbonDataQA commented Jul 20, 2018

CarbonDataQA commented Jul 20, 2018

ravipesala commented Jul 20, 2018

brijoobopanna commented Jul 21, 2018

ravipesala commented Jul 21, 2018

gvramana commented Jul 23, 2018

kumarvishal09 commented Jul 19, 2018 •

edited