Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-2760] Reduce Memory footprint and store size for local dictionary encoded columns #2529

Closed

Conversation

kumarvishal09
Copy link
Contributor

@kumarvishal09 kumarvishal09 commented Jul 19, 2018

Why this PR?

  1. Local dictionary encoded page is using unsafevarlenghtcolumn column page which internally maintains offset of each value in another column page because of this memory footprint is high.
  2. for complex primitive string data type column page while compressing, it is converting to LV even if it is encoded with dictionary values, because of this store size is high.

Solution:

  1. Use UnsafeFixedLength Column page for local dictionary encoded columns
  2. No need to convert to LV during query if local dictionary is present so use UnsafeFixLength Column page
  • Any interfaces changed?

  • Any backward compatibility impacted?

  • Document update required?

  • Testing done
    All Testcase will take care. Tested in 3 Node setup with 135 million records

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@@ -431,6 +439,15 @@ private void ensureArraySize(int requestSize, DataType dataType) {
System.arraycopy(doubleData, 0, newArray, 0, arrayElementCount);
doubleData = newArray;
}
} else if (dataType == DataTypes.BYTE_ARRAY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increasing by 16 is too low, it can be doubled like array list case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -201,12 +201,18 @@ public void putDouble(int rowId, double value) {

@Override
public void putBytes(int rowId, byte[] bytes) {
try {
ensureMemory(eachRowSize);
} catch (MemoryException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemoryException can be runtime exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok Will handle this in different PR

byte[] data = new byte[totalLength];
int numberOfRows = getEndLoop();
int destOffset = 0;
for (int i = 0; i < numberOfRows; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly get single byte array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6088/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7327/

@kumarvishal09 kumarvishal09 changed the title [WIP] Reduce Memory footprint and store size for local dictionary encoded columns [CARBONDATA-2760] Reduce Memory footprint and store size for local dictionary encoded columns Jul 19, 2018
@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7321/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7332/

@brijoobopanna
Copy link
Contributor

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6097/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7338/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6102/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5929/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5932/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7366/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6127/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5940/

@brijoobopanna
Copy link
Contributor

retest sdv please

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5948/

@gvramana
Copy link
Contributor

LGTM

@asfgit asfgit closed this in 43285bb Jul 23, 2018
asfgit pushed a commit that referenced this pull request Jul 30, 2018
…ctionary encoded columns

Problem:
Local dictionary encoded page is using unsafevarlenghtcolumn column page which internally maintains offset of each value in another column page because of this memory footprint is high.
for complex primitive string data type column page while compressing, it is converting to LV even if it is encoded with dictionary values, because of this store size is high.

Solution:
Use UnsafeFixedLength Column page for local dictionary encoded columns
No need to convert to LV during query if local dictionary is present so use UnsafeFixLength Column page

This closes #2529
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants