Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3692] Support NoneCompression during loading data. #3611

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Pickupolddriver
Copy link

Why is this PR needed?

In some cases, the data need to be uncompressed after loading into Carbondata file.
In the current version, the project do not support loading data without compression.

What changes were proposed in this PR?

Provide a new Compressor as NoneCompressor implement the AbstractCompressor.
This compressor can be set by calling
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR,"none");

Does this PR introduce any user interface change?

Yes

Is any new testcase added?

Yes

Why is this PR needed?

In some cases, the data need to be uncompressed after loading into Carbondata file.
In the current version, the project do not support loading data without compression.

What changes were proposed in this PR?

Provide a new Compressor as NoneCompressor implement the AbstractCompressor.
This compressor can be set by calling
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR,"none");

Does this PR introduce any user interface change?
Yes

Is any new testcase added?
Yes
@CarbonDataQA1
Copy link

Can one of the admins verify this patch?

@kunal642
Copy link
Contributor

Can you please explain the scenario where no-compression would be beneficial?

@Pickupolddriver
Copy link
Author

Can you please explain the scenario where no-compression would be beneficial?

This NoneCompress Compressor will improve the speed of loading data from Flink to OBS File by trade-off space and IO in some cases.

For example: when loading data from Flink to OBS, data needs to be compressed by Flink to temporary files and then decompressed by OBS.
After adding the NoneCompressor, users can use the NoneCompressor load data without compress first and then decompress the temporary files.

@ajantha-bhat
Copy link
Member

@Pickupolddriver : Agree that it can improve the loading speed. But data will be 3x bigger. So, storage cost on OBS will be 3x more!

CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "none")
loadData()
checkAnswer(sql(s"SELECT count(*) FROM $tableName"), Seq(Row(16)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change select count(*) to select * so that actual data is validated

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you want to change all the test cases in this class from select count(*) to *?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just change the newly added test cases to select *

@Pickupolddriver
Copy link
Author

@Pickupolddriver : Agree that it can improve the loading speed. But data will be 3x bigger. So, storage cost on OBS will be 3x more!

Data would be processed after loaded to OBS. So if we could provide a NonCompressor, it could avoid the data being compressed and then uncompressed. And the uncompressed data would be deleted after processed in OBS.

@@ -35,6 +35,7 @@
private final Map<String, Compressor> allSupportedCompressors = new HashMap<>();

public enum NativeSupportedCompressor {
NONE("none",NoneCompressor.class),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add space after ,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest change to NONE("nocompress", NoneCompressor.class)
nocompress will be appended to the data file name

@@ -0,0 +1,51 @@
package org.apache.carbondata.core.datastore.compression;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add license header as other source file


import java.io.IOException;

public class NoneCompressor extends AbstractCompressor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add description to this class that it does not perform any compression

@ajantha-bhat
Copy link
Member

add to whitelist

@QiangCai
Copy link
Contributor

please rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants