-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Labels
Description
Search before asking
- I had searched in the issues and found no similar issues.
Description
We have a table which is ingested from hive.
But, the data volumne of doris table is 4 times that of hive table.
Hive table's format is orc.
I found that a double column takes up the most storage.
I tried all the encoding and compression methods and found that the compression ratio is related to encoding method, compression method and data distribution.
Kudu SQL: https://impala.apache.org/docs/build/html/topics/impala_kudu.html
CREATE TABLE various_encodings
(
id BIGINT PRIMARY KEY,
c1 BIGINT ENCODING PLAIN_ENCODING,
c2 BIGINT ENCODING AUTO_ENCODING,
c3 TINYINT ENCODING BIT_SHUFFLE,
c4 DOUBLE ENCODING BIT_SHUFFLE,
c5 BOOLEAN ENCODING RLE,
c6 STRING ENCODING DICT_ENCODING,
c7 STRING ENCODING PREFIX_ENCODING
) ..
Solution
Support specifying encoding in doris.
Example SQL:
CREATE TABLE ads_snapshot_celldata_LZ4F (
`xxx` varchar(255) encoding 'plain' NOT NULL COMMENT "",
`yyy` array<double encoding "plain"> NOT NULL COMMENT "",
`pt` varchar(255) NOT NULL COMMENT "分区" )
ENGINE=OLAP
...
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct