Skip to content

[Enhancement] Support encoding method for column #50631

@liutang123

Description

@liutang123

Search before asking

  • I had searched in the issues and found no similar issues.

Description

We have a table which is ingested from hive.
But, the data volumne of doris table is 4 times that of hive table.
Hive table's format is orc.
I found that a double column takes up the most storage.
I tried all the encoding and compression methods and found that the compression ratio is related to encoding method, compression method and data distribution.

Kudu SQL: https://impala.apache.org/docs/build/html/topics/impala_kudu.html

CREATE TABLE various_encodings
(
  id BIGINT PRIMARY KEY,
  c1 BIGINT ENCODING PLAIN_ENCODING,
  c2 BIGINT ENCODING AUTO_ENCODING,
  c3 TINYINT ENCODING BIT_SHUFFLE,
  c4 DOUBLE ENCODING BIT_SHUFFLE,
  c5 BOOLEAN ENCODING RLE,
  c6 STRING ENCODING DICT_ENCODING,
  c7 STRING ENCODING PREFIX_ENCODING
) ..

Solution

Support specifying encoding in doris.
Example SQL:

CREATE TABLE ads_snapshot_celldata_LZ4F (   
`xxx` varchar(255) encoding 'plain' NOT NULL COMMENT "",   
`yyy` array<double encoding "plain"> NOT NULL COMMENT "",  
`pt` varchar(255) NOT NULL COMMENT "分区" ) 
ENGINE=OLAP 
...

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions