Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-11371 - Big column compressed(innodb) #261

Closed
wants to merge 3 commits into from

Conversation

GCSAdmin
Copy link

Some big columns(blob/text/varchar/varbinary) waste a lot of space, we introduce “compressed” into column definition when create or alter a table.
When a column was defined as a compressed column, the column data will be compressed using zlib (other compress algorithm not support yet).
We could get a better compression ratio and performance, more flexibility (vs compressed row format)
For example:
Create table tcompress (
C1 int,
C2 blob compressed,
C3 text compressed,
C4 text) engine = innodb

We achieve this 'big columns compress' function by following step:

  1. Support 'compressed' syntax, and save this attribute in .frm file
  2. Store the 'compressed' attribute in innodb layer, we add DATA_IS_COMPRESSED flag in prtype
  3. If needed , we do compress in row_mysql_store_col_in_innobase_format and do decompress in row_sel_store_mysql_field
  4. Use a compress header to control how to compress/decompress.
    Compress Header is 1 Byte,
    7 Bit: Always 1, mean compressed;
    5-6 Bit: Compressed algorithm - Always 0, means zlib.It maybe support other compression algorithm in the future.
    0-3 Bit: Bytes of "Record Original Length"
    Record Original Length: 1-4 Bytes*/

We add system global variable 'field_compress_min_len' was used to control that only compress the column if the data length exceeds 'field_compress_min_len'. Default 128.
Also we add 3 error num:
ER_FIELD_TYPE_NOT_ALLOWED_AS_COMPRESSED_FIELD: support text/blob/varchar/varbinary column has compress attribute only.
ER_FIELD_CAN_NOT_COMPRESSED_AND_INDEX: column has compress attribute can not be an index
ER_FIELD_CAN_NOT_COMPRESSED_IN_CURRENT_ENGINESS: column compress be supported in innodb only

create table t1(c1 blob compressed, c2 varchar(1000) compressed); mean
the table column c1,c2 have compress property.  The data in table t1 have
been compressed.  The compress property is transparent to user.

fix format

set default_storage_engine = @default_storage_engine_old;

drop table t1,t2,t3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more test cases about CHAR/MEDIUMBLOB/MEDIUMTEXT/LONGBLOB/LONGTEXT types.
And can you do a performance test by yourselves and show the results?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add more test cases.

@svoj
Copy link

svoj commented Nov 29, 2016

Hi GCSAdmin,

Thanks for your contribution. JIRA task has been created to track this pull request: https://jira.mariadb.org/browse/MDEV-11371

This task was added to 10.2.4 backlog, which is planned to be handled between 2016-12-15 and 2016-12-22.

Thanks,
Sergey

@svoj svoj changed the title Big column compressed(innodb) MDEV-11371 - Big column compressed(innodb) Nov 29, 2016
@svoj svoj self-assigned this Nov 29, 2016
@laurynas-biveinis
Copy link
Contributor

Maybe you'd find something interesting in percona/percona-server@35d5d3f, testcases perhaps, although feature grammar is incompatible

@svoj
Copy link

svoj commented Nov 29, 2016

@GCSAdmin, before we continue could you explain what benefits gives column compression over two alternatives: InnoDB native compression and COMPRESS()/UNCOMPRESS()?

Are you using indexes over compressed columns? Is it at all supported in this patch?

@GCSAdmin
Copy link
Author

GCSAdmin commented Nov 29, 2016

@svoj @plinux
The principle of column compress is equivalent to COMPRESS()/UNCOMPRESS(). The benefits of colunn compress is transparent to application layer, application layer does not require any changes can get the benefits.

Before we choose column, we had compared page compressed and column compress (use comperss()/uncompress()). Following is the test result:
Storage Testing: (The data comes from the real game DB)
origin data: 51G ( data size)
page compress data: 24G
col_compress data: 7.3G

Performance Testing(The data comes from the real game DB)
d

As column comperss has a better compression ratio and performance and more flexibility, we did it.

We do not support indexes over compressed columns now.

@vuvova
Copy link
Member

vuvova commented Nov 29, 2016

Also, why is it better than what Percona has (commit link above)? Percona implementation is more complex, but it'll also allow much better compression with their external dictionary feature.

Why is it better than a special data type (we might have user defined types in 10.3) that is "like a blo b, but compresses everything before storage"?

And, anyway, this feature cannot possibly go into 10.2, it's too late for that. It can go into 10.3.

@janlindstrom
Copy link
Contributor

janlindstrom commented Nov 29, 2016

Thank you for your contribution, however I would implement compressed columns in upper layer not inside InnoDB. Benefits would include

  • Reduced payload provided to InnoDB (and possible other storage engines supporting this)
  • Reduced payload provided out from InnoDB
  • This would also work for replication i.e. reduced payload on binary log as column would be compressed on binary log
  • Do we really need to uncompress these columns always to InnoDB buffer pool ?

@willhan123
Copy link
Contributor

willhan123 commented Nov 29, 2016

@janlindstrom

You are right, but many users may expect to solve their problems in databases layer.
We can provide an optional storage layer solution by column compress.

Do we really need to uncompress these columns always to InnoDB buffer pool ?
We only uncompress when needed, can get a more efficient use of memory.

@janlindstrom
Copy link
Contributor

By upper layer I meant still inside a MariaDB server i.e. implementation somewhere in sql directory.

@felixliang
Copy link

@laurynas-biveinis thank you for your suggestion, we are reading percona's testcases, it maybe help.

@svoj
Copy link

svoj commented Nov 29, 2016

Also note column compression in AliSQL: https://jira.mariadb.org/browse/MDEV-11381

when alter table add a column with compressed, do not support inplace
alter
@felixliang
Copy link

@vuvova hi vuvova, it is better to implement big column compressed in MariaDB server or innodb layer; because we can do some optimization about data export or import using mysqldump.

For example, when the big column is compressed, we can add a grammar to keep data compressed while exporting data out of mysql and keep data compressed while importing into mysql.

To finish this, while we use mysqldump to doing export, we use , in this case, data which is compressed does not need to be uncompress, and the general backup sqlfile, which format is , it means that when we use the backup sqlfile to do restoring job, the compressed data need not be compressed. This implementation means a lot to us.

@felixliang
Copy link

@janlindstrom , hi janlindstrom, the big column compressed feature can surely be implemented in the upper layer i.e. MariaDB Server, we choose to implement in innodb layer, because we think this implementation is simple enough to finish.

@svoj svoj added this to the 10.2 milestone Mar 1, 2017
@svoj
Copy link

svoj commented Apr 21, 2017

@felixliang one of our developers wonder if you also considered trigger (compress) + view (uncompress) solution? Why didn't it work for you? I assume because you want simpler syntax.

@svoj
Copy link

svoj commented Apr 22, 2017

@felixliang, @plinux in your implementations you store compression algorithm for every row (also wrap flag in AliSQL).

Do you really need this information in every row or per-table setting is acceptable?

@laurynas-biveinis
Copy link
Contributor

@svoj, FWIW in our implementation we decided to go with a per-row algorithm info so that we could do, in the future, if new algorithms are implemented, an ALTER TABLE (existing compressed table) (to a new algorithm), which is a metadata operation only, and rows are rewritten in the new algorithm as they are updated

@svoj
Copy link

svoj commented Apr 22, 2017

@laurynas-biveinis thanks!

@felixliang
Copy link

@felixliang, @plinux in your implementations you store compression algorithm for every row (also wrap flag in AliSQL).

Do you really need this information in every row or per-table setting is acceptable?

@svoj
hi svoj, we store the compression algorithm for every row(it supports 4 compression algorithm in max), so that if when we do "alter table" operation to change the column's compression algorithm, we don't need to copy data, we just need to change meta data only.

but right now TMySQL hasn't support changing column's compression algorithm for instantly yet.

@felixliang
Copy link

@felixliang one of our developers wonder if you also considered trigger (compress) + view (uncompress) solution? Why didn't it work for you? I assume because you want simpler syntax.

@svoj
hi svoj, in our opinion the solution about using trigger(compress) + view to solute the compression problem is not good idea. it brings very complicated jobs to DBA, and the trigger will bring overload to the DBServers.

@svoj
Copy link

svoj commented May 8, 2017

@felixliang thanks for your answers. I'm almost done porting this to SQL-layer (mostly to class Field). Will send you an email with details soon.

@felixliang
Copy link

@svoj

so you will pick up Tencent Game DBA Team's implementation into MariaDB 10.3, right?

in the latest meet up, monty said you will evalute our implementation and AliSQL, so i don't know which one you will choose?

@svoj
Copy link

svoj commented May 8, 2017

@felixliang I evaluated Tencent, Alibaba and Percona code base. Unfortunately we can't take any implementation as is, because we want storage engine independent solution.

To keep things simple we won't take compression dictionary from Percona for this first implementation. It will be possible to add it later though.

Our first implementation will cover all Tencent and Alibaba requirements, except for Alibaba heap alloc (which can be added easily later anyway).

Syntax wise we will be compatible with Tencent patch, but we had to rename system variables.

.frm is not compatible with any implementation.

Same for data: generally we store the same information, but we reserve 4 bits for compression algorithm and we don't store compressed flag. In our implementation compression_algorithm == 0 means uncompressed.

@svoj
Copy link

svoj commented May 8, 2017

One nice benefit of implementing this at SQL layer is that we can avoid data recompression in many cases when we need to copy data, like:

  • ALTER TABLE ALGORITHM=copy
  • CREATE TABLE ... SELECT
  • INSERT ... SELECT

@vinchen
Copy link
Contributor

vinchen commented May 9, 2017

Hi, @svoj

We hope that the new implementation of blob compressed in MariaDB can be binary compatible with what Tencent's patch done.

In our Tencent implementation, for compressed header, one bit for compressed flag, 2 bits for compression algorithm, and 3 bits for Bytes of "Record Original Length"
7 Bit: Always 1, mean compressed;
5-6 Bit: Compressed algorithm - Always 0, means zlib. It maybe support other compression algorithm in the future.
0-3 Bit: Bytes of "Record Original Length"

Same for data: generally we store the same information, but we reserve 4 bits for compression algorithm and we don't store compressed flag. In our implementation compression_algorithm == 0 means uncompressed.

And does it mean that higher 4 bits means compression algorithm and lower 4 bits means Bytes of "Record Original Length"?

If so, we think it can be **binary compatible **.

**And Zlibs algorithm should be 0x08. It should be the default algorithm. **
The first header byte should be

Header Byte = (0x08 << 4) | bytes_of_original_length

What do you think?

@svoj
Copy link

svoj commented May 9, 2017

@vinchen, I understand your wish to make it binary compatible.

Our header format is as following:

Generic compressed header format (1 byte):

Bits 1-4: algorithm specific bits
Bits 5-8: compression algorithm

If compression algorithm is 0 then header is immediately followed by
uncompressed data.

If compression algorithm is zlib:

Bits 1-2: N + 1 bytes are occupied by original data length
Bit 3: unused
Bits 4: true if zlib wrapper present
Bits 5-8: store 1 (zlib)

The difference is: in your implementation you reserve 4 bits for compressed data length, in our implementation we reserve only 2 bits. We also store bytes_of_original_length - 1.

In theory making your implementation binary compatible with ours is not that complex, but we'll have to discuss it with Monty. Adding support for Alibaba and Percona headers is a lot more complex.

@svoj
Copy link

svoj commented May 9, 2017

@felixliang, @GCSAdmin, @vinchen, @plinux patch is in bb-10.3-svoj: 733ddb9

Note that there're still a bunch of edge cases not covered (many explained in revision comment).
Please consider this patch as prototype for now: behaviour and storage formats may change.

Your feedback will be greatly appreciated.

@HugeFelix
Copy link

HugeFelix commented May 11, 2017

hi @svoj

Maybe storage engine independent solution is better, because it can support any storage engines.

But it is very interesting that: the code base of Tencent, Alibaba and Percona about column compressed implementation are very similar: doing that in the InnoDB layer.

And implementation in InnoDB layer, we can also avoid data recompression in the following cases:

  1. ALTER TABLE ALGORITHM=copy
  2. CREATE TABLE ... SELECT
  3. INSERT ... SELECT

The above 2 and 3 cases, we may need to use HINT to avoid data recompression.

Another thing, when we backup data in logic way, we use HINT in SELECT syntax to avoid data recompression.

@svoj
Copy link

svoj commented May 12, 2017

@HugeFelix are there any reasons to keep it in InnoDB? The only reason I got so far is simplicity.

MariaDB compared to MySQL has a lot more storage engines available. Thus we have to care about all available storage engines equally.

@HugeFelix
Copy link

Yes. Simiplicity is important for us. And we really understand and support storage engine independent solution in MariaDB.

@svoj
Copy link

svoj commented May 13, 2017

@HugeFelix Nice, thanks! It was agreed to change row storage format to be compatible with Tencent. This is generic enough and doesn't cost us much effort.

Metadata (the value stored in unireg_check) haven't been decided yet, but I guess we should come up with some nice solution too.

@svoj
Copy link

svoj commented Aug 31, 2017

Pushed fdc4779

@svoj svoj closed this Aug 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
10 participants