Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Clone in Desktop Download ZIP

Loading…

Doubling space by primary key #14

Closed
jonahcohen opened this Issue · 2 comments

3 participants

@jonahcohen
Collaborator

From @yoshinorim:

I just noticed primary keys are stored twice in RocksDB SE – as both key and value, even though
they are not needed to be stored as value.

How to repeat:

10 column primary key

create table r1 (
id1 bigint not null,
id2 bigint not null,
id3 bigint not null,
id4 bigint not null,
id5 bigint not null,
id6 bigint not null,
id7 bigint not null,
id8 bigint not null,
id9 bigint not null,
id10 bigint not null,
id11 bigint not null,
id12 bigint not null,
primary key (id1, id2, id3, id4, id5, id6, id7, id8, id9, id10)
) engine=rocksdb charset=latin1 collate=latin1_bin;

1 column primary key

create table r2 (
id1 bigint not null,
id2 bigint not null,
id3 bigint not null,
id4 bigint not null,
id5 bigint not null,
id6 bigint not null,
id7 bigint not null,
id8 bigint not null,
id9 bigint not null,
id10 bigint not null,
id11 bigint not null,
id12 bigint not null,
primary key (id1)
) engine=rocksdb charset=latin1 collate=latin1_bin;

Inserting 2 million rows on these tables. Values sequentially increasing from 1 to 2 million (all columns have same values per row) -- can be generated like this.
for(my $i=1; $i<= 2000000; $i++) {
for(my $j=0; $j < 12; $j++) {
if($j < 11) {
print "$i,";
}else {
print "$i\n";
}
}
}

then make compaction happen.

Size in KB:
id1-id10 pk:
353764

id1 pk:
212984

innodb
id1-id10 pk:
264241

id1 pk:
260046

So basically long primary keys increase on-disk space.

I also checked via gdb (setting breakpoint at rocksdb::WriteBatch::Put()) and confirmed.
insert into r1 values (1,1,1,1,1,1,1,1,1,1,1,1);
=>
(gdb) p key.size_
$1 = 84
(gdb) p value.size_
$2 = 96

insert into r2 values (1,1,1,1,1,1,1,1,1,1,1,1);
=>
(gdb) p key.size_
$3 = 12
(gdb) p value.size_
$4 = 96

I expected value.size_ were 12 and 84 respectively. Since both key fields and value fields are stored, duplicate fields can be removed from value fields.

@jonahcohen jonahcohen added the bug label
@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Remove unused code: support for storing part of index column in the v…
…alue

part of RocksDB key-value pair.

Initially, this space was used to store lengths of varchar columns
(mem-comparable images of varchars were endspace-padded). Then, we
switched to using mem-comparable encoding of varchars that stored
length in the key part of the key-value pair, so support for index
"tails" became dead code.

I'm removing it now as I'm going to make changes in data storage
(Issue #14), so I will likely break the dead code anyway.
a4e83ad
@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Fix issue #14: Doubling space by primary key
SQL table records are stored as (pk_keyval -> record_val) key-value
pairs. When PK column datatype allows to decode PK column value from
its mem-comparable form in pk_keyval, we don't need to store the
column in record_val.
037e3e9
@spetrunia
Collaborator

I have made a fix for this. I pushed it into this branch:
https://github.com/MySQLOnRocksDB/mysql-5.6/tree/webscalesql-5.6.21.55-issue14

The branch has three revisions:

  1. Fix the testsuite (It was broken by somebody who migrated to MySQLOnRocksDB repo. This is not related to this bug): 703e9a6

  2. A patch that removes a piece of RocksDB-SE code that is no longer used: a4e83ad

  3. The fix itself: 037e3e9 . There are no specific tests in the patch, the test coverage is that tests in t/rocksdb*.test and suite/rocksdb tests pass after the change.

Do I just push this to the main branch, or I need to submit patches to review into reviews.facebook.net ? (I haven't yet figured out how to do that)

@yoshinorim
Owner

Thanks Sergey, please submit patches to review.facebook.net (follow instructions at https://github.com/MySQLOnRocksDB/mysql-5.6/wiki).

@spetrunia spetrunia added this to the high-pri milestone
@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Fix issue #14: Doubling space by primary key
Summary:
SQL table records are stored as (pk_keyval -> record_val) key-value
pairs. When PK column datatype allows to decode PK column value from
its mem-comparable form in pk_keyval, we don't need to store the
column in record_val.

Test Plan: ./mtr t/rocks*test && ./mtr --suite=rocksdb && ./mtr --suite=rocksdb_rpl

Reviewers: hermanlee4, jonahcohen, maykov, yoshinorim

Reviewed By: yoshinorim

Differential Revision: https://reviews.facebook.net/D33117
c6ae5cd
@spetrunia spetrunia closed this
@spetrunia spetrunia referenced this issue in MariaDB/webscalesql-5.6
Closed

Doubling space by primary key #32

@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Issue #14: post-review fixes: spelling, better comments and variable …
…names

Test Plan: N/A

Reviewers: maykov, hermanlee4, yoshinorim, jonahcohen

Reviewed By: jonahcohen

Differential Revision: https://reviews.facebook.net/D33453
43c740d
@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Fix issue #14: Doubling space by primary key
Summary:
SQL table records are stored as (pk_keyval -> record_val) key-value
pairs. When PK column datatype allows to decode PK column value from
its mem-comparable form in pk_keyval, we don't need to store the
column in record_val.

Test Plan: ./mtr t/rocks*test && ./mtr --suite=rocksdb && ./mtr --suite=rocksdb_rpl

Reviewers: hermanlee4, jonahcohen, maykov, yoshinorim

Reviewed By: yoshinorim

Differential Revision: https://reviews.facebook.net/D33117
7ea412e
@spetrunia spetrunia referenced this issue from a commit
@spetrunia spetrunia Issue #14: post-review fixes: spelling, better comments and variable …
…names

Test Plan: N/A

Reviewers: maykov, hermanlee4, yoshinorim, jonahcohen

Reviewed By: jonahcohen

Differential Revision: https://reviews.facebook.net/D33453
8d664e0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.