Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: non-uniform hash-sharded index with power-of-2 number of buckets #91109

Closed
michae2 opened this issue Nov 2, 2022 · 7 comments · Fixed by #109374
Closed

sql: non-uniform hash-sharded index with power-of-2 number of buckets #91109

michae2 opened this issue Nov 2, 2022 · 7 comments · Fixed by #109374
Assignees
Labels
A-hash-sharding Hash-sharded indexes C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@michae2
Copy link
Collaborator

michae2 commented Nov 2, 2022

Looks like the hash bucket calculation mod(fnv32(crdb_internal.datums_to_bytes(x, y)), z) is poorly distributed when x and y are INT8 columns with equal values, and z is a power of 2. For example:

CREATE TABLE ijk (i, j, k) AS SELECT x, x, x FROM generate_series(0, 1048575) s (x);
CREATE INDEX ON ijk (i) USING HASH WITH BUCKET_COUNT = 8;
CREATE INDEX ON ijk (i, j) USING HASH WITH BUCKET_COUNT = 8;
CREATE INDEX ON ijk (i, j, k) USING HASH WITH BUCKET_COUNT = 8;
SELECT crdb_internal_i_shard_8, COUNT(crdb_internal_i_shard_8) FROM ijk GROUP BY 1 ORDER BY 1;
SELECT crdb_internal_i_j_shard_8, COUNT(crdb_internal_i_j_shard_8) FROM ijk GROUP BY 1 ORDER BY 1;
SELECT crdb_internal_i_j_k_shard_8, COUNT(crdb_internal_i_j_k_shard_8) FROM ijk GROUP BY 1 ORDER BY 1;

Gives:

demo@127.0.0.1:26257/defaultdb> SELECT crdb_internal_i_shard_8, COUNT(crdb_internal_i_shard_8) FROM ijk GROUP BY 1 ORDER BY 1;
  crdb_internal_i_shard_8 | count
--------------------------+---------
                        0 | 131071
                        1 | 131071
                        2 | 131072
                        3 | 131072
                        4 | 131073
                        5 | 131073
                        6 | 131072
                        7 | 131072
(8 rows)


Time: 770ms total (execution 770ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> SELECT crdb_internal_i_j_shard_8, COUNT(crdb_internal_i_j_shard_8) FROM ijk GROUP BY 1 ORDER BY 1;
  crdb_internal_i_j_shard_8 | count
----------------------------+---------
                          1 | 376905
                          3 |  16348
                          5 | 638976
                          7 |  16347
(4 rows)


Time: 836ms total (execution 835ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> SELECT crdb_internal_i_j_k_shard_8, COUNT(crdb_internal_i_j_k_shard_8) FROM ijk GROUP BY 1 ORDER BY 1;
  crdb_internal_i_j_k_shard_8 | count
------------------------------+---------
                            0 | 131073
                            1 | 131071
                            2 | 131071
                            3 | 131072
                            4 | 131072
                            5 | 131073
                            6 | 131072
                            7 | 131072
(8 rows)


Time: 976ms total (execution 976ms / network 0ms)

The hash-sharded index is still unevenly distributed with 16 buckets:

demo@127.0.0.1:26257/defaultdb> CREATE INDEX ij16 ON ijk (i, j) USING HASH WITH BUCKET_COUNT = 16;
CREATE INDEX


Time: 5.629s total (execution 5.629s / network 0.000s)

demo@127.0.0.1:26257/defaultdb> SELECT crdb_internal_i_j_shard_16, COUNT(crdb_internal_i_j_shard_16) FROM ijk GROUP BY 1 ORDER BY 1;
  crdb_internal_i_j_shard_16 | count
-----------------------------+---------
                           1 | 184357
                           3 |  12224
                           5 | 487461
                           7 |   8174
                           9 | 192548
                          11 |   4124
                          13 | 151515
                          15 |   8173
(8 rows)


Time: 819ms total (execution 819ms / network 0ms)

With 14 it's evenly distributed across half the buckets:

demo@127.0.0.1:26257/defaultdb> CREATE INDEX ij14 ON ijk (i, j) USING HASH WITH BUCKET_COUNT = 14;
CREATE INDEX


Time: 6.913s total (execution 6.912s / network 0.000s)

demo@127.0.0.1:26257/defaultdb> SELECT crdb_internal_i_j_shard_14, COUNT(crdb_internal_i_j_shard_14) FROM ijk GROUP BY 1 ORDER BY 1;
  crdb_internal_i_j_shard_14 | count
-----------------------------+---------
                           1 | 149778
                           3 | 150045
                           5 | 149398
                           7 | 149531
                           9 | 150521
                          11 | 149379
                          13 | 149924
(7 rows)


Time: 846ms total (execution 846ms / network 0ms)

With a prime number of buckets it's fine:

demo@127.0.0.1:26257/defaultdb> CREATE INDEX ij17 ON ijk (i, j) USING HASH WITH BUCKET_COUNT = 17;
CREATE INDEX


Time: 6.533s total (execution 6.532s / network 0.000s)

demo@127.0.0.1:26257/defaultdb> SELECT crdb_internal_i_j_shard_17, COUNT(crdb_internal_i_j_shard_17) FROM ijk GROUP BY 1 ORDER BY 1;
  crdb_internal_i_j_shard_17 | count
-----------------------------+--------
                           0 | 61416
                           1 | 61852
                           2 | 61537
                           3 | 61534
                           4 | 61605
                           5 | 62347
                           6 | 61745
                           7 | 61611
                           8 | 61527
                           9 | 61494
                          10 | 62198
                          11 | 61752
                          12 | 61749
                          13 | 61353
                          14 | 61502
                          15 | 61921
                          16 | 61433
(17 rows)


Time: 849ms total (execution 849ms / network 0ms)

This example is (admittedly) contrived, but there could be other cases like this. Maybe we should default to creating a prime number of buckets?

Jira issue: CRDB-21107

Epic CRDB-27601

@michae2 michae2 added C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-schema-deprecated Use T-sql-foundations instead T-sql-queries SQL Queries Team labels Nov 2, 2022
@blathers-crl blathers-crl bot added this to Triage in SQL Foundations Nov 2, 2022
@blathers-crl blathers-crl bot added this to Triage in SQL Queries Nov 2, 2022
@michae2 michae2 added the E-quick-win Likely to be a quick win for someone experienced. label Nov 2, 2022
michae2 added a commit to michae2/cockroach that referenced this issue Nov 2, 2022
We've discovered a case where non-random data and a power-of-2 number of
buckets cause an uneven distribution of data among buckets in a hash-
sharded index. There's an old trick of using a prime number of buckets
in hash tables to avoid this sort of unfortunate clustering. Let's use
that.

Fixes: cockroachdb#91109

Epic: None

Release note (performance improvement): We now recommend using a prime
number of buckets in hash-sharded indexes to increase the chance of an
even distribution of data among buckets. The default value of
`sql.defaults.default_hash_sharded_index_bucket_count` has been changed
to a prime number.
@ajwerner
Copy link
Contributor

ajwerner commented Nov 2, 2022

From a prescient @RaduBerinde

What feels wrong is that we do mod 8, so we only look at the last few bits of the fnv32, which is not great for a non-cryptographic hash. Using a prime number for the buckets would fix that. I think we could also mod against a bigger prime number before modding down to 8, or shuffle the bits somehow.

I think this is a good idea. The question is how to pick a prime that's larger than the bucket count to mod with first. If we had a limit on the number of buckets, which I'm sure we should have but I'm less sure we do have, we could just do the smallest prime larger than that limit.

michae2 added a commit to michae2/cockroach that referenced this issue Nov 2, 2022
We've discovered a case where a combination of non-random data and a
power-of-two number of buckets causes an uneven distribution of rows
in a hash-sharded index. While this specific case is very contrived, it
illustrates a small weakness in the current hash-shard calculation:
modulo by a power-of-two number of buckets only uses the last few bits
of the hash value.

Radu suggested this would be a problem in cockroachdb#67865 and also suggested a
fix: add an intermediate modulo by a larger prime before modulo by num
buckets, so we'll try that.

Fixes: cockroachdb#91109

Epic: None

Release note (performance improvement): This change updates the shard
calculation of newly-created hash-sharded indexes so that uneven
distributions of rows are less likely.
@ajwerner
Copy link
Contributor

ajwerner commented Nov 2, 2022

The thing that's weird about your input is that it has a lot of repetition of bytes because the values are all the same for every tuple. I don't know how often that comes up. If you did this:

CREATE TABLE ijk (i, j, k) AS SELECT * FROM generate_series(0, 100), generate_series(0, 100), generate_series(0, 100) ;
CREATE INDEX ON ijk (i) USING HASH WITH BUCKET_COUNT = 8;
CREATE INDEX ON ijk (i, j) USING HASH WITH BUCKET_COUNT = 8;
CREATE INDEX ON ijk (i, j, k) USING HASH WITH BUCKET_COUNT = 8;
SELECT crdb_internal_i_shard_8, COUNT(crdb_internal_i_shard_8) FROM ijk GROUP BY 1 ORDER BY 1;
SELECT crdb_internal_i_j_shard_8, COUNT(crdb_internal_i_j_shard_8) FROM ijk GROUP BY 1 ORDER BY 1;
SELECT crdb_internal_i_j_k_shard_8, COUNT(crdb_internal_i_j_k_shard_8) FROM ijk GROUP BY 1 ORDER BY 1;

Then you get a perfect distribution.

@ajwerner
Copy link
Contributor

ajwerner commented Nov 8, 2022

Deep down, I think what's going on here is that the bitstring is the same if the column values are the same. This is not a real problem in practice. I'm going to leave this around as documentation but am closing as won't fix.

@postamar postamar moved this from Triage to Cold storage in SQL Foundations Nov 8, 2022
@ajwerner ajwerner closed this as completed Nov 8, 2022
SQL Queries automation moved this from Triage to Done Nov 8, 2022
SQL Foundations automation moved this from Cold storage to Done Nov 8, 2022
@exalate-issue-sync exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label Nov 8, 2022
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
@michae2
Copy link
Collaborator Author

michae2 commented Jul 21, 2023

@odessit55 found another case using DECIMAL:

CREATE TABLE products (
  ts DECIMAL NOT NULL PRIMARY KEY USING HASH WITH (bucket_count=16),
  product_id INT8
);

INSERT INTO products SELECT generate_series(0, 1023), 101;

Which shows a similar problem:

demo@127.0.0.1:26257/demoapp/defaultdb> SELECT crdb_internal_ts_shard_16 AS shard, COUNT(*) FROM products GROUP BY 1 ORDER BY 1;
  shard | count
--------+--------
      0 |     1
      1 |   127
      2 |     1
      3 |   126
      4 |     1
      5 |   126
      6 |     1
      7 |   127
      8 |     3
      9 |   126
     10 |     2
     11 |   128
     12 |     1
     13 |   127
     14 |     1
     15 |   126
(16 rows)

Time: 3ms total (execution 3ms / network 0ms)

Even if we expect real-world data to be better distributed, it doesn't look great when simple user tests result in non-uniform distributions.

@michae2 michae2 reopened this Jul 21, 2023
SQL Foundations automation moved this from Done to Triage Jul 21, 2023
@mgartner mgartner moved this from Done to New Backlog in SQL Queries Jul 21, 2023
@michae2 michae2 changed the title sql: non-uniform hash-sharded index on two equal INT8 columns with power-of-2 number of buckets sql: non-uniform hash-sharded index with power-of-2 number of buckets Jul 25, 2023
@michae2 michae2 added A-hash-sharding Hash-sharded indexes and removed E-quick-win Likely to be a quick win for someone experienced. labels Jul 25, 2023
@rafiss
Copy link
Collaborator

rafiss commented Aug 22, 2023

@michae2 in that most recent example, the bucket count was specified manually, so it seems this issue is going beyond just changing the default bucket count. Do you think we should change the hashing function?

@michae2
Copy link
Collaborator Author

michae2 commented Aug 22, 2023

@michae2 in that most recent example, the bucket count was specified manually, so it seems this issue is going beyond just changing the default bucket count. Do you think we should change the hashing function?

Yes, I think we should.

@rafiss
Copy link
Collaborator

rafiss commented Aug 23, 2023

One idea is to use mod(fnv32(md5(crdb_internal.datums_to_bytes(columns))), bucket_count). In that example:

> create table t(a decimal primary key, b int as (mod(fnv32(md5(crdb_internal.datums_to_bytes(a))), 16)) virtual);

> INSERT INTO t SELECT generate_series(0, 1023);

> SELECT b AS shard, COUNT(*) FROM t GROUP BY 1 ORDER BY 1;
  shard | count
--------+--------
      0 |    51
      1 |    54
      2 |    65
      3 |    68
      4 |    52
      5 |    67
      6 |    58
      7 |    67
      8 |    74
      9 |    73
     10 |    73
     11 |    56
     12 |    72
     13 |    63
     14 |    64
     15 |    67
(16 rows)

@rafiss rafiss self-assigned this Aug 23, 2023
@craig craig bot closed this as completed in 9d3115d Sep 8, 2023
SQL Foundations automation moved this from Triage to Done Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-hash-sharding Hash-sharded indexes C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
Archived in project
SQL Queries
New Backlog
3 participants