-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/sem/builtins: add crdb_internal.datums_to_bytes, use in hash sharding #67865
sql/sem/builtins: add crdb_internal.datums_to_bytes, use in hash sharding #67865
Conversation
649b2e4
to
667a412
Compare
@RaduBerinde this PR has the changes. Ideally that big select in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/sql/create_table.go, line 2716 at r2 (raw file):
} func makeHashShardComputeExpr(colNames []string, buckets int) *string {
[nit] probably on your TODO list but this should have a comment with sample expressions
pkg/sql/create_table.go, line 2752 at r2 (raw file):
} } // Construct an expression which is the sum of all of the casted and hashed
[nit] comment needs updating ("casted")
pkg/sql/create_table.go, line 2761 at r2 (raw file):
expr = &tree.BinaryExpr{ Left: modBuckets(hashedColumnExpr(i)), Operator: tree.MakeBinaryOperator(tree.Plus),
Would it be easier to just do fnv32(encode_key(col1, col2, col3)) % buckets
?
pkg/sql/opt/memo/logical_props_builder.go, line 2644 at r2 (raw file):
// TODO(ajwerner,RaduBerinde): Deal with the cases like // `crdb_internal.key_encode(f4+1)`. if funcExpr, ok := e.(*FunctionExpr); ok && funcExpr.Properties.CompositeInsensitive {
This can be simplified to just make sure that checkTypes is false below when funcExpr.Properties.CompositeInsensitive = true
. The code below already checks that the inputs are not sensitive.
pkg/sql/sem/builtins/builtins.go, line 3548 at r1 (raw file):
}, Volatility: tree.VolatilityImmutable, Info: "Converts datums into key-encoded bytes. " +
Not sure if it should be here or in a comment above the built-in but we should mention the property key_encode(a) == key_encode(b) iff (a IS NOT DISTINCT FROM b)
pkg/sql/sem/builtins/key_encode_builtin_test.go, line 1 at r1 (raw file):
package builtins_test
[nit] missing header
pkg/sql/sem/builtins/key_encode_builtin_test.go, line 24 at r1 (raw file):
// can cross product them all and we get different values which align to the uniformity // expectations. func TestCrdbInternalKeyEncode(t *testing.T) {
I don't quite get what this test is trying to achieve, it seems overly complicated. Is it essentially just cross-checking the distinct count against the distinct key count? Wouldn't we get the same or better by generating a random values clause and doing something like
with v as (values (1),(2),(3))
select (select count(distinct v.*) from v) - (select count(distinct key_encode(v.*)) from v);
pkg/sql/sem/builtins/key_encode_builtin_test.go, line 73 at r1 (raw file):
// Generate a query to cross join all of the values of the two rows // with also the column ordinal between them. crossJoinQuery := func() string {
[nit] would help to show a sample query
pkg/sql/sem/tree/function_definition.go, line 108 at r2 (raw file):
HasSequenceArguments bool // CompositeInsensitive indicates that this function does not care about
CompositeInsensitive indicates that this function returns equal results when evaluated on equal inputs. This is a non-trivial property for composite types which can be equal but not identical (e.g. decimals 1.0 and 1.00). For example, converting a decimal to string is not CompositeInsensitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)
pkg/sql/create_table.go, line 2761 at r2 (raw file):
Previously, RaduBerinde wrote…
Would it be easier to just do
fnv32(encode_key(col1, col2, col3)) % buckets
?
I was looking at the distribution of values and startled by it which lead me down a small rabbit hole. Consider the test I wrote with the big cross join of two rows. All of the rows end up being unique and constructing unique keys using encode_key
. What startled me is that it seemed that no matter how I ended up randomizing the inputs, the bucket counts were perfectly equal. That alarmed me. It got me thinking that fnv32
must have strong correlations that are not being taken into account.
To make this concrete, let's imagine the following for the one non-nil row
tdb.Exec(t, `INSERT
INTO t
SELECT mod(i, 1<<15)::INT2,
i::INT4,
i::INT8,
i::FLOAT4,
i::FLOAT8,
i::STRING,
i::CHAR,
i::STRING::BYTES,
i::DECIMAL,
i::INTERVAL,
i::OID,
i::TIMESTAMPTZ,
i::TIMESTAMP,
i::DATE,
('127.0.0.' || mod(i, 256)::STRING)::INET,
i::VARBIT,
ARRAY[i::STRING],
ARRAY[i]
FROM (
SELECT $1::INT AS i
);`, rand.Intn(math.MaxInt32))
If I do this query with %s as the cross join (including or not! the column ordinals), I always find the following distribution
SELECT k, count(k) AS c
FROM (
SELECTmod(fnv32(crdb_internal.key_encode(input.*)), 8) AS k
FROM (%s) AS input
)
GROUP BY k
ORDER BY k
0, 32768
1, 32768
2, 32768
3, 32768
4, 32768
5, 32768
6, 32768
7, 32768
That scared me, I'd expect it to be independent. In fact, most formulations that don't involve the previous step in the hash of the next step don't end up being independent (the current formula is just as broken). If you use fnv64, it's a bit better but still not perfect. If you use sha256
and then fnv32
then you get something that feels truly independent. Also, if you incorporate the result of the previous step in the current step, then it feels independent. I don't know the right tradeoff or the math here.
So, here's what I'm thinking for the two options:
SELECT mod(fnv32(sha256(crdb_internal.key_encode(a.i2, 0, b.i4, 1, c.i8, 2, d.f4, 3, e.f8, 4, f.s, 5, g.c, 6, h.b, 7, i.dc, 8, j.ival, 9, k.oid, 10, l.tstz, 11, m.ts, 12, n.da, 13, o.inet, 14, p.vb, 15, q.sa, 16, r.ia, 17))), 8)
FROM t a, t b, t c, t d, t e, t f, t g, t h, t i, t j, t k, t l, t m, t n, t o, t p, t q, t r;
SELECT
*
FROM
(
SELECT
mod(
fnv32(
crdb_internal.key_encode(
a.i2,
0,
fnv32(
crdb_internal.key_encode(
b.i4,
1,
fnv32(
crdb_internal.key_encode(
c.i8,
2,
fnv32(
crdb_internal.key_encode(
d.f4,
3,
fnv32(
crdb_internal.key_encode(
e.f8,
4,
fnv32(
crdb_internal.key_encode(
f.s,
5,
fnv32(
crdb_internal.key_encode(
g.c,
6,
fnv32(
crdb_internal.key_encode(
h.b,
7,
fnv32(
crdb_internal.key_encode(
i.dc,
8,
fnv32(
crdb_internal.key_encode(
j.ival,
9,
fnv32(
crdb_internal.key_encode(
k.oid,
10,
fnv32(
crdb_internal.key_encode(
l.tstz,
11,
fnv32(
crdb_internal.key_encode(
m.ts,
12,
fnv32(
crdb_internal.key_encode(
n.da,
13,
fnv32(
crdb_internal.key_encode(
o.inet,
14,
fnv32(crdb_internal.key_encode(p.vb, 15, fnv32(crdb_internal.key_encode(q.sa, 16, fnv32(crdb_internal.key_encode(r.ia, 17))))))
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
),
8
)
FROM
t AS a, t AS b, t AS c, t AS d, t AS e, t AS f, t AS g, t AS h, t AS i, t AS j, t AS k, t AS l, t AS m, t AS n, t AS o, t AS p, t AS q, t AS r
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/sql/create_table.go, line 2761 at r2 (raw file):
Previously, ajwerner wrote…
I was looking at the distribution of values and startled by it which lead me down a small rabbit hole. Consider the test I wrote with the big cross join of two rows. All of the rows end up being unique and constructing unique keys using
encode_key
. What startled me is that it seemed that no matter how I ended up randomizing the inputs, the bucket counts were perfectly equal. That alarmed me. It got me thinking thatfnv32
must have strong correlations that are not being taken into account.To make this concrete, let's imagine the following for the one non-nil row
tdb.Exec(t, `INSERT INTO t SELECT mod(i, 1<<15)::INT2, i::INT4, i::INT8, i::FLOAT4, i::FLOAT8, i::STRING, i::CHAR, i::STRING::BYTES, i::DECIMAL, i::INTERVAL, i::OID, i::TIMESTAMPTZ, i::TIMESTAMP, i::DATE, ('127.0.0.' || mod(i, 256)::STRING)::INET, i::VARBIT, ARRAY[i::STRING], ARRAY[i] FROM ( SELECT $1::INT AS i );`, rand.Intn(math.MaxInt32))If I do this query with %s as the cross join (including or not! the column ordinals), I always find the following distribution
SELECT k, count(k) AS c FROM ( SELECTmod(fnv32(crdb_internal.key_encode(input.*)), 8) AS k FROM (%s) AS input ) GROUP BY k ORDER BY k0, 32768 1, 32768 2, 32768 3, 32768 4, 32768 5, 32768 6, 32768 7, 32768
That scared me, I'd expect it to be independent. In fact, most formulations that don't involve the previous step in the hash of the next step don't end up being independent (the current formula is just as broken). If you use fnv64, it's a bit better but still not perfect. If you use
sha256
and thenfnv32
then you get something that feels truly independent. Also, if you incorporate the result of the previous step in the current step, then it feels independent. I don't know the right tradeoff or the math here.So, here's what I'm thinking for the two options:
SELECT mod(fnv32(sha256(crdb_internal.key_encode(a.i2, 0, b.i4, 1, c.i8, 2, d.f4, 3, e.f8, 4, f.s, 5, g.c, 6, h.b, 7, i.dc, 8, j.ival, 9, k.oid, 10, l.tstz, 11, m.ts, 12, n.da, 13, o.inet, 14, p.vb, 15, q.sa, 16, r.ia, 17))), 8) FROM t a, t b, t c, t d, t e, t f, t g, t h, t i, t j, t k, t l, t m, t n, t o, t p, t q, t r;SELECT * FROM ( SELECT mod( fnv32( crdb_internal.key_encode( a.i2, 0, fnv32( crdb_internal.key_encode( b.i4, 1, fnv32( crdb_internal.key_encode( c.i8, 2, fnv32( crdb_internal.key_encode( d.f4, 3, fnv32( crdb_internal.key_encode( e.f8, 4, fnv32( crdb_internal.key_encode( f.s, 5, fnv32( crdb_internal.key_encode( g.c, 6, fnv32( crdb_internal.key_encode( h.b, 7, fnv32( crdb_internal.key_encode( i.dc, 8, fnv32( crdb_internal.key_encode( j.ival, 9, fnv32( crdb_internal.key_encode( k.oid, 10, fnv32( crdb_internal.key_encode( l.tstz, 11, fnv32( crdb_internal.key_encode( m.ts, 12, fnv32( crdb_internal.key_encode( n.da, 13, fnv32( crdb_internal.key_encode( o.inet, 14, fnv32(crdb_internal.key_encode(p.vb, 15, fnv32(crdb_internal.key_encode(q.sa, 16, fnv32(crdb_internal.key_encode(r.ia, 17)))))) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ), 8 ) FROM t AS a, t AS b, t AS c, t AS d, t AS e, t AS f, t AS g, t AS h, t AS i, t AS j, t AS k, t AS l, t AS m, t AS n, t AS o, t AS p, t AS q, t AS r );
In that example, we have two random rows in t
(generated with that Exec statement) and we're mixing and matching all combinations?
When you say "That scared me, I'd expect it to be independent", what do you mean? Just that the counts shouldn't look so perfect?
What feels wrong is that we do mod 8, so we only look at the last few bits of the fnv32, which is not great for a non-cryptographic hash. Using a prime number for the buckets would fix that. I think we could also mod against a bigger prime number before modding down to 8, or shuffle the bits somehow.
I think this can be a problem even for a single column, though I can't find an example.
My concern with your first suggestion is that sha is fairly expensive to calculate. And the second suggestion feels like its obfuscating the underlying problem but not solving it entirely.
pkg/sql/create_table.go, line 2761 at r2 (raw file): Previously, RaduBerinde wrote…
I guess this example kind-of illustrates your finding:
Note how changing the last value doesn't change the bucket, even though the overall fnv value changes. I think it's probably possible that, given the very specific way in which these rows are constructed, for any row you can find 32768 other rows where the values that are different don't affect the lower 8 bits of the fnv. Or something along those lines. Not sure how much of an alarm this whole thing is.. It would be more alarming if we could find data where the buckets are super skewed. |
pkg/sql/create_table.go, line 2761 at r2 (raw file): Previously, RaduBerinde wrote…
*lower 3 bits of the fnv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)
pkg/sql/create_table.go, line 2761 at r2 (raw file):
Previously, RaduBerinde wrote…
*lower 3 bits of the fnv
Given all of this and efficiency considerations, I'm thinking about just adding a different builtin which takes the datums and hashes them but chains the hash values together to make things . Would such a function upset you? I can also be convinced to stop caring.
var h fnv.New32()
var buf []byte
for i, datum := range args {
buf = buf[:0]
var err error
buf, err = rowenc.EncodeTableKey(out, datum, encoding.Ascending)
if err != nil { return err }
_, err := h.Write(buf)
if err != nil { ... }
var scratch [4]byte
binary.PutUvarint(scratch, i) // hash the index hash value to make sequences of the same value independent
_, err := h.Write(scratch[:])
if err != nil { ... }
binary.PutUvarint(scratch, h.Sum32()) // hash the current hash value to increase the data independence
_, err := h.Write(scratch[:])
if err != nil { ... }
}
pkg/sql/opt/memo/logical_props_builder.go, line 2644 at r2 (raw file):
Previously, RaduBerinde wrote…
This can be simplified to just make sure that checkTypes is false below when
funcExpr.Properties.CompositeInsensitive = true
. The code below already checks that the inputs are not sensitive.
I don't think that's quite true. The Child(0)
of FunctionExpr
is its arg list. I think I could do the following but something about it felt icky at the time.
checkTypes := !opt.IsCompositeInsensitiveOp(e)
if funcExpr, ok := e.(*FunctionExpr); ok && funcExpr.Properties.CompositeInsensitive {
checkTypes = false
e = funcExpr.Args
}
pkg/sql/sem/builtins/key_encode_builtin_test.go, line 24 at r1 (raw file):
Previously, RaduBerinde wrote…
I don't quite get what this test is trying to achieve, it seems overly complicated. Is it essentially just cross-checking the distinct count against the distinct key count? Wouldn't we get the same or better by generating a random values clause and doing something like
with v as (values (1),(2),(3)) select (select count(distinct v.*) from v) - (select count(distinct key_encode(v.*)) from v);
Sure, I was mostly just testing that all the data types worked and that they NULLs worked and then got too clever. I'll take another pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)
pkg/sql/create_table.go, line 2761 at r2 (raw file):
Previously, ajwerner wrote…
Given all of this and efficiency considerations, I'm thinking about just adding a different builtin which takes the datums and hashes them but chains the hash values together to make things . Would such a function upset you? I can also be convinced to stop caring.
var h fnv.New32() var buf []byte for i, datum := range args { buf = buf[:0] var err error buf, err = rowenc.EncodeTableKey(out, datum, encoding.Ascending) if err != nil { return err } _, err := h.Write(buf) if err != nil { ... } var scratch [4]byte binary.PutUvarint(scratch, i) // hash the index hash value to make sequences of the same value independent _, err := h.Write(scratch[:]) if err != nil { ... } binary.PutUvarint(scratch, h.Sum32()) // hash the current hash value to increase the data independence _, err := h.Write(scratch[:]) if err != nil { ... } }
In general I'm opposed to coming up with this kind of one-off supposed improvements, unless it is following some standard practice or has clear math/reasoning behind it. It's a bit like trying to come up with your own security scheme.
Passing the current hash value through the hasher makes no sense to me. What "data independence" does it increase? It's mathematically equivalent to applying this obscure, static uint32->uint32 function to the current hash. At best, it's a bijective function and you effectively did nothing. At worst, it's collapsing different hashes into the same hash, making things worse.
Also don't see a benefit to hashing the index value (given that the concatenated key is such that you can split it into pieces without ambiguity). I don't know what "make sequences of the same value independent" means.
To me the legit concept is that we take what would be the key for a non-sharded index and hash that. It doesn't get cleaner than that conceptually - it's equivalent to having a hash-based KV store, which is what this is trying to simulate in the first place. I have no objection to a built-in that does that (to save on holding the entire key in memory).
pkg/sql/opt/memo/logical_props_builder.go, line 2644 at r2 (raw file): Previously, ajwerner wrote…
So? It's a recursive function.. |
pkg/sql/opt/memo/logical_props_builder.go, line 2644 at r2 (raw file): Previously, RaduBerinde wrote…
Ah, maybe ScalarListOp is not marked as insensitive, we should fix that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)
pkg/sql/create_table.go, line 2761 at r2 (raw file):
Previously, RaduBerinde wrote…
In general I'm opposed to coming up with this kind of one-off supposed improvements, unless it is following some standard practice or has clear math/reasoning behind it. It's a bit like trying to come up with your own security scheme.
Passing the current hash value through the hasher makes no sense to me. What "data independence" does it increase? It's mathematically equivalent to applying this obscure, static uint32->uint32 function to the current hash. At best, it's a bijective function and you effectively did nothing. At worst, it's collapsing different hashes into the same hash, making things worse.
Also don't see a benefit to hashing the index value (given that the concatenated key is such that you can split it into pieces without ambiguity). I don't know what "make sequences of the same value independent" means.
To me the legit concept is that we take what would be the key for a non-sharded index and hash that. It doesn't get cleaner than that conceptually - it's equivalent to having a hash-based KV store, which is what this is trying to simulate in the first place. I have no objection to a built-in that does that (to save on holding the entire key in memory).
I don't have a formalism about the "independence" so I'll stop worrying about it. Anecdotally I observe that if you feed the hash value back in that two rows seem to be no longer related. I've been convinced that I don't care about that. I also don't care much about the buffering. Thanks for all the engagement, I've been convinced to stop caring.
pkg/sql/opt/memo/logical_props_builder.go, line 2644 at r2 (raw file):
Previously, RaduBerinde wrote…
Ah, maybe ScalarListOp is not marked as insensitive, we should fix that
The value of checkTypes
is not passed in recursively (for good reason). The ScalarListExpr
that is Args
will have its types checked, right?
pkg/sql/opt/memo/logical_props_builder.go, line 2644 at r2 (raw file): Previously, ajwerner wrote…
Not if it (the ScalarListOp) is marked as an insensitive op. We are checking that all paths to variable leaves only pass through composite-insensitive ops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)
pkg/sql/opt/memo/logical_props_builder.go, line 2644 at r2 (raw file):
Previously, RaduBerinde wrote…
Not if it (the ScalarListOp) is marked as an insensitive op. We are checking that all paths to variable leaves only pass through composite-insensitive ops.
If ScalarListOp
were marked as an insensitive op then wouldn't that mean that function invocations be treated as insensitive? At that point anything related to the function properties would be irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)
pkg/sql/opt/memo/logical_props_builder.go, line 2644 at r2 (raw file):
Previously, ajwerner wrote…
If
ScalarListOp
were marked as an insensitive op then wouldn't that mean that function invocations be treated as insensitive? At that point anything related to the function properties would be irrelevant.
An example of a function that, I think, should be composite sensitive is json_build_object
. I think that that thing already has some problems:
demo@127.0.0.1:52608/movr> SELECT json_build_object('a', -0.0);
json_build_object
---------------------
{"a": 0.0}
(1 row)
Time: 0ms total (execution 0ms / network 0ms)
demo@127.0.0.1:52608/movr> SELECT json_build_object('a', -0);
json_build_object
---------------------
{"a": 0}
(1 row)
demo@127.0.0.1:52608/movr> SELECT (json_build_object('a', '-0.0'::JSON)->'a')::FLOAT;
float8
----------
-0
(1 row)
pkg/sql/opt/memo/logical_props_builder.go, line 2644 at r2 (raw file): Previously, ajwerner wrote…
ScalarListOp is "under" the function. The function would be checked separately. I believe that what you're discovering above is that you can't simply use -0 in a statement to create a -0 float (and I believe that's consistent with pg). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)
pkg/sql/opt/memo/logical_props_builder.go, line 2644 at r2 (raw file):
Previously, RaduBerinde wrote…
ScalarListOp is "under" the function. The function would be checked separately.
I believe that what you're discovering above is that you can't simply use -0 in a statement to create a -0 float (and I believe that's consistent with pg).
Sorry for the confusion. What I was missing was that getOuterCols
was also recursive. I think that that makes this check N^2 in the depth of the expression but yes, if we also mark ScalarListOp
as IsCompositeInsensitiveOp
then it all works out. Thanks!
pkg/sql/opt/memo/logical_props_builder.go, line 2644 at r2 (raw file): Previously, ajwerner wrote…
Good point, maybe leave a TODO in there for the quadratic behavior. |
I noticed there already is a |
4279e15
to
ca026e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)
pkg/sql/create_table.go, line 2569 at r3 (raw file):
// makeShardColumnDesc returns a new column descriptor for a hidden computed shard column // based on all the `colNames`. func makeShardColumnDesc(
[nit] this deserves a comment explaining the argument, maybe pointing to the two make*ComputeExpr for details.
pkg/sql/create_table.go, line 2625 at r3 (raw file):
Exprs: tree.Exprs{ expr, &tree.CastExpr{
Is this cast necessary? Maybe we can just use tree.AdjustValueToType
instead? Though I think we should error out much earlier if the value is too large.
b57a6bc
to
94587d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)
pkg/sql/create_table.go, line 2716 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] probably on your TODO list but this should have a comment with sample expressions
Done.
pkg/sql/create_table.go, line 2569 at r3 (raw file):
Previously, RaduBerinde wrote…
[nit] this deserves a comment explaining the argument, maybe pointing to the two make*ComputeExpr for details.
Done.
pkg/sql/create_table.go, line 2625 at r3 (raw file):
Previously, RaduBerinde wrote…
Is this cast necessary? Maybe we can just use
tree.AdjustValueToType
instead? Though I think we should error out much earlier if the value is too large.
Removed. Also added a check and test.
pkg/sql/opt/memo/logical_props_builder.go, line 2644 at r2 (raw file):
Previously, RaduBerinde wrote…
Good point, maybe leave a TODO in there for the quadratic behavior.
This is now fixed 🎉
pkg/sql/sem/tree/function_definition.go, line 108 at r2 (raw file):
Previously, RaduBerinde wrote…
CompositeInsensitive indicates that this function returns equal results when evaluated on equal inputs. This is a non-trivial property for composite types which can be equal but not identical (e.g. decimals 1.0 and 1.00). For example, converting a decimal to string is not CompositeInsensitive.
Removed as it's not needed anymore.
pkg/sql/sem/builtins/key_encode_builtin_test.go, line 1 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] missing header
Done.
pkg/sql/sem/builtins/key_encode_builtin_test.go, line 24 at r1 (raw file):
Previously, ajwerner wrote…
Sure, I was mostly just testing that all the data types worked and that they NULLs worked and then got too clever. I'll take another pass.
Cleaned it up roughly as you suggested.
pkg/sql/sem/builtins/key_encode_builtin_test.go, line 73 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] would help to show a sample query
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)
pkg/sql/sem/tree/function_definition.go, line 108 at r2 (raw file):
Previously, ajwerner wrote…
Removed as it's not needed anymore.
Hm, how come? I don't think we can infer a hash value for composite types without it..
pkg/sql/sem/tree/function_definition.go, line 108 at r2 (raw file): Previously, RaduBerinde wrote…
I think you might have encountered a bug. You made the scalar list composite insensitive and we're probably hitting this code:
The data type of scalar list is |
4b9db56
to
a6ca290
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)
pkg/sql/sem/tree/function_definition.go, line 108 at r2 (raw file):
Previously, RaduBerinde wrote…
I think you might have encountered a bug. You made the scalar list composite insensitive and we're probably hitting this code:
if opt.IsCompositeInsensitiveOp(e) { // The operator is known to be composite-insensitive. If its result is a // non-composite type, it is also composite-identical. return true, !colinfo.HasCompositeKeyEncoding(e.(opt.ScalarExpr).DataType()) }
The data type of scalar list is
Any
and it's not handled. There are others likeTuple
that aren't handled. We should changeHasCompositeKeyEncoding
toCanHaveCompositeKeyEncoding
and have the default return value for unhandled types be true.
Alrighty, it's back and it's now confirmed needed. See how you feel about the new logic for CanHaveCompositeKeyEncoding
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)
pkg/sql/sem/tree/function_definition.go, line 108 at r2 (raw file):
Previously, ajwerner wrote…
Alrighty, it's back and it's now confirmed needed. See how you feel about the new logic for
CanHaveCompositeKeyEncoding
.
👍 👍 looks great!
5c08eab
to
e21e00e
Compare
@RaduBerinde anything else you want on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)
e21e00e
to
7bdae14
Compare
This patch also fixes the behavior to properly catagorize types like any and unknown. It does so by adding all of the families and setting the default to be true instead of the prior false. Release note: None
This new builtin function encodes datums using the key encoding. It is useful in particular because it is a function with immutable volatility from a large number of data types to bytes. Release note (sql change): Added a builtin function, crdb_internal.datums_to_bytes, which can encode any data type which can be used in an forward index key into bytes in an immutable way. This function is now used in the expression for hash sharded indexes.
7bdae14
to
f172756
Compare
bors r+ |
Build succeeded: |
We've discovered a case where a combination of non-random data and a power-of-two number of buckets causes an uneven distribution of rows in a hash-sharded index. While this specific case is very contrived, it illustrates a small weakness in the current hash-shard calculation: modulo by a power-of-two number of buckets only uses the last few bits of the hash value. Radu suggested this would be a problem in cockroachdb#67865 and also suggested a fix: add an intermediate modulo by a larger prime before modulo by num buckets, so we'll try that. Fixes: cockroachdb#91109 Epic: None Release note (performance improvement): This change updates the shard calculation of newly-created hash-sharded indexes so that uneven distributions of rows are less likely.
This new builtin function encodes datums using the key encoding. It is useful
in particular because it is a function with immutable volatility from a large
number of data types to bytes.
The function is then adopted as the default expression for hash sharded indexes if the version is new enough.
Fixes #67170.
Release note (sql change): Added a builtin function,
crdb_internal.datums_to_bytes
, which can encode any data type which can beused in an forward index key into bytes in an immutable way. This function
is now used in the expression for hash sharded indexes.