Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ttljob: fix job to handle composite PKs #116988

Merged
merged 1 commit into from Jan 8, 2024

Conversation

rafiss
Copy link
Collaborator

@rafiss rafiss commented Dec 22, 2023

Certain types, like decimals and collated strings, store their contents in the value of the encoded KeyValue. The ttljob code that decodes the span bounds into datums for the purpose of creating SQL query bounds previously did not take this into account.

fixes #116845
Release note (bug fix): Fixed a bug in the row-level TTL job that would cause it to skip expired rows if the primary key of the table included columns of the collated string or decimal type.

@rafiss rafiss added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. labels Dec 22, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rafiss rafiss force-pushed the fix-ttl-composite-key-decoding branch 3 times, most recently from 0a095c7 to 99bd654 Compare December 22, 2023 19:02
@rafiss rafiss marked this pull request as ready for review December 23, 2023 15:29
@rafiss rafiss requested review from a team as code owners December 23, 2023 15:29
@rafiss rafiss requested review from DrewKimball and removed request for a team December 23, 2023 15:29
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)


pkg/sql/rowenc/index_encoding.go line 466 at r1 (raw file):

	if kvVal := keyValue.Value; kvVal != nil && kvVal.GetTag() == roachpb.ValueType_TUPLE {
		valueBytes, err := kvVal.GetTuple()

I believe we can rely on the value having a Tuple encoding for column families that include a primary key column, so skipping non-tuple encodings should be fine (although a comment explaining why would be nice).

The one case I can see where the encoding might be Bytes instead is for a temporary index created during a backfill, but I'm guessing/hoping the TTL code doesn't interact with that case? E.g. what happens if a primary key is added to a table with row-level TTL enabled? My reading is that this temporary index is never read apart from the backfill operation.


pkg/sql/ttl/ttljob/ttljob_processor.go line 415 at r1 (raw file):

	alloc *tree.DatumAlloc,
) (bounds QueryBounds, hasRows bool, _ error) {
	const maxRows = 1

I think it'll be necessary to pass in the table descriptor as well, and set this constant to desc.IndexKeysPerRow(primaryIndexDesc) or desc.NumFamilies() (these are equivalent for primary indexes). That'll ensure that we scan all the KVs that make up the row. If we didn't do that, we'd miss a composite value encoded in a family other than 0. Then, we have to iterate through them while decoding. Here's an example of when a composite primary key value can be in a family ID > 0.
(Also, I think the reverse scan retrieves the last column family first, so the current state might be broken even if we disallow the nonzero column family case).

As an extra snarl, it's possible for a family KV to be omitted if all columns making it up are NULL. So, we have to check while decoding whether we've reached the next row, and stop if we have. The row fetcher check is here for reference:

unchangedPrefix := (!rf.args.SpansCanOverlap || rf.spanID == spanID) &&
rf.table.spec.MaxKeysPerRow > 1 && rf.indexKey != nil && bytes.HasPrefix(rf.kv.Key, rf.indexKey)
if unchangedPrefix {
// Skip decoding!
rf.keyRemainingBytes = rf.kv.Key[len(rf.indexKey):]
return false, spanID, nil
}
// The current key belongs to a new row.
if rf.mustDecodeIndexKey {

BTW, because a primary index includes NOT NULL constraints, we can rely on the KVs being present for any column family that contains a composite value for a primary index column.


pkg/sql/ttl/ttljob/ttljob_processor_test.go line 328 at r1 (raw file):

	}

	for _, tc := range testCases {

Can we add a non primary-key column, and test different combinations of column families?

CREATE TABLE %s (a string, b string COLLATE en_US_u_ks_level2, c INT, PRIMARY KEY(a,b))
CREATE TABLE %s (a STRING, b STRING COLLATE en_US_u_ks_level2, c INT, PRIMARY KEY(a,b), FAMILY (a, b), FAMILY (c))
CREATE TABLE %s (a STRING, b STRING COLLATE en_US_u_ks_level2, c INT, PRIMARY KEY(a,b), FAMILY (c), FAMILY (a, b))
CREATE TABLE %s (a STRING, b STRING COLLATE en_US_u_ks_level2, c INT, PRIMARY KEY(a,b), FAMILY (a), FAMILY (b), FAMILY (c))

^ I suspect some of these will be broken.
To add to the above, an interesting case will be when c is NULL, since then its family's KV will be omitted.

It's probably also worth checking decimals/floats. For reference, floats are only composite for -0, and decimals are only composite for -0 or trailing zeroes. The logic should work for both composite and non-composite rows.


pkg/sql/ttl/ttljob/ttljob_test.go line 581 at r1 (raw file):

	collatedStringType := types.MakeCollatedString(types.String, "en" /* locale */)
	var indexableTyps []*types.T
	for _, typ := range append(types.Scalar, collatedStringType) {

We use types.Scalar in a bunch of places for tests - I wonder if we're missing out on useful testing coverage by not including collated strings.

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball)


pkg/sql/ttl/ttljob/ttljob_processor_test.go line 328 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Can we add a non primary-key column, and test different combinations of column families?

CREATE TABLE %s (a string, b string COLLATE en_US_u_ks_level2, c INT, PRIMARY KEY(a,b))
CREATE TABLE %s (a STRING, b STRING COLLATE en_US_u_ks_level2, c INT, PRIMARY KEY(a,b), FAMILY (a, b), FAMILY (c))
CREATE TABLE %s (a STRING, b STRING COLLATE en_US_u_ks_level2, c INT, PRIMARY KEY(a,b), FAMILY (c), FAMILY (a, b))
CREATE TABLE %s (a STRING, b STRING COLLATE en_US_u_ks_level2, c INT, PRIMARY KEY(a,b), FAMILY (a), FAMILY (b), FAMILY (c))

^ I suspect some of these will be broken.
To add to the above, an interesting case will be when c is NULL, since then its family's KV will be omitted.

It's probably also worth checking decimals/floats. For reference, floats are only composite for -0, and decimals are only composite for -0 or trailing zeroes. The logic should work for both composite and non-composite rows.

nice, thanks for those other examples. i've added families to the randomized integration test, but for this TestSpanToQueryBoundsCompositeKeys unit test i'm having trouble since it uses replicationtestutils.EncodeKV, which requires one family. are you aware of another helper that could be used here?

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)


pkg/sql/ttl/ttljob/ttljob_processor_test.go line 328 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nice, thanks for those other examples. i've added families to the randomized integration test, but for this TestSpanToQueryBoundsCompositeKeys unit test i'm having trouble since it uses replicationtestutils.EncodeKV, which requires one family. are you aware of another helper that could be used here?

That limitation doesn't look fundamental. Would something like this work?

diff --git a/pkg/ccl/streamingccl/replicationtestutils/encoding.go b/pkg/ccl/streamingccl/replicationtestutils/encoding.go
index c4813593c3f..7d602ffaac7 100644
--- a/pkg/ccl/streamingccl/replicationtestutils/encoding.go
+++ b/pkg/ccl/streamingccl/replicationtestutils/encoding.go
@@ -26,6 +26,28 @@ func EncodeKV(
        t *testing.T, codec keys.SQLCodec, descr catalog.TableDescriptor, pkeyVals ...interface{},
 ) roachpb.KeyValue {
        require.Equal(t, 1, descr.NumFamilies(), "there can be only one")
+       indexEntries := encodeKVImpl(t, codec, descr, pkeyVals)
+       require.Equal(t, 1, len(indexEntries))
+       return roachpb.KeyValue{Key: indexEntries[0].Key, Value: indexEntries[0].Value}
+}
+
+// EncodeKVs is similar to EncodeKV, but can be used for a table with multiple
+// column families, in which case up to one KV is returned per family.
+func EncodeKVs(
+       t *testing.T, codec keys.SQLCodec, descr catalog.TableDescriptor, pkeyVals ...interface{},
+) []roachpb.KeyValue {
+       indexEntries := encodeKVImpl(t, codec, descr, pkeyVals)
+       require.GreaterOrEqual(t, 1, len(indexEntries))
+       kvs := make([]roachpb.KeyValue, len(indexEntries))
+       for i := range indexEntries {
+               kvs[i] = roachpb.KeyValue{Key: indexEntries[i].Key, Value: indexEntries[i].Value}
+       }
+       return kvs
+}
+
+func encodeKVImpl(
+       t *testing.T, codec keys.SQLCodec, descr catalog.TableDescriptor, pkeyVals ...interface{},
+) []rowenc.IndexEntry {
        primary := descr.GetPrimaryIndex()
        require.LessOrEqual(t, primary.NumKeyColumns(), len(pkeyVals))

@@ -42,9 +64,10 @@ func EncodeKV(
        indexEntries, err := rowenc.EncodePrimaryIndex(codec, descr, primary,
                colMap, datums, includeEmpty)
        require.NoError(t, err)
-       require.Equal(t, 1, len(indexEntries))
-       indexEntries[0].Value.InitChecksum(indexEntries[0].Key)
-       return roachpb.KeyValue{Key: indexEntries[0].Key, Value: indexEntries[0].Value}
+       for i := range indexEntries {
+               indexEntries[i].Value.InitChecksum(indexEntries[i].Key)
+       }
+       return indexEntries
 }

@rafiss rafiss force-pushed the fix-ttl-composite-key-decoding branch from 99bd654 to 60c83bd Compare January 6, 2024 09:00
@rafiss rafiss requested a review from a team as a code owner January 6, 2024 09:00
@rafiss rafiss requested review from msbutler and removed request for a team January 6, 2024 09:00
@rafiss rafiss force-pushed the fix-ttl-composite-key-decoding branch 2 times, most recently from 16cc7a3 to da9d26a Compare January 7, 2024 15:30
Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @msbutler)


pkg/sql/ttl/ttljob/ttljob_processor.go line 415 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I think it'll be necessary to pass in the table descriptor as well, and set this constant to desc.IndexKeysPerRow(primaryIndexDesc) or desc.NumFamilies() (these are equivalent for primary indexes). That'll ensure that we scan all the KVs that make up the row. If we didn't do that, we'd miss a composite value encoded in a family other than 0. Then, we have to iterate through them while decoding. Here's an example of when a composite primary key value can be in a family ID > 0.
(Also, I think the reverse scan retrieves the last column family first, so the current state might be broken even if we disallow the nonzero column family case).

As an extra snarl, it's possible for a family KV to be omitted if all columns making it up are NULL. So, we have to check while decoding whether we've reached the next row, and stop if we have. The row fetcher check is here for reference:

unchangedPrefix := (!rf.args.SpansCanOverlap || rf.spanID == spanID) &&
rf.table.spec.MaxKeysPerRow > 1 && rf.indexKey != nil && bytes.HasPrefix(rf.kv.Key, rf.indexKey)
if unchangedPrefix {
// Skip decoding!
rf.keyRemainingBytes = rf.kv.Key[len(rf.indexKey):]
return false, spanID, nil
}
// The current key belongs to a new row.
if rf.mustDecodeIndexKey {

BTW, because a primary index includes NOT NULL constraints, we can rely on the KVs being present for any column family that contains a composite value for a primary index column.

thanks for explanation. i've taken all this into account now


pkg/sql/ttl/ttljob/ttljob_processor_test.go line 328 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

That limitation doesn't look fundamental. Would something like this work?

diff --git a/pkg/ccl/streamingccl/replicationtestutils/encoding.go b/pkg/ccl/streamingccl/replicationtestutils/encoding.go
index c4813593c3f..7d602ffaac7 100644
--- a/pkg/ccl/streamingccl/replicationtestutils/encoding.go
+++ b/pkg/ccl/streamingccl/replicationtestutils/encoding.go
@@ -26,6 +26,28 @@ func EncodeKV(
        t *testing.T, codec keys.SQLCodec, descr catalog.TableDescriptor, pkeyVals ...interface{},
 ) roachpb.KeyValue {
        require.Equal(t, 1, descr.NumFamilies(), "there can be only one")
+       indexEntries := encodeKVImpl(t, codec, descr, pkeyVals)
+       require.Equal(t, 1, len(indexEntries))
+       return roachpb.KeyValue{Key: indexEntries[0].Key, Value: indexEntries[0].Value}
+}
+
+// EncodeKVs is similar to EncodeKV, but can be used for a table with multiple
+// column families, in which case up to one KV is returned per family.
+func EncodeKVs(
+       t *testing.T, codec keys.SQLCodec, descr catalog.TableDescriptor, pkeyVals ...interface{},
+) []roachpb.KeyValue {
+       indexEntries := encodeKVImpl(t, codec, descr, pkeyVals)
+       require.GreaterOrEqual(t, 1, len(indexEntries))
+       kvs := make([]roachpb.KeyValue, len(indexEntries))
+       for i := range indexEntries {
+               kvs[i] = roachpb.KeyValue{Key: indexEntries[i].Key, Value: indexEntries[i].Value}
+       }
+       return kvs
+}
+
+func encodeKVImpl(
+       t *testing.T, codec keys.SQLCodec, descr catalog.TableDescriptor, pkeyVals ...interface{},
+) []rowenc.IndexEntry {
        primary := descr.GetPrimaryIndex()
        require.LessOrEqual(t, primary.NumKeyColumns(), len(pkeyVals))

@@ -42,9 +64,10 @@ func EncodeKV(
        indexEntries, err := rowenc.EncodePrimaryIndex(codec, descr, primary,
                colMap, datums, includeEmpty)
        require.NoError(t, err)
-       require.Equal(t, 1, len(indexEntries))
-       indexEntries[0].Value.InitChecksum(indexEntries[0].Key)
-       return roachpb.KeyValue{Key: indexEntries[0].Key, Value: indexEntries[0].Value}
+       for i := range indexEntries {
+               indexEntries[i].Value.InitChecksum(indexEntries[i].Key)
+       }
+       return indexEntries
 }

thanks!

i've added these cases to the randomized test, and made the non-random test use families as well.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through the complexity! I think this is close.

Reviewed 6 of 6 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msbutler and @rafiss)


pkg/sql/rowenc/index_encoding.go line 450 at r2 (raw file):

// DecodeIndexKey, but eagerly decodes the []EncDatum to tree.Datums.
// Also, unlike DecodeIndexKey, this function is able to handle types
// with composite encoding.

[nit] Think it would be worth adding a comment to DecodeIndexKey mentioning that it doesn't work for composite encodings?


pkg/sql/rowenc/index_encoding.go line 510 at r2 (raw file):

				// columns, so the slice we're iterating through might contain KVs from
				// a different row at the end.
				break kvLoop

I don't think this works for decimals and floats, since they don't always have a composite values entry. Consider the case when the value in the target row is non-composite, but the value in the next row is composite (and the nullable family thing causes us to read extra KVs). We'd end up reading the value from the next row as if it's part of the current row.

Here's an example that I think would run into this problem:

CREATE TABLE xy (
  x FLOAT,
  y INT,
  FAMILY (y),
  FAMILY (x),
  PRIMARY KEY (x)
)

INSERT INTO xy VALUES (-10, NULL); -- TTL scan starts with this row.
INSERT INTO xy VALUES (-0, NULL);

For both rows, there will be no entry for the y column family, since it's NULL. Since there are two column families, we have to read the first two KVs, which will be the x column families for each row. On the first pass, the x value is non-composite, so the encoding will be DatumEncoding_ASCENDING_KEY rather than DatumEncoding_VALUE. This will cause the check above to fail, and so rather than breaking out of the loop on the new row, we'll overwrite the previous -10 with the -0 of the second row.

You can use keys.GetRowPrefixLength to determine the prefix of the key that can be compared to check for a new row. I'd suggest just determining up front how many of the KVs are from the desired row - then, the outer loop can just iterate up to that point without any special break needed in the loop body.

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @msbutler)


pkg/sql/rowenc/index_encoding.go line 450 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Think it would be worth adding a comment to DecodeIndexKey mentioning that it doesn't work for composite encodings?

done


pkg/sql/rowenc/index_encoding.go line 510 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I don't think this works for decimals and floats, since they don't always have a composite values entry. Consider the case when the value in the target row is non-composite, but the value in the next row is composite (and the nullable family thing causes us to read extra KVs). We'd end up reading the value from the next row as if it's part of the current row.

Here's an example that I think would run into this problem:

CREATE TABLE xy (
  x FLOAT,
  y INT,
  FAMILY (y),
  FAMILY (x),
  PRIMARY KEY (x)
)

INSERT INTO xy VALUES (-10, NULL); -- TTL scan starts with this row.
INSERT INTO xy VALUES (-0, NULL);

For both rows, there will be no entry for the y column family, since it's NULL. Since there are two column families, we have to read the first two KVs, which will be the x column families for each row. On the first pass, the x value is non-composite, so the encoding will be DatumEncoding_ASCENDING_KEY rather than DatumEncoding_VALUE. This will cause the check above to fail, and so rather than breaking out of the loop on the new row, we'll overwrite the previous -10 with the -0 of the second row.

You can use keys.GetRowPrefixLength to determine the prefix of the key that can be compared to check for a new row. I'd suggest just determining up front how many of the KVs are from the desired row - then, the outer loop can just iterate up to that point without any special break needed in the loop body.

that's a helpful example, thanks. i think the randomized test would hopefully hit that eventually -- i will check.

GetRowPrefixLength is exactly what i was looking for initially, but i couldn't find the right way to compare the key prefixes.

Certain types, like decimals and collated strings, store their contents
in the value of the encoded KeyValue. The ttljob code that decodes the
span bounds into datums for the purpose of creating SQL query bounds
previously did not take this into account.

Release note (bug fix): Fixed a bug in the row-level TTL job that would
cause it to skip expired rows if the primary key of the table included
columns of the collated string or decimal type.
@rafiss rafiss force-pushed the fix-ttl-composite-key-decoding branch from da9d26a to 3a133ba Compare January 8, 2024 05:45
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Great work!

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @msbutler and @rafiss)

@rafiss
Copy link
Collaborator Author

rafiss commented Jan 8, 2024

tftr!

bors r=DrewKimball

@craig
Copy link
Contributor

craig bot commented Jan 8, 2024

Build succeeded:

@craig craig bot merged commit 5243cc3 into cockroachdb:master Jan 8, 2024
8 of 9 checks passed
Copy link

blathers-crl bot commented Jan 8, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 3a133ba to blathers/backport-release-23.1-116988: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


error creating merge commit from 3a133ba to blathers/backport-release-23.2-116988: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on figuring this out! I have a couple of fixups that I'll open in a separate PR, but the change LGTM.

Reviewed 1 of 5 files at r1, 3 of 6 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


-- commits line 11 at r3:
nit: my intuition says that this bug shouldn't affect decimals because there the compositeness is only present in the number of zeroes. In other words, the "key" for a decimal number should be sufficient to specify a boundary for the TTL job query, so we shouldn't miss any expired rows. Is my intuition incorrect?


pkg/sql/ttl/ttljob/ttljob_test.go line 581 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

We use types.Scalar in a bunch of places for tests - I wonder if we're missing out on useful testing coverage by not including collated strings.

I did add a collated string into randgen.SeedTypes in #96695.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


-- commits line 11 at r3:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: my intuition says that this bug shouldn't affect decimals because there the compositeness is only present in the number of zeroes. In other words, the "key" for a decimal number should be sufficient to specify a boundary for the TTL job query, so we shouldn't miss any expired rows. Is my intuition incorrect?

I think that's correct, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: row-level TTL behaves incorrectly with composite type primary key
4 participants