New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrade: remove buggy TTL repair #110364
upgrade: remove buggy TTL repair #110364
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm!, but please include this test: https://cockroachlabs.slack.com/archives/C05RE28U495/p1694457604827269
Ah missed that, added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
-- commits
line 12 at r1:
Please extend the release note:
- what is the symptom of the bug a customer would see.
- what version the bug was introduced in.
- what, if anything, does the customer need to do if they encounter the bug other than upgrade to the version containing the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @knz)
Previously, knz (Raphael 'kena' Poss) wrote…
Please extend the release note:
- what is the symptom of the bug a customer would see.
- what version the bug was introduced in.
- what, if anything, does the customer need to do if they encounter the bug other than upgrade to the version containing the fix.
the bug was introduced in v23.1.9
pkg/sql/logictest/testdata/logic_test/upgrade
line 0 at r1 (raw file):
nit: let's rename the file to something more descriptive. perhaps mixed_version_upgrade_repair_descriptors
or mixed_version_row_level_ttl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
Previously, rafiss (Rafi Shamim) wrote…
the bug was introduced in v23.1.9
I am personally aware, yes. I am just pointing out that this information should be included in the release note too.
as per https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/186548364/Release+notes#Bug-fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
Previously, knz (Raphael 'kena' Poss) wrote…
I am personally aware, yes. I am just pointing out that this information should be included in the release note too.
as per https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/186548364/Release+notes#Bug-fix
Probably worth mentioning in the RN as well that "clusters using TTL should not be upgraded to 23.1.9 at all and should be upgraded to 23.1.10 or later directly".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @knz)
Previously, knz (Raphael 'kena' Poss) wrote…
Probably worth mentioning in the RN as well that "clusters using TTL should not be upgraded to 23.1.9 at all and should be upgraded to 23.1.10 or later directly".
i thought you were probably aware, but added the comment to help with amending the note :)
pkg/sql/catalog/tabledesc/table_desc_builder.go
line 246 at r2 (raw file):
tdb.changes.Add(catalog.StrippedDanglingBackReferences) } }
one more thing i forgot from the first review; let's add a short-circuit here:
func FirstUpgradeFromRelease( |
for 23.1, let's make this repair logic only happen if a certain environment variable is set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)
Previously, rafiss (Rafi Shamim) wrote…
i thought you were probably aware, but added the comment to help with amending the note :)
Done.
pkg/sql/logictest/testdata/logic_test/upgrade
line at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: let's rename the file to something more descriptive. perhaps
mixed_version_upgrade_repair_descriptors
ormixed_version_row_level_ttl
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)
pkg/sql/catalog/tabledesc/table_desc_builder.go
line 246 at r2 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
one more thing i forgot from the first review; let's add a short-circuit here:
func FirstUpgradeFromRelease( for 23.1, let's make this repair logic only happen if a certain environment variable is set
Should this be a separate PR that is made against the release-23.1 branch and backported to 23.1.10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)
pkg/sql/catalog/tabledesc/table_desc_builder.go
line 246 at r2 (raw file):
Previously, ecwall (Evan Wall) wrote…
Should this be a separate PR that is made against the release-23.1 branch and backported to 23.1.10?
Also it looks like this is only used for 23.2
firstUpgradeTowardsV23_2 = upgrade.NewTenantUpgrade(
"prepare upgrade to v23.2 release",
toCV(clusterversion.V23_2Start),
FirstUpgradeFromReleasePrecondition,
FirstUpgradeFromRelease,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @knz)
pkg/sql/catalog/tabledesc/table_desc_builder.go
line 246 at r2 (raw file):
Previously, ecwall (Evan Wall) wrote…
Also it looks like this is only used for 23.2
firstUpgradeTowardsV23_2 = upgrade.NewTenantUpgrade( "prepare upgrade to v23.2 release", toCV(clusterversion.V23_2Start), FirstUpgradeFromReleasePrecondition, FirstUpgradeFromRelease, )
this upgrade exists in 23.1 also.
cockroach/pkg/upgrade/upgrades/upgrades.go
Lines 330 to 335 in 83bdbcc
firstUpgradeTowardsV23_1 = upgrade.NewTenantUpgrade( | |
"prepare upgrade to v23.1 release", | |
toCV(clusterversion.V23_1Start), | |
FirstUpgradeFromReleasePrecondition, | |
NoTenantUpgradeFunc, | |
) |
anyway, sure, let's only add the short-circuit logic in the backport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like TestStripDanglingBackReferences needs to be updated
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @knz)
pkg/sql/logictest/testdata/logic_test/mixed_version_upgrade_repair_descriptors
line 9 at r3 (raw file):
query T SELECT crdb_internal.node_executable_version()
you can remove this assertion, or if you keep it, the expected value on this branch is 23.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)
pkg/sql/logictest/testdata/logic_test/mixed_version_upgrade_repair_descriptors
line 9 at r3 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
you can remove this assertion, or if you keep it, the expected value on this branch is
23.1
I'll remove that.
Is this needed below?
query B retry
SELECT version LIKE '22.2-%' FROM [SHOW CLUSTER SETTING version]
----
true
Shouldn't the version be 23.1 after the upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @knz)
pkg/sql/logictest/testdata/logic_test/mixed_version_upgrade_repair_descriptors
line 9 at r3 (raw file):
Previously, ecwall (Evan Wall) wrote…
I'll remove that.
Is this needed below?
query B retry SELECT version LIKE '22.2-%' FROM [SHOW CLUSTER SETTING version] ---- true
Shouldn't the version be 23.1 after the upgrade?
ah, i had this check from when i was testing on the 23.1 branch.
when testing on master, the check should be:
query B retry
SELECT version LIKE '23.1-%' FROM [SHOW CLUSTER SETTING version]
----
true
Note that this is looking at the logical cluster version, not the binary version.
On master, the previous logical version starts at 23.1, and the check for 23.1-%
makes sure that the first upgrade migration (the one that runs all repairs) has finished running. (after all upgrade migrations run, SHOW CLUSTER SETTING version
would show 23.2)
On the 23.1 branch, this check should be looking for 22.2-%
instead
`upgrade all` upgrades all nodes in the cluster. It has the same behavior as using `upgrade <idx>` for each index in the cluster. Release note: None
Fixes #110363 The TTL descriptor repair in FirstUpgradeFromReleasePrecondition incorrectly removes TTL fields from table descriptors after incorrectly comparing the table descriptor's TTL job schedule ID to a set of job IDs. This change removes the repair until tests are properly added. Release note (bug fix): Remove buggy TTL descriptor repair. Previously, upgrading from 22.2.X to 23.1.9 incorrectly removed TTL storage params from tables (visible via `SHOW CREATE TABLE <ttl-table>;`) while attempting to repair table descriptors. This resulted in the node that attempts to run the TTL job crashing due to a panic caused by the missing TTL storage params. Clusters currently on 22.2.X should NOT be upgraded to 23.1.9 and should be upgraded to 23.1.10 or later directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @knz)
bors r=rafiss |
Build failed (retrying...): |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 48b0df4 to blathers/backport-release-23.1-110364: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. error creating merge commit from 48b0df4 to blathers/backport-release-23.1.10-rc-110364: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.10-rc failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Fixes #110363
The TTL descriptor repair in FirstUpgradeFromReleasePrecondition incorrectly
removes TTL fields from table descriptors after incorrectly comparing the
table descriptor's TTL job schedule ID to a set of job IDs.
This change removes the repair until tests are properly added.
Release note (bug fix): Remove buggy TTL descriptor repair. Previously,
upgrading from 22.2.X to 23.1.9 incorrectly removed TTL storage params from
tables (visible via
SHOW CREATE TABLE <ttl-table>;
) while attempting torepair table descriptors. This resulted in the node that attempts to run the
TTL job crashing due to a panic caused by the missing TTL storage params.
Clusters currently on 22.2.X should NOT be upgraded to 23.1.9 and should
be upgraded to 23.1.10 or later directly.