Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: busted key encoding of tuples when spilling to disk in row containers #125367

Closed
cockroach-teamcity opened this issue Jun 8, 2024 · 3 comments · Fixed by #125471
Closed
Assignees
Labels
branch-release-24.1.1-rc Used to mark GA and release blockers and technical advisories for 24.1.1-rc C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. O-rsg Random Syntax Generator T-sql-queries SQL Queries Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jun 8, 2024

roachtest.unoptimized-query-oracle/disable-rules=all/seed-multi-region failed with artifacts on release-24.1.1-rc @ b7a5b158354408939cb3d680aca4305c91b415af:

	tab_13.crdb_internal_mvcc_timestamp ASC NULLS LAST,
	tab_12._uuid ASC NULLS FIRST,
	tab_14._timestamptz ASC NULLS FIRST,
	tab_13._timestamp DESC,
	tab_15._int2 ASC NULLS LAST,
	tab_14._uuid,
	col_41 DESC,
	tab_12._bytes ASC,
	tab_15._bool ASC NULLS LAST,
	col_44 ASC NULLS LAST,
	tab_12._interval ASC NULLS FIRST,
	tab_13._bytes NULLS LAST,
	tab_12._bool DESC NULLS LAST,
	tab_13.tableoid NULLS FIRST,
	tab_14.crdb_internal_mvcc_timestamp ASC NULLS LAST,
	tab_15._string NULLS LAST,
	tab_15._timestamptz DESC NULLS FIRST,
	tab_12._timestamp,
	tab_14._int8 ASC NULLS FIRST,
	tab_15._decimal NULLS FIRST,
	tab_14._int4 DESC,
	tab_12.crdb_internal_mvcc_timestamp ASC NULLS LAST,
	tab_13._uuid NULLS LAST,
	tab_13._float8 ASC NULLS FIRST,
	tab_12._int8,
	tab_12._decimal ASC NULLS FIRST,
	tab_15._int4 ASC,
	tab_14._bytes ASC,
	tab_15._interval NULLS FIRST,
	tab_13._int4 ASC NULLS FIRST,
	tab_13._interval NULLS LAST,
	tab_13._decimal NULLS FIRST,
	col_40 DESC,
	tab_15._timestamp NULLS LAST,
	tab_13._timestamptz,
	tab_14._float8 DESC,
	tab_15._uuid NULLS LAST,
	tab_14._decimal ASC NULLS FIRST,
	tab_15._float8 NULLS FIRST,
	tab_13._date DESC,
	tab_13._int8 ASC NULLS LAST,
	tab_14._interval ASC NULLS FIRST,
	tab_12._timestamptz NULLS LAST,
	tab_13._string DESC NULLS LAST,
	tab_12._date DESC NULLS LAST,
	tab_15._date ASC NULLS FIRST,
	tab_14._date NULLS LAST
LIMIT
	19:::INT8
test artifacts and logs in: /artifacts/unoptimized-query-oracle/disable-rules=all/seed-multi-region/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

Jira issue: CRDB-39414

@cockroach-teamcity cockroach-teamcity added branch-release-24.1.1-rc Used to mark GA and release blockers and technical advisories for 24.1.1-rc C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. O-rsg Random Syntax Generator release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-queries SQL Queries Team labels Jun 8, 2024
@cockroach-teamcity cockroach-teamcity added this to the 24.1 milestone Jun 8, 2024
@yuzefovich
Copy link
Member

It appears that the problem is with the sort in the row-by-row engine when it spills to disk, which is not a blocker and most definitely not a regression.

@yuzefovich yuzefovich removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Jun 10, 2024
@michae2
Copy link
Collaborator

michae2 commented Jun 10, 2024

Looking at unoptimized-query-oracle000.failure.log it seems like the difference has to do with the first interval column?

I've reduced it down to this so far, and though the absolute values are different it appears to reproduce the difference in the value of that first interval column.

repro.txt

@michae2 michae2 changed the title roachtest: unoptimized-query-oracle/disable-rules=all/seed-multi-region failed unoptimized-query-oracle failed: difference in interval result value Jun 10, 2024
@yuzefovich yuzefovich self-assigned this Jun 11, 2024
@yuzefovich
Copy link
Member

This one is rather annoying and edge case'y. Here is a reduced reproduction:

CREATE TABLE t AS SELECT g AS _int8, g * '1 day'::INTERVAL AS _interval, g::DECIMAL AS _decimal FROM generate_series(1, 5) AS g;
UPDATE t SET _interval = '7 years 1 mon 887 days 18:22:39.99567';

SET testing_optimizer_random_seed = 1481092000980190599;
SET testing_optimizer_disable_rule_probability = 1.000000;
SET vectorize = off;
SET distsql_workmem = '2B';

SELECT
	tab_15._interval AS col_40,
	(NULL:::STRING, NULL:::JSONB, NULL:::TIME) AS col_42
FROM
		t AS tab_14 JOIN t AS tab_15 ON (tab_14._int8) = (tab_15._int8)
ORDER BY
	col_42,
	tab_14._interval,
	tab_14._decimal,
	col_40 DESC
LIMIT
	1000:::INT8;

RESET testing_optimizer_random_seed;
RESET testing_optimizer_disable_rule_probability;

SELECT
	tab_15._interval AS col_40,
	(NULL:::STRING, NULL:::JSONB, NULL:::TIME) AS col_42
FROM
		t AS tab_14 JOIN t AS tab_15 ON (tab_14._int8) = (tab_15._int8)
ORDER BY
	col_42,
	tab_14._interval,
	tab_14._decimal,
	col_40 DESC
LIMIT
	1000:::INT8;

The problem is that we use faulty key-encoding for tuples (for which we don't actually have the corresponding key decoding function), yet we don't hit an error since some leftover bytes are used by the following column. In this case it happens to be interval, so we have "encoding corruption". We do have #49975 to track this. I'll send a patch to error out when we need to spill tuples to disk in key encoding context and check whether we can / should remove the faulty key encoding altogether.

@craig craig bot closed this as completed in f5e65c5 Jun 12, 2024
@yuzefovich yuzefovich changed the title unoptimized-query-oracle failed: difference in interval result value roachtest: busted key encoding of tuples when spilling to disk in row containers Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-24.1.1-rc Used to mark GA and release blockers and technical advisories for 24.1.1-rc C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. O-rsg Random Syntax Generator T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants