Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest.unoptimized-query-oracle/disable-rules=half failed due to extra precision decimals #86790

Closed
rytaft opened this issue Aug 24, 2022 · 14 comments · Fixed by #88485
Closed
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). sync-me T-sql-queries SQL Queries Team

Comments

@rytaft
Copy link
Collaborator

rytaft commented Aug 24, 2022

roachtest.unoptimized-query-oracle/disable-rules=half failed with artifacts on master @ 80c274877a917580af62be6eb0cd48c8c7ae9c08:

test artifacts and logs in: /artifacts/unoptimized-query-oracle/disable-rules=half/run_1
	query_comparison_util.go:237,query_comparison_util.go:69,unoptimized_query_oracle.go:51,test_runner.go:896: . 17242 statements run: expected unoptimized and optimized results to be equal
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*queryComparisonHelper).makeError
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/query_comparison_util.go:345
		  | [...repeated from below...]
		Wraps: (2) . 17242 statements run
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runUnoptimizedQueryOracleImpl
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/unoptimized_query_oracle.go:160
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerUnoptimizedQueryOracle.func1.1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/unoptimized_query_oracle.go:54
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runOneRoundQueryComparison
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/query_comparison_util.go:236
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runQueryComparison
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/query_comparison_util.go:69
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerUnoptimizedQueryOracle.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/unoptimized_query_oracle.go:51
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:896
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1571
		Wraps: (4) expected unoptimized and optimized results to be equal
		  |   []string{
		  |   	strings.Join({
		  |   		"-9.8,0,0,104,0,1661263847326383409.0",
		  | + 		"000000000",
		  |   	}, ""),
		  |   	strings.Join({
		  |   		"-9.8,104,0,104,0,1661263847077927364.0",
		  | + 		"000000000",
		  |   	}, ""),
		  |   }
		  | sql: SELECT
		  | 	(-9.8):::DECIMAL AS col_37722,
		  | 	tab_21236.col1_0 AS col_37723,
		  | 	0:::OID AS col_37724,
		  | 	tab_21236.tableoid AS col_37725,
		  | 	0:::OID AS col_37726,
		  | 	(tab_21236.crdb_internal_mvcc_timestamp::DECIMAL / 1:::DECIMAL::DECIMAL)::DECIMAL AS col_37727
		  | FROM
		  | 	defaultdb.public.table1@[0] AS tab_21236
		  | 	JOIN defaultdb.public.table1@[0] AS tab_21237 ON
		  | 			(tab_21236.crdb_internal_mvcc_timestamp) = (tab_21237.crdb_internal_mvcc_timestamp)
		  | ORDER BY
		  | 	tab_21236.tableoid ASC
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

Originally posted by @cockroach-teamcity in #86308 (comment)

Jira issue: CRDB-18936

@blathers-crl
Copy link

blathers-crl bot commented Aug 24, 2022

Hi @rytaft, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@rytaft
Copy link
Collaborator Author

rytaft commented Aug 24, 2022

This failure looks like the same issue: #86308 (comment)

@rytaft
Copy link
Collaborator Author

rytaft commented Aug 24, 2022

Hey @cucaroach can you please look into this failure? Seems to me like we have one version returning a number with much higher precision than the other. Doesn't look like a correctness issue, but we should figure out what's going on and fix it to prevent the test from continuing to fail for this reason.

@rytaft rytaft added the C-test-failure Broken test (automatically or manually discovered). label Aug 24, 2022
@DrewKimball
Copy link
Collaborator

#85318 also looks the same as this one

@DrewKimball
Copy link
Collaborator

I found a reproduction here

@rytaft
Copy link
Collaborator Author

rytaft commented Aug 24, 2022

Great, closing this as a duplicate.

@rytaft rytaft closed this as completed Aug 24, 2022
@DrewKimball
Copy link
Collaborator

Reopening this one to specifically track the decimal precision issue.

@cucaroach
Copy link
Contributor

Interestingly it repros with vectorize = off too.

demo@127.0.0.1:26257/defaultdb> CREATE TABLE t (x INT8 NOT NULL);
CREATE TABLE


Time: 13ms total (execution 13ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> INSERT INTO t VALUES (-4429772553778622992);
INSERT 0 1


Time: 11ms total (execution 11ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> EXPLAIN  SELECT (x / 1)::DECIMAL FROM t;
           info
--------------------------
  distribution: local
  vectorized: true

  • render
  │
  └── • scan
        missing stats
        table: t@t_pkey
        spans: FULL SCAN
(9 rows)


Time: 1ms total (execution 1ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> EXPLAIN (VERBOSE) SELECT (x / 1)::DECIMAL FROM t;
                        info
----------------------------------------------------
  distribution: local
  vectorized: true

  • render
  │ columns: ("numeric")
  │ render numeric: x::DECIMAL
  │
  └── • scan
        columns: (x)
        estimated row count: 1,000 (missing stats)
        table: t@t_pkey
        spans: FULL SCAN
(12 rows)


Time: 1ms total (execution 1ms / network 1ms)

demo@127.0.0.1:26257/defaultdb> SELECT (x / 1)::DECIMAL FROM t;
        numeric
------------------------
  -4429772553778622992
(1 row)


Time: 1ms total (execution 1ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> set vectorize = off;
SET


Time: 1ms total (execution 0ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> SELECT (x / 1)::DECIMAL FROM t;
        numeric
------------------------
  -4429772553778622992
(1 row)


Time: 1ms total (execution 1ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> SET testing_optimizer_disable_rule_probability = 1;
SET


Time: 1ms total (execution 0ms / network 0ms)

warning: error retrieving the database name: ERROR: use of crdb_internal_vtable_pk column not allowed. (SQLSTATE XXUUU)
demo@127.0.0.1:26257/?> SELECT (x / 1)::DECIMAL FROM t;
         numeric
--------------------------
  -4429772553778622992.0
(1 row)


Time: 2ms total (execution 1ms / network 0ms)

warning: error retrieving the database name: ERROR: use of crdb_internal_vtable_pk column not allowed. (SQLSTATE XXUUU)
demo@127.0.0.1:26257/?> EXPLAIN (VERBOSE) SELECT (x / 1)::DECIMAL FROM t;
                                        info
------------------------------------------------------------------------------------
  distribution: local
  vectorized: false

  • render
  │ columns: ("numeric")
  │ render numeric: x / 1
  │
  └── • scan
        columns: (x, rowid, crdb_internal_mvcc_timestamp, tableoid)
        estimated row count: 1 (100% of the table; stats collected 17 seconds ago)
        table: t@t_pkey
        spans: FULL SCAN
(12 rows)


Time: 2ms total (execution 1ms / network 1ms)

warning: error retrieving the database name: ERROR: use of crdb_internal_vtable_pk column not allowed. (SQLSTATE XXUUU)
demo@127.0.0.1:26257/?> 

@cucaroach
Copy link
Contributor

I think this issue is super fundamental:

❯ cr demo --empty -e "SELECT 1 / 1"
        ?column?
-------------------------
  1.0000000000000000000
(1 row)


Time: 1ms

vs.

❯ sudo -u postgres psql -c "SELECT 1/1"
 ?column? 
----------
        1
(1 row)

@cucaroach
Copy link
Contributor

Related: #41448

@cucaroach
Copy link
Contributor

So my current take is we have two bugs:

  1. sql: make / operator perform integer division #41448 which is out of scope for 22.2
  2. the optimizer doesn't follow the same rules as the eval engine

I think we could consider #2 for 23.1 (ie if we're gonna constant fold integer division the result should be a decimal so we're internally consistent but that would come at the cost of more visible disagreement with PG and potentially turn the heat up on #41448). Probably we should let sleeping dogs lie at wait for #41448.

Anyways all this is to say that I'm only gonna look for an expedient roachtest specific fix for this and ignore the larger issues.

@DrewKimball
Copy link
Collaborator

Here's a similar problem from #87495 that happens when using // on decimals instead of / on ints:

statement ok
CREATE TABLE t (x DECIMAL NOT NULL);
INSERT INTO t VALUES (1.0);

query T
EXPLAIN (VEC) SELECT (x::DECIMAL // 1::DECIMAL)::DECIMAL FROM t;
----
│
└ Node 1
  └ *colfetcher.ColBatchScan

query R
SELECT (x::DECIMAL // 1::DECIMAL)::DECIMAL FROM t;
----
1.0

statement ok
SET testing_optimizer_disable_rule_probability = 1;

query T
EXPLAIN (VEC) SELECT (x::DECIMAL // 1::DECIMAL)::DECIMAL FROM t;
----
│
└ Node 1
  └ *colexecprojconst.projFloorDivDecimalDecimalConstOp
    └ *colfetcher.ColBatchScan

query R
SELECT (x::DECIMAL // 1::DECIMAL)::DECIMAL FROM t;
----
1

@mgartner
Copy link
Collaborator

Another reproduction here: #87605 (comment)

mgartner added a commit to mgartner/cockroach that referenced this issue Sep 12, 2022
…ests

The division (`/`) and floor division (`//`) operators were making
costfuzz and unoptimized-query-oracle tests flaky. This commit disables
generation of these operators as a temporary mitigation for these
flakes.

Informs cockroachdb#86790

Release note: None
craig bot pushed a commit that referenced this issue Sep 12, 2022
87645: ui: fix txn insight query bug, align summary card, remove contended keys in details page r=ericharmeling a=ericharmeling

This commit fixes a small bug on the transaction insight details page
that was incorrectly mapping the waiting transaction statement
fingerprints to the blocking transaction statements. The commit also
aligns the summary cards in the details page. The commit also removes
the contended key from the details page while we look for a more user-
friendly format to display row contention.

Before:

![image](https://user-images.githubusercontent.com/27286675/189216476-8211d598-5d4e-4255-846f-82c785764016.png)


After:

![image](https://user-images.githubusercontent.com/27286675/189216006-f01edeb6-ab2f-42ac-9978-6fce85b9a79a.png)

Fixes #87838.

Release note: None
Release justification: bug fix

87715: roachtest: add 4s of sleep after restart when upgrading nodes r=yuzefovich a=yuzefovich

We have seen cases where a transient error could occur when a newly-upgraded node serves as a gateway for a distributed query due to remote nodes not being able to dial back to the gateway for some reason (investigation of it is tracked in #87634). For now, we're papering over these flakes by 4 second sleep.

Addresses: #87104.

Release note: None

87840: roachtest: do not generate division ops in costfuzz and unoptimized tests r=mgartner a=mgartner

The division (`/`) and floor division (`//`) operators were making costfuzz and unoptimized-query-oracle tests flaky. This commit disables generation of these operators as a temporary mitigation for these flakes.

Informs #86790

Release note: None

87854: kvcoord: reliably handle stuck watcher error r=erikgrinaker a=tbg

Front-ports parts of #87253.

When a rangefeed gets stuck, and the server is local, the server might
notice the cancellation before the client, and may send a cancellation
error back in a rangefeed event.

We now handle this the same as the other case (where the stream client
errors out due to the cancellation).

This also checks in the test from
#87253 (which is on
release-22.1).

Fixes #87370.

No release note since this will be backported to release-22.2
Release note: None


Co-authored-by: Eric Harmeling <eric.harmeling@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
blathers-crl bot pushed a commit that referenced this issue Sep 12, 2022
…ests

The division (`/`) and floor division (`//`) operators were making
costfuzz and unoptimized-query-oracle tests flaky. This commit disables
generation of these operators as a temporary mitigation for these
flakes.

Informs #86790

Release note: None
craig bot pushed a commit that referenced this issue Sep 23, 2022
87652: kv: Handle invalid lease transfer checks r=andrewbaptist a=andrewbaptist

Previously it would be a programming error when checking if the leaseholder move was valid if the leaseholder wasn't one of the replicas. This check was overly strict.

Release note: None

88379: pgwire: add encoding for JSON arguments r=ZhouXing19 a=rafiss

fixes #88355

This adds pgwire decoding support for JSON (oid=114). This does not add
oid=114 to the OidToType map in types.go since that introduces many
other assumptions about the type; namely that the type can be used as
the type of a column descriptor. That change would mean that existing
tables that were defined with JSON would be inconsistent with new tables
defined with a JSON column.

Release note (sql change): The pgwire protocol implementation can
now accept arguments of the JSON type (oid=114). Previously, it could
only accept JSONB (oid=3802). Internally, JSON and JSONB values are
still identical, so this change only affects how the values are received
over the wire protocol.

88485: xform: make FoldDivOne an essential normalization r=msirek a=msirek

Fixes #86790

This fixes sqlsmith errors caused by differences in the number of
displayed digits right of the decimal place for `expression / 1`
by making the folding of this expression essential.

Release note: None

88528: scripts/bench: update benchstat install instructions r=tbg a=michae2

As far as I can tell, we're not relying on the JSON output in github.com/cockroachdb/benchstat so change the instructions here to install the original benchstat instead of the fork.

Release note: None

Co-authored-by: Andrew Baptist <baptist@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>
Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
@craig craig bot closed this as completed in 956948e Sep 23, 2022
msirek pushed a commit to msirek/cockroach that referenced this issue Sep 26, 2022
Fixes cockroachdb#86790

This fixes sqlsmith errors caused by differences in the number of
displayed digits right of the decimal place for `expression / 1`
by making the folding of this expression essential.

Release note: None
@rytaft rytaft added the sync-me label Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). sync-me T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants