Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: cdc/scan/catchup/nodes=5/cpu=16/rows=1G/ranges=100/protocol=mux/format=json/sink=null failed #121270

Open
cockroach-teamcity opened this issue Mar 28, 2024 · 3 comments
Assignees
Labels
A-cdc Change Data Capture branch-release-23.2 Used to mark GA and release blockers and technical advisories for 23.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-1 Issues/test failures with a fix SLA of 1 month T-cdc
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 28, 2024

roachtest.cdc/scan/catchup/nodes=5/cpu=16/rows=1G/ranges=100/protocol=mux/format=json/sink=null failed with artifacts on release-23.2 @ 39c1e341e4cabc9339d5b0932d9d4aef296914c7:

(assertions.go:333).Fail: 
	Error Trace:	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cdc_bench.go:338
	            				github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cdc_bench.go:106
	            				main/pkg/cmd/roachtest/test_runner.go:1097
	            				src/runtime/asm_amd64.s:1650
	Error:      	Received unexpected error:
	            	pq: failed to resolve targets in the CHANGEFEED stmt: table "kv.kv" does not exist
	Test:       	cdc/scan/catchup/nodes=5/cpu=16/rows=1G/ranges=100/protocol=mux/format=json/sink=null
(require.go:1360).NoError: FailNow called
test artifacts and logs in: /artifacts/cdc/scan/catchup/nodes=5/cpu=16/rows=1G/ranges=100/protocol=mux/format=json/sink=null/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=16
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/cdc

This test on roachdash | Improve this report!

Jira issue: CRDB-37167

@cockroach-teamcity cockroach-teamcity added branch-release-23.2 Used to mark GA and release blockers and technical advisories for 23.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-cdc labels Mar 28, 2024
@cockroach-teamcity cockroach-teamcity added this to the 23.2 milestone Mar 28, 2024
@blathers-crl blathers-crl bot added the A-cdc Change Data Capture label Mar 28, 2024
@jayshrivastava
Copy link
Contributor

The error is surprising since we initialized and ran the workload before starting the changefeed.

11:20:26 cluster_synced.go:2588:    1: done
11:20:26 cluster.go:686: test status: 
11:20:26 cdc_bench.go:274: configuring zones
11:20:27 util.go:103: waiting for initial up-replication... (<2m0s)
11:20:27 util.go:128: up-replication complete
11:20:27 cdc_bench.go:289: creating table with 100 ranges
11:20:27 cluster.go:2360: running cmd `./cockroach workload init k...` on nodes [:6]; details in run_112027.349642002_n6_cockroach-workload-i.log
11:20:28 util.go:103: waiting for initial up-replication... (<2m0s)
11:20:28 util.go:128: up-replication complete
11:20:28 cdc_bench.go:302: ingesting 1,000,000,000 rows using insert
11:20:28 cluster.go:2360: running cmd `./cockroach workload init k...` on nodes [:6]; details in run_112028.997533708_n6_cockroach-workload-i.log
11:41:06 cdc_bench.go:308: starting coordinator node
11:41:06 cluster.go:686: test status: starting nodes :6
11:41:09 cockroach.go:430: teamcity-14594903-1711603835-53-n6cpu16 (system): starting cockroach processes
11:41:10 cockroach.go:1128: teamcity-14594903-1711603835-53-n6cpu16 (system): setting cluster settings
teamcity-14594903-1711603835-53-n6cpu16: executing sql
11:41:11 cockroach.go:1122: log into DB console with user=roachprod password=cockroachdb
11:41:11 cdc_bench.go:321: running changefeed catchup scan

@rharding6373 rharding6373 removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Apr 1, 2024
@wenyihu6
Copy link
Contributor

wenyihu6 commented Apr 5, 2024

This is pretty surprising:

ALTER TABLE kv.kv was able to finish without erroring. I wonder if something was wrong with schema_locked = true and led the table to appear as being dropped.

_, err := conn.ExecContext(ctx, "ALTER TABLE kv.kv  SET (schema_locked = true);")
require.NoError(t, err)
node_id	application_name	flags	statement_id	key	anonymized	count	first_attempt_count	max_retries	last_error	last_error_code	rows_avg	rows_var	idle_lat_avg	idle_lat_var	parse_lat_avg	parse_lat_var	plan_lat_avg	plan_lat_var	run_lat_avg	run_lat_var	service_lat_avg	service_lat_var	overhead_lat_avg	overhead_lat_var	bytes_read_avg	bytes_read_var	rows_read_avg	rows_read_var	rows_written_avg	rows_written_var	network_bytes_avg	network_bytes_var	network_msgs_avg	network_msgs_var	max_mem_usage_avg	max_mem_usage_var	max_disk_usage_avg	max_disk_usage_var	contention_time_avg	contention_time_var	cpu_sql_nanos_avg	cpu_sql_nanos_var	mvcc_step_avg	mvcc_step_var	mvcc_step_internal_avg	mvcc_step_internal_var	mvcc_seek_avg	mvcc_seek_var	mvcc_seek_internal_avg	mvcc_seek_internal_var	mvcc_block_bytes_avg	mvcc_block_bytes_var	mvcc_block_bytes_in_cache_avg	mvcc_block_bytes_in_cache_var	mvcc_key_bytes_avg	mvcc_key_bytes_var	mvcc_value_bytes_avg	mvcc_value_bytes_var	mvcc_point_count_avg	mvcc_point_count_var	mvcc_points_covered_by_range_tombstones_avg	mvcc_points_covered_by_range_tombstones_var	mvcc_range_key_count_avg	mvcc_range_key_count_var	mvcc_range_key_contained_points_avg	mvcc_range_key_contained_points_var	mvcc_range_key_skipped_points_avg	mvcc_range_key_skipped_points_var	implicit_txn	full_scan	sample_plan	database_name	exec_node_ids	txn_fingerprint_id	index_recommendations	latency_seconds_min	latency_seconds_max	latency_seconds_p50	latency_seconds_p90	latency_seconds_p99
6			7291334691105395411	ALTER TABLE kv.kv SET (schema_locked = _)		1	1	0	NULL	NULL	0	NaN	0	NaN	8.4775e-05	NaN	0.006001413	NaN	0.000794136	NaN	0.00688322	NaN	2.8960000000004954e-06	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	10240	NaN	0	NaN	0	NaN	273168	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	t	f	"{""Children"": [], ""Name"": """"}"	defaultdb	{6}	14579202069275632908	{}	0.00688322	0.00688322	0	0	0
6		!	6668642469535916242	CREATE CHANGEFEED FOR TABLE kv.kv INTO '_' WITH OPTIONS (format = '_', end_time = '_', cursor = '_')		1	1	0	"failed to resolve targets in the CHANGEFEED stmt: table ""kv.kv"" does not exist"	XXUUU	0	NaN	0	NaN	9.5702e-05	NaN	0.000138428	NaN	0.009439099	NaN	0.009677336	NaN	4.106999999999722e-06	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	10240	NaN	0	NaN	0	NaN	302778	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	0	NaN	t	f	"{""Children"": [], ""Name"": """"}"	defaultdb	{6}	17575433554488465165	{}	0.009677336	0.009677336	0	0	0

@wenyihu6
Copy link
Contributor

wenyihu6 commented Apr 5, 2024

I asked in https://cockroachlabs.slack.com/archives/C04N0AS14CT/p1712352853313569. This looks like a bug.

@wenyihu6 wenyihu6 added the P-1 Issues/test failures with a fix SLA of 1 month label Apr 5, 2024
@wenyihu6 wenyihu6 removed their assignment Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture branch-release-23.2 Used to mark GA and release blockers and technical advisories for 23.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-1 Issues/test failures with a fix SLA of 1 month T-cdc
Projects
None yet
Development

No branches or pull requests

4 participants