roachtest: import/tpcc/warehouses=1000/nodes=32 failed [replication failures and clock latency jumps] #122348

cockroach-teamcity · 2024-04-14T16:06:28Z

roachtest.import/tpcc/warehouses=1000/nodes=32 failed with artifacts on release-24.1 @ 6e3e9e6012dc04dabc64333a87e77a80d9f9d46b:

(monitor.go:154).Wait: monitor failure: pq: internal error while retrieving user account memberships: operation "get-user-session" timed out after 10.008s (given timeout 10s): internal error while retrieving user account: get default settings error: interrupted during singleflight load-value:defaultsettings-roachprod-100-1: context deadline exceeded
test artifacts and logs in: /artifacts/import/tpcc/warehouses=1000/nodes=32/run_1

Parameters:

ROACHTEST_arch=amd64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=4
ROACHTEST_encrypted=false
ROACHTEST_fs=ext4
ROACHTEST_localSSD=true
ROACHTEST_metamorphicBuild=false
ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

roachtest: import/tpcc/warehouses=1000/nodes=32 failed #122338 roachtest: import/tpcc/warehouses=1000/nodes=32 failed [C-test-failure O-roachtest O-robot T-sql-queries branch-master release-blocker]

/cc @cockroachdb/sql-queries _{This test on roachdash | Improve this report!

Jira issue: CRDB-37821}

The text was updated successfully, but these errors were encountered:

rytaft · 2024-04-17T16:07:18Z

Looks like SQL Foundations

rafiss · 2024-04-17T20:09:06Z

The error message is the workload health check failing at 15:59:07:

health: 15:59:07 health_checker.go:84: health check terminated on node 4 with pq: internal error while retrieving user account memberships: operation "get-user-session" timed out after 10.008s (given timeout 10s): internal error while retrieving user account: get default settings error: interrupted during singleflight load-value:defaultsettings-roachprod-100-1: context deadline exceeded

The first step of the healthcheck is to connect to the cluster. Part of authenticating is to run the get-user-session operation internally. This has a 10 second timeout. If it fails, it means that some part of the cluster is unhealthy, and prevented the get-user-session operation from loading the user data it needed.

The logs from n4 show some replication errors happening earlier:

I240414 15:58:20.656955 11824 kv/kvserver/replica_command.go:1860 ⋮ [T1,Vsystem,n4,replicate,s4,r78/3:‹/Table/110/1{-/29/3/…}›] 1139  could not successfully add and upreplicate LEARNER replica(s) on [n26,s26], rolling back: operation ‹"send-snapshot"› timed out after 38.43s (given timeout 1h0m0s): ‹range_id:78 coordinator_replica:<node_id:4 store_id:4 replica_id:3 type:VOTER_FULL > recipient_replica:<node_id:26 store_id:26 replica_id:7 type:LEARNER > delegated_sender:<node_id:4 store_id:4 replica_id:3 type:VOTER_FULL > term:6 first_index:112 sender_queue_name:REPLICATE_QUEUE descriptor_generation:24 queue_on_delegate_len:-1 snap_id:81c2d9ef-481e-4e4d-b08b-58056427f2b3 ›: remote failed to send snapshot: recv msg error: grpc: ‹context deadline exceeded› [code 4/DeadlineExceeded]
I240414 15:58:20.671192 11824 kv/kvserver/replica_command.go:2137 ⋮ [T1,Vsystem,n4,replicate,s4,r78/3:‹/Table/110/1{-/29/3/…}›] 1140  rolled back LEARNER n26,s26 in r78:‹/Table/110/1{-/29/3/-2767}› [(n12,s12):6, (n3,s3):2, (n4,s4):3, (n26,s26):7LEARNER, next=8, gen=25, sticky=1713113828.131410469,0]
E240414 15:58:20.671494 11824 kv/kvserver/queue.go:1202 ⋮ [T1,Vsystem,n4,replicate,s4,r78/3:‹/Table/110/1{-/29/3/…}›] 1141  operation ‹"replicate queue process replica 78"› timed out after 1m0.015s (given timeout 1m0s): failed to replicate after 5 retries
E240414 15:58:20.672481 296 kv/kvserver/queue.go:1202 ⋮ [T1,Vsystem,n4,replicate,s4,r287/6:‹/Table/108/1/81{2/10/…-3/3/3…}›] 1142  needs lease, not adding: repl=(n10,s10):4 seq=5 start=1713110295.001470725,0 epo=1 pro=1713110295.003861671,0
I240414 15:58:20.672498 296 kv/kvserver/queue.go:918 ⋮ [T1,Vsystem,n4,replicate,s4,r287/6:‹/Table/108/1/81{2/10/…-3/3/3…}›] 1143  ‹queue.replicate: processing replica [outer]›: skipping 6 since replica can't be processed needs lease, not adding: repl=(n10,s10):4 seq=5 start=1713110295.001470725,0 epo=1 pro=1713110295.003861671,0
E240414 15:58:20.673842 296 kv/kvserver/queue.go:1202 ⋮ [T1,Vsystem,n4,replicate,s4,r370/4:‹/Table/108/1/532/{5/10…-7/22…}›] 1144  needs lease, not adding: repl=(n29,s29):1 seq=5 start=1713110299.001662446,0 epo=1 pro=1713110299.064308260,0

There are also are some clock latency jumps:

W240414 15:59:03.903171 13356 kv/kvserver/spanlatch/manager.go:610 ⋮ [T1,Vsystem,n4,s4,r267/3:‹/Table/110/1/2{9/3/-…-15/9/…}›] 1429  have been waiting 15s to acquire write latch ‹/Local/Range/Table/110/1/29/3/-2767/Transaction/"00000000-0000-0000-0000-000000000000"›@0,0 for request RecomputeStats [/Table/110/1/‹29›/‹3›/‹-2767›], held by write latch ‹/Local/Range/Table/110/1/29/3/-2767/Transaction/"00000000-0000-0000-0000-000000000000"›@0,0 for request RecomputeStats [/Table/110/1/‹29›/‹3›/‹-2767›]
W240414 15:59:03.904504 13150 kv/kvserver/spanlatch/manager.go:610 ⋮ [T1,Vsystem,n4,s4,r267/3:‹/Table/110/1/2{9/3/-…-15/9/…}›] 1430  have been waiting 15s to acquire write latch ‹/Local/Range/Table/110/1/29/3/-2767/Transaction/"00000000-0000-0000-0000-000000000000"›@0,0 for request RecomputeStats [/Table/110/1/‹29›/‹3›/‹-2767›], held by write latch ‹/Local/Range/Table/110/1/29/3/-2767/Transaction/"00000000-0000-0000-0000-000000000000"›@0,0 for request RecomputeStats [/Table/110/1/‹29›/‹3›/‹-2767›]
W240414 15:59:03.914373 14505 kv/kvserver/spanlatch/manager.go:610 ⋮ [T1,Vsystem,n4,s4,r267/3:‹/Table/110/1/2{9/3/-…-15/9/…}›] 1431  have been waiting 15s to acquire write latch ‹/Local/Range/Table/110/1/29/3/-2767/Transaction/"00000000-0000-0000-0000-000000000000"›@0,0 for request RecomputeStats [/Table/110/1/‹29›/‹3›/‹-2767›], held by write latch ‹/Local/Range/Table/110/1/29/3/-2767/Transaction/"00000000-0000-0000-0000-000000000000"›@0,0 for request RecomputeStats [/Table/110/1/‹29›/‹3›/‹-2767›]
W240414 15:59:03.969015 13469 kv/kvserver/spanlatch/manager.go:610 ⋮ [T1,Vsystem,n4,s4,r267/3:‹/Table/110/1/2{9/3/-…-15/9/…}›] 1432  have been waiting 15s to acquire write latch ‹/Local/Range/Table/110/1/29/3/-2767/Transaction/"00000000-0000-0000-0000-000000000000"›@0,0 for request RecomputeStats [/Table/110/1/‹29›/‹3›/‹-2767›], held by write latch ‹/Local/Range/Table/110/1/29/3/-2767/Transaction/"00000000-0000-0000-0000-000000000000"›@0,0 for request RecomputeStats [/Table/110/1/‹29›/‹3›/‹-2767›]
W240414 15:59:04.058812 15381 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=6,raddr=‹10.142.0.253:29000›,class=default,rpc] 1433  latency jump (prev avg 172.50ms, current 338.57ms)
I240414 15:59:04.517880 43559 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r1810/5:‹/Table/114/1/97{3/10/…-4/2/-…}›] 1434  applied snapshot 84be03c3 from (n26,s26):3 at applied index 26 (total=26ms data=2.0 KiB excise=true ingestion=6@25ms)
W240414 15:59:04.650433 14588 kv/kvserver/spanlatch/manager.go:610 ⋮ [T1,Vsystem,n4,s4,r267/3:‹/Table/110/1/2{9/3/-…-15/9/…}›] 1435  have been waiting 15s to acquire write latch ‹/Local/Range/Table/110/1/29/3/-2767/Transaction/"00000000-0000-0000-0000-000000000000"›@0,0 for request RecomputeStats [/Table/110/1/‹29›/‹3›/‹-2767›], held by write latch ‹/Local/Range/Table/110/1/29/3/-2767/Transaction/"00000000-0000-0000-0000-000000000000"›@0,0 for request RecomputeStats [/Table/110/1/‹29›/‹3›/‹-2767›]
W240414 15:59:05.321603 16453 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=14,raddr=‹10.142.0.163:29000›,class=raft,rpc] 1436  latency jump (prev avg 183.06ms, current 277.63ms)
W240414 15:59:05.430425 8455 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=10,raddr=‹10.142.0.16:29000›,class=default,rpc] 1437  latency jump (prev avg 166.39ms, current 354.80ms)
W240414 15:59:05.433493 4436 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=17,raddr=‹10.142.0.245:29000›,class=default,rpc] 1438  latency jump (prev avg 178.82ms, current 299.18ms)
W240414 15:59:05.504544 13129 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=14,raddr=‹10.142.0.163:29000›,class=default,rpc] 1439  latency jump (prev avg 194.30ms, current 309.77ms)
W240414 15:59:05.646795 7599 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=18,raddr=‹10.142.0.210:29000›,class=default,rpc] 1440  latency jump (prev avg 187.55ms, current 297.18ms)
W240414 15:59:05.668718 2522 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=5,raddr=‹10.142.0.85:29000›,class=raft,rpc] 1441  latency jump (prev avg 208.74ms, current 357.35ms)
W240414 15:59:05.711929 12983 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=22,raddr=‹10.142.0.80:29000›,class=default,rpc] 1442  latency jump (prev avg 200.82ms, current 301.83ms)
W240414 15:59:05.734415 19491 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=30,raddr=‹10.142.0.213:29000›,class=raft,rpc] 1443  latency jump (prev avg 192.03ms, current 304.17ms)
W240414 15:59:05.756897 5154 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=19,raddr=‹10.142.0.72:29000›,class=rangefeed,rpc] 1444  latency jump (prev avg 174.77ms, current 325.34ms)
W240414 15:59:05.773111 5122 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=6,raddr=‹10.142.0.253:29000›,class=rangefeed,rpc] 1445  latency jump (prev avg 199.78ms, current 392.87ms)
I240414 15:59:05.774899 43728 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r3188/4:‹/Table/113/1/2{53/82…-81/25…}›] 1446  applied snapshot 341387ce from (n32,s32):2 at applied index 17 (total=142ms data=901 B excise=true ingestion=6@140ms)
W240414 15:59:05.803060 15220 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=13,raddr=‹10.142.0.199:29000›,class=default,rpc] 1447  latency jump (prev avg 175.47ms, current 302.25ms)
W240414 15:59:05.803116 11134 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=20,raddr=‹10.142.0.81:29000›,class=raft,rpc] 1448  latency jump (prev avg 176.72ms, current 302.51ms)
I240414 15:59:05.812669 243 kv/kvserver/store_raft.go:699 ⋮ [T1,Vsystem,n4,s4,r2877/6:‹/Table/113/1/721/{20941-33238}›,raft] 1449  raft ready handling: 0.50s [append=0.32s, apply=0.00s, non-blocking-sync=0.18s, other=0.00s], wrote [append-batch=222 B, append-sst=807 KiB (1), ]; node might be overloaded
W240414 15:59:05.927763 2719 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=8,raddr=‹10.142.0.192:29000›,class=system,rpc] 1450  latency jump (prev avg 197.27ms, current 407.13ms)
W240414 15:59:05.959638 7061 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=30,raddr=‹10.142.0.213:29000›,class=system,rpc] 1451  latency jump (prev avg 198.32ms, current 439.01ms)
W240414 15:59:05.968020 7681 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=4,raddr=‹10.142.0.147:29000›,class=system,rpc] 1452  latency jump (prev avg 196.96ms, current 327.21ms)
W240414 15:59:06.051436 9848 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=11,raddr=‹10.142.1.32:29000›,class=raft,rpc] 1453  latency jump (prev avg 187.92ms, current 363.86ms)
W240414 15:59:06.051519 11773 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=31,raddr=‹10.142.0.152:29000›,class=default,rpc] 1454  latency jump (prev avg 166.64ms, current 363.84ms)
W240414 15:59:06.051586 11909 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=26,raddr=‹10.142.1.3:29000›,class=rangefeed,rpc] 1455  latency jump (prev avg 269.59ms, current 425.52ms)
W240414 15:59:06.090414 5151 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=21,raddr=‹10.142.1.1:29000›,class=raft,rpc] 1456  latency jump (prev avg 180.60ms, current 317.54ms)
W240414 15:59:06.262189 16271 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=3,raddr=‹10.142.1.20:29000›,class=default,rpc] 1457  latency jump (prev avg 185.04ms, current 338.64ms)
W240414 15:59:06.323990 2860 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=9,raddr=‹10.142.0.175:29000›,class=system,rpc] 1458  latency jump (prev avg 165.17ms, current 271.30ms)
W240414 15:59:06.519444 7734 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=29,raddr=‹10.142.1.4:29000›,class=default,rpc] 1459  latency jump (prev avg 187.07ms, current 331.56ms)
W240414 15:59:06.667912 4851 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=20,raddr=‹10.142.0.81:29000›,class=system,rpc] 1460  latency jump (prev avg 190.67ms, current 305.02ms)
W240414 15:59:06.678668 327 server/node.go:1301 ⋮ [T1,Vsystem,n4,summaries] 1461  health alerts detected: {Alerts:[{StoreID:4 Category:‹METRICS› Description:‹requests.slow.latch› Value:5}]}
W240414 15:59:06.745774 281 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=1,raddr=‹10.142.0.15:29000›,class=system,rpc] 1462  latency jump (prev avg 189.40ms, current 347.14ms)
W240414 15:59:06.746673 5672 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=24,raddr=‹10.142.0.63:29000›,class=system,rpc] 1463  latency jump (prev avg 215.13ms, current 347.94ms)
W240414 15:59:06.836284 15381 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=6,raddr=‹10.142.0.253:29000›,class=default,rpc] 1464  latency jump (prev avg 219.42ms, current 396.75ms)
W240414 15:59:06.867859 3558 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=13,raddr=‹10.142.0.199:29000›,class=system,rpc] 1465  latency jump (prev avg 188.66ms, current 469.20ms)
W240414 15:59:06.867931 3820 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=15,raddr=‹10.142.0.221:29000›,class=system,rpc] 1466  latency jump (prev avg 204.09ms, current 469.24ms)
W240414 15:59:06.896167 4436 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=17,raddr=‹10.142.0.245:29000›,class=default,rpc] 1467  latency jump (prev avg 202.13ms, current 376.45ms)
W240414 15:59:06.896602 9647 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=12,raddr=‹10.142.0.204:29000›,class=default,rpc] 1468  latency jump (prev avg 182.73ms, current 386.62ms)
W240414 15:59:06.896644 17629 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=25,raddr=‹10.142.0.44:29000›,class=default,rpc] 1469  latency jump (prev avg 192.70ms, current 386.72ms)
W240414 15:59:06.991562 7599 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=18,raddr=‹10.142.0.210:29000›,class=default,rpc] 1470  latency jump (prev avg 203.18ms, current 341.56ms)
W240414 15:59:07.059973 18283 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=28,raddr=‹10.142.0.212:29000›,class=raft,rpc] 1471  latency jump (prev avg 201.74ms, current 314.57ms)
W240414 15:59:07.142753 15504 kv/kvserver/replica_command.go:684 ⋮ [T1,Vsystem,n4,s4,r267/3:‹/Table/110/1/2{9/3/-…-15/9/…}›,*kvpb.AdminSplitRequest] 1472  retrying split after err: split at key /Table/110/1/‹312›/‹6›/‹-3001› failed: descriptor changed: [expected] r267:‹/Table/110/{1/29/3/-2767-2}› [(n12,s12):1, (n3,s3):2, (n4,s4):3, (n26,s26):4LEARNER, next=5, gen=25, sticky=1713110851.315439149,0] != [actual] r267:‹/Table/110/1/2{9/3/-2767-15/9/-2665}› [(n12,s12):1, (n3,s3):2, (n4,s4):3, (n26,s26):4LEARNER, next=5, gen=26, sticky=1713110851.315439149,0]: unexpected value: ‹raw_bytes:"\305\231\321/\003\010\213\002\022\010\366n\211\245\213\206\3651\032\t\366n\211\366\327\221\206\365\227\"\010\010\014\020\014\030\001 \000\"\010\010\003\020\003\030\002 \000\"\010\010\004\020\004\030\003 \000\"\010\010\032\020\032\030\004 \001(\0050\032:\n\010\255\254\230\314\342\247\214\343\027" timestamp:<wall_time:1713110296825040740 > ›
W240414 15:59:07.298011 7681 2@rpc/clock_offset.go:291 ⋮ [T1,Vsystem,n4,rnode=4,raddr=‹10.142.0.147:29000›,class=system,rpc] 1473  latency jump (prev avg 210.32ms, current 315.91ms)
W240414 15:59:07.459183 42359 sql/user.go:280 ⋮ [T1,Vsystem,n4,client=35.230.168.164:37914,hostssl,user=‹roachprod›] 1474  user lookup for ‹"roachprod"› failed: get default settings error: interrupted during singleflight ‹load-value:defaultsettings-roachprod-100-1›: context deadline exceeded

This might mean the cluster is overloaded, or there's some other problem at a lower level. @rytaft I'm moving this back to SQL Queries so you can decide which direction to pursue.

rytaft · 2024-04-17T23:04:49Z

This shouldn't block a beta, but marking as GA-blocker for now since we'd like to dig a bit deeper

yuzefovich · 2024-04-18T00:42:49Z

Looking closer at n4, here are some interesting messages

I240414 15:57:39.385090 247 kv/kvserver/store_raft.go:699 ⋮ [T1,Vsystem,n4,s4,r287/6:‹/Table/108/1/81{2/10/…-3/3/3…}›,raft] 441  raft ready handling: 0.65s [append=0.00s, apply=0.22s, , other=0.00s], wrote [append-batch=51 B, apply=312 B (1)] pebble stats: [commit-wait 107ns sem 442ns]; node might be overloaded
...
I240414 15:58:48.719835 13712 kv/kvserver/replica_command.go:474 ⋮ [T1,Vsystem,n4,s4,r267/3:‹/Table/110/1/2{9/3/-…-15/9/…}›,*kvpb.AdminSplitRequest] 1257  initiating a split of this range at key /Table/110/1/‹375›/‹1›/‹-3001› [r1273] (manual); delayed by 45.1s to resolve: replica r267/4 not caught up: StateSnapshot match=0 next=12 learner paused (without success); r267/4 is waiting for a Raft snapshot

Then at 15:59:07 we get this error. I think KV team should look closer whether this behavior is expected

cockroach-teamcity · 2024-04-22T16:50:30Z

roachtest.import/tpcc/warehouses=1000/nodes=32 failed with artifacts on release-24.1 @ 88c9d88cb0093d35f2d630137fd178518aa48569:

(monitor.go:154).Wait: monitor failure: pq: internal error while retrieving user account memberships: operation "get-user-session" timed out after 10.019s (given timeout 10s): interrupted during singleflight load-value:/global/-1: context deadline exceeded
test artifacts and logs in: /artifacts/import/tpcc/warehouses=1000/nodes=32/run_1

Parameters:

ROACHTEST_arch=amd64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=4
ROACHTEST_encrypted=true
ROACHTEST_fs=ext4
ROACHTEST_localSSD=true
ROACHTEST_metamorphicBuild=false
ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

_{This test on roachdash | Improve this report!}

kvoli · 2024-04-23T00:48:12Z

This overload is likely due to #122309 which #122824 will address.

miraradeva · 2024-04-23T15:27:58Z

I managed to repro the issue on a GCE worker running off master, and it definitely looks like the same issue as #122309. I also ran the test with #122824 and confirmed it helps run a more stable cluster (e.g. no slow RPCs, no slow latch requests).

miraradeva · 2024-04-24T19:16:00Z

The backported fix just merged in; this should be fixed.

cockroach-teamcity added this to the 24.1 milestone Apr 14, 2024

rytaft added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-queries SQL Queries Team labels Apr 17, 2024

rytaft mentioned this issue Apr 17, 2024

roachtest: import/tpcc/warehouses=1000/nodes=32 failed #122338

Closed

rytaft changed the title ~~roachtest: import/tpcc/warehouses=1000/nodes=32 failed~~ roachtest: [operation "get-user-session" timed out] import/tpcc/warehouses=1000/nodes=32 failed Apr 17, 2024

rafiss changed the title ~~roachtest: [operation "get-user-session" timed out] import/tpcc/warehouses=1000/nodes=32 failed~~ roachtest: import/tpcc/warehouses=1000/nodes=32 failed [replication failures and clock latency jumps] Apr 17, 2024

rytaft added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Apr 17, 2024

rytaft added the P-1 Issues/test failures with a fix SLA of 1 month label Apr 18, 2024

blathers-crl bot added the T-kv KV Team label Apr 18, 2024

yuzefovich removed the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Apr 18, 2024

cockroach-teamcity mentioned this issue Apr 18, 2024

roachtest: import/tpcc/warehouses=1000/nodes=32 failed #122632

Closed

kvoli mentioned this issue Apr 23, 2024

kvserver: opt manual splits out of estimated MVCC stats #122824

Merged

nicktrav assigned miraradeva Apr 24, 2024

miraradeva closed this as completed Apr 24, 2024

miraradeva added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest: import/tpcc/warehouses=1000/nodes=32 failed [replication failures and clock latency jumps] #122348

roachtest: import/tpcc/warehouses=1000/nodes=32 failed [replication failures and clock latency jumps] #122348

cockroach-teamcity commented Apr 14, 2024 •

edited by cockroach-jira-scripts

Loading

rytaft commented Apr 17, 2024

rafiss commented Apr 17, 2024

rytaft commented Apr 17, 2024

yuzefovich commented Apr 18, 2024

cockroach-teamcity commented Apr 22, 2024

kvoli commented Apr 23, 2024

miraradeva commented Apr 23, 2024

miraradeva commented Apr 24, 2024

roachtest: import/tpcc/warehouses=1000/nodes=32 failed [replication failures and clock latency jumps] #122348

roachtest: import/tpcc/warehouses=1000/nodes=32 failed [replication failures and clock latency jumps] #122348

Comments

cockroach-teamcity commented Apr 14, 2024 • edited by cockroach-jira-scripts Loading

rytaft commented Apr 17, 2024

rafiss commented Apr 17, 2024

rytaft commented Apr 17, 2024

yuzefovich commented Apr 18, 2024

cockroach-teamcity commented Apr 22, 2024

kvoli commented Apr 23, 2024

miraradeva commented Apr 23, 2024

miraradeva commented Apr 24, 2024

cockroach-teamcity commented Apr 14, 2024 •

edited by cockroach-jira-scripts

Loading