Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/changefeedccl: TestChangefeedSchemaChange failed under stress #29569

Closed
cockroach-teamcity opened this issue Sep 5, 2018 · 16 comments
Closed
Assignees
Labels
A-sql-executor SQL txn logic C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/5692414c3217d909da84d483864fb204dc781952

Parameters:

TAGS=
GOFLAGS=-race

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestChangefeedSchemaChange PKG=github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=883234&tab=buildLog

=== RUN   TestChangefeedSchemaChange
W180905 04:34:43.825075 3293 server/status/runtime.go:310  [n?] Could not parse build timestamp: parsing time "" as "2006/01/02 15:04:05": cannot parse "" as "2006"
I180905 04:34:43.884926 3293 server/server.go:849  [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
I180905 04:34:43.885531 3293 base/addr_validation.go:279  [n?] server certificate addresses: IP=127.0.0.1,::1; DNS=localhost,*.local; CN=node
I180905 04:34:43.885637 3293 base/addr_validation.go:319  [n?] web UI certificate addresses: IP=127.0.0.1,::1; DNS=localhost,*.local; CN=node
I180905 04:34:43.911945 3293 server/config.go:493  [n?] 1 storage engine initialized
I180905 04:34:43.912074 3293 server/config.go:496  [n?] RocksDB cache size: 128 MiB
I180905 04:34:43.912151 3293 server/config.go:496  [n?] store 0: in-memory, size 0 B
I180905 04:34:44.063763 3293 server/node.go:373  [n?] **** cluster d57a7672-012f-4b3d-ab65-0fc550bf94d7 has been created
I180905 04:34:44.063871 3293 server/server.go:1411  [n?] **** add additional nodes by specifying --join=127.0.0.1:41465
I180905 04:34:44.064879 3293 gossip/gossip.go:407  [n1] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"127.0.0.1:41465" > attrs:<> locality:<> ServerVersion:<major_val:2 minor_val:0 patch:0 unstable:13 > build_tag:"v2.2.0-alpha.00000000-491-g5692414" started_at:1536122084064556185 
I180905 04:34:44.066467 3280 gossip/gossip.go:941  [n1] gossip connectivity
 
I180905 04:34:44.098242 3293 storage/store.go:1551  [n1,s1] [n1,s1]: failed initial metrics computation: [n1,s1]: system config not yet available
I180905 04:34:44.099393 3293 server/node.go:476  [n1] initialized store [n1,s1]: disk (capacity=512 MiB, available=512 MiB, used=0 B, logicalBytes=6.9 KiB), ranges=1, leases=0, queries=0.00, writes=0.00, bytesPerReplica={p10=7103.00 p25=7103.00 p50=7103.00 p75=7103.00 p90=7103.00 pMax=7103.00}, writesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00}
I180905 04:34:44.099681 3293 storage/stores.go:242  [n1] read 0 node addresses from persistent storage
I180905 04:34:44.100034 3293 server/node.go:699  [n1] connecting to gossip network to verify cluster ID...
I180905 04:34:44.104485 3293 server/node.go:724  [n1] node connected via gossip and verified as part of cluster "d57a7672-012f-4b3d-ab65-0fc550bf94d7"
I180905 04:34:44.104873 3293 server/node.go:548  [n1] node=1: started with [<no-attributes>=<in-mem>] engine(s) and attributes []
I180905 04:34:44.121227 3358 storage/replica_command.go:298  [split,n1,s1,r1/1:/M{in-ax}] initiating a split of this range at key /System/"" [r2]
I180905 04:34:44.153693 3293 server/status/recorder.go:611  [n1] available memory from cgroups (8.0 EiB) exceeds system memory 16 GiB, using system memory
I180905 04:34:44.153796 3293 server/server.go:1802  [n1] Could not start heap profiler worker due to: directory to store profiles could not be determined
I180905 04:34:44.171708 3293 server/server.go:1548  [n1] starting https server at 127.0.0.1:33541 (use: 127.0.0.1:33541)
I180905 04:34:44.171818 3293 server/server.go:1550  [n1] starting grpc/postgres server at 127.0.0.1:41465
I180905 04:34:44.171872 3293 server/server.go:1551  [n1] advertising CockroachDB node at 127.0.0.1:41465
I180905 04:34:44.282136 3583 storage/replica_command.go:298  [split,n1,s1,r2/1:/{System/-Max}] initiating a split of this range at key /System/NodeLiveness [r3]
W180905 04:34:44.396854 3484 storage/intent_resolver.go:668  [n1,s1] failed to push during intent resolution: failed to push "unnamed" id=0e00bc1b key=/Table/SystemConfigSpan/Start rw=true pri=0.00049705 iso=SERIALIZABLE stat=PENDING epo=0 ts=1536122084.279131764,0 orig=1536122084.279131764,0 max=1536122084.279131764,0 wto=false rop=false seq=6
I180905 04:34:44.402003 3105 storage/replica_command.go:298  [split,n1,s1,r3/1:/{System/NodeL…-Max}] initiating a split of this range at key /System/NodeLivenessMax [r4]
I180905 04:34:44.456405 3398 sql/event_log.go:126  [n1,intExec=optInToDiagnosticsStatReporting] Event: "set_cluster_setting", target: 0, info: {SettingName:diagnostics.reporting.enabled Value:true User:root}
I180905 04:34:44.519950 3605 storage/replica_command.go:298  [split,n1,s1,r4/1:/{System/NodeL…-Max}] initiating a split of this range at key /System/tsd [r5]
I180905 04:34:44.652059 3590 storage/replica_command.go:298  [split,n1,s1,r5/1:/{System/tsd-Max}] initiating a split of this range at key /System/"tse" [r6]
I180905 04:34:44.771722 3265 storage/replica_command.go:298  [split,n1,s1,r6/1:/{System/tse-Max}] initiating a split of this range at key /Table/SystemConfigSpan/Start [r7]
I180905 04:34:44.838390 3588 sql/event_log.go:126  [n1,intExec=set-setting] Event: "set_cluster_setting", target: 0, info: {SettingName:version Value:2.0-13 User:root}
I180905 04:34:44.924482 3611 storage/replica_command.go:298  [split,n1,s1,r7/1:/{Table/System…-Max}] initiating a split of this range at key /Table/11 [r8]
I180905 04:34:45.033338 3622 sql/event_log.go:126  [n1,intExec=disableNetTrace] Event: "set_cluster_setting", target: 0, info: {SettingName:trace.debug.enable Value:false User:root}
I180905 04:34:45.054270 3626 storage/replica_command.go:298  [split,n1,s1,r8/1:/{Table/11-Max}] initiating a split of this range at key /Table/12 [r9]
I180905 04:34:45.194197 3699 storage/replica_command.go:298  [split,n1,s1,r9/1:/{Table/12-Max}] initiating a split of this range at key /Table/13 [r10]
I180905 04:34:45.218932 3614 rpc/nodedialer/nodedialer.go:92  [consistencyChecker,n1,s1,r1/1:/{Min-System/}] connection to n1 established
I180905 04:34:45.307754 3646 sql/event_log.go:126  [n1,intExec=initializeClusterSecret] Event: "set_cluster_setting", target: 0, info: {SettingName:cluster.secret Value:7bd7f491-fd12-4085-ad61-87a845c00777 User:root}
I180905 04:34:45.353748 3684 storage/replica_command.go:298  [split,n1,s1,r10/1:/{Table/13-Max}] initiating a split of this range at key /Table/14 [r11]
I180905 04:34:45.462730 3328 sql/event_log.go:126  [n1,intExec=create-default-db] Event: "create_database", target: 50, info: {DatabaseName:defaultdb Statement:CREATE DATABASE IF NOT EXISTS defaultdb User:root}
I180905 04:34:45.481549 3716 storage/replica_command.go:298  [split,n1,s1,r11/1:/{Table/14-Max}] initiating a split of this range at key /Table/15 [r12]
I180905 04:34:45.517717 3751 storage/replica_command.go:298  [split,n1,s1,r12/1:/{Table/15-Max}] initiating a split of this range at key /Table/16 [r13]
I180905 04:34:45.522768 3703 sql/event_log.go:126  [n1,intExec=create-default-db] Event: "create_database", target: 51, info: {DatabaseName:postgres Statement:CREATE DATABASE IF NOT EXISTS postgres User:root}
I180905 04:34:45.561913 3779 storage/replica_command.go:298  [split,n1,s1,r13/1:/{Table/16-Max}] initiating a split of this range at key /Table/17 [r14]
I180905 04:34:45.609483 3811 storage/replica_command.go:298  [split,n1,s1,r14/1:/{Table/17-Max}] initiating a split of this range at key /Table/18 [r15]
I180905 04:34:45.639230 3293 server/server.go:1604  [n1] done ensuring all necessary migrations have run
I180905 04:34:45.639351 3293 server/server.go:1607  [n1] serving sql connections
I180905 04:34:45.678438 3759 server/server_update.go:67  [n1] no need to upgrade, cluster already at the newest version
I180905 04:34:45.690926 3761 sql/event_log.go:126  [n1] Event: "node_join", target: 1, info: {Descriptor:{NodeID:1 Address:{NetworkField:tcp AddressField:127.0.0.1:41465} Attrs: Locality: ServerVersion:2.0-13 BuildTag:v2.2.0-alpha.00000000-491-g5692414 StartedAt:1536122084064556185 LocalityAddress:[]} ClusterID:d57a7672-012f-4b3d-ab65-0fc550bf94d7 StartedAt:1536122084064556185 LastUp:1536122084064556185}
I180905 04:34:45.692114 3693 storage/replica_command.go:298  [split,n1,s1,r15/1:/{Table/18-Max}] initiating a split of this range at key /Table/19 [r16]
I180905 04:34:45.745276 3697 storage/replica_command.go:298  [split,n1,s1,r16/1:/{Table/19-Max}] initiating a split of this range at key /Table/20 [r17]
I180905 04:34:45.829285 3846 storage/replica_command.go:298  [split,n1,s1,r17/1:/{Table/20-Max}] initiating a split of this range at key /Table/21 [r18]
I180905 04:34:45.891908 3861 storage/replica_command.go:298  [split,n1,s1,r18/1:/{Table/21-Max}] initiating a split of this range at key /Table/22 [r19]
I180905 04:34:45.933295 3280 gossip/gossip.go:941  [n1] gossip connectivity
  n1 [sentinel];
 
I180905 04:34:45.948406 3740 storage/replica_command.go:298  [split,n1,s1,r19/1:/{Table/22-Max}] initiating a split of this range at key /Table/23 [r20]
I180905 04:34:46.058337 3855 storage/replica_command.go:298  [split,n1,s1,r20/1:/{Table/23-Max}] initiating a split of this range at key /Table/50 [r21]
I180905 04:34:46.140143 3879 storage/replica_command.go:298  [split,n1,s1,r21/1:/{Table/50-Max}] initiating a split of this range at key /Table/51 [r22]
I180905 04:34:46.592504 3928 sql/event_log.go:126  [n1,client=127.0.0.1:60008,user=root] Event: "set_cluster_setting", target: 0, info: {SettingName:changefeed.experimental_poll_interval Value:0s User:root}
I180905 04:34:46.669651 3928 sql/event_log.go:126  [n1,client=127.0.0.1:60008,user=root] Event: "create_database", target: 52, info: {DatabaseName:d Statement:CREATE DATABASE d User:root}
--- FAIL: TestChangefeedSchemaChange (23.24s)
=== RUN   TestChangefeedSchemaChange/add_column_with_default
==================
WARNING: DATA RACE
Write at 0x00c42157f9a8 by goroutine 59:
  github.com/cockroachdb/cockroach/pkg/sql.(*txnState).finishSQLTxn()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/txn_state.go:250 +0x1d8
  github.com/cockroachdb/cockroach/pkg/sql.glob..func3()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_fsm.go:281 +0x59
  github.com/cockroachdb/cockroach/pkg/util/fsm.Transitions.apply()
      /go/src/github.com/cockroachdb/cockroach/pkg/util/fsm/fsm.go:107 +0x20e
  github.com/cockroachdb/cockroach/pkg/util/fsm.(*Machine).ApplyWithPayload()
      /go/src/github.com/cockroachdb/cockroach/pkg/util/fsm/fsm.go:136 +0x165
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).txnStateTransitionsApplyWrapper()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1837 +0x1f5
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1204 +0x15c9
  github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).initConnEx.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:210 +0x97

Previous read at 0x00c42157f9a8 by goroutine 214:
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).serialize()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1945 +0x137
  github.com/cockroachdb/cockroach/pkg/sql.(*SessionRegistry).SerializeAll()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/exec_util.go:849 +0x263
  github.com/cockroachdb/cockroach/pkg/server.(*statusServer).ListLocalSessions()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/status.go:1288 +0xfb
  github.com/cockroachdb/cockroach/pkg/server/serverpb._Status_ListLocalSessions_Handler.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/serverpb/status.pb.go:1559 +0xa1
  github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc.OpenTracingServerInterceptor.func1()
      /go/src/github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc/server.go:48 +0xc23
  github.com/cockroachdb/cockroach/pkg/rpc.NewServerWithInterceptor.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:168 +0x10c
  github.com/cockroachdb/cockroach/pkg/server/serverpb._Status_ListLocalSessions_Handler()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/serverpb/status.pb.go:1561 +0x1e3
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).processUnaryRPC()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1011 +0xde7
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).handleStream()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1249 +0x1364
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:680 +0xac

Goroutine 59 (running) created at:
  github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).initConnEx()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:199 +0x492
  github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).execInternal()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:449 +0x54d
  github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).Exec()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:292 +0x1d1
  github.com/cockroachdb/cockroach/pkg/sql.LeaseStore.release()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/lease.go:249 +0x58d
  github.com/cockroachdb/cockroach/pkg/sql.releaseLease.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/lease.go:1020 +0x112
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:324 +0xf3

Goroutine 214 (finished) created at:
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).serveStreams.func1()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:678 +0xb8
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/transport.(*http2Server).operateHeaders()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/transport/http2_server.go:417 +0x1446
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/transport.(*http2Server).HandleStreams()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/transport/http2_server.go:457 +0x87a
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).serveStreams()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:676 +0x16e
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).handleRawConn.func2()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:631 +0x53
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).handleRawConn.func3()
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:639 +0x34
==================
I180905 04:34:51.128654 4422 storage/replica_command.go:298  [split,n1,s1,r26/1:/{Table/55-Max}] initiating a split of this range at key /Table/56 [r27]
I180905 04:34:51.138204 4069 sql/event_log.go:126  [n1,client=127.0.0.1:60018,user=root] Event: "create_table", target: 56, info: {TableName:d.public.add_column_def Statement:CREATE TABLE add_column_def (a INT PRIMARY KEY) User:root}
I180905 04:34:51.231234 4369 ccl/changefeedccl/table_history.go:291  [n1,client=127.0.0.1:60018,user=root] add_column_def 1 1536122091.076811487,0
I180905 04:34:51.280319 3928 sql/lease.go:345  [n1,client=127.0.0.1:60008,user=root] publish: descID=56 (add_column_def) version=2 mtime=2018-09-05 04:34:51.245656098 +0000 UTC
I180905 04:34:51.509040 3928 sql/lease.go:345  [n1,client=127.0.0.1:60008,user=root] publish: descID=56 (add_column_def) version=2 mtime=2018-09-05 04:34:51.279313832 +0000 UTC
I180905 04:34:51.633487 4369 ccl/changefeedccl/table_history.go:291  [n1,client=127.0.0.1:60018,user=root] add_column_def 2 1536122091.279313832,1
I180905 04:34:51.634219 3928 sql/event_log.go:126  [n1,client=127.0.0.1:60008,user=root] Event: "alter_table", target: 56, info: {TableName:d.public.add_column_def Statement:ALTER TABLE add_column_def ADD COLUMN b STRING DEFAULT 'd' User:root MutationID:1 CascadeDroppedViews:[]}
I180905 04:34:51.634363 3928 sql/event_log.go:126  [n1,client=127.0.0.1:60008,user=root] Event: "alter_table", target: 56, info: {TableName:d.public.add_column_def Statement:ALTER TABLE add_column_def ADD COLUMN b STRING DEFAULT 'd' User:root MutationID:1 CascadeDroppedViews:[]}
I180905 04:34:51.647891 4470 sql/lease.go:877  new lease: 56("add_column_def") ver=2:1536122377.799372693,0, refcount=0
I180905 04:34:51.829529 3928 sql/lease.go:345  [n1,client=127.0.0.1:60008,user=root,scExec=] publish: descID=56 (add_column_def) version=3 mtime=2018-09-05 04:34:51.828676068 +0000 UTC
I180905 04:34:51.860794 3928 sql/lease.go:315  publish (1 leases): desc=[{add_column_def 56 2}]
I180905 04:34:51.862232 4487 sql/lease.go:877  new lease: 56("add_column_def") ver=3:1536122425.582245917,0, refcount=0
I180905 04:34:51.895004 3928 sql/lease.go:315  publish (1 leases): desc=[{add_column_def 56 2}]
I180905 04:34:51.955264 3928 sql/backfill.go:135  [n1,client=127.0.0.1:60008,user=root,scExec=] Running backfill for "add_column_def", v=3, m=1
I180905 04:34:52.167223 3928 sql/lease.go:345  [n1,client=127.0.0.1:60008,user=root,scExec=] publish: descID=56 (add_column_def) version=4 mtime=2018-09-05 04:34:52.166581255 +0000 UTC
I180905 04:34:52.308670 3928 sql/event_log.go:126  [n1,client=127.0.0.1:60008,user=root,scExec=] Event: "finish_schema_change", target: 56, info: {MutationID:1}
I180905 04:34:52.386748 4442 sql/lease.go:877  new lease: 56("add_column_def") ver=4:1536122362.908308835,0, refcount=0
I180905 04:34:52.388552 3928 sql/lease.go:315  publish (1 leases): desc=[{add_column_def 56 3}]
    --- FAIL: TestChangefeedSchemaChange/add_column_with_default (1.42s)
    	testing.go:730: race detected during execution of test

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Sep 5, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Sep 5, 2018
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1fdb03d6531245890570587d59687630064102c4

Parameters:

TAGS=
GOFLAGS=-race

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestChangefeedSchemaChange PKG=github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=890575&tab=buildLog

@tbg tbg modified the milestones: 2.2, 2.1 Sep 24, 2018
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/6d8bd75a2cf8ee76d5e9a3999e3544d78acaa1e4

Parameters:

TAGS=
GOFLAGS=-race

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=TestChangefeedSchemaChange PKG=github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=930970&tab=buildLog

I180928 05:43:16.272088 11241 storage/split_queue.go:170  [n1,split,s1,r21/1:/Table/5{0-1}] split done
I180928 05:43:16.364337 11198 sql/event_log.go:126  [n1,client=127.0.0.1:56232,user=root] Event: "set_cluster_setting", target: 0, info: {SettingName:changefeed.experimental_poll_interval Value:0s User:root}
I180928 05:43:16.364780 10953 storage/split_queue.go:116  [n1,split,s1,r10/1:/Table/1{3-4}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.365075 10953 storage/split_queue.go:116  [n1,split,s1,r19/1:/Table/2{2-3}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.365325 10953 storage/split_queue.go:116  [n1,split,s1,r13/1:/Table/1{6-7}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.365583 10953 storage/split_queue.go:116  [n1,split,s1,r7/1:/Table/{SystemCon…-11}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.365848 10953 storage/split_queue.go:116  [n1,split,s1,r18/1:/Table/2{1-2}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.366127 10953 storage/split_queue.go:116  [n1,split,s1,r14/1:/Table/1{7-8}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.366365 10953 storage/split_queue.go:116  [n1,split,s1,r5/1:/System/ts{d-e}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.366590 10953 storage/split_queue.go:116  [n1,split,s1,r1/1:/{Min-System/}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.366830 10953 storage/split_queue.go:116  [n1,split,s1,r22/1:/{Table/51-Max}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.367115 10953 storage/split_queue.go:116  [n1,split,s1,r17/1:/Table/2{0-1}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.367421 10953 storage/split_queue.go:116  [n1,split,s1,r21/1:/Table/5{0-1}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.367656 10953 storage/split_queue.go:116  [n1,split,s1,r12/1:/Table/1{5-6}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.367871 10953 storage/split_queue.go:116  [n1,split,s1,r6/1:/{System/tse-Table/System…}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.368087 10953 storage/split_queue.go:116  [n1,split,s1,r3/1:/System/NodeLiveness{-Max}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.368235 10953 storage/split_queue.go:116  [n1,split,s1,r2/1:/System/{-NodeLive…}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.368415 10953 storage/split_queue.go:116  [n1,split,s1,r11/1:/Table/1{4-5}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.369026 10953 storage/split_queue.go:116  [n1,split,s1,r4/1:/System/{NodeLive…-tsd}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.369287 10953 storage/split_queue.go:116  [n1,split,s1,r15/1:/Table/1{8-9}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.369535 10953 storage/split_queue.go:116  [n1,split,s1,r16/1:/Table/{19-20}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.369768 10953 storage/split_queue.go:116  [n1,split,s1,r20/1:/Table/{23-50}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.370002 10953 storage/split_queue.go:116  [n1,split,s1,r9/1:/Table/1{2-3}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.370248 10953 storage/split_queue.go:116  [n1,split,s1,r8/1:/Table/1{1-2}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.388114 10953 storage/split_queue.go:116  [n1,split,s1,r10/1:/Table/1{3-4}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.388390 10953 storage/split_queue.go:116  [n1,split,s1,r11/1:/Table/1{4-5}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.388549 11198 sql/event_log.go:126  [n1,client=127.0.0.1:56232,user=root] Event: "create_database", target: 52, info: {DatabaseName:d Statement:CREATE DATABASE d User:root}
I180928 05:43:16.388614 10953 storage/split_queue.go:116  [n1,split,s1,r15/1:/Table/1{8-9}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.388803 10953 storage/split_queue.go:116  [n1,split,s1,r16/1:/Table/{19-20}] shouldQueue: shouldQ=false priority=0.0
I180928 05:43:16.388981 10953 storage/split_queue.go:116  [n1,split,s1,r9/1:/Table/1{2-3}] shouldQueue: shouldQ=false priority=0.0
    --- FAIL: TestChangefeedSchemaChange/enterprise (22.19s)

@tbg
Copy link
Member

tbg commented Oct 2, 2018

last one doesn't have the stack (see end of this post) but the failure one up does:

[05:54:26]
WARNING: DATA RACE
[05:54:26]
Write at 0x00c420abc0a8 by goroutine 285:
[05:54:26]
  github.com/cockroachdb/cockroach/pkg/sql.(*txnState).finishSQLTxn()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/txn_state.go:250 +0x1d8
[05:54:26]
  github.com/cockroachdb/cockroach/pkg/sql.glob..func3()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_fsm.go:281 +0x59
[05:54:26]
  github.com/cockroachdb/cockroach/pkg/util/fsm.Transitions.apply()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/pkg/util/fsm/fsm.go:107 +0x20e
[05:54:26]
  github.com/cockroachdb/cockroach/pkg/util/fsm.(*Machine).ApplyWithPayload()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/pkg/util/fsm/fsm.go:136 +0x165
[05:54:26]
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).txnStateTransitionsApplyWrapper()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1837 +0x1f5
[05:54:26]
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1204 +0x15c9
[05:54:26]
  github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).initConnEx.func1()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:210 +0x97
[05:54:26]
[05:54:26]
Previous read at 0x00c420abc0a8 by goroutine 87:
[05:54:26]
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).serialize()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1945 +0x137
[05:54:26]
  github.com/cockroachdb/cockroach/pkg/sql.(*SessionRegistry).SerializeAll()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/exec_util.go:849 +0x263
[05:54:26]
  github.com/cockroachdb/cockroach/pkg/server.(*statusServer).ListLocalSessions()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/pkg/server/status.go:1288 +0xfb
[05:54:26]
  github.com/cockroachdb/cockroach/pkg/server/serverpb._Status_ListLocalSessions_Handler.func1()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/pkg/server/serverpb/status.pb.go:1559 +0xa1
[05:54:26]
  github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc.OpenTracingServerInterceptor.func1()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc/server.go:48 +0xc23
[05:54:26]
  github.com/cockroachdb/cockroach/pkg/rpc.NewServerWithInterceptor.func1()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:168 +0x10c
[05:54:26]
  github.com/cockroachdb/cockroach/pkg/server/serverpb._Status_ListLocalSessions_Handler()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/pkg/server/serverpb/status.pb.go:1561 +0x1e3
[05:54:26]
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).processUnaryRPC()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1011 +0xde7
[05:54:26]
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).handleStream()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1249 +0x1364
[05:54:26]
  github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1()
[05:54:26]
      /go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:680 +0xac
==================
WARNING: DATA RACE
Write at 0x00c42101a268 by goroutine 81:
  github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).commit()
      /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:992 +0x1be
  github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).Commit()
      /go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:573 +0x4c
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).commitSQLTransaction()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:586 +0xb6
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).handleAutoCommit()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1345 +0x140
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState.func4()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:194 +0x1ba
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:435 +0xe43
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:95 +0x457
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1080 +0x349d
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1080 +0x349d
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).txnStateTransitionsApplyWrapper()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1881 +0xfc9
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1206 +0x15df
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1080 +0x349d
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).txnStateTransitionsApplyWrapper()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1881 +0xfc9
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1206 +0x15df
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1080 +0x349d
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1206 +0x15df
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1080 +0x349d
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1206 +0x15df
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1080 +0x349d
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1080 +0x349d
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1206 +0x15df
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1080 +0x349d
  github.com/cockroachdb/cockroach/pkg/sql.(*schemaChangerCollection).execSchemaChanges()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/exec_util.go:928 +0x277
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).txnStateTransitionsApplyWrapper()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1881 +0xfc9
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1206 +0x15df
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1206 +0x15df
  github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:417 +0x1e0
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).serveImpl.func3()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:268 +0x191

Previous read at 0x00c42101a268 by goroutine 102:
  [failed to restore the stack]

Goroutine 81 (running) created at:
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).serveImpl()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:267 +0x12d1
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.serveConn()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:163 +0x240
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*Server).ServeConn()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/server.go:453 +0x9b2
  github.com/cockroachdb/cockroach/pkg/server.(*Server).Start.func19.1()
      /go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1631 +0x1b5
  github.com/cockroachdb/cockroach/pkg/util/netutil.(*Server).ServeWith.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/util/netutil/net.go:141 +0xd9

Goroutine 102 (running) created at:
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask()
      /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:319 +0x14c
  github.com/cockroachdb/cockroach/pkg/kv.GRPCTransportFactory()
      /go/src/github.com/cockroachdb/cockroach/pkg/kv/transport_race.go:107 +0x2a8
  github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendToReplicas()
      /go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:1298 +0x117
  github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendRPC()
      /go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:387 +0x2cb
  github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendSingleRange()
      /go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:457 +0x1ef
  github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendPartialBatch()
      /go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:1088 +0x3da
  github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).divideAndSendBatchToRanges()
      /go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:759 +0x17d4
  github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).Send()
      /go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:671 +0x683
  github.com/cockroachdb/cockroach/pkg/internal/client.lookupRangeFwdScan()
      /go/src/github.com/cockroachdb/cockroach/pkg/internal/client/range_lookup.go:319 +0x51e
  github.com/cockroachdb/cockroach/pkg/internal/client.RangeLookup()
      /go/src/github.com/cockroachdb/cockroach/pkg/internal/client/range_lookup.go:202 +0xa43
  github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).RangeLookup()
      /go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:318 +0xba
  github.com/cockroachdb/cockroach/pkg/kv.(*RangeDescriptorCache).performRangeLookup()
      /go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:407 +0x3d7
  github.com/cockroachdb/cockroach/pkg/kv.(*RangeDescriptorCache).lookupRangeDescriptorInternal.func3()
      /go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:286 +0x167
  github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall()
      /go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:118 +0x4c
==================

@tbg tbg assigned nvanbenschoten and unassigned danhhz Oct 2, 2018
@tbg tbg added the A-coreperf label Oct 2, 2018
@tbg
Copy link
Member

tbg commented Oct 2, 2018

@nvanbenschoten this looks like you or @andreimatei might be able to put an end to this

@nvanbenschoten
Copy link
Member

Looks like a race between connExecutor's use of its txnState's txn when accessed through (*statusServer).ListLocalSessions() and its use of its txnState's txn in finishSQLTxn. The former holds a mutex, the later doesn't.

I'll leave this to @andreimatei since he's been working in this area recently while touching tracing.

@nvanbenschoten nvanbenschoten added A-kv-client Relating to the KV client and the KV interface. and removed A-coreperf labels Oct 2, 2018
@nvanbenschoten nvanbenschoten added A-sql-executor SQL txn logic and removed A-kv-client Relating to the KV client and the KV interface. labels Oct 2, 2018
@andreimatei andreimatei modified the milestones: 2.1, 2.2 Oct 4, 2018
@tbg tbg added this to Incoming in KV via automation Oct 30, 2018
@tbg
Copy link
Member

tbg commented Oct 30, 2018

@andreimatei just checking that you're aware that this ended up on your stack.

@andreimatei
Copy link
Contributor

ack

@andreimatei
Copy link
Contributor

@jordan, passing on to you :)
Trying to shed some failures from my plate.

@tbg tbg added this to Triage in (DEPRECATED) SQL execution via automation Dec 11, 2018
@tbg tbg removed this from Incoming in KV Dec 11, 2018
@tbg
Copy link
Member

tbg commented Dec 11, 2018

@andreimatei 👍 ps if you throw something into another team, do me a favor and also add their project (that's how we are keeping track of ownership, FSVO "we"), thanks!

@jordanlewis
Copy link
Member

Could this have been fixed by #33249? cc @petermattis. I am not seeing it reproduce locally with stressrace but runs don't come very quickly.

@petermattis
Copy link
Collaborator

The first race is exactly what was fixed by #33249. I can't tell about the second race. I'll take this over and verify that #33249 fixed the problem.

@petermattis
Copy link
Collaborator

Heh, I see that the diagnosis of the problem here was exactly what I fixed in #33249. And that diagnosis was done almost 2 months ago. I'm roachprod-stressracing this test on master right now (actually, TestChangefeedSchemaChangeAllowBackfill because the test has since been renamed).

Note that TestChangefeedSchemaChangeAllowBackfill has its own issue. See #33166. I might end up fixing that issue too.

@petermattis
Copy link
Collaborator

20m of roachprod-stressrace without a failure:

~ make roachprod-stressrace PKG=./ccl/changefeedccl/ TESTS=TestChangefeedSchemaChangeAllowBackfill CLUSTER=peter-stress
...
1549 runs so far, 0 failures, over 20m10s

I'm going to see if I can reproduce a failure here before #33249 went in.

@petermattis
Copy link
Collaborator

Something I'm confused about here is that I don't see where ListLocalSessions is being called by this test.

@petermattis
Copy link
Collaborator

Something is calling SHOW CLUSTER QUERIES in this test. Not sure who yet.

@petermattis
Copy link
Collaborator

petermattis commented Dec 19, 2018

Specifically, SELECT query_id FROM [SHOW CLUSTER QUERIES] WHERE (query LIKE 'CREATE CHANGEFEED%') AND (start > $1). I guess this is expected, then.

Update: Ah, sinklessFeedFactory.Feed executes this query. Now the race above makes sense. I'm confident #33249 fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-executor SQL txn logic C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
No open projects
Development

No branches or pull requests

7 participants