Skip to content

Version mismatch errors during cluster upgrade #20003

@gpaul

Description

@gpaul

Is this a question, feature request, or bug report?

BUG REPORT

  1. Please supply the header (i.e. the first few lines) of your most recent
    log file for each node in your cluster. On most unix-based systems
    running with defaults, this boils down to the output of

    grep -F '[config]' cockroach-data/logs/cockroach.log

    When log files are not available, supply the output of cockroach version
    and all flags/environment variables passed to cockroach start instead.

/opt/mesosphere/active/cockroach/bin/cockroach start --logtostderr --cache=100MiB --store=/var/lib/dcos/cockroach --certs-dir=/run/dcos/pki/cockroach --advertise-host=172.17.0.3 --host=172.17.0.3 --port=26257 --http-host=127.0.0.1 --http-port=8090 --log-dir= --join=172.17.0.3,172.17.0.2,172.17.0.4
I171113 09:29:47.659301 1 cli/start.go:785  CockroachDB CCL v1.1.2 (linux amd64, built 2017/11/07 08:40:54, go1.9.2)
I171113 09:29:47.760595 1 server/config.go:311  available memory from cgroups (8.0 EiB) exceeds system memory 31 GiB, using system memory
I171113 09:29:47.760664 1 server/config.go:425  system total memory: 31 GiB
I171113 09:29:47.760779 1 server/config.go:427  server configuration:
max offset                500000000
cache size                100 MiB
SQL memory pool size      128 MiB
scan interval             10m0s
scan max idle time        200ms
metrics sample interval   10s
event log enabled         true
linearizable              false
I171113 09:29:47.760996 12 cli/start.go:503  starting cockroach node
I171113 09:29:47.761051 12 cli/start.go:505  using local environment variables: COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING=true
W171113 09:29:47.761994 12 security/certificate_loader.go:252  error finding key for /run/dcos/pki/cockroach/client.root.crt: could not read key file /run/dcos/pki/cockroach/client.root.key: open /run/dcos/pki/cockroach/client.root.key: permission denied
I171113 09:29:47.768751 12 storage/engine/rocksdb.go:405  opening rocksdb instance at "/var/lib/dcos/cockroach/local"
W171113 09:29:47.804815 12 gossip/gossip.go:1241  [n?] no incoming or outgoing connections
I171113 09:29:47.805159 12 storage/engine/rocksdb.go:405  opening rocksdb instance at "/var/lib/dcos/cockroach"
I171113 09:29:47.817498 24 gossip/client.go:129  [n?] started gossip client to 172.17.0.2:26257
I171113 09:29:48.140959 12 server/config.go:527  [n?] 1 storage engine initialized
I171113 09:29:48.141005 12 server/config.go:530  [n?] RocksDB cache size: 100 MiB
I171113 09:29:48.141027 12 server/config.go:530  [n?] store 0: RocksDB, max size 0 B, max open file limit 11384
I171113 09:29:48.142335 12 server/server.go:819  [n?] sleeping for 161.979362ms to guarantee HLC monotonicity
I171113 09:29:48.324021 12 server/node.go:461  [n1] initialized store [n1,s1]: disk (capacity=788 GiB, available=203 GiB, used=2.4 MiB, logicalBytes=20 MiB), ranges=23, leases=0, writes=0.00, bytesPerReplica={p10=0.00 p25=0.00 p50=1987.00 p75=23656.00 p90=51983.00}, writesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00}
I171113 09:29:48.324116 12 server/node.go:326  [n1] node ID 1 initialized
I171113 09:29:48.324432 12 gossip/gossip.go:327  [n1] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"172.17.0.3:26257" > attrs:<> locality:<> ServerVersion:<major_val:1 minor_val:1 patch:0 unstable:0 >
I171113 09:29:48.324706 12 storage/stores.go:303  [n1] read 2 node addresses from persistent storage
I171113 09:29:48.325142 12 server/node.go:606  [n1] connecting to gossip network to verify cluster ID...
I171113 09:29:48.325222 12 server/node.go:631  [n1] node connected via gossip and verified as part of cluster "3dfb4c5c-dd06-4ea7-96a2-5c670fe2514b"
I171113 09:29:48.325334 12 server/node.go:403  [n1] node=1: started with [<no-attributes>=/var/lib/dcos/cockroach] engine(s) and attributes []
I171113 09:29:48.325596 12 sql/executor.go:408  [n1] creating distSQLPlanner with address {tcp 172.17.0.3:26257}
I171113 09:29:48.336739 12 server/server.go:946  [n1] starting https server at 127.0.0.1:8090
I171113 09:29:48.336782 12 server/server.go:947  [n1] starting grpc/postgres server at 172.17.0.3:26257
I171113 09:29:48.336809 12 server/server.go:948  [n1] advertising CockroachDB node at 172.17.0.3:26257
I171113 09:29:48.351394 12 server/server.go:1090  [n1] done ensuring all necessary migrations have run
I171113 09:29:48.351424 12 server/server.go:1092  [n1] serving sql connections
I171113 09:29:48.351528 12 cli/start.go:582  node startup completed:
CockroachDB node starting at 2017-11-13 09:29:48.351458 +0000 UTC (took 0.7s)
build:      CCL v1.1.2 @ 2017/11/07 08:40:54 (go1.9.2)
admin:      https://127.0.0.1:8090
sql:        postgresql://root@172.17.0.3:26257?application_name=cockroach&sslmode=verify-full&sslrootcert=%2Frun%2Fdcos%2Fpki%2Fcockroach%2Fca.crt
logs:
store[0]:   path=/var/lib/dcos/cockroach
status:     restarted pre-existing node
clusterID:  3dfb4c5c-dd06-4ea7-96a2-5c670fe2514b
nodeID:     1
  1. Please describe the issue you observed:
  • What did you do?
    A cluster of 3 cockroachdb nodes running v1.0.6 was running correctly.
    Two of the nodes [node 1, node 2] were individually upgraded to v1.1.2 by stopping cockroachdb, replacing the binary, then starting it again using the same cmdline.

  • What did you expect to see?
    Queries should keep working on node 3, which is still running v1.0.6.

  • What did you see instead?
    Queries on node 3 begin to fail, reporting:

sqlalchemy.exc.InternalError: (psycopg2.InternalError) version mismatch in flow request: 3; this node accepts 6 through 6

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions