Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stability: cyan crashing after on disk/memory state diverged #10723

Closed
mberhault opened this issue Nov 16, 2016 · 15 comments
Closed

stability: cyan crashing after on disk/memory state diverged #10723

mberhault opened this issue Nov 16, 2016 · 15 comments
Assignees

Comments

@mberhault
Copy link
Contributor

mberhault commented Nov 16, 2016

cyan nodes started crashing after an upgrade to 981b8aa at 20161116-04:00 UTC.
The previous build on cyan was 85f80d9 at 20161115-23:00 UTC.

Nodes crashed within a minute with:

F161116 04:00:56.433545 107 storage/replica.go:1113  [n1,s1,r3974/4:/Table/51/1/908{59106…-81657…}] on-disk and in-memory state diverged:
[Lease.StartStasis.WallTime: 1479268833127627548 != 1479268840607062711 Lease.Expiration.WallTime: 1479268833377627548 != 1479268840857062711 Lease.ProposedTS.WallTime: 1479268824127630948 != 1479268831607066111]
@mberhault
Copy link
Contributor Author

details for all nodes, showing fatal line (includes affected key) and startup line including sha.

0: cockroach@104.209.248.59
I161115 23:00:30.028362 1 cli/start.go:299  CockroachDB beta-20161110-153-g85f80d9 (linux amd64, built 2016/11/15 22:23:17, go1.7.3)
I161116 04:00:39.923882 1 cli/start.go:299  CockroachDB beta-20161110-161-g981b8aa (linux amd64, built 2016/11/16 03:46:45, go1.7.3)
F161116 04:00:56.433545 107 storage/replica.go:1113  [n1,s1,r3974/4:/Table/51/1/908{59106…-81657…}] on-disk and in-memory state diverged:
I161116 05:00:34.442329 1 cli/start.go:299  CockroachDB beta-20161110-163-g9d9b348 (linux amd64, built 2016/11/16 04:06:25, go1.7.3)
F161116 05:00:43.564244 122 storage/replica.go:1113  [n1,s1,r2607/1:/System/tsd/cr.store.queue.re…] on-disk and in-memory state diverged:
I161116 06:00:42.026812 1 cli/start.go:299  CockroachDB beta-20161110-167-g183f7c0 (linux amd64, built 2016/11/16 05:02:06, go1.7.3)
F161116 06:00:51.223745 103 storage/replica.go:1113  [n1,s1,r1100/4:/System/tsd/cr.store.mutex.…] on-disk and in-memory state diverged:
I161116 08:47:25.673727 1 cli/start.go:299  CockroachDB beta-20161110-169-gdd6e8e1 (linux amd64, built 2016/11/16 06:59:08, go1.7.3)
F161116 08:47:37.384188 105 storage/replica.go:1113  [n1,s1,r1231/3:/System/tsd/cr.node.sql.…] on-disk and in-memory state diverged:

1: cockroach@40.79.32.211
I161115 23:00:32.890924 1 cli/start.go:299  CockroachDB beta-20161110-153-g85f80d9 (linux amd64, built 2016/11/15 22:23:17, go1.7.3)
I161116 04:00:42.736691 1 cli/start.go:299  CockroachDB beta-20161110-161-g981b8aa (linux amd64, built 2016/11/16 03:46:45, go1.7.3)
F161116 04:00:54.302041 99 storage/replica.go:1113  [n2,s2,r1261/4:/System/tsd/cr.{node.…-store…}] on-disk and in-memory state diverged:
I161116 05:00:37.073277 1 cli/start.go:299  CockroachDB beta-20161110-163-g9d9b348 (linux amd64, built 2016/11/16 04:06:25, go1.7.3)
F161116 05:00:46.413715 99 storage/replica.go:1113  [n2,s2,r1100/7:/System/tsd/cr.store.mutex.…] on-disk and in-memory state diverged:
I161116 06:00:44.684708 1 cli/start.go:299  CockroachDB beta-20161110-167-g183f7c0 (linux amd64, built 2016/11/16 05:02:06, go1.7.3)
F161116 06:00:54.150128 93 storage/replica.go:1113  [n2,s2,r2432/8:/System/tsd/cr.store.mutex.re…] on-disk and in-memory state diverged:
I161116 08:47:28.699418 1 cli/start.go:299  CockroachDB beta-20161110-169-gdd6e8e1 (linux amd64, built 2016/11/16 06:59:08, go1.7.3)
F161116 08:47:37.781689 119 storage/replica.go:1113  [n2,s2,r4721/2:/System/tsd/cr.store.queue.gc…] on-disk and in-memory state diverged:

2: cockroach@104.209.249.37
I161115 23:00:35.531929 1 cli/start.go:299  CockroachDB beta-20161110-153-g85f80d9 (linux amd64, built 2016/11/15 22:23:17, go1.7.3)
I161116 04:00:45.651326 1 cli/start.go:299  CockroachDB beta-20161110-161-g981b8aa (linux amd64, built 2016/11/16 03:46:45, go1.7.3)
F161116 04:00:56.426405 95 storage/replica.go:1113  [n3,s3,r3974/1:/Table/51/1/908{59106…-81657…}] on-disk and in-memory state diverged:
I161116 05:00:39.730120 1 cli/start.go:299  CockroachDB beta-20161110-163-g9d9b348 (linux amd64, built 2016/11/16 04:06:25, go1.7.3)
F161116 05:00:46.412893 116 storage/replica.go:1113  [n3,s3,r1100/9:/System/tsd/cr.store.mutex.…] on-disk and in-memory state diverged:
I161116 06:00:47.603702 1 cli/start.go:299  CockroachDB beta-20161110-167-g183f7c0 (linux amd64, built 2016/11/16 05:02:06, go1.7.3)
F161116 06:50:12.954826 115 storage/replica.go:1113  [n3,s3,r1507/1:/Table/51/1/262{47379…-70021…}] on-disk and in-memory state diverged:
I161116 08:47:31.588655 1 cli/start.go:299  CockroachDB beta-20161110-169-gdd6e8e1 (linux amd64, built 2016/11/16 06:59:08, go1.7.3)
F161116 09:16:23.028046 72 storage/replica.go:1113  [n3,s3,r1297/5:/Table/51/1/674{60066…-82634…}] on-disk and in-memory state diverged:

3: cockroach@104.46.104.189
I161116 04:00:48.458129 1 cli/start.go:299  CockroachDB beta-20161110-161-g981b8aa (linux amd64, built 2016/11/16 03:46:45, go1.7.3)
I161116 05:00:42.757135 1 cli/start.go:299  CockroachDB beta-20161110-163-g9d9b348 (linux amd64, built 2016/11/16 04:06:25, go1.7.3)
I161116 06:00:50.532790 1 cli/start.go:299  CockroachDB beta-20161110-167-g183f7c0 (linux amd64, built 2016/11/16 05:02:06, go1.7.3)
F161116 06:46:15.539709 120 storage/replica.go:1113  [n4,s4,r1645/5:/Table/51/1/195{54823…-77379…}] on-disk and in-memory state diverged:
I161116 08:47:34.567746 1 cli/start.go:299  CockroachDB beta-20161110-169-gdd6e8e1 (linux amd64, built 2016/11/16 06:59:08, go1.7.3)

4: cockroach@104.46.105.52
I161115 23:00:41.543618 1 cli/start.go:299  CockroachDB beta-20161110-153-g85f80d9 (linux amd64, built 2016/11/15 22:23:17, go1.7.3)
I161116 04:00:51.533333 1 cli/start.go:299  CockroachDB beta-20161110-161-g981b8aa (linux amd64, built 2016/11/16 03:46:45, go1.7.3)
F161116 04:01:09.183781 104 storage/replica.go:1113  [n5,s5,r957/12:/Table/51/1/489{21672…-44222…}] on-disk and in-memory state diverged:
I161116 05:00:45.313537 1 cli/start.go:299  CockroachDB beta-20161110-163-g9d9b348 (linux amd64, built 2016/11/16 04:06:25, go1.7.3)
I161116 06:00:52.888546 1 cli/start.go:299  CockroachDB beta-20161110-167-g183f7c0 (linux amd64, built 2016/11/16 05:02:06, go1.7.3)
I161116 08:47:37.489834 1 cli/start.go:299  CockroachDB beta-20161110-169-gdd6e8e1 (linux amd64, built 2016/11/16 06:59:08, go1.7.3)
F161116 09:16:23.026196 28 storage/replica.go:1113  [n5,s5,r1297/7:/Table/51/1/674{60066…-82634…}] on-disk and in-memory state diverged:

5: cockroach@104.46.103.35
I161115 23:00:44.834157 1 cli/start.go:299  CockroachDB beta-20161110-153-g85f80d9 (linux amd64, built 2016/11/15 22:23:17, go1.7.3)
I161116 04:00:54.162996 1 cli/start.go:299  CockroachDB beta-20161110-161-g981b8aa (linux amd64, built 2016/11/16 03:46:45, go1.7.3)
F161116 04:01:09.183090 123 storage/replica.go:1113  [n6,s6,r957/10:/Table/51/1/489{21672…-44222…}] on-disk and in-memory state diverged:
I161116 05:00:48.106295 1 cli/start.go:299  CockroachDB beta-20161110-163-g9d9b348 (linux amd64, built 2016/11/16 04:06:25, go1.7.3)
I161116 06:00:55.765135 1 cli/start.go:299  CockroachDB beta-20161110-167-g183f7c0 (linux amd64, built 2016/11/16 05:02:06, go1.7.3)
F161116 06:46:15.537996 88 storage/replica.go:1113  [n6,s6,r1645/7:/Table/51/1/195{54823…-77379…}] on-disk and in-memory state diverged:
I161116 08:47:40.533880 1 cli/start.go:299  CockroachDB beta-20161110-169-gdd6e8e1 (linux amd64, built 2016/11/16 06:59:08, go1.7.3)

@mberhault
Copy link
Contributor Author

mberhault commented Nov 16, 2016

btw, I have paused continuous deployment while we look at this. We already went through three upgrades since the problem started manifesting.

@mberhault
Copy link
Contributor Author

silenced alerts on cyan for 6h.

@bdarnell
Copy link
Member

Here's one of the failures on cockroach@104.46.103.35:

F161116 06:46:15.537996 88 storage/replica.go:1113  [n6,s6,r1645/7:/Table/51/1/195{
54823…-77379…}] on-disk and in-memory state diverged:
[Lease.StartStasis.WallTime: 1479268833430905692 != 1479268840705036366 Lease.Expir
ation.WallTime: 1479268833680905692 != 1479268840955036366 Lease.ProposedTS.WallTim
e: 1479268824430909492 != 1479268831705040066]

Here are some entries from the raft log (ignoring all the "regular" commands and just focusing on leader and lease stuff):

/Local/RangeID/1645/u/RaftLog/logIndex:434494: Term:246 Index:434494  by {1 1 4}
RequestLease [/Table/51/1/1955482351830207526/"758f09b7-cc55-4ab8-ba64-f0222a000862"/1011373,/Min)
origin_replica:<node_id:1 store_id:1 replica_id:4 > max_lease_index:363825 cmd:<header:<timestamp:<wall_time:1479268824431031695 logical:0 > replica:<node_id:0 store_id:0 replica_id:0 > range_id:1645 user_priority:NORMAL read_consistency:CONSISTENT max_span_request_keys:0 distinct_spans:false return_range_info:false > requests:<request_lease:<header:<key:"\273\211\375\033#D\326\241b\214&\022758f09b7-cc55-4ab8-ba64-f0222a000862\000\001\370\017n\255" > lease:<start:<wall_time:1479268824430905692 logical:0 > start_stasis:<wall_time:1479268833430905692 logical:0 > expiration:<wall_time:1479268833680905692 logical:0 > replica:<node_id:1 store_id:1 replica_id:4 > proposed_ts:<wall_time:1479268824430909492 logical:0 > > > > > write_batch:<> 

/Local/RangeID/1645/u/RaftLog/logIndex:434502: Term:246 Index:434502  by {1 1 4}
RequestLease [/Table/51/1/1955482351830207526/"758f09b7-cc55-4ab8-ba64-f0222a000862"/1011373,/Min)
origin_replica:<node_id:1 store_id:1 replica_id:4 > max_lease_index:363832 cmd:<header:<timestamp:<wall_time:1479268831705130168 logical:0 > replica:<node_id:0 store_id:0 replica_id:0 > range_id:1645 user_priority:NORMAL read_consistency:CONSISTENT max_span_request_keys:0 distinct_spans:false return_range_info:false > requests:<request_lease:<header:<key:"\273\211\375\033#D\326\241b\214&\022758f09b7-cc55-4ab8-ba64-f0222a000862\000\001\370\017n\255" > lease:<start:<wall_time:1479268831705036366 logical:0 > start_stasis:<wall_time:1479268840705036366 logical:0 > expiration:<wall_time:1479268840955036366 logical:0 > replica:<node_id:1 store_id:1 replica_id:4 > proposed_ts:<wall_time:1479268831705040066 logical:0 > > > > > write_batch:<> 

/Local/RangeID/1645/u/RaftLog/logIndex:434503: Term:248 Index:434503 : EMPTY

/Local/RangeID/1645/u/RaftLog/logIndex:434504: Term:248 Index:434504  by {6 6 7}
RequestLease [/Table/51/1/1955482351830207526/"758f09b7-cc55-4ab8-ba64-f0222a000862"/1011373,/Min)
origin_replica:<node_id:6 store_id:6 replica_id:7 > max_lease_index:363831 cmd:<header:<timestamp:<wall_time:1479278775531014937 logical:0 > replica:<node_id:0 store_id:0 replica_id:0 > range_id:1645 user_priority:NORMAL read_consistency:CONSISTENT max_span_request_keys:0 distinct_spans:false return_range_info:false > requests:<request_lease:<header:<key:"\273\211\375\033#D\326\241b\214&\022758f09b7-cc55-4ab8-ba64-f0222a000862\000\001\370\017n\255" > lease:<start:<wall_time:1479278775530902837 logical:0 > start_stasis:<wall_time:1479278784530902837 logical:0 > expiration:<wall_time:1479278784780902837 logical:0 > replica:<node_id:6 store_id:6 replica_id:7 > proposed_ts:<wall_time:1479278775530973137 logical:0 > > > > > 

In the assertion, the "on-disk" values are from the lease at index 434494, and the "in-memory" values are from the lease at index 434502 (43504 is the last entry in the lease at this point, and has apparently not been processed yet). Note that this is an extension; the start time and replica descriptor are the same.

The proposed timestamps on the leases translate to Wed Nov 16 04:00:31 2016, just before this node was restarted with 981b8aa. So it looks like these leases were proposed just before the restart, and then the latter was applied after it. This suggests that #10681 is handling commands that were proposed by the earlier version (which called evaluateProposalTwice) differently than before.

I think a brand-new cluster would be fine, as would a cluster upgrading from beta-20161103, so this would only be an issue for clusters upgrading from 1110 or unreleased master builds. I'm still re-reviewing #10681 to figure out what might have changed, though.

@bdarnell
Copy link
Member

I think I see it: This line before my PR sets a non-nil but empty WriteBatch, but this line after my PR prefers the WriteBatch that was serialized into the command over the one we just generated if it is non-nil.

So there should be a quick fix to just change the tag numbers, since we're not yet using these fields yet for anything real. My upcoming cleanup PR to break up the ProposalData struct should make this logic less fragile.

bdarnell added a commit to bdarnell/cockroach that referenced this issue Nov 16, 2016
Older versions were writing non-nil but empty values into these fields,
leading to problems on upgrade. Since there is no non-experimental use
of these fields yet, renumber them to discard all old data.

Fixes cockroachdb#10723
@mberhault
Copy link
Contributor Author

15 minutes and everything looks good.
Removed alert silence and re-enabled continuous deployment.

@mberhault mberhault reopened this Nov 16, 2016
@mberhault
Copy link
Contributor Author

Cyan node 104.209.249.37 died with inconsistency failure:

E161116 14:28:16.155816 986799 storage/replica_command.go:1887  [n3,s3,r1231/13:/System/tsd/cr.node.sql.â<80>¦]
replica {1 1 3} is inconsistent: expected checksum b6e61649aa7fb428f62e220ecc6949e94c1673bb04e3f0bf5b72c8fcb4f43855ac891ee8054f38c5d6b709bb007119bec0100f24e1232be39ce183863a657864, got b6b211a82b2a7a1a48fe0f77a32221a309c3d944d63fde97f374b8a8ce6ecddb8283c827f0c89692f780bfe1fac3cca7cb514a2d1305f6ad52f250a001871251--- leaseholder
+++ follower
-0.000000000,0 /Local/RangeID/1231/r/RangeStats
-  ts:<zero>
-  value:^R^D^H^@^P^@^X^@ ^@(^@2|ТÚ'^C Ê7°Gj<8d><87>^T^Q^@^@^@^@^@^@^@^@^Y^@^@^@^@^@^@^@^@!9°>^B^@^@^@^@)ù^W^@^@^@^@^@^@1ìE^C^@^@^@^@^@9ù^W^@^@^@^@^@^@AMj;^B^@^@^@^@Iù^W^@^@^@^@^@^@Q^@^@^@^@^@^@^@^@Y^@^@^@^@^@^@^@^@aÂ^U^@^@^@^@^@^@i ^@^@^@^@^@^@^@p^A
-  raw_key:/Local/RangeID/1231/r/RangeStats raw_value:120408001000180020002800327cd0a2da270309ca37b0476a8d87141100000000000000001900000000000000002139b03e020000000029f91700000000000031ec4503000000000039f917000000000000414d6a3b020000000049f91700000000000051000000000000000059000000000000000061c2150000000000006920000000000000007001
+0.000000000,0 /Local/RangeID/1231/r/RangeStats
+  ts:<zero>
+  value:^R^D^H^@^P^@^X^@ ^@(^@2|%`¿^U^C        Ê7°Gj<8d><87>^T^Q^@^@^@^@^@^@^@^@^Y^@^@^@^@^@^@^@^@!¹«>^B^@^@^@^@)ù^W^@^@^@^@^@^@1ìE^C^@^@^@^@^@9ù^W^@^@^@^@^@^@AÍe;^B^@^@^@^@Iù^W^@^@^@^@^@^@Q^@^@^@^@^@^@^@^@Y^@^@^@^@^@^@^@^@aÂ^U^@^@^@^@^@^@i ^@^@^@^@^@^@^@p^A
+  raw_key:/Local/RangeID/1231/r/RangeStats raw_value:120408001000180020002800327c2560bf150309ca37b0476a8d871411000000000000000019000000000000000021b9ab3e020000000029f91700000000000031ec4503000000000039f91700000000000041cd653b020000000049f91700000000000051000000000000000059000000000000000061c2150000000000006920000000000000007001
-0.000000000,0 /System/tsd/cr.node.sql.bytesin/3/10s/2016-11-16T04:00:00Z
-  ts:<zero>
-  value:2Q^@^@^@^@d^H<80><80>ƺ­äÚÃ^T^P<80>ȯ %^Z^M^H^@0^A9^@^@<80>DÌ^RâA^Z^M^H^A0^A9^@^@ X<92>^UâA^Z^M^H^B0^A9^@^@<80>^Xf^XâA^Z^M^H^C0^A9^@^@ n|^ZâA:^L^H<92>ÛôȾäÚÃ^T^P^@
-  raw_key:/System/tsd/cr.node.sql.bytesin/3/10s/2016-11-16T04:00:00Z raw_value:32510000000064088080c6baade4dac3141080c8afa0251a0d080030013900008044cc12e2411a0d08013001390000a0589215e2411a0d0802300139000080186618e2411a0d08033001390000a06e7c1ae2413a0c0892dbf4c8bee4dac3141000
+0.000000000,0 /System/tsd/cr.node.sql.bytesin/3/10s/2016-11-16T04:00:00Z
+  ts:<zero>
+  value:2B^@^@^@^@d^H<80><80>ƺ­äÚÃ^T^P<80>ȯ %^Z^M^H^@0^A9^@^@<80>DÌ^RâA^Z^M^H^A0^A9^@^@ X<92>^UâA^Z^M^H^B0^A9^@^@<80>^Xf^XâA:^L^H<92>ÛôȾäÚÃ^T^P^@
+  raw_key:/System/tsd/cr.node.sql.bytesin/3/10s/2016-11-16T04:00:00Z raw_value:32420000000064088080c6baade4dac3141080c8afa0251a0d080030013900008044cc12e2411a0d08013001390000a0589215e2411a0d0802300139000080186618e2413a0c0892dbf4c8bee4dac3141000
-0.000000000,0 /System/tsd/cr.node.sql.bytesin/4/10s/2016-11-16T04:00:00Z
-  ts:<zero>
-  value:2Q^@^@^@^@d^H<80><80>ƺ­äÚÃ^T^P<80>ȯ %^Z^M^H^@0^A9^@^@ ù<99>^LâA^Z^M^H^A0^A9^@^@^@EP^OâA^Z^M^H^B0^A9^@^@À°^E^RâA^Z^M^H^C0^A9^@^@À<98>^O^TâA:^L^Hõ»½È¾äÚÃ^T^P^@
-  raw_key:/System/tsd/cr.node.sql.bytesin/4/10s/2016-11-16T04:00:00Z raw_value:32510000000064088080c6baade4dac3141080c8afa0251a0d0800300139000020f9990ce2411a0d080130013900000045500fe2411a0d08023001390000c0b00512e2411a0d08033001390000c0980f14e2413a0c08f5bbbdc8bee4dac3141000
+0.000000000,0 /System/tsd/cr.node.sql.bytesin/4/10s/2016-11-16T04:00:00Z
+  ts:<zero>
+  value:2B^@^@^@^@d^H<80><80>ƺ­äÚÃ^T^P<80>ȯ %^Z^M^H^@0^A9^@^@ ù<99>^LâA^Z^M^H^A0^A9^@^@^@EP^OâA^Z^M^H^B0^A9^@^@À°^E^RâA:^L^Hõ»½È¾äÚÃ^T^P^@
+  raw_key:/System/tsd/cr.node.sql.bytesin/4/10s/2016-11-16T04:00:00Z raw_value:32420000000064088080c6baade4dac3141080c8afa0251a0d0800300139000020f9990ce2411a0d080130013900000045500fe2411a0d08023001390000c0b00512e2413a0c08f5bbbdc8bee4dac3141000
-0.000000000,0 /System/tsd/cr.node.sql.bytesin/5/10s/2016-11-16T04:00:00Z
-  ts:<zero>
-  value:2Q^@^@^@^@d^H<80><80>ƺ­äÚÃ^T^P<80>ȯ %^Z^M^H^@0^A9^@^@`<96>¢WâA^Z^M^H^A0^A9^@^@à<]ZâA^Z^M^H^B0^A9^@^@ É^[]âA^Z^M^H^C0^A9^@^@À);_âA:^L^H<86>Ì<8d><8b>¾äÚÃ^T^P^@
-  raw_key:/System/tsd/cr.node.sql.bytesin/5/10s/2016-11-16T04:00:00Z raw_value:32510000000064088080c6baade4dac3141080c8afa0251a0d080030013900006096a257e2411a0d08013001390000e03c5d5ae2411a0d0802300139000020c91b5de2411a0d08033001390000c0293b5fe2413a0c0886cc8d8bbee4dac3141000
+0.000000000,0 /System/tsd/cr.node.sql.bytesin/5/10s/2016-11-16T04:00:00Z
+  ts:<zero>
+  value:2B^@^@^@^@d^H<80><80>ƺ­äÚÃ^T^P<80>ȯ %^Z^M^H^@0^A9^@^@`<96>¢WâA^Z^M^H^A0^A9^@^@à<]ZâA^Z^M^H^B0^A9^@^@ É^[]âA:^L^H<86>Ì<8d><8b>¾äÚÃ^T^P^@
+  raw_key:/System/tsd/cr.node.sql.bytesin/5/10s/2016-11-16T04:00:00Z raw_value:32420000000064088080c6baade4dac3141080c8afa0251a0d080030013900006096a257e2411a0d08013001390000e03c5d5ae2411a0d0802300139000020c91b5de2413a0c0886cc8d8bbee4dac3141000
-0.000000000,0 /System/tsd/cr.node.sql.bytesin/6/10s/2016-11-16T04:00:00Z
-  ts:<zero>
-  value:2Q^@^@^@^@d^H<80><80>ƺ­äÚÃ^T^P<80>ȯ %^Z^M^H^@0^A9^@^@@+^MõáA^Z^M^H^A0^A9^@^@à<9c>Ú÷áA^Z^M^H^B0^A9^@^@@ú<81>úáA^Z^M^H^C0^A9^@^@<80>C<94>üáA:^L^H¿<94>ï;äÚÃ^T^P^@
-  raw_key:/System/tsd/cr.node.sql.bytesin/6/10s/2016-11-16T04:00:00Z raw_value:32510000000064088080c6baade4dac3141080c8afa0251a0d08003001390000402b0df5e1411a0d08013001390000e09cdaf7e1411a0d0802300139000040fa81fae1411a0d08033001390000804394fce1413a0c08bf94efcdbee4dac3141000
+0.000000000,0 /System/tsd/cr.node.sql.bytesin/6/10s/2016-11-16T04:00:00Z
+  ts:<zero>
+  value:2B^@^@^@^@d^H<80><80>ƺ­äÚÃ^T^P<80>ȯ %^Z^M^H^@0^A9^@^@`<96>¢WâA^Z^M^H^A0^A9^@^@à<]ZâA^Z^M^H^B0^A9^@^@ É^[]âA:^L^H<86>Ì<8d><8b>¾äÚÃ^T^P^@
+  raw_key:/System/tsd/cr.node.sql.bytesin/5/10s/2016-11-16T04:00:00Z raw_value:32420000000064088080c6baade4dac3141080c8afa0251a0d080030013900006096a257e2411a0d08013001390000e03c5d5ae2411a0d0802300139000020c91b5de2413a0c0886cc8d8bbee4dac3141000
-0.000000000,0 /System/tsd/cr.node.sql.bytesin/6/10s/2016-11-16T04:00:00Z
-  ts:<zero>
-  value:2Q^@^@^@^@d^H<80><80>ƺ­äÚÃ^T^P<80>ȯ %^Z^M^H^@0^A9^@^@@+^MõáA^Z^M^H^A0^A9^@^@à<9c>Ú÷áA^Z^M^H^B0^A9^@^@@ú<81>úáA^Z^M^H^C0^A9^@^@<80>C<94>üáA:^L^H¿<94>ï;äÚÃ^T^P^@
-  raw_key:/System/tsd/cr.node.sql.bytesin/6/10s/2016-11-16T04:00:00Z raw_value:32510000000064088080c6baade4dac3141080c8afa0251a0d08003001390000402b0df5e1411a0d08013001390000e09cdaf7e1411a0d0802300139000040fa81fae1411a0d08033001390000804394fce1413a0c08bf94efcdbee4dac3141000
+0.000000000,0 /System/tsd/cr.node.sql.bytesin/6/10s/2016-11-16T04:00:00Z
+  ts:<zero>
+  value:2B^@^@^@^@d^H<80><80>ƺ­äÚÃ^T^P<80>ȯ %^Z^M^H^@0^A9^@^@@+^MõáA^Z^M^H^A0^A9^@^@à<9c>Ú÷áA^Z^M^H^B0^A9^@^@@ú<81>úáA:^L^H¿<94>ï;äÚÃ^T^P^@
+  raw_key:/System/tsd/cr.node.sql.bytesin/6/10s/2016-11-16T04:00:00Z raw_value:32420000000064088080c6baade4dac3141080c8afa0251a0d08003001390000402b0df5e1411a0d08013001390000e09cdaf7e1411a0d0802300139000040fa81fae1413a0c08bf94efcdbee4dac3141000
-0.000000000,0 /System/tsd/cr.node.sql.bytesout/3/10s/2016-11-16T04:00:00Z
-  ts:<zero>
-  value:2Q^@^@^@^@d^H<80><80>ƺ­äÚÃ^T^P<80>ȯ %^Z^M^H^@0^A9^@^@^@"[Ï·A^Z^M^H^A0^A9^@^@^@Ú^BÓ·A^Z^M^H^B0^A9^@^@^@3¹Ö·A^Z^M^H^C0^A9^@^@^@M<86>Ù·A:^L^H<92>ÛôȾäÚÃ^T^P^@
-  raw_key:/System/tsd/cr.node.sql.bytesout/3/10s/2016-11-16T04:00:00Z raw_value:32510000000064088080c6baade4dac3141080c8afa0251a0d0800300139000000225bcfb7411a0d0801300139000000da02d3b7411a0d080230013900000033b9d6b7411a0d08033001390000004d86d9b7413a0c0892dbf4c8bee4dac3141000
+0.000000000,0 /System/tsd/cr.node.sql.bytesout/3/10s/2016-11-16T04:00:00Z
+  ts:<zero>
+  value:2B^@^@^@^@d^H<80><80>ƺ­äÚÃ^T^P<80>ȯ %^Z^M^H^@0^A9^@^@^@"[Ï·A^Z^M^H^A0^A9^@^@^@Ú^BÓ·A^Z^M^H^B0^A9^@^@^@3¹Ö·A:^L^H<92>ÛôȾäÚÃ^T^P^@
+  raw_key:/System/tsd/cr.node.sql.bytesout/3/10s/2016-11-16T04:00:00Z raw_value:32420000000064088080c6baade4dac3141080c8afa0251a0d0800300139000000225bcfb7411a0d0801300139000000da02d3b7411a0d080230013900000033b9d6b7413a0c0892dbf4c8bee4dac3141000
-0.000000000,0 /System/tsd/cr.node.sql.bytesout/4/10s/2016-11-16T04:00:00Z
-  ts:<zero>
-  value:2Q^@^@^@^@d^H<80><80>ƺ­äÚÃ^T^P<80>ȯ %^Z^M^H^@0^A9^@^@^@É<91>µ·A^Z^M^H^A0^A9^@^@^@³ ¹·A^Z^M^H^B0^A9^@^@^@Ú³¼·A^Z^M^H^C0^A9^@^@^@|t¿·A:^L^Hõ»½È¾äÚÃ^T^P^@
-  raw_key:/System/tsd/cr.node.sql.bytesout/4/10s/2016-11-16T04:00:00Z raw_value:32510000000064088080c6baade4dac3141080c8afa0251a0d0800300139000000c991b5b7411a0d0801300139000000b320b9b7411a0d0802300139000000dab3bcb7411a0d08033001390000007c74bfb7413a0c08f5bbbdc8bee4dac3141000
+0.000000000,0 /System/tsd/cr.node.sql.bytesout/4/10s/2016-11-16T04:00:00Z
+  ts:<zero>

and keeps going for a while. attaching full log:
cockroach.stderr.gz

@mberhault
Copy link
Contributor Author

@petermattis's proposal is to revert cyan back to 85f80d9 and see if it stabilizes, then try latest master again.

@mberhault
Copy link
Contributor Author

correction, proposal is to wipe cyan, start it with 85f80d9, let it run for a bit, then upgrade to latest master.
@bdarnell: is there anything you want to poke at on cyan?

@bdarnell
Copy link
Member

I think reverting to 85f80d9 and then upgrading to master is the right way to go. Inconsistency could be a result of the upgrade from 981b8aa to master, so skipping that range should be safe.

@mberhault
Copy link
Contributor Author

Cyan wiped and restarted with 85f80d9. Will run block_writer against it for a little bit, then push the latest master.

@mberhault
Copy link
Contributor Author

Cyan is wedged with the same issue as delta: #10733

@bdarnell
Copy link
Member

To be clear, it got stuck while still running 85f80d9 and was never upgraded to master, right?

@mberhault
Copy link
Contributor Author

That's correct, it was started with 85f80d9 and was never upgraded.

On Thu, Nov 17, 2016 at 3:57 AM, Ben Darnell notifications@github.com
wrote:

To be clear, it got stuck while still running 85f80d9
85f80d9
and was never upgraded to master, right?


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#10723 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFoMqLvZxravfzfgTNhtwsucTOsxUCyMks5q-8KwgaJpZM4Kzmzo
.

@bdarnell
Copy link
Member

Cyan has been upgraded and we haven't seen this issue recur, so it looks like we can close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants