osd: stray pg should/could not send notify anymore #25589

Aran85 · 2018-12-17T13:48:31Z

let us say:

primary pg queried full log from stray pg
primary rewind the stray pg log when proc_replica_log(),and generate new peer info
osdmap update and did not change peering interval
stray pg react ActMap and send old info
the new peer info was replaced with older info when Primary::react(const MNotifyRec& notevt)
when activating, primary pg will use old peer info to search missing..

Fixes: http://tracker.ceph.com/issues/37679

Signed-off-by: Zengran Zhang zhangzengran@sangfor.com.cn

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

tchaikov · 2018-12-17T16:26:26Z

retest this please

liewegas · 2018-12-17T16:35:28Z

I'm confused by step 4. If the interval did not change, why does the stray send the notify?

Also, with step 5, is this because the notify and the ful log query pass each other on the wire?

liewegas · 2018-12-17T16:35:46Z

We should open a tracker ticket to track this issue.

Aran85 · 2018-12-18T01:55:02Z

I'm confused by step 4. If the interval did not change, why does the stray send the notify?

Also, with step 5, is this because the notify and the ful log query pass each other on the wire?

step 4 is the case that it the stray pg handle an advance osdmap and then handle_activate_map in OSD::advance_pg(), the default action when stray pg react ActMap is sending notify(see PG::RecoveryState::Stray::react(const ActMap&)). currently stray pg will stop sending if it goes into active state.

step 5 is because primary proc_replica_log first, and then handle notify in PG::RecoveryState::Primary::react(const MNotifyRec& notevt)

Aran85 · 2018-12-18T02:06:13Z

@liewegas I opened a new tracker, and then found that we already have a same failed assert there, see http://tracker.ceph.com/issues/15373.

stale · 2019-02-16T02:22:41Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If you are a maintainer or core committer, please follow-up on this issue to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

Aran85 · 2019-04-05T15:38:37Z

@athanatos mind take a look?

Aran85 · 2019-04-19T08:06:29Z

retest this please

liewegas · 2019-04-25T13:39:23Z

This PR is suspected cause of http://tracker.ceph.com/issues/39441 ... putting on hold for the moment

neha-ojha · 2019-04-26T23:36:36Z

The root cause for http://tracker.ceph.com/issues/39441 was not this PR, and the bug has now been fixed. Let's run this PR through another round of testing.

tchaikov · 2019-05-06T13:59:07Z

@Aran85 needs rebase

let us say: primary pg queried full log from stray pg primary rewind the stray pg log when proc_replica_log(),and generate new peer info osdmap update and did not change peering interval stray pg react ActMap and send old info the new peer info was replaced with older info when Primary::react(const MNotifyRec& notevt) when activating, primary pg will use old peer info to search missing.. Fixes: http://tracker.ceph.com/issues/37679 Signed-off-by: Zengran Zhang <zhangzengran@sangfor.com.cn>

Aran85 · 2019-05-07T02:06:59Z

@tchaikov rebased, thanks.

neha-ojha · 2019-05-09T00:35:15Z

rados run: http://pulpito.ceph.com/nojha-2019-05-07_17:20:56-rados-fix-pg-notify-distro-basic-smithi/

@Aran85 I noticed http://pulpito.ceph.com/nojha-2019-05-07_17:20:56-rados-fix-pg-notify-distro-basic-smithi/3937720/, which we have not seen before. Can you take look to make sure it not related?

Aran85 · 2019-05-09T01:43:47Z

hi @neha-ojha

Error(s) ignored for 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head enough copies available

It seems that some ec shard read failed but whole stripes decode successfully, So i think it is related with the read failure.. can we get the osd log?

neha-ojha · 2019-05-09T02:06:26Z

hi @neha-ojha

Error(s) ignored for 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head enough copies available

It seems that some ec shard read failed but whole stripes decode successfully, So i think it is related with the read failure.. can we get the osd log?

Should be in the log directory in http://qa-proxy.ceph.com/teuthology/nojha-2019-05-07_17:20:56-rados-fix-pg-notify-distro-basic-smithi/3937720/remote/

dzafman · 2019-05-09T04:01:59Z

@neha-ojha @Aran85 There is a pg [7,4,3]/[7,4,2147483647]p7(0) async=[3(2)]. The read of 3(0) is getting ENOENT read_result_t(r=0, errors={3(0)=-2}. Everything looks ok to me except that I'm not sure why 3(0) is missing. So there is also a missing shard 7(0) causing a pull that reads 3(0) which is the easiest way to fix the primary. Keep in mind the async recovery is to 3(2) a different shard that happens to also be stored on osd.3. So after 3(0) fails, the primary is able to read 4(1) and 6(2) to reconstruct shard 0 for osd.7. Is this what the fix does? Makes 6(2) a stray available?

If so, maybe we need to add "enough copies available" to the log-whitelist.

Aran85 · 2019-05-09T08:16:33Z

Thanks @dzafman for help to debug. 6(2) being a stray available is not the fix does, it is because search missing found the 6(2) shard . I checked the osd log, and have no idea about why 3(0) is missing.

3(0) last update when it is in acting set is 87'1005, 87'1004 is a modify, 87'1005 is a delete, they are on same object.Then interval change, auth log rewind to 87'1004, so 3(0) treat the 87'1005 as divergent log, and rolled back to 87'1004. when 7(0) read shard from 3(0) for recover, read error occurred on 3(0). I see it's because get_onode return -2. but I also see some get_onode error on above process. related logs of 3(0):

87'1005 delete appiled

2019-05-07 17:55:47.883 7fdf67fb1700 10 osd.3 pg_epoch: 87 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 19/86/19) [3,4,6]p3(0) r=0 lpr=86 crt=87'1004 lcod 87'1003 mlcod 87'1003 active+clean] append_log log((0'0,87'1004], crt=87'1004) [87'1005 (87'1004) delete 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head by client.4331.0:7378 2019-05-07 17:55:47.885437 0]
2019-05-07 17:55:47.883 7fdf67fb1700 10 osd.3 pg_epoch: 87 pg[2.15s0( v 87'1005 (0'0,87'1005] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 19/86/19) [3,4,6]p3(0) r=0 lpr=86 luod=87'1004 lua=87'1004 crt=87'1004 lcod 87'1003 mlcod 87'1003 active+clean] add_log_entry 87'1005 (87'1004) delete 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head by client.4331.0:7378 2019-05-07 17:55:47.885437 0
2019-05-07 17:55:47.883 7fdf67fb1700 10 osd.3 pg_epoch: 87 pg[2.15s0( v 87'1005 (0'0,87'1005] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 19/86/19) [3,4,6]p3(0) r=0 lpr=86 luod=87'1004 lua=87'1004 crt=87'1004 lcod 87'1003 mlcod 87'1003 active+clean] append_log approx pg log length = 1005
2019-05-07 17:55:47.883 7fdf67fb1700 10 osd.3 pg_epoch: 87 pg[2.15s0( v 87'1005 (0'0,87'1005] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 19/86/19) [3,4,6]p3(0) r=0 lpr=86 luod=87'1004 lua=87'1004 crt=87'1004 lcod 87'1003 mlcod 87'1003 active+clean] append_log transaction_applied = 1
2019-05-07 17:55:47.883 7fdf67fb1700 6 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 87'1005, trimmed: , trimmed_dups: , clear_divergent_priors: 0
2019-05-07 17:55:47.883 7fdf67fb1700 10 bluestore(/var/lib/ceph/osd/ceph-3) queue_transactions ch 0x55e9716b4c00 2.15s0_head
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_create osr 0x55e96ed0f560 = 0x55e971d5b300 seq 15
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x55e9716b4c00) get_onode oid 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed key 0x808000000000000002adc4568321736d'ithi19813912-5#20oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo!='0xfffffffffffffffe00000000000003ed'o'
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x55e9716b4c00) r -2 v.len 0
2019-05-07 17:55:47.883 7fdf67fb1700 15 bluestore(/var/lib/ceph/osd/ceph-3) _rename 2.15s0_head 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# -> 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed
2019-05-07 17:55:47.883 7fdf67fb1700 10 bluestore(/var/lib/ceph/osd/ceph-3) _rename 2.15s0_head 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# -> 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed = 0
2019-05-07 17:55:47.883 7fdf67fb1700 15 bluestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys 2.15s0_head 0#2:a8000000::::head#
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys 0x00000000000006a0'.0000000087.00000000000000001005' <- 0000000087.00000000000000001005
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys 0x00000000000006a0'._fastinfo' <- _fastinfo
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys 0x00000000000006a0'.can_rollback_to' <- can_rollback_to
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys 0x00000000000006a0'.rollback_info_trimmed_to' <- rollback_info_trimmed_to
2019-05-07 17:55:47.883 7fdf67fb1700 10 bluestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys 2.15s0_head 0#2:a8000000::::head# = 0
2019-05-07 17:55:47.883 7fdf67fb1700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_calc_cost 0x55e971d5b300 cost 5878 (1 ios * 4000 + 1878 bytes)
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_write_nodes txc 0x55e971d5b300 onodes 0x55e970811440 shared_blobs
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore.extentmap(0x55e970811590) update 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _record_onode onode 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed is 1990 (1073 bytes onode + 2 bytes spanning blobs + 915 bytes inline extents)
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_finalize_kv txc 0x55e971d5b300 allocated 0x[] released 0x[]
2019-05-07 17:55:47.883 7fdf67fb1700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_state_proc txc 0x55e971d5b300 prepare
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_finish_io 0x55e971d5b300
2019-05-07 17:55:47.883 7fdf67fb1700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_state_proc txc 0x55e971d5b300 io_done
2019-05-07 17:55:47.883 7fdf67fb1700 10 osd.3 pg_epoch: 87 pg[2.15s0( v 87'1005 (0'0,87'1005] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 19/86/19) [3,4,6]p3(0) r=0 lpr=86 luod=87'1004 lua=87'1004 crt=87'1004 lcod 87'1003 mlcod 87'1003 active+clean] op_applied version 87'1005

87'1005 rolled back to 87'1004

2019-05-07 17:56:23.175 7f7563e29700 10 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1005 (0'0,87'1005] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] do_peering_event: epoch_sent: 106 epoch_requested: 106 MInfoRec from 7(0) info: 2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105)
2019-05-07 17:56:23.175 7f7563e29700 10 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1005 (0'0,87'1005] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] state<Started/Stray>: got info from osd.7(0) 2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105)
2019-05-07 17:56:23.175 7f7563e29700 10 rewind_divergent_log truncate divergent future 87'1004
2019-05-07 17:56:23.175 7f7563e29700 20 rewind_divergent_log original_crt = 87'1004
2019-05-07 17:56:23.175 7f7563e29700 10 rewind_divergent_log future divergent 87'1005 (87'1004) delete 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head by client.4331.0:7378 2019-05-07 17:55:47.885437 0
2019-05-07 17:56:23.175 7f7563e29700 20 _merge_object_divergent_entries: merging hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head entries: 87'1005 (87'1004) delete 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head by client.4331.0:7378 2019-05-07 17:55:47.885437 0
2019-05-07 17:56:23.175 7f7563e29700 20 _merge_object_divergent_entries: keeping 87'1005 (87'1004) delete 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head by client.4331.0:7378 2019-05-07 17:55:47.885437 0
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head prior_version: 87'1004 first_divergent_update: 87'1005 last_divergent_update: 87'1005
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head has no more recent entries in log
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head must be rolled back or recovered, attempting to rollback
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head olog_can_rollback_to: 87'1004
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head original_crt: 87'1004
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head rolling back 87'1005 (87'1004) delete 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head by client.4331.0:7378 2019-05-07 17:55:47.885437 0
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head rolled back
2019-05-07 17:56:23.175 7f7563e29700 5 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] exit Started/Stray 2019-05-07 17:56:23.179138 7 0.001081
2019-05-07 17:56:23.175 7f7563e29700 5 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] enter Started/ReplicaActive
2019-05-07 17:56:23.175 7f7563e29700 5 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] enter Started/ReplicaActive/RepNotRecovering
2019-05-07 17:56:23.175 7f7563e29700 10 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] state<Started/ReplicaActive>: In ReplicaActive, about to call activate
2019-05-07 17:56:23.175 7f7563e29700 10 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] activate - no missing, moving last_complete 87'1004 -> 87'1004
2019-05-07 17:56:23.175 7f7563e29700 10 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] state<Started/ReplicaActive>: Activate Finished
2019-05-07 17:56:23.175 7f7563e29700 6 write_log_and_missing with: dirty_to: 0'0, dirty_from: 87'1005, writeout_from: 4294967295'18446744073709551615, trimmed: , trimmed_dups: , clear_divergent_priors: 0
2019-05-07 17:56:23.175 7f7563e29700 10 bluestore(/var/lib/ceph/osd/ceph-3) queue_transactions ch 0x561e392b8400 2.15s0_head
2019-05-07 17:56:23.175 7f7563e29700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_create osr 0x561e37fcdc20 = 0x561e38f19900 seq 4
2019-05-07 17:56:23.175 7f7563e29700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x561e392b8400) get_onode oid 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# key 0x808000000000000002adc4568321736d'ithi19813912-5#20oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo!='0xfffffffffffffffeffffffffffffffff'o'
2019-05-07 17:56:23.175 7f7563e29700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x561e392b8400) r -2 v.len 0
2019-05-07 17:56:23.175 7f7563e29700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_add_transaction op 13 got ENOENT on 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#
2019-05-07 17:56:23.175 7f7563e29700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x561e392b8400) get_onode oid 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed key 0x808000000000000002adc4568321736d'ithi19813912-5#20oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo!='0xfffffffffffffffe00000000000003ed'o'
2019-05-07 17:56:23.175 7f7563e29700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x561e392b8400) r -2 v.len 0
2019-05-07 17:56:23.175 7f7563e29700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_add_transaction op 38 got ENOENT on 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed
2019-05-07 17:56:23.175 7f7563e29700 15 bluestore(/var/lib/ceph/osd/ceph-3) _omap_rmkey_range 2.15s0_head 0#2:a8000000::::head#

sub read 87'1004 error

2019-05-07 17:56:23.191 7f755fe21700 10 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 luod=0'0 crt=87'1004 lcod 0'0 active+remapped mbc={}] _handle_message: MOSDECSubOpRead(2.15s0 106/105 ECSubRead(tid=254, to_read={2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head=0,524288,0}, subchunks={2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head=[0,1]}, attrs_to_read=2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head)) v3
2019-05-07 17:56:23.191 7f755fe21700 25 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 luod=0'0 crt=87'1004 lcod 0'0 active+remapped mbc={}] handle_sub_read case1: reading the complete chunk/shard.
2019-05-07 17:56:23.191 7f755fe21700 15 bluestore(/var/lib/ceph/osd/ceph-3) read 2.15s0_head 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# 0x080000
2019-05-07 17:56:23.191 7f755fe21700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x561e392b8400) get_onode oid 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# key 0x808000000000000002adc4568321736d'ithi19813912-5#20oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo!='0xfffffffffffffffeffffffffffffffff'o'
2019-05-07 17:56:23.191 7f755fe21700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x561e392b8400) r -2 v.len 0
2019-05-07 17:56:23.191 7f755fe21700 10 bluestore(/var/lib/ceph/osd/ceph-3) read 2.15s0_head 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# 0x080000 = -2
2019-05-07 17:56:23.191 7f755fe21700 5 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 luod=0'0 crt=87'1004 lcod 0'0 active+remapped mbc={}] handle_sub_read: Error -2 reading 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head, fast read, probably ok

stale · 2019-07-08T09:11:47Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

liewegas · 2019-07-08T13:29:19Z

unstale

stale · 2019-09-06T13:47:03Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

stale · 2019-12-05T14:42:07Z

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

stale · 2020-02-03T15:49:00Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

stale · 2020-05-03T16:10:32Z

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

tchaikov added bug-fix core labels Dec 17, 2018

stale bot added the stale label Feb 16, 2019

stale bot removed the stale label Apr 5, 2019

liewegas approved these changes Apr 22, 2019

View reviewed changes

liewegas added needs-qa wip-sage4-testing labels Apr 22, 2019

liewegas requested a review from neha-ojha April 23, 2019 19:00

liewegas removed needs-qa wip-sage4-testing labels Apr 25, 2019

liewegas assigned neha-ojha Apr 25, 2019

neha-ojha added the needs-qa label Apr 26, 2019

tchaikov added needs-rebase and removed needs-qa labels May 6, 2019

Aran85 closed this May 7, 2019

Aran85 force-pushed the fix-pg-notify branch from ae8166d to 6dbe496 Compare May 7, 2019 02:00

Aran85 reopened this May 7, 2019

tchaikov added needs-qa and removed needs-rebase labels May 7, 2019

neha-ojha added the wip-neha-testing label May 7, 2019

neha-ojha removed wip-neha-testing needs-qa labels May 9, 2019

stale bot added the stale label Jul 8, 2019

stale bot removed the stale label Jul 8, 2019

stale bot added the stale label Sep 6, 2019

stale bot closed this Dec 5, 2019

tchaikov reopened this Dec 5, 2019

stale bot removed the stale label Dec 5, 2019

stale bot added the stale label Feb 3, 2020

stale bot closed this May 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: stray pg should/could not send notify anymore #25589

osd: stray pg should/could not send notify anymore #25589

Aran85 commented Dec 17, 2018 •

edited

tchaikov commented Dec 17, 2018

liewegas commented Dec 17, 2018

liewegas commented Dec 17, 2018

Aran85 commented Dec 18, 2018

Aran85 commented Dec 18, 2018

stale bot commented Feb 16, 2019

Aran85 commented Apr 5, 2019

Aran85 commented Apr 19, 2019

liewegas commented Apr 25, 2019

neha-ojha commented Apr 26, 2019

tchaikov commented May 6, 2019

Aran85 commented May 7, 2019

neha-ojha commented May 9, 2019

Aran85 commented May 9, 2019

neha-ojha commented May 9, 2019

dzafman commented May 9, 2019

Aran85 commented May 9, 2019 •

edited

stale bot commented Jul 8, 2019

liewegas commented Jul 8, 2019

stale bot commented Sep 6, 2019

stale bot commented Dec 5, 2019

stale bot commented Feb 3, 2020

stale bot commented May 3, 2020

osd: stray pg should/could not send notify anymore #25589

osd: stray pg should/could not send notify anymore #25589

Conversation

Aran85 commented Dec 17, 2018 • edited

tchaikov commented Dec 17, 2018

liewegas commented Dec 17, 2018

liewegas commented Dec 17, 2018

Aran85 commented Dec 18, 2018

Aran85 commented Dec 18, 2018

stale bot commented Feb 16, 2019

Aran85 commented Apr 5, 2019

Aran85 commented Apr 19, 2019

liewegas commented Apr 25, 2019

neha-ojha commented Apr 26, 2019

tchaikov commented May 6, 2019

Aran85 commented May 7, 2019

neha-ojha commented May 9, 2019

Aran85 commented May 9, 2019

neha-ojha commented May 9, 2019

dzafman commented May 9, 2019

Aran85 commented May 9, 2019 • edited

87'1005 delete appiled

87'1005 rolled back to 87'1004

sub read 87'1004 error

stale bot commented Jul 8, 2019

liewegas commented Jul 8, 2019

stale bot commented Sep 6, 2019

stale bot commented Dec 5, 2019

stale bot commented Feb 3, 2020

stale bot commented May 3, 2020

Aran85 commented Dec 17, 2018 •

edited

Aran85 commented May 9, 2019 •

edited