Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: stray pg should/could not send notify anymore #25589

Closed
wants to merge 1 commit into from

Conversation

Aran85
Copy link
Contributor

@Aran85 Aran85 commented Dec 17, 2018

let us say:

  1. primary pg queried full log from stray pg
  2. primary rewind the stray pg log when proc_replica_log(),and generate new peer info
  3. osdmap update and did not change peering interval
  4. stray pg react ActMap and send old info
  5. the new peer info was replaced with older info when Primary::react(const MNotifyRec& notevt)
  6. when activating, primary pg will use old peer info to search missing..

Fixes: http://tracker.ceph.com/issues/37679

Signed-off-by: Zengran Zhang zhangzengran@sangfor.com.cn

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

@tchaikov
Copy link
Contributor

retest this please

@liewegas
Copy link
Member

I'm confused by step 4. If the interval did not change, why does the stray send the notify?

Also, with step 5, is this because the notify and the ful log query pass each other on the wire?

@liewegas
Copy link
Member

We should open a tracker ticket to track this issue.

@Aran85
Copy link
Contributor Author

Aran85 commented Dec 18, 2018

I'm confused by step 4. If the interval did not change, why does the stray send the notify?

Also, with step 5, is this because the notify and the ful log query pass each other on the wire?

step 4 is the case that it the stray pg handle an advance osdmap and then handle_activate_map in OSD::advance_pg(), the default action when stray pg react ActMap is sending notify(see PG::RecoveryState::Stray::react(const ActMap&)). currently stray pg will stop sending if it goes into active state.

step 5 is because primary proc_replica_log first, and then handle notify in PG::RecoveryState::Primary::react(const MNotifyRec& notevt)

@Aran85
Copy link
Contributor Author

Aran85 commented Dec 18, 2018

@liewegas I opened a new tracker, and then found that we already have a same failed assert there, see http://tracker.ceph.com/issues/15373.

@stale
Copy link

stale bot commented Feb 16, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If you are a maintainer or core committer, please follow-up on this issue to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Feb 16, 2019
@Aran85
Copy link
Contributor Author

Aran85 commented Apr 5, 2019

@athanatos mind take a look?

@stale stale bot removed the stale label Apr 5, 2019
@Aran85
Copy link
Contributor Author

Aran85 commented Apr 19, 2019

retest this please

@liewegas
Copy link
Member

This PR is suspected cause of http://tracker.ceph.com/issues/39441 ... putting on hold for the moment

@neha-ojha
Copy link
Member

The root cause for http://tracker.ceph.com/issues/39441 was not this PR, and the bug has now been fixed. Let's run this PR through another round of testing.

@tchaikov
Copy link
Contributor

tchaikov commented May 6, 2019

@Aran85 needs rebase

let us say:

primary pg queried full log from stray pg
primary rewind the stray pg log when proc_replica_log(),and generate new peer info
osdmap update and did not change peering interval
stray pg react ActMap and send old info
the new peer info was replaced with older info when Primary::react(const MNotifyRec& notevt)
when activating, primary pg will use old peer info to search missing..

Fixes: http://tracker.ceph.com/issues/37679

Signed-off-by: Zengran Zhang <zhangzengran@sangfor.com.cn>
@Aran85 Aran85 reopened this May 7, 2019
@Aran85
Copy link
Contributor Author

Aran85 commented May 7, 2019

@tchaikov rebased, thanks.

@neha-ojha
Copy link
Member

@Aran85
Copy link
Contributor Author

Aran85 commented May 9, 2019

hi @neha-ojha

Error(s) ignored for 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head enough copies available

It seems that some ec shard read failed but whole stripes decode successfully, So i think it is related with the read failure.. can we get the osd log?

@neha-ojha
Copy link
Member

hi @neha-ojha

Error(s) ignored for 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head enough copies available

It seems that some ec shard read failed but whole stripes decode successfully, So i think it is related with the read failure.. can we get the osd log?

Should be in the log directory in http://qa-proxy.ceph.com/teuthology/nojha-2019-05-07_17:20:56-rados-fix-pg-notify-distro-basic-smithi/3937720/remote/

@dzafman
Copy link
Contributor

dzafman commented May 9, 2019

@neha-ojha @Aran85 There is a pg [7,4,3]/[7,4,2147483647]p7(0) async=[3(2)]. The read of 3(0) is getting ENOENT read_result_t(r=0, errors={3(0)=-2}. Everything looks ok to me except that I'm not sure why 3(0) is missing. So there is also a missing shard 7(0) causing a pull that reads 3(0) which is the easiest way to fix the primary. Keep in mind the async recovery is to 3(2) a different shard that happens to also be stored on osd.3. So after 3(0) fails, the primary is able to read 4(1) and 6(2) to reconstruct shard 0 for osd.7. Is this what the fix does? Makes 6(2) a stray available?

If so, maybe we need to add "enough copies available" to the log-whitelist.

@Aran85
Copy link
Contributor Author

Aran85 commented May 9, 2019

Thanks @dzafman for help to debug. 6(2) being a stray available is not the fix does, it is because search missing found the 6(2) shard . I checked the osd log, and have no idea about why 3(0) is missing.

3(0) last update when it is in acting set is 87'1005, 87'1004 is a modify, 87'1005 is a delete, they are on same object.Then interval change, auth log rewind to 87'1004, so 3(0) treat the 87'1005 as divergent log, and rolled back to 87'1004. when 7(0) read shard from 3(0) for recover, read error occurred on 3(0). I see it's because get_onode return -2. but I also see some get_onode error on above process. related logs of 3(0):

87'1005 delete appiled

2019-05-07 17:55:47.883 7fdf67fb1700 10 osd.3 pg_epoch: 87 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 19/86/19) [3,4,6]p3(0) r=0 lpr=86 crt=87'1004 lcod 87'1003 mlcod 87'1003 active+clean] append_log log((0'0,87'1004], crt=87'1004) [87'1005 (87'1004) delete 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head by client.4331.0:7378 2019-05-07 17:55:47.885437 0]
2019-05-07 17:55:47.883 7fdf67fb1700 10 osd.3 pg_epoch: 87 pg[2.15s0( v 87'1005 (0'0,87'1005] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 19/86/19) [3,4,6]p3(0) r=0 lpr=86 luod=87'1004 lua=87'1004 crt=87'1004 lcod 87'1003 mlcod 87'1003 active+clean] add_log_entry 87'1005 (87'1004) delete 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head by client.4331.0:7378 2019-05-07 17:55:47.885437 0
2019-05-07 17:55:47.883 7fdf67fb1700 10 osd.3 pg_epoch: 87 pg[2.15s0( v 87'1005 (0'0,87'1005] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 19/86/19) [3,4,6]p3(0) r=0 lpr=86 luod=87'1004 lua=87'1004 crt=87'1004 lcod 87'1003 mlcod 87'1003 active+clean] append_log approx pg log length = 1005
2019-05-07 17:55:47.883 7fdf67fb1700 10 osd.3 pg_epoch: 87 pg[2.15s0( v 87'1005 (0'0,87'1005] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 19/86/19) [3,4,6]p3(0) r=0 lpr=86 luod=87'1004 lua=87'1004 crt=87'1004 lcod 87'1003 mlcod 87'1003 active+clean] append_log transaction_applied = 1
2019-05-07 17:55:47.883 7fdf67fb1700 6 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 87'1005, trimmed: , trimmed_dups: , clear_divergent_priors: 0
2019-05-07 17:55:47.883 7fdf67fb1700 10 bluestore(/var/lib/ceph/osd/ceph-3) queue_transactions ch 0x55e9716b4c00 2.15s0_head
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_create osr 0x55e96ed0f560 = 0x55e971d5b300 seq 15
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x55e9716b4c00) get_onode oid 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed key 0x808000000000000002adc4568321736d'ithi19813912-5#20oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo!='0xfffffffffffffffe00000000000003ed'o'
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x55e9716b4c00) r -2 v.len 0
2019-05-07 17:55:47.883 7fdf67fb1700 15 bluestore(/var/lib/ceph/osd/ceph-3) _rename 2.15s0_head 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# -> 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed
2019-05-07 17:55:47.883 7fdf67fb1700 10 bluestore(/var/lib/ceph/osd/ceph-3) _rename 2.15s0_head 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# -> 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed = 0
2019-05-07 17:55:47.883 7fdf67fb1700 15 bluestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys 2.15s0_head 0#2:a8000000::::head#
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys 0x00000000000006a0'.0000000087.00000000000000001005' <- 0000000087.00000000000000001005
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys 0x00000000000006a0'._fastinfo' <- _fastinfo
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys 0x00000000000006a0'.can_rollback_to' <- can_rollback_to
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys 0x00000000000006a0'.rollback_info_trimmed_to' <- rollback_info_trimmed_to
2019-05-07 17:55:47.883 7fdf67fb1700 10 bluestore(/var/lib/ceph/osd/ceph-3) _omap_setkeys 2.15s0_head 0#2:a8000000::::head# = 0
2019-05-07 17:55:47.883 7fdf67fb1700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_calc_cost 0x55e971d5b300 cost 5878 (1 ios * 4000 + 1878 bytes)
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_write_nodes txc 0x55e971d5b300 onodes 0x55e970811440 shared_blobs
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore.extentmap(0x55e970811590) update 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _record_onode onode 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed is 1990 (1073 bytes onode + 2 bytes spanning blobs + 915 bytes inline extents)
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_finalize_kv txc 0x55e971d5b300 allocated 0x[] released 0x[]
2019-05-07 17:55:47.883 7fdf67fb1700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_state_proc txc 0x55e971d5b300 prepare
2019-05-07 17:55:47.883 7fdf67fb1700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_finish_io 0x55e971d5b300
2019-05-07 17:55:47.883 7fdf67fb1700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_state_proc txc 0x55e971d5b300 io_done
2019-05-07 17:55:47.883 7fdf67fb1700 10 osd.3 pg_epoch: 87 pg[2.15s0( v 87'1005 (0'0,87'1005] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 19/86/19) [3,4,6]p3(0) r=0 lpr=86 luod=87'1004 lua=87'1004 crt=87'1004 lcod 87'1003 mlcod 87'1003 active+clean] op_applied version 87'1005

87'1005 rolled back to 87'1004

2019-05-07 17:56:23.175 7f7563e29700 10 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1005 (0'0,87'1005] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] do_peering_event: epoch_sent: 106 epoch_requested: 106 MInfoRec from 7(0) info: 2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105)
2019-05-07 17:56:23.175 7f7563e29700 10 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1005 (0'0,87'1005] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] state<Started/Stray>: got info from osd.7(0) 2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105)
2019-05-07 17:56:23.175 7f7563e29700 10 rewind_divergent_log truncate divergent future 87'1004
2019-05-07 17:56:23.175 7f7563e29700 20 rewind_divergent_log original_crt = 87'1004
2019-05-07 17:56:23.175 7f7563e29700 10 rewind_divergent_log future divergent 87'1005 (87'1004) delete 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head by client.4331.0:7378 2019-05-07 17:55:47.885437 0
2019-05-07 17:56:23.175 7f7563e29700 20 _merge_object_divergent_entries: merging hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head entries: 87'1005 (87'1004) delete 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head by client.4331.0:7378 2019-05-07 17:55:47.885437 0
2019-05-07 17:56:23.175 7f7563e29700 20 _merge_object_divergent_entries: keeping 87'1005 (87'1004) delete 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head by client.4331.0:7378 2019-05-07 17:55:47.885437 0
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head prior_version: 87'1004 first_divergent_update: 87'1005 last_divergent_update: 87'1005
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head has no more recent entries in log
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head must be rolled back or recovered, attempting to rollback
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head olog_can_rollback_to: 87'1004
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head original_crt: 87'1004
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head rolling back 87'1005 (87'1004) delete 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head by client.4331.0:7378 2019-05-07 17:55:47.885437 0
2019-05-07 17:56:23.175 7f7563e29700 10 _merge_object_divergent_entries: hoid 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head rolled back
2019-05-07 17:56:23.175 7f7563e29700 5 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] exit Started/Stray 2019-05-07 17:56:23.179138 7 0.001081
2019-05-07 17:56:23.175 7f7563e29700 5 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] enter Started/ReplicaActive
2019-05-07 17:56:23.175 7f7563e29700 5 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] enter Started/ReplicaActive/RepNotRecovering
2019-05-07 17:56:23.175 7f7563e29700 10 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] state<Started/ReplicaActive>: In ReplicaActive, about to call activate
2019-05-07 17:56:23.175 7f7563e29700 10 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] activate - no missing, moving last_complete 87'1004 -> 87'1004
2019-05-07 17:56:23.175 7f7563e29700 10 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 crt=87'1004 lcod 0'0 remapped mbc={}] state<Started/ReplicaActive>: Activate Finished
2019-05-07 17:56:23.175 7f7563e29700 6 write_log_and_missing with: dirty_to: 0'0, dirty_from: 87'1005, writeout_from: 4294967295'18446744073709551615, trimmed: , trimmed_dups: , clear_divergent_priors: 0
2019-05-07 17:56:23.175 7f7563e29700 10 bluestore(/var/lib/ceph/osd/ceph-3) queue_transactions ch 0x561e392b8400 2.15s0_head
2019-05-07 17:56:23.175 7f7563e29700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_create osr 0x561e37fcdc20 = 0x561e38f19900 seq 4
2019-05-07 17:56:23.175 7f7563e29700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x561e392b8400) get_onode oid 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# key 0x808000000000000002adc4568321736d'ithi19813912-5#20oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo!='0xfffffffffffffffeffffffffffffffff'o'
2019-05-07 17:56:23.175 7f7563e29700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x561e392b8400) r -2 v.len 0
2019-05-07 17:56:23.175 7f7563e29700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_add_transaction op 13 got ENOENT on 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#
2019-05-07 17:56:23.175 7f7563e29700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x561e392b8400) get_onode oid 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed key 0x808000000000000002adc4568321736d'ithi19813912-5#20oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo!='0xfffffffffffffffe00000000000003ed'o'
2019-05-07 17:56:23.175 7f7563e29700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x561e392b8400) r -2 v.len 0
2019-05-07 17:56:23.175 7f7563e29700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_add_transaction op 38 got ENOENT on 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head#3ed
2019-05-07 17:56:23.175 7f7563e29700 15 bluestore(/var/lib/ceph/osd/ceph-3) _omap_rmkey_range 2.15s0_head 0#2:a8000000::::head#

sub read 87'1004 error

2019-05-07 17:56:23.191 7f755fe21700 10 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 luod=0'0 crt=87'1004 lcod 0'0 active+remapped mbc={}] _handle_message: MOSDECSubOpRead(2.15s0 106/105 ECSubRead(tid=254, to_read={2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head=0,524288,0}, subchunks={2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head=[0,1]}, attrs_to_read=2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head)) v3
2019-05-07 17:56:23.191 7f755fe21700 25 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 luod=0'0 crt=87'1004 lcod 0'0 active+remapped mbc={}] handle_sub_read case1: reading the complete chunk/shard.
2019-05-07 17:56:23.191 7f755fe21700 15 bluestore(/var/lib/ceph/osd/ceph-3) read 2.15s0_head 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# 0x080000
2019-05-07 17:56:23.191 7f755fe21700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x561e392b8400) get_onode oid 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# key 0x808000000000000002adc4568321736d'ithi19813912-5#20oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo!='0xfffffffffffffffeffffffffffffffff'o'
2019-05-07 17:56:23.191 7f755fe21700 20 bluestore(/var/lib/ceph/osd/ceph-3).collection(2.15s0_head 0x561e392b8400) r -2 v.len 0
2019-05-07 17:56:23.191 7f755fe21700 10 bluestore(/var/lib/ceph/osd/ceph-3) read 2.15s0_head 0#2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# 0x0
80000 = -2
2019-05-07 17:56:23.191 7f755fe21700 5 osd.3 pg_epoch: 106 pg[2.15s0( v 87'1004 (0'0,87'1004] local-lis/les=86/87 n=1 ec=86/19 lis/c 86/86 les/c/f 87/87/0 104/105/105) [7,4,3]/[7,4,2147483647]p7(0) r=-1 lpr=105 pi=[86,105)/2 luod=0'0 crt=87'1004 lcod 0'0 active+remapped mbc={}] handle_sub_read: Error -2 reading 2:adc45683:::smithi19813912-5 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head, fast read, probably ok

@stale
Copy link

stale bot commented Jul 8, 2019

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Jul 8, 2019
@liewegas
Copy link
Member

liewegas commented Jul 8, 2019

unstale

@stale stale bot removed the stale label Jul 8, 2019
@stale
Copy link

stale bot commented Sep 6, 2019

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Sep 6, 2019
@stale
Copy link

stale bot commented Dec 5, 2019

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@stale stale bot closed this Dec 5, 2019
@tchaikov tchaikov reopened this Dec 5, 2019
@stale stale bot removed the stale label Dec 5, 2019
@stale
Copy link

stale bot commented Feb 3, 2020

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Feb 3, 2020
@stale
Copy link

stale bot commented May 3, 2020

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@stale stale bot closed this May 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants