Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
osd/PG: fix last_complete re-calculation on splitting
We add hard-limit for pg_logs now, which means we might keep trimming old log entries irrespective of pg's current missing_set. This as a result can cause the last_complete pointer moving far ahead of the real on-disk version (the oldest need of missing_set, for instance) the corresponding pg should have on splitting: ``` 2019-04-19 06:41:52.559247 7efd4725c700 10 osd.2 271 Splitting pg[5.6( v 270'943 lc 0'0 (238'300,270'943] local-lis/les=250/251 n=943 ec=223/223 lis/c 250/223 les/251/224/0 250/271/229) [5,2] r=1 lpr=271 pi=[223,271)/4 crt=270'943 unknown NOTIFY m=518 mbc={}] into 5.16 2019-04-19 06:41:52.561413 7efd4725c700 10 osd.2 pg_epoch: 271 pg[5.6( v 270'943 lc 238'300 (238'300,270'943] local-lis/les=250/251 n=943 ec=223/223 lis/c 250/223 c/f 251/224/0 250/271/229) [5,2] r=1 lpr=271 pi=[223,271)/4 crt=270'943 lcod 0'0 unknown NOTIFY m=261 mbc={}] release_backoffs [MIN,MAX) ``` For the above example, parent's last_complete cursor changed from **0'0** to **238'300** directly due to the effort of trying to catch up the oldest log entry changing when splitting was done. However, back into v12.2.9 primary would still reference shard's last_complete field when trying to figure out all possible locations of a currently missing object (see PG::MissingLoc::add_source_info): ```c++ if (oinfo.last_complete < need) { if (omissing.is_missing(soid)) { ldout(pg->cct, 10) << "search_for_missing " << soid << " " << need << " also missing on osd." << fromosd << dendl; continue; } } ``` Hence a wrongly calculated last_complete could then make primary mis-consider that a specific shard might have the authoritative object it currently looking for: ``` 2019-04-19 06:41:52.904163 7fd4cfb5a700 10 osd.5 pg_epoch: 271 pg[5.6( v 270'943 lc 238'300 (238'300,270'943] local-lis/les=250/251 n=471 ec=223/223 lis/c 250/223 les/ c/f 251/224/0 250/271/229) [5,2] r=0 lpr=271 pi=[223,271)/4 crt=270'943 lcod 226'77 mlcod 0'0 peering m=16 mbc={}] proc_replica_log for osd.2: 5.6( v 270'943 lc 238'30 0 (238'300,270'943] local-lis/les=250/251 n=471 ec=223/223 lis/c 250/223 les/c/f 251/224/0 250/271/229) log((249'563,270'943], crt=270'943) missing(261 may_include_del etes = 1) 2019-04-19 06:41:52.904645 7fd4cfb5a700 20 osd.5 pg_epoch: 271 pg[5.6( v 270'943 lc 238'300 (238'300,270'943] local-lis/les=250/251 n=471 ec=223/223 lis/c 250/223 les/ c/f 251/224/0 250/271/229) [5,2] r=0 lpr=271 pi=[223,271)/4 crt=270'943 lcod 226'77 mlcod 0'0 peering m=16 mbc={}] after missing 5:624c3a7a:::benchmark_data_smithi190 _39968_object1382:head need 226'110 have 0'0 2019-04-19 06:41:53.567820 7fd4d035b700 10 osd.5 pg_epoch: 272 pg[5.6( v 270'943 lc 0'0 (238'300,270'943] local-lis/les=271/272 n=471 ec=223/223 lis/c 250/223 les/c/f 251/224/0 250/271/229) [5,2] r=0 lpr=271 pi=[223,271)/4 crt=270'943 lcod 226'77 mlcod 0'0 unknown m=16 u=13 mbc={255={(1+0)=220,(2+0)=28}}] search_for_missing 5:624c3a 7a:::benchmark_data_smithi190_39968_object1382:head 226'110 is on osd.2 ``` note that ```5:624c3a7a:::benchmark_data_smithi190_39968_object1382:head 226'110``` was actually missing on both primary and shard osd.2 whereas primary insisted that object should exist on shard osd.2! #26175 posted an indirect fix for the above problem by ignoring last_complete when checking the missing set, but it should generally make more sense to fill in the last_complete field correctly whenever possible. Hence coming this additional fix. Fixes: http://tracker.ceph.com/issues/26958 Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn> (cherry picked from commit aad5d47)
- Loading branch information