Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upmimic: osd/PG: fix last_complete re-calculation on splitting #28259
Conversation
We add hard-limit for pg_logs now, which means we might keep trimming old log entries irrespective of pg's current missing_set. This as a result can cause the last_complete pointer moving far ahead of the real on-disk version (the oldest need of missing_set, for instance) the corresponding pg should have on splitting: ``` 2019-04-19 06:41:52.559247 7efd4725c700 10 osd.2 271 Splitting pg[5.6( v 270'943 lc 0'0 (238'300,270'943] local-lis/les=250/251 n=943 ec=223/223 lis/c 250/223 les/251/224/0 250/271/229) [5,2] r=1 lpr=271 pi=[223,271)/4 crt=270'943 unknown NOTIFY m=518 mbc={}] into 5.16 2019-04-19 06:41:52.561413 7efd4725c700 10 osd.2 pg_epoch: 271 pg[5.6( v 270'943 lc 238'300 (238'300,270'943] local-lis/les=250/251 n=943 ec=223/223 lis/c 250/223 c/f 251/224/0 250/271/229) [5,2] r=1 lpr=271 pi=[223,271)/4 crt=270'943 lcod 0'0 unknown NOTIFY m=261 mbc={}] release_backoffs [MIN,MAX) ``` For the above example, parent's last_complete cursor changed from **0'0** to **238'300** directly due to the effort of trying to catch up the oldest log entry changing when splitting was done. However, back into v12.2.9 primary would still reference shard's last_complete field when trying to figure out all possible locations of a currently missing object (see PG::MissingLoc::add_source_info): ```c++ if (oinfo.last_complete < need) { if (omissing.is_missing(soid)) { ldout(pg->cct, 10) << "search_for_missing " << soid << " " << need << " also missing on osd." << fromosd << dendl; continue; } } ``` Hence a wrongly calculated last_complete could then make primary mis-consider that a specific shard might have the authoritative object it currently looking for: ``` 2019-04-19 06:41:52.904163 7fd4cfb5a700 10 osd.5 pg_epoch: 271 pg[5.6( v 270'943 lc 238'300 (238'300,270'943] local-lis/les=250/251 n=471 ec=223/223 lis/c 250/223 les/ c/f 251/224/0 250/271/229) [5,2] r=0 lpr=271 pi=[223,271)/4 crt=270'943 lcod 226'77 mlcod 0'0 peering m=16 mbc={}] proc_replica_log for osd.2: 5.6( v 270'943 lc 238'30 0 (238'300,270'943] local-lis/les=250/251 n=471 ec=223/223 lis/c 250/223 les/c/f 251/224/0 250/271/229) log((249'563,270'943], crt=270'943) missing(261 may_include_del etes = 1) 2019-04-19 06:41:52.904645 7fd4cfb5a700 20 osd.5 pg_epoch: 271 pg[5.6( v 270'943 lc 238'300 (238'300,270'943] local-lis/les=250/251 n=471 ec=223/223 lis/c 250/223 les/ c/f 251/224/0 250/271/229) [5,2] r=0 lpr=271 pi=[223,271)/4 crt=270'943 lcod 226'77 mlcod 0'0 peering m=16 mbc={}] after missing 5:624c3a7a:::benchmark_data_smithi190 _39968_object1382:head need 226'110 have 0'0 2019-04-19 06:41:53.567820 7fd4d035b700 10 osd.5 pg_epoch: 272 pg[5.6( v 270'943 lc 0'0 (238'300,270'943] local-lis/les=271/272 n=471 ec=223/223 lis/c 250/223 les/c/f 251/224/0 250/271/229) [5,2] r=0 lpr=271 pi=[223,271)/4 crt=270'943 lcod 226'77 mlcod 0'0 unknown m=16 u=13 mbc={255={(1+0)=220,(2+0)=28}}] search_for_missing 5:624c3a 7a:::benchmark_data_smithi190_39968_object1382:head 226'110 is on osd.2 ``` note that ```5:624c3a7a:::benchmark_data_smithi190_39968_object1382:head 226'110``` was actually missing on both primary and shard osd.2 whereas primary insisted that object should exist on shard osd.2! #26175 posted an indirect fix for the above problem by ignoring last_complete when checking the missing set, but it should generally make more sense to fill in the last_complete field correctly whenever possible. Hence coming this additional fix. Fixes: http://tracker.ceph.com/issues/26958 Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn> (cherry picked from commit aad5d47)
This comment has been minimized.
This comment has been minimized.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
pdvian commentedMay 28, 2019
http://tracker.ceph.com/issues/39538