Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic: osd/PG: fix last_complete re-calculation on splitting #28259

Merged
merged 1 commit into from Jul 1, 2019

Conversation

@pdvian
Copy link

pdvian commented May 28, 2019

We add hard-limit for pg_logs now, which means we might keep trimming
old log entries irrespective of pg's current missing_set. This as a
result can cause the last_complete pointer moving far ahead of the real
on-disk version (the oldest need of missing_set, for instance) the
corresponding pg should have on splitting:

```
2019-04-19 06:41:52.559247 7efd4725c700 10 osd.2 271 Splitting pg[5.6( v 270'943 lc 0'0 (238'300,270'943] local-lis/les=250/251 n=943 ec=223/223 lis/c 250/223 les/251/224/0 250/271/229) [5,2] r=1 lpr=271 pi=[223,271)/4 crt=270'943 unknown NOTIFY m=518 mbc={}] into 5.16
2019-04-19 06:41:52.561413 7efd4725c700 10 osd.2 pg_epoch: 271 pg[5.6( v 270'943 lc 238'300 (238'300,270'943] local-lis/les=250/251 n=943 ec=223/223 lis/c 250/223 c/f 251/224/0 250/271/229) [5,2] r=1 lpr=271 pi=[223,271)/4 crt=270'943 lcod 0'0 unknown NOTIFY m=261 mbc={}] release_backoffs [MIN,MAX)
```

For the above example, parent's last_complete cursor changed from **0'0** to
**238'300** directly due to the effort of trying to catch up the oldest
log entry changing when splitting was done. However, back into v12.2.9 primary
would still reference shard's last_complete field when trying to figure out all
possible locations of a currently missing object (see PG::MissingLoc::add_source_info):

```c++
  if (oinfo.last_complete < need) {
    if (omissing.is_missing(soid)) {
      ldout(pg->cct, 10) << "search_for_missing " << soid << " " << need
                         << " also missing on osd." << fromosd << dendl;
      continue;
    }
  }
```

Hence a wrongly calculated last_complete could then make primary mis-consider
that a specific shard might have the authoritative object it currently
looking for:

```
2019-04-19 06:41:52.904163 7fd4cfb5a700 10 osd.5 pg_epoch: 271 pg[5.6( v 270'943 lc 238'300 (238'300,270'943] local-lis/les=250/251 n=471 ec=223/223 lis/c 250/223 les/
c/f 251/224/0 250/271/229) [5,2] r=0 lpr=271 pi=[223,271)/4 crt=270'943 lcod 226'77 mlcod 0'0 peering m=16 mbc={}] proc_replica_log for osd.2: 5.6( v 270'943 lc 238'30
0 (238'300,270'943] local-lis/les=250/251 n=471 ec=223/223 lis/c 250/223 les/c/f 251/224/0 250/271/229) log((249'563,270'943], crt=270'943) missing(261 may_include_del
etes = 1)
2019-04-19 06:41:52.904645 7fd4cfb5a700 20 osd.5 pg_epoch: 271 pg[5.6( v 270'943 lc 238'300 (238'300,270'943] local-lis/les=250/251 n=471 ec=223/223 lis/c 250/223 les/
c/f 251/224/0 250/271/229) [5,2] r=0 lpr=271 pi=[223,271)/4 crt=270'943 lcod 226'77 mlcod 0'0 peering m=16 mbc={}]  after missing 5:624c3a7a:::benchmark_data_smithi190
_39968_object1382:head need 226'110 have 0'0
2019-04-19 06:41:53.567820 7fd4d035b700 10 osd.5 pg_epoch: 272 pg[5.6( v 270'943 lc 0'0 (238'300,270'943] local-lis/les=271/272 n=471 ec=223/223 lis/c 250/223 les/c/f
251/224/0 250/271/229) [5,2] r=0 lpr=271 pi=[223,271)/4 crt=270'943 lcod 226'77 mlcod 0'0 unknown m=16 u=13 mbc={255={(1+0)=220,(2+0)=28}}] search_for_missing 5:624c3a
7a:::benchmark_data_smithi190_39968_object1382:head 226'110 is on osd.2
```

note that ```5:624c3a7a:::benchmark_data_smithi190_39968_object1382:head 226'110```
was actually missing on both primary and shard osd.2 whereas primary insisted that
object should exist on shard osd.2!

#26175 posted an indirect fix
for the above problem by ignoring last_complete when checking the missing set,
but it should generally make more sense to fill in the last_complete field correctly
whenever possible.
Hence coming this additional fix.

Fixes: http://tracker.ceph.com/issues/26958
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit aad5d47)
@tchaikov tchaikov added this to the mimic milestone May 28, 2019
@smithfarm smithfarm added the core label May 28, 2019
@smithfarm smithfarm requested review from jdurgin and neha-ojha May 28, 2019
@yuriw

This comment has been minimized.

Copy link
Contributor

yuriw commented Jun 26, 2019

@yuriw yuriw merged commit f2c799a into ceph:mimic Jul 1, 2019
4 checks passed
4 checks passed
Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.