Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/PrimaryLogPG: fix recovering hang when have unfound objects #16558

Merged

Conversation

Projects
None yet
4 participants
@hjwsm1989
Copy link
Contributor

hjwsm1989 commented Jul 25, 2017

pg_log.get_missing() return type is pg_missing_tracker_t,
if we use:
const pg_missing_t &missing = pg_log.get_missing();
the reference object(missing) will not change when we changed the
pglog's missing set in recover_primary()/recover_got(),which will
result the recovering hang.

Signed-off-by: huangjun huangjun@xsky.com

huangjun
osd/PrimaryLogPG: fix recovering hang when have unfound objects
  pg_log.get_missing() return type is pg_missing_tracker_t,
  if we use:
    const pg_missing_t &missing = pg_log.get_missing();
  the reference object(missing) will not change when we changed the
  pglog's missing set in recover_primary()/recover_got(),which will
  result the recovering hang.

  Signed-off-by: huangjun <huangjun@xsky.com>
@hjwsm1989

This comment has been minimized.

Copy link
Contributor Author

hjwsm1989 commented Jul 25, 2017

retest this please

@jdurgin

This comment has been minimized.

Copy link
Member

jdurgin commented Jul 25, 2017

Which code path will result in a recovery hang?

@hjwsm1989

This comment has been minimized.

Copy link
Contributor Author

hjwsm1989 commented Jul 25, 2017

we met pg state hang in recovering when doing lost_unfound.py test.
and got the werid log like this:
2017-07-09 16:14:06.882851 7f2c28830700 10 osd.0 pg_epoch: 24 pg[1.4( v 24'1288 (12'0,24'1288] local-les=24 n=1246 ec=8 les/c/f 24/15/0 22/23/18) [0,2]/[0] r=0 lpr=23 pi=8-22/5 bft=2 crt=24'1288 lcod 13'6 mlcod 13'6 active+recovering+undersized+degraded+remapped] last_complete now 24'1288 log.complete_to at end
2017-07-09 16:14:06.883383 7f2c28830700 -1 log_channel(cluster) log [ERR] : 1.4 recovery ending with 1: {1:227e0c77:::existing_7:head=24'895(13'6)}

the dout() of pg show there is no missing and unfound objects, but the missing.num_missing() > 0;

@scienceluo

This comment has been minimized.

Copy link
Contributor

scienceluo commented Jul 25, 2017

Hi huang, which version?

@hjwsm1989

This comment has been minimized.

Copy link
Contributor Author

hjwsm1989 commented Jul 25, 2017

retest this please

@jdurgin

This comment has been minimized.

Copy link
Member

jdurgin commented Jul 25, 2017

Ah, I see what you mean. Nice debugging! @scienceluo this would only affect kraken and later, where the missing_set was refactored with different types to allow persisting it.

@yuriw yuriw merged commit 6942e39 into ceph:master Jul 27, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details

@hjwsm1989 hjwsm1989 deleted the hjwsm1989:fix-recovering-hang-with-unfound-objects branch Jul 27, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.