New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix asserts caused by DNE pgs left behind after lots of OSD restarts #20571
Conversation
So all these problems were caused by a partly-imported PG? Several of the backtraces on that ticket are once the OSD is already running, after OSD::handle_advance_map gets into PG::start_peering_interval. (And one of them reports it migrated to other OSDs, as they started removing OSDs...but maybe there was a whole stale set which had started but not finished deleting the PG?) Are you sure we want to delete the collection, though? Or I guess if |
@gregsfortytwo Yes, a DNE PG is empty unless this is a pg_info_t::pg_history_t corruption. We've noticed these DNE PGs after many crashes which we assumed was newly started backfill or recovery. I didn't figure out what would leave behind a newly created PG in DNE state (split?). I tested my fix by hacking ceph-objectstore-tool to clear epoch_created. |
@gregsfortytwo Per recent logs for crash start_peering_interval() assert, the DNE pg is also appears empty because last_update.version == 0. 2018-02-14 03:05:02.468466 7f92379c7700 10 osd.46 pg_epoch: 40258 pg[7.77s5( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c 0/0 les/c/f 0/0/0 0/0/0) [51,41,53,23,25,9,58,31]/[51,2147483647,53,33,38,59,2147483647,4] r=-1 lpr=40258 pi=[28396,40202)/9 crt=0'0 unknown NOTIFY] Not blocking outgoing recovery messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a simple enough fix, then. Just one nit.
src/osd/OSD.cc
Outdated
service.pg_remove_epoch(pg->pg_id); | ||
pg->unlock(); | ||
// Delete pg | ||
RWLock::WLocker l(pg_map_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it matters at this point in the boot process, but we need to drop this before invoking recursive_remove_collection() since that prompts disk accesses. I'd just put a block around the pg_map modifying bits.
@gregsfortytwo Dropped pg_map_lock before recursive_remove_collection() call. |
LGTM |
{ | ||
// Delete pg | ||
RWLock::WLocker l(pg_map_lock); | ||
auto p = pg_map.find(pg->get_pgid()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a pg is returned by _open_lock_pg()
, i think we can assume that this pg is always added to pg_map
, am i right? so this check is not necessary, and can be replaced with an assert() i guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchaikov Changed to an assert
Fixes: http://tracker.ceph.com/issues/21833 Signed-off-by: David Zafman <dzafman@redhat.com>
@tchaikov I don't know that another QA run is really necessary. This change passed make check and run-standalone.sh. I don't think rados suite would even go through this code path. I'll manually test this again, remove needs-qa and then merge. |
No description provided.