New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: Corrupt objects stop snaptrim and mark pg snaptrim_error #15635

Merged
merged 2 commits into from Jun 28, 2017

Conversation

Projects
None yet
3 participants
@dzafman
Member

dzafman commented Jun 12, 2017

No description provided.

@dzafman

This comment has been minimized.

Member

dzafman commented Jun 12, 2017

In Progress

ldout(pg->cct, 10) << "waiting for it to clear"
<< dendl;
return transit< WaitRWLock >();
} else {
// XXX: This can't happen anymore

This comment has been minimized.

@liewegas

liewegas Jun 12, 2017

Member

remove it then?

// XXX: Caller doesn't expect this
if (obc->ssc == NULL)
return ObjectContextRef(); // -ENOENT!

This comment has been minimized.

@liewegas

liewegas Jun 12, 2017

Member

maybe derr here? it makes me a bit nervous to turn this into silently an ENOENT when we can't load the snapset.

@liewegas liewegas added the core label Jun 20, 2017

osd: On errors during snaptrim stop so pg can be repaired
Fix get_object_context() to return errors for trim_object()
Have trim_object() return errors to trimming caller
Add pg state snaptrim_error to indicate that a trim has been stopped
Remove queue_snap_trim bool, just check snap_trimq after scrub finishes
Catch errors from snapset decoding in get_snapset_context()

Fixes: http://tracker.ceph.com/issues/13837

Signed-off-by: David Zafman <dzafman@redhat.com>

@dzafman dzafman changed the title from DNM: osd: Skip really corrupt objects in trim_object() to osd: Skip really corrupt objects in trim_object() Jun 23, 2017

@dzafman

This comment has been minimized.

Member

dzafman commented Jun 23, 2017

Remaining issue: If a "snapset" is missing and repair fixes it, the object context cache isn't fixed, so that snap trimmer still can't complete. A reboot of the osd cleans up the pg.

@dzafman

This comment has been minimized.

Member

dzafman commented Jun 23, 2017

retest this please

@dzafman dzafman changed the title from osd: Skip really corrupt objects in trim_object() to osd: Corrupt objects stop snaptrim and mark pg snaptrim_error Jun 23, 2017

osd: Clear object context cache to get repair information
Signed-off-by: David Zafman <dzafman@redhat.com>
@dzafman

This comment has been minimized.

Member

dzafman commented Jun 28, 2017

dzafman-2017-06-26_14:07:20-rados-wip-13837-distro-basic-smithi

Testing passed.

The Shaman "Ubuntu Xenial notcmalloc" build failed, thus the failed package fetches

Failed
1328151 Failed to fetch package
1328164 "cephtest/swift/test/functional -v -a ''!fails_on_rgw''" seen in master
1328203 Out of space Tracker #20422
1328248 Failed to fetch package
1328254 cp: cannot stat ‘/var/log/audit/audit.log’: No such file or directory
1328282 Out of space Tracker #20422
1328316 Out of space Tracker #20422
1328339 Failed to fetch package
1328360 Failed to fetch package

Dead
1328310 Out of space tracker #20422
1328370 Stuck waiting for clean early on. Tracker #20439

@jdurgin jdurgin merged commit 27569da into ceph:master Jun 28, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
default Build finished.
Details
make check make check succeeded
Details

@dzafman dzafman deleted the dzafman:wip-13837 branch Jun 29, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment