Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/ObjectStore: properly clear object map when replaying OP_REMOVE #11388

Merged
merged 2 commits into from Oct 24, 2016

Conversation

ukernel
Copy link
Contributor

@ukernel ukernel commented Oct 10, 2016

To remove an object, filestore needs to unlink corresponding object
file from filesystem and removes corresponding object keys from
DBObjectMap. When replaying OP_REMOVE operation, it's possible the
operation has completed partially, object file has been deleted, but
object keys in DBObjectMap hasn't.

The fix is force clear object keys if object file does not exists

Fixes: http://tracker.ceph.com/issues/17177
Signed-off-by: Yan, Zheng zyan@redhat.com

fdcache.clear(o);
return 0;
} else if (hardlink == 1) {
if (hardlink == 0 || hardlink == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we always clear omap first, and then unlink the object file. so, if the hardlink is 1 here, we should have cleared the omap already, am i right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? we get object file's hardlink count before clearing omap and unlinking object file. hardlink == 1 is the most common case

Copy link
Member

@liewegas liewegas Oct 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to clarify the situation is

  • clear omap
  • unlink
  • unlink persists to disk, but omap does not
  • crash
  • replay sees hardlink 0 and doesn't clear omap

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

To remove an object, filestore needs to unlink corresponding object
file from filesystem and removes corresponding object keys from
DBObjectMap. When replaying OP_REMOVE operation, it's possible the
operation has completed partially, object file has been deleted, but
object keys in DBObjectMap hasn't.

The fix is force clear object keys if object file does not exists

Fixes: http://tracker.ceph.com/issues/17177
Signed-off-by: Yan, Zheng <zyan@redhat.com>
…_RENAME

FileStore::_close_replay_guard does not sync the object map. If OSD
crashes while executing FileStore::_collection_move_rename, it's
possible that the replay guard is set, but the object map map update
gets lost. When recovering, OSD checks the replay guard and does
nothing.

The fix is sync the object map in FileStore::_close_replay_guard()

Signed-off-by: Yan, Zheng <zyan@redhat.com>
@ukernel
Copy link
Contributor Author

ukernel commented Oct 14, 2016

ping @athanatos @liewegas @tchaikov

@liewegas
Copy link
Member

lgtm!

@athanatos
Copy link
Contributor

lgtm! (Whoa, good catch!)

@tchaikov
Copy link
Contributor

lgtm also.

@badone
Copy link
Contributor

badone commented Oct 18, 2016

Thanks @ukernel Hopefully we can get this merged soon.

@yuriw yuriw merged commit 73a1b45 into ceph:master Oct 24, 2016
@ukernel ukernel deleted the wip-17177 branch January 12, 2017 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants