New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wip rebuild monstore #10933
Wip rebuild monstore #10933
Conversation
ab164e2
to
f6e8134
Compare
@@ -33,6 +33,7 @@ install(PROGRAMS | |||
|
|||
add_executable(ceph-objectstore-tool | |||
ceph_objectstore_tool.cc | |||
rebuild_mondb.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably also needs to go in automake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do. thought we were ditching off automake =)
@gregsfortytwo Can you take a look? |
Recovery using OSDs | ||
------------------- | ||
|
||
But what if all monitors fails at the same time? Since users are encouraged to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/fails/fail/
Paging @jecluis as well. ;) |
Symptoms of store corruption | ||
---------------------------- | ||
|
||
Ceph monitor stores the `cluster map`_ in a key/value store. If a monitor in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to say something like "a key/value store such as LevelDB".
I realize it's configurable and we might switch to RocksDB, but I think in the medium term anybody looking at this is going to know that LevelDB got messed up and not seeing that this is a repair tool for that case will confuse them. (Avoiding the name is especially weird since we are including LevelDB error messages below.)
Still reading code, but I don't see any tests for this. I just know it's going to break every time we change storage stuff (like with ceph-mgr or something, for instance) and we aren't going to notice if the nightlies don't tell us. So a basic scenario where we append a "rebuild" task at the end of one (or several) normal RADOS tests to start with. At least one where we have an OSD we deliberately keep out-of-date like Sam mentioned above. One we'll write when the FS repair stuff is done. Perhaps others but that's all I've got off the top of my head. |
const OSDSuperblock& sb, | ||
MonitorDBStore& ms) | ||
{ | ||
// stolen from AuthMonitor::prepare_command(), where prefix is "auth add" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like you actually copied the functions? If so we should definitely pull them out into a shared cc file or something, rather than having two copies that can diverge with different behavior or bugs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really, i adapted them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregsfortytwo can we leave the refactor for another PR? as i need to restructure monitor carefully to cater for the needs on both sides.
Otherwise looks fine to me (although I only skimmed the pgmap updates). |
f6e8134
to
5e09f33
Compare
all comments addressed except for #10933 (comment) and #10933 (comment). i am working on a ceph-qa-suite task to
|
jenkins, retest this please. |
b401aea
to
2b5e2d2
Compare
changelog
|
test added at ceph/ceph-qa-suite#1169 |
f5cfc21
to
053f5a9
Compare
changelog
@athanatos could you take a look again? thanks! |
unsigned ntrimmed = 0; | ||
{ | ||
auto t = make_shared<MonitorDBStore::Transaction>(); | ||
for (auto e = first_committed; e && e < sb.oldest_map; e++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why e && e < oldest_map? What does first_committed = 0 mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is not such a k/v pair in store.db, ms.get(prefix, key)
will simply return 0. this creates an illusion for ceph-objectstore-tool that there is always an osdmap#0 in store.db even there is nothing in it. and in real world, the osdmap starts at 1. i will accompany the code with some comments.
b7cd123
to
8ad055e
Compare
changelog
@athanatos mind taking a look again? thanks as always. |
for (auto e = first_committed; first_committed && e < sb.oldest_map; e++) { | ||
t->erase(prefix, e); | ||
t->erase(prefix, ms.combine_strings("full", e)); | ||
t->put(prefix, first_committed_name, e + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably do this once at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
8ad055e
to
cee230e
Compare
changelog
|
cee230e
to
0fddc11
Compare
changelog
|
LGTM, @gregsfortytwo One last look? |
per the discussion with @jecluis
@jecluis i reconsidered you suggestion on pulling only the meta directory, i think it would be easier to just run this tool on the OSD side:
|
0fddc11
to
f1b45f5
Compare
so ceph-objectstore-tool is able to use it when rebuilding monitor db. Fixes: http://tracker.ceph.com/issues/17179 Signed-off-by: Kefu Chai <kchai@redhat.com>
Fixes: http://tracker.ceph.com/issues/17179 Signed-off-by: Kefu Chai <kchai@redhat.com>
Fixes: http://tracker.ceph.com/issues/17179 Signed-off-by: Kefu Chai <kchai@redhat.com>
document the process to recover from leveldb corruption. Fixes: http://tracker.ceph.com/issues/17179 Signed-off-by: Kefu Chai <kchai@redhat.com>
f1b45f5
to
79a9f29
Compare
changelog
|
test passed at http://pulpito.ceph.com/kchai-2016-09-15_15:48:11-rados:singleton-wip-kefu-testing---basic-mira/ . @jecluis @athanatos @gregsfortytwo, thanks for your reviews. is this changeset good to merge? |
It looks good to me, it would probably be good to get a final ack from @gregsfortytwo . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks great to me. Reviewed-by:
Thanks @tchaikov! :)
Fixes: http://tracker.ceph.com/issues/17179