Wip rebuild monstore #10933

tchaikov · 2016-08-31T10:24:55Z

Fixes: http://tracker.ceph.com/issues/17179

athanatos · 2016-08-31T14:26:27Z

src/tools/CMakeLists.txt

@@ -33,6 +33,7 @@ install(PROGRAMS

 add_executable(ceph-objectstore-tool
  ceph_objectstore_tool.cc
+  rebuild_mondb.cc


Probably also needs to go in automake?

will do. thought we were ditching off automake =)

athanatos · 2016-08-31T14:26:58Z

@gregsfortytwo Can you take a look?

tchaikov · 2016-08-31T15:26:23Z

doc/rados/troubleshooting/troubleshooting-mon.rst

+Recovery using OSDs
+-------------------
+
+But what if all monitors fails at the same time? Since users are encouraged to


s/fails/fail/

gregsfortytwo · 2016-08-31T22:05:52Z

Paging @jecluis as well. ;)

gregsfortytwo · 2016-08-31T22:08:08Z

doc/rados/troubleshooting/troubleshooting-mon.rst

+Symptoms of store corruption
+----------------------------
+
+Ceph monitor stores the `cluster map`_ in a key/value store. If a monitor in a


We might want to say something like "a key/value store such as LevelDB".

I realize it's configurable and we might switch to RocksDB, but I think in the medium term anybody looking at this is going to know that LevelDB got messed up and not seeing that this is a repair tool for that case will confuse them. (Avoiding the name is especially weird since we are including LevelDB error messages below.)

gregsfortytwo · 2016-08-31T22:41:49Z

Still reading code, but I don't see any tests for this. I just know it's going to break every time we change storage stuff (like with ceph-mgr or something, for instance) and we aren't going to notice if the nightlies don't tell us.

So a basic scenario where we append a "rebuild" task at the end of one (or several) normal RADOS tests to start with. At least one where we have an OSD we deliberately keep out-of-date like Sam mentioned above. One we'll write when the FS repair stuff is done. Perhaps others but that's all I've got off the top of my head.

gregsfortytwo · 2016-08-31T22:57:18Z

src/tools/rebuild_mondb.cc

+                       const OSDSuperblock& sb,
+                       MonitorDBStore& ms)
+{
+  // stolen from AuthMonitor::prepare_command(), where prefix is "auth add"


It sounds like you actually copied the functions? If so we should definitely pull them out into a shared cc file or something, rather than having two copies that can diverge with different behavior or bugs!

not really, i adapted them.

@gregsfortytwo can we leave the refactor for another PR? as i need to restructure monitor carefully to cater for the needs on both sides.

gregsfortytwo · 2016-08-31T23:22:42Z

Otherwise looks fine to me (although I only skimmed the pgmap updates).

tchaikov · 2016-09-01T08:46:39Z

all comments addressed except for #10933 (comment) and #10933 (comment).

i am working on a ceph-qa-suite task to

nuke all store.db
rebuild one from OSDs
restore it to the first mon, mkfs on other mons
revive them

tchaikov · 2016-09-07T15:24:43Z

jenkins, retest this please.

tchaikov · 2016-09-08T11:57:17Z

changelog

rebased against master
ceph-monstore-tool: instead of getting the client.admin key and add an almighty keyring, import all keyrings from the --keyring file, to ready the mon.

tchaikov · 2016-09-08T13:38:20Z

test added at ceph/ceph-qa-suite#1169

tchaikov · 2016-09-12T02:35:59Z

changelog

rebase against master to resolve conflicts.

@athanatos could you take a look again? thanks!

athanatos · 2016-09-12T06:10:58Z

src/tools/rebuild_mondb.cc

+  unsigned ntrimmed = 0;
+  {
+    auto t = make_shared<MonitorDBStore::Transaction>();
+    for (auto e = first_committed; e && e < sb.oldest_map; e++) {


Why e && e < oldest_map? What does first_committed = 0 mean?

if there is not such a k/v pair in store.db, ms.get(prefix, key) will simply return 0. this creates an illusion for ceph-objectstore-tool that there is always an osdmap#0 in store.db even there is nothing in it. and in real world, the osdmap starts at 1. i will accompany the code with some comments.

tchaikov · 2016-09-12T08:54:31Z

changelog

add comment explaining why we don't want to trim osdmap#0.
do not apply transaction if it's empty.
make sure the store.db is consistent after each transaction adding osdmaps.

@athanatos mind taking a look again? thanks as always.

athanatos · 2016-09-12T09:19:34Z

src/tools/rebuild_mondb.cc

+  for (auto e = first_committed; first_committed && e < sb.oldest_map; e++) {
+    t->erase(prefix, e);
+    t->erase(prefix, ms.combine_strings("full", e));
+    t->put(prefix, first_committed_name, e + 1);


Probably do this once at the end?

tchaikov · 2016-09-12T09:56:12Z

changelog

update the "first_committed" once in the end.

tchaikov · 2016-09-12T14:15:24Z

changelog

remove trailing spaces.

@athanatos

tchaikov · 2016-09-12T18:19:01Z

test passed at http://pulpito.front.sepia.ceph.com:80/kchai-2016-09-12_18:18:40-rados:singleton-wip-kefu-testing---basic-mira/

athanatos · 2016-09-13T03:50:33Z

LGTM, @gregsfortytwo One last look?

tchaikov · 2016-09-14T13:59:33Z

per the discussion with @jecluis

~~import from OSD's keyring also (already implemented, and documented). the path is hardwired to ${data_path}/keyring.
check the crc of osdmaps
consider rsync'ing and using the meta directory only.

@jecluis i reconsidered you suggestion on pulling only the meta directory, i think it would be easier to just run this tool on the OSD side:

from the developer's point of view: we can reuse the interface exposed by ObjectStore. so it works even with bluestore.
from the user's point of view: yes, it's unnecessary to copy the store.db back and forth. but the process is scriptable. so i guess it's less painful than it sounds when performing disaster recovery. if it takes 5 seconds to collects an OSD, if there are 1000 OSD instances, it would take less than 1.5 hours to rebuild the mon db.

so ceph-objectstore-tool is able to use it when rebuilding monitor db. Fixes: http://tracker.ceph.com/issues/17179 Signed-off-by: Kefu Chai <kchai@redhat.com>

Fixes: http://tracker.ceph.com/issues/17179 Signed-off-by: Kefu Chai <kchai@redhat.com>

document the process to recover from leveldb corruption. Fixes: http://tracker.ceph.com/issues/17179 Signed-off-by: Kefu Chai <kchai@redhat.com>

tchaikov · 2016-09-15T09:57:57Z

changelog

rebase against master.
check the crc of osdmaps before putting them into store.db.

tchaikov · 2016-09-15T16:55:02Z

test passed at http://pulpito.ceph.com/kchai-2016-09-15_15:48:11-rados:singleton-wip-kefu-testing---basic-mira/ .

@jecluis @athanatos @gregsfortytwo, thanks for your reviews.

is this changeset good to merge?

athanatos · 2016-09-16T04:23:53Z

It looks good to me, it would probably be good to get a final ack from @gregsfortytwo .

gregsfortytwo

This all looks great to me. Reviewed-by:

Thanks @tchaikov! :)

tchaikov added feature tools labels Aug 31, 2016

tchaikov assigned athanatos Aug 31, 2016

tchaikov force-pushed the wip-rebuild-monstore branch 3 times, most recently from ab164e2 to f6e8134 Compare August 31, 2016 11:13

athanatos reviewed Aug 31, 2016
View reviewed changes

athanatos assigned gregsfortytwo Aug 31, 2016

tchaikov reviewed Aug 31, 2016
View reviewed changes

gregsfortytwo assigned jecluis Aug 31, 2016

gregsfortytwo reviewed Aug 31, 2016
View reviewed changes

tchaikov force-pushed the wip-rebuild-monstore branch from f6e8134 to 5e09f33 Compare September 1, 2016 08:43

tchaikov force-pushed the wip-rebuild-monstore branch 2 times, most recently from b401aea to 2b5e2d2 Compare September 8, 2016 11:54

tchaikov mentioned this pull request Sep 8, 2016

tasks: add rebuild_mondb ceph/ceph-qa-suite#1169

Merged

tchaikov force-pushed the wip-rebuild-monstore branch from f5cfc21 to 053f5a9 Compare September 12, 2016 02:35

athanatos reviewed Sep 12, 2016
View reviewed changes

tchaikov force-pushed the wip-rebuild-monstore branch 2 times, most recently from b7cd123 to 8ad055e Compare September 12, 2016 08:53

athanatos reviewed Sep 12, 2016
View reviewed changes

tchaikov force-pushed the wip-rebuild-monstore branch from 8ad055e to cee230e Compare September 12, 2016 09:55

tchaikov force-pushed the wip-rebuild-monstore branch from cee230e to 0fddc11 Compare September 12, 2016 14:12

tchaikov force-pushed the wip-rebuild-monstore branch from 0fddc11 to f1b45f5 Compare September 15, 2016 07:39

tchaikov added 4 commits September 15, 2016 17:56

mon/AuthMonitor: make AuthMonitor::IncType public

19ef4f1

so ceph-objectstore-tool is able to use it when rebuilding monitor db. Fixes: http://tracker.ceph.com/issues/17179 Signed-off-by: Kefu Chai <kchai@redhat.com>

tools/ceph-objectstore-tool: add "update-mon-db" command

24faea7

Fixes: http://tracker.ceph.com/issues/17179 Signed-off-by: Kefu Chai <kchai@redhat.com>

tools/ceph_monstore_tool: add "rebuild" command

d909fa0

Fixes: http://tracker.ceph.com/issues/17179 Signed-off-by: Kefu Chai <kchai@redhat.com>

doc: add rados/operations/disaster-recovery.rst

79a9f29

document the process to recover from leveldb corruption. Fixes: http://tracker.ceph.com/issues/17179 Signed-off-by: Kefu Chai <kchai@redhat.com>

tchaikov force-pushed the wip-rebuild-monstore branch from f1b45f5 to 79a9f29 Compare September 15, 2016 09:57

gregsfortytwo approved these changes Sep 17, 2016

View reviewed changes

tchaikov merged commit 19acda7 into ceph:master Sep 17, 2016

tchaikov deleted the wip-rebuild-monstore branch September 17, 2016 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wip rebuild monstore #10933

Wip rebuild monstore #10933

tchaikov commented Aug 31, 2016

athanatos Aug 31, 2016

tchaikov Aug 31, 2016

athanatos commented Aug 31, 2016

tchaikov Aug 31, 2016

gregsfortytwo commented Aug 31, 2016

gregsfortytwo Aug 31, 2016

gregsfortytwo commented Aug 31, 2016

gregsfortytwo Aug 31, 2016

tchaikov Sep 1, 2016

tchaikov Sep 1, 2016 •

edited

gregsfortytwo commented Aug 31, 2016

tchaikov commented Sep 1, 2016

tchaikov commented Sep 7, 2016

tchaikov commented Sep 8, 2016

tchaikov commented Sep 8, 2016

tchaikov commented Sep 12, 2016

athanatos Sep 12, 2016

tchaikov Sep 12, 2016

tchaikov commented Sep 12, 2016

athanatos Sep 12, 2016

tchaikov Sep 12, 2016

tchaikov commented Sep 12, 2016

tchaikov commented Sep 12, 2016

tchaikov commented Sep 12, 2016 •

edited

athanatos commented Sep 13, 2016

tchaikov commented Sep 14, 2016 •

edited

tchaikov commented Sep 15, 2016 •

edited

tchaikov commented Sep 15, 2016

athanatos commented Sep 16, 2016

gregsfortytwo left a comment

Wip rebuild monstore #10933

Wip rebuild monstore #10933

Conversation

tchaikov commented Aug 31, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

athanatos commented Aug 31, 2016

Choose a reason for hiding this comment

gregsfortytwo commented Aug 31, 2016

Choose a reason for hiding this comment

gregsfortytwo commented Aug 31, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tchaikov Sep 1, 2016 • edited

Choose a reason for hiding this comment

gregsfortytwo commented Aug 31, 2016

tchaikov commented Sep 1, 2016

tchaikov commented Sep 7, 2016

tchaikov commented Sep 8, 2016

tchaikov commented Sep 8, 2016

tchaikov commented Sep 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tchaikov commented Sep 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tchaikov commented Sep 12, 2016

tchaikov commented Sep 12, 2016

tchaikov commented Sep 12, 2016 • edited

athanatos commented Sep 13, 2016

tchaikov commented Sep 14, 2016 • edited

tchaikov commented Sep 15, 2016 • edited

tchaikov commented Sep 15, 2016

athanatos commented Sep 16, 2016

gregsfortytwo left a comment

Choose a reason for hiding this comment

tchaikov Sep 1, 2016 •

edited

tchaikov commented Sep 12, 2016 •

edited

tchaikov commented Sep 14, 2016 •

edited

tchaikov commented Sep 15, 2016 •

edited