Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jewel: tools: add a tool to rebuild mon store from OSD #11126

Merged
9 commits merged into from Oct 20, 2016

Conversation

tchaikov
Copy link
Contributor

@tchaikov tchaikov commented Sep 19, 2016

@tchaikov tchaikov added this to the jewel milestone Sep 19, 2016
@tchaikov tchaikov changed the title Wip 17179 jewel jewel: add a tool to rebuild mon store from OSD Sep 19, 2016
@zphj1987
Copy link
Contributor

[root@lab8106 mon]# /usr/bin/ceph-mon -f --cluster ceph --id lab8106 --setuser ceph --setgroup ceph
starting mon.lab8106 rank 0 at 192.168.8.106:6789/0 mon_data /var/lib/ceph/mon/ceph-lab8106 fsid fa7ec1a1-662a-4ba3-b478-7cb570482b62
/root/rpmbuild/BUILD/ceph-11.0.0-2460-g22053d0/src/mon/PGMonitor.cc: In function 'void PGMonitor::check_osd_map(epoch_t)' thread 7f1f995f2500 time 2016-09-20 15:58:28.203619
/root/rpmbuild/BUILD/ceph-11.0.0-2460-g22053d0/src/mon/PGMonitor.cc: 911: FAILED assert(err == 0)
 ceph version v11.0.0-2460-g22053d0 (22053d057fb03e9c932da2771d7c90556567d1e4)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x55bb3cb61d55]
 2: (PGMonitor::check_osd_map(unsigned int)+0x150d) [0x55bb3cab811d]
 3: (PGMonitor::on_active()+0xf6) [0x55bb3cab8566]
 4: (PaxosService::_active()+0x195) [0x55bb3c9d5655]
 5: (PaxosService::election_finished()+0x7a) [0x55bb3c9d5cda]
 6: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, std::less<int>, std::allocator<int> > const*)+0x246) [0x55bb3c993a16]
 7: (Monitor::win_standalone_election()+0x17f) [0x55bb3c993e5f]
 8: (Monitor::bootstrap()+0xa1b) [0x55bb3c9949cb]
 9: (Monitor::init()+0xea) [0x55bb3c994c2a]
 10: (main()+0x2628) [0x55bb3c8e4958]
 11: (__libc_start_main()+0xf5) [0x7f1f94eccb15]
 12: (()+0x298445) [0x55bb3c958445]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2016-09-20 15:58:28.205597 7f1f995f2500 -1 /root/rpmbuild/BUILD/ceph-11.0.0-2460-g22053d0/src/mon/PGMonitor.cc: In function 'void PGMonitor::check_osd_map(epoch_t)' thread 7f1f995f2500 time 2016-09-20 15:58:28.203619
/root/rpmbuild/BUILD/ceph-11.0.0-2460-g22053d0/src/mon/PGMonitor.cc: 911: FAILED assert(err == 0)

 ceph version v11.0.0-2460-g22053d0 (22053d057fb03e9c932da2771d7c90556567d1e4)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x55bb3cb61d55]
 2: (PGMonitor::check_osd_map(unsigned int)+0x150d) [0x55bb3cab811d]
 3: (PGMonitor::on_active()+0xf6) [0x55bb3cab8566]
 4: (PaxosService::_active()+0x195) [0x55bb3c9d5655]
 5: (PaxosService::election_finished()+0x7a) [0x55bb3c9d5cda]
 6: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, std::less<int>, std::allocator<int> > const*)+0x246) [0x55bb3c993a16]
 7: (Monitor::win_standalone_election()+0x17f) [0x55bb3c993e5f]
 8: (Monitor::bootstrap()+0xa1b) [0x55bb3c9949cb]
 9: (Monitor::init()+0xea) [0x55bb3c994c2a]
 10: (main()+0x2628) [0x55bb3c8e4958]
 11: (__libc_start_main()+0xf5) [0x7f1f94eccb15]
 12: (()+0x298445) [0x55bb3c958445]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2016-09-20 15:58:28.205597 7f1f995f2500 -1 /root/rpmbuild/BUILD/ceph-11.0.0-2460-g22053d0/src/mon/PGMonitor.cc: In function 'void PGMonitor::check_osd_map(epoch_t)' thread 7f1f995f2500 time 2016-09-20 15:58:28.203619
/root/rpmbuild/BUILD/ceph-11.0.0-2460-g22053d0/src/mon/PGMonitor.cc: 911: FAILED assert(err == 0)

 ceph version v11.0.0-2460-g22053d0 (22053d057fb03e9c932da2771d7c90556567d1e4)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x55bb3cb61d55]
 2: (PGMonitor::check_osd_map(unsigned int)+0x150d) [0x55bb3cab811d]
 3: (PGMonitor::on_active()+0xf6) [0x55bb3cab8566]
 4: (PaxosService::_active()+0x195) [0x55bb3c9d5655]
 5: (PaxosService::election_finished()+0x7a) [0x55bb3c9d5cda]
 6: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, std::less<int>, std::allocator<int> > const*)+0x246) [0x55bb3c993a16]
 7: (Monitor::win_standalone_election()+0x17f) [0x55bb3c993e5f]
 8: (Monitor::bootstrap()+0xa1b) [0x55bb3c9949cb]
 9: (Monitor::init()+0xea) [0x55bb3c994c2a]
 10: (main()+0x2628) [0x55bb3c8e4958]
 11: (__libc_start_main()+0xf5) [0x7f1f94eccb15]
 12: (()+0x298445) [0x55bb3c958445]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

*** Caught signal (Aborted) **
 in thread 7f1f995f2500 thread_name:ceph-mon
 ceph version v11.0.0-2460-g22053d0 (22053d057fb03e9c932da2771d7c90556567d1e4)
 1: (()+0x68fdea) [0x55bb3cd4fdea]
 2: (()+0xf100) [0x7f1f95a9b100]
 3: (gsignal()+0x37) [0x7f1f94ee05f7]
 4: (abort()+0x148) [0x7f1f94ee1ce8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x55bb3cb61f37]
 6: (PGMonitor::check_osd_map(unsigned int)+0x150d) [0x55bb3cab811d]
 7: (PGMonitor::on_active()+0xf6) [0x55bb3cab8566]
 8: (PaxosService::_active()+0x195) [0x55bb3c9d5655]
 9: (PaxosService::election_finished()+0x7a) [0x55bb3c9d5cda]
 10: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, std::less<int>, std::allocator<int> > const*)+0x246) [0x55bb3c993a16]
 11: (Monitor::win_standalone_election()+0x17f) [0x55bb3c993e5f]
 12: (Monitor::bootstrap()+0xa1b) [0x55bb3c9949cb]
 13: (Monitor::init()+0xea) [0x55bb3c994c2a]
 14: (main()+0x2628) [0x55bb3c8e4958]
 15: (__libc_start_main()+0xf5) [0x7f1f94eccb15]
 16: (()+0x298445) [0x55bb3c958445]
2016-09-20 15:58:28.208314 7f1f995f2500 -1 *** Caught signal (Aborted) **
 in thread 7f1f995f2500 thread_name:ceph-mon

 ceph version v11.0.0-2460-g22053d0 (22053d057fb03e9c932da2771d7c90556567d1e4)
 1: (()+0x68fdea) [0x55bb3cd4fdea]
 2: (()+0xf100) [0x7f1f95a9b100]
 3: (gsignal()+0x37) [0x7f1f94ee05f7]
 4: (abort()+0x148) [0x7f1f94ee1ce8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x55bb3cb61f37]
 6: (PGMonitor::check_osd_map(unsigned int)+0x150d) [0x55bb3cab811d]
 7: (PGMonitor::on_active()+0xf6) [0x55bb3cab8566]
 8: (PaxosService::_active()+0x195) [0x55bb3c9d5655]
 9: (PaxosService::election_finished()+0x7a) [0x55bb3c9d5cda]
 10: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, std::less<int>, std::allocator<int> > const*)+0x246) [0x55bb3c993a16]
 11: (Monitor::win_standalone_election()+0x17f) [0x55bb3c993e5f]
 12: (Monitor::bootstrap()+0xa1b) [0x55bb3c9949cb]
 13: (Monitor::init()+0xea) [0x55bb3c994c2a]
 14: (main()+0x2628) [0x55bb3c8e4958]
 15: (__libc_start_main()+0xf5) [0x7f1f94eccb15]
 16: (()+0x298445) [0x55bb3c958445]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2016-09-20 15:58:28.208314 7f1f995f2500 -1 *** Caught signal (Aborted) **
 in thread 7f1f995f2500 thread_name:ceph-mon

 ceph version v11.0.0-2460-g22053d0 (22053d057fb03e9c932da2771d7c90556567d1e4)
 1: (()+0x68fdea) [0x55bb3cd4fdea]
 2: (()+0xf100) [0x7f1f95a9b100]
 3: (gsignal()+0x37) [0x7f1f94ee05f7]
 4: (abort()+0x148) [0x7f1f94ee1ce8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x55bb3cb61f37]
 6: (PGMonitor::check_osd_map(unsigned int)+0x150d) [0x55bb3cab811d]
 7: (PGMonitor::on_active()+0xf6) [0x55bb3cab8566]
 8: (PaxosService::_active()+0x195) [0x55bb3c9d5655]
 9: (PaxosService::election_finished()+0x7a) [0x55bb3c9d5cda]
 10: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, std::less<int>, std::allocator<int> > const*)+0x246) [0x55bb3c993a16]
 11: (Monitor::win_standalone_election()+0x17f) [0x55bb3c993e5f]
 12: (Monitor::bootstrap()+0xa1b) [0x55bb3c9949cb]
 13: (Monitor::init()+0xea) [0x55bb3c994c2a]
 14: (main()+0x2628) [0x55bb3c8e4958]
 15: (__libc_start_main()+0xf5) [0x7f1f94eccb15]
 16: (()+0x298445) [0x55bb3c958445]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

@tchaikov

your tool i have a test,and find aproblem,and want to tell you .

i have a cluster with 1 mon, 2 osd, and

  • i stop all osd , stop mon
  • rebuild mon data ,start mon ,everything work well
    -then i repeat, stop all osd, stop mon
  • rebuild mon data ,start mon , will not start the mon ,is there something wrong?

i clean all and first time work,and base the rebuild environment,once again,it will not start the mon

@tchaikov
Copy link
Contributor Author

@zphj1987 thanks for the testing, i will try to repeat your steps tomorrow.

@renhwztetecs
Copy link
Contributor

@tchaikov
I also use wip-17179-jewel version of the test, the leader mon met with the same core

2016-09-22 16:49:59.900190 7fab37d8b700 5 mon.node173@0(leader).paxos(paxos active c 1..3) is_readable = 1 - now=2016-09-22 16:49:59.900190 lease_expire=2016-09-22 16:50:04.900174 has v0 lc 3
2016-09-22 16:49:59.916854 7fab37d8b700 -1 mon/PGMonitor.cc: In function 'void PGMonitor::check_osd_map(epoch_t)' thread 7fab37d8b700 time 2016-09-22 16:49:59.900215
mon/PGMonitor.cc: 892: FAILED assert(err == 0)

ceph version 10.2.2.8 (4cf7ed7423032cffc3768f1a091251d3733b26d0)
1: (ceph::ceph_assert_fail(char const, char const, int, char const_)+0x85) [0x7fab3eddbaa5]
2: (PGMonitor::check_osd_map(unsigned int)+0x1528) [0x7fab3eb3efe8]
3: (PGMonitor::on_active()+0xf6) [0x7fab3eb3f416]
4: (PaxosService::active()+0x207) [0x7fab3ea88217]
5: (Context::complete(int)+0x9) [0x7fab3ea54a39]
6: (void finish_contexts(CephContext
, std::list<Context*, std::allocator<Context*> >&, int)+0xac) [0x7fab3ea5b27c]
7: (Paxos::finish_round()+0xd1) [0x7fab3ea7f5c1]
8: (Paxos::commit_finish()+0x656) [0x7fab3ea81df6]
9: (C_Committed::finish(int)+0x2b) [0x7fab3ea8501b]
10: (Context::complete(int)+0x9) [0x7fab3ea54a39]
11: (MonitorDBStore::C_DoTransaction::finish(int)+0xa7) [0x7fab3ea83bc7]
12: (Context::complete(int)+0x9) [0x7fab3ea54a39]
13: (Finisher::finisher_thread_entry()+0x216) [0x7fab3ed020e6]
14: (()+0x7df3) [0x7fab3d308df3]
15: (clone()+0x6d) [0x7fab3bbd33dd]
NOTE: a copy of the executable, or objdump -rdS <executable> is needed to interpret this.

@tchaikov
Copy link
Contributor Author

@renhwztetecs i am not able to reproduce your issue with following steps. i repeated it for 3 times, no luck.

../src/stop.sh
rm -rf /tmp/rebuild-db; mkdir /tmp/rebuild-db
for i in 0 1 2; do ./bin/ceph-objectstore-tool --data-path dev/osd$i  --op update-mon-db --mon-store-path /tmp/rebuild-db; done
./bin/ceph-monstore-tool /tmp/rebuild-db/ rebuild -- --keyring ./keyring
for i in b c; do  rm -rf dev/mon.$i; done
rm -rf dev/mon.a/store.db/
mv /tmp/rebuild-db/store.db/ dev/mon.a/
for i in b c; do ./bin/ceph-mon --mkfs -c ceph.conf -i $i --keyring=./keyring;done
CEPH_NUM_MON=3 CEPH_NUM_OSD=3 ../src/vstart.sh -x -l -k mon osd
./bin/ceph -s

could you please paste your script so i can repeat it? thanks!

@renhwztetecs
Copy link
Contributor

renhwztetecs commented Sep 23, 2016

@tchaikov
the following is the cluster information and steps,
what I missed? environment is still , I can reproduction

cluster:

[root@node181 mon]# ceph osd tree
ID WEIGHT  TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 0.51472 root default                                       
-2 0.22887     host node181                                   
 1 0.07629         osd.1         up  1.00000          1.00000 
 2 0.07629         osd.2         up  1.00000          1.00000 
 0 0.07628         osd.0         up  1.00000          1.00000 
-3 0.28586     host node173                                   
 4 0.09529         osd.4         up  1.00000          1.00000 
 5 0.09529         osd.5         up  1.00000          1.00000 
 3 0.09528         osd.3         up  1.00000          1.00000 

steps info

  1. stop all server
    systemctl stop ceph-osd.target
    systemctl stop ceph-mon@node173
    systemctl stop ceph-osd.target
    systemctl stop ceph-mon@node181
  2. mkdir tmp_store and backup origin mon
    mkdir -p /tmp/mon-store node173
    mkdir -p /tmp/mon-store node181
    mv /var/lib/ceph/mon/ceph-node173 /var/lib/ceph/mon/ceph-node173_back
    mv /var/lib/ceph/mon/ceph-node181 /var/lib/ceph/mon/ceph-node181_back
  3. collect the cluster map from OSDs in node173
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-3/ --op update-mon-db --mon-store-path /tmp/mon-store/
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4/ --op update-mon-db --mon-store-path /tmp/mon-store/
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5/ --op update-mon-db --mon-store-path /tmp/mon-store/
  4. sync mon-store to node181
    rsync -avz /tmp/mon-store/ 10.118.202.181:/tmp/mon-store/
  5. collect the cluster map from OSDs in node181
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op update-mon-db --mon-store-path /tmp/mon-store/
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --op update-mon-db --mon-store-path /tmp/mon-store/
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2/ --op update-mon-db --mon-store-path /tmp/mon-store/
  6. rebuild the monitor store
    /usr/bin/ceph-monstore-tool /tmp/mon-store rebuild -- --keyring /etc/ceph/ceph.client.admin.keyring
    mkdir -p /var/lib/ceph/mon/ceph-node181
    cp -r /tmp/mon-store/* /var/lib/ceph/mon/ceph-node181
    cp /keyring /var/lib/ceph/mon/ceph-node181
    touch done; touch systemd
    chown ceph:ceph -R ../ceph-node181
    mkdir -p /var/lib/ceph/mon/ceph-node173 // in node173
    scp -r /tmp/mon-store/* 10.118.202.173:/var/lib/ceph/mon/ceph-node173
    cp /keyring /var/lib/ceph/mon/ceph-node173
    touch done; touch systemd
    chown ceph:ceph -R ../ceph-node173
  7. start mon
    systemctl stop ceph-mon@node173
    systemctl stop ceph-mon@node181

@tchaikov
Copy link
Contributor Author

@renhwztetecs i don't see anything obvious other than

mkdir -p /tmp/mon-store node173
mkdir -p /tmp/mon-store node181

can you file a bug on tracker? maybe we can continue the investigation there instead reusing this PR?

and could you upload your restore store.db and the output of

for i in `seq 0 5`; do
  ls /var/lib/ceph/osd/ceph-3/current/meta/
done

@renhwztetecs
Copy link
Contributor

yeah!
I'will push it later

@renhwztetecs
Copy link
Contributor

@tchaikov tchaikov changed the title jewel: add a tool to rebuild mon store from OSD [DNM] jewel: add a tool to rebuild mon store from OSD Sep 30, 2016
@tchaikov
Copy link
Contributor Author

fix posted at #11276

@tchaikov tchaikov changed the title [DNM] jewel: add a tool to rebuild mon store from OSD jewel: add a tool to rebuild mon store from OSD Sep 30, 2016
@@ -215,7 +215,10 @@ int update_osdmap(ObjectStore& fs, OSDSuperblock& sb, MonitorDBStore& ms)

// trim stale maps
unsigned ntrimmed = 0;
for (auto e = first_committed; e < sb.oldest_map; e++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit says "doc: ...", but then it's also touching rebuild_mondb.cc ? Should that change be split out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktdreyer, yes. the code change spilled out into the doc change in the original commit. let me fix it.

@renhwztetecs
Copy link
Contributor

nothing coredump and test pass. 👍
client.admin lack of caps after rebuild,
I push PR fix it, please reviews.
#11381

@ghost ghost added core feature labels Oct 10, 2016
@ghost
Copy link

ghost commented Oct 10, 2016

@tchaikov could you please update the 301bbca commit message to display the files in which the code change were located. It's not a big deal but we usually expect a list of files impacted by the conflict. Thanks :-)

@ghost ghost self-assigned this Oct 10, 2016
@tchaikov tchaikov self-assigned this Oct 10, 2016
@tchaikov
Copy link
Contributor Author

@dachary sure, will do!

@tchaikov
Copy link
Contributor Author

i will update the cross references in this changeset after the commits (#11276) are merged in master

ghost pushed a commit that referenced this pull request Oct 10, 2016
…m OSD

Reviewed-by: Loic Dachary <ldachary@redhat.com>
@tchaikov tchaikov changed the title jewel: add a tool to rebuild mon store from OSD [DNM] jewel: add a tool to rebuild mon store from OSD Oct 10, 2016
so ceph-objectstore-tool is able to use it when rebuilding monitor
db.

Fixes: http://tracker.ceph.com/issues/17179
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 19ef4f1)
Fixes: http://tracker.ceph.com/issues/17179
Signed-off-by: Kefu Chai <kchai@redhat.com>
Conflicts:
	src/tools/CMakeLists.txt: this file was added in master, so
		update src/CMakeLists.txt instead
	src/tools/Makefile-server.am: jewel is still using autotools,
		so update this file also.
        src/tools/rebuild_mondb.cc: move the code spilled into
                doc/rados/troubleshooting/troubleshooting-mon.rst
                by accident back to this commit.
(cherry picked from commit 24faea7)
Fixes: http://tracker.ceph.com/issues/17179
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit d909fa0)
tchaikov and others added 6 commits October 18, 2016 10:49
document the process to recover from leveldb corruption.

Fixes: http://tracker.ceph.com/issues/17179
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 79a9f29)
Conflicts:
        src/tools/rebuild_mondb.cc:
		remove the code change in this file from this commit.
		and the code gets removed is added in anther commit.
In general we return negative codes for error cases, so there is
no need perform the cast here.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 6a1c01d)
As follow:

[ 72%] Building CXX object src/tools/CMakeFiles/ceph-objectstore-tool.dir/RadosDump.cc.o
/home/jenkins-build/build/workspace/ceph-pull-requests/src/tools/rebuild_mondb.cc: In function ‘int update_mon_db(ObjectStore&, OSDSuperblock&, const string&, const string&)’:
/home/jenkins-build/build/workspace/ceph-pull-requests/src/tools/rebuild_mondb.cc:289:22: warning: ‘crc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
         if (have_crc && osdmap.get_crc() != crc) {
                      ^
/home/jenkins-build/build/workspace/ceph-pull-requests/src/tools/rebuild_mondb.cc:238:14: note: ‘crc’ was declared here
     uint32_t crc;

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit f16a314)
…e.db

we should rebuild pgmap_meta table from the collected osdmaps

Fixes: http://tracker.ceph.com/issues/17400
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit cdfa7a6)
we take it as an error if no caps is granted to an entity in the
specified keyring file when rebuilding the monitor db.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit b4bd400)
to make sure the recovered monitor store is ready for use.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit af8e211)
@tchaikov tchaikov changed the title [DNM] jewel: add a tool to rebuild mon store from OSD jewel: add a tool to rebuild mon store from OSD Oct 18, 2016
@tchaikov tchaikov removed their assignment Oct 18, 2016
@tchaikov
Copy link
Contributor Author

changelog

  • rebase against master
  • fix the xref, so they point to the corresponding commits in master.

@ghost
Copy link

ghost commented Oct 18, 2016

jenkins test this please

ghost pushed a commit that referenced this pull request Oct 18, 2016
…m OSD

Reviewed-by: Loic Dachary <ldachary@redhat.com>
@ghost
Copy link

ghost commented Oct 20, 2016

It passed the rados (http://tracker.ceph.com/issues/17487#note-19) suite except for two jobs that are, I believe unrelated. It also passed the upgrade/jewel-x and upgrade/hammer-x (http://tracker.ceph.com/issues/17487#note-22) suites.

@ghost ghost merged commit fb74b16 into ceph:jewel Oct 20, 2016
@tchaikov tchaikov deleted the wip-17179-jewel branch October 20, 2016 09:37
@theanalyst theanalyst changed the title jewel: add a tool to rebuild mon store from OSD "jewel: tools: add a tool to rebuild mon store from OSD" Nov 17, 2016
@theanalyst theanalyst changed the title "jewel: tools: add a tool to rebuild mon store from OSD" jewel: tools: add a tool to rebuild mon store from OSD Nov 17, 2016
@Alanwalker3
Copy link

@tchaikov i want to ask that why fsmap cannot be restored.

@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 17, 2017

because i don't understand cephfs enough to do this, and that was not my first priority by then. but please feel free to send a PR to enable this feature. i will be more than happy to test and review it.

@Alanwalker3
Copy link

@tchaikov ,thank you very much for your answer, i have another question that why pgmap cannot be restore fully, you can see the status :
Before restoring:
pgmap v94:2048 pgs, 2 pools, 34294 bytes data,23 objects 127 MB used, 2818 GB / 2818 GB avail 2048 active+clean
After restoring:
pgmap v13: 2048 pgs, 2 pools, 34294 bytes data,23 objects 131 MB used, 2818 GB / 2818 GB avail 2048 active+clean

@tchaikov
Copy link
Contributor Author

@Alanwalker3 it would be great if we can move this discussion to ceph-devel.

@Alanwalker3
Copy link

@tchaikov How can i get into ceph-devel.

@tchaikov
Copy link
Contributor Author

tchaikov commented Jul 18, 2017 via email

@Alanwalker3
Copy link

@tchaikov From the file src/tools/rebuild_mondb.cc ,we can see that only auth,monitor,osdmap,pgmap_pg are updated,and we did nothing for pgmap. So if it is the reason that we cannot restore pgmap fully.

@tchaikov
Copy link
Contributor Author

@Alanwalker3 yes, we don't restore pgmap. as it will be reconstructed anyway after the cluster is back online.

So if it is the reason that we cannot restore pgmap fully.

i don't follow you. could you please rephrase this ?

@Alanwalker3
Copy link

@tchaikov because i was busy with my work,i cannot answer you soon.i agree with you that the pgmap will be reconstructed anyway after the cluster is back online,so there is no need to restore pgmap.
i have another question that one of known limititions:the MDS keyrings and other keyrings are missing.what does "other keyrings"mean and contain.
i also want to ask that i try to use ceph-decoder to decode auth:1,but it reports error.the command is:
ceph-decode import auth.1.txt type MonitorDBStore::Transaction decode dump_json.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants