Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jewel: tools: add a tool to rebuild mon store from OSD #11126

Merged
9 commits merged into from Oct 20, 2016

Conversation

@tchaikov
Copy link
Contributor

tchaikov commented Sep 19, 2016

@tchaikov tchaikov added this to the jewel milestone Sep 19, 2016
@tchaikov tchaikov changed the title Wip 17179 jewel jewel: add a tool to rebuild mon store from OSD Sep 19, 2016
@zphj1987

This comment has been minimized.

Copy link
Contributor

zphj1987 commented Sep 20, 2016

[root@lab8106 mon]# /usr/bin/ceph-mon -f --cluster ceph --id lab8106 --setuser ceph --setgroup ceph
starting mon.lab8106 rank 0 at 192.168.8.106:6789/0 mon_data /var/lib/ceph/mon/ceph-lab8106 fsid fa7ec1a1-662a-4ba3-b478-7cb570482b62
/root/rpmbuild/BUILD/ceph-11.0.0-2460-g22053d0/src/mon/PGMonitor.cc: In function 'void PGMonitor::check_osd_map(epoch_t)' thread 7f1f995f2500 time 2016-09-20 15:58:28.203619
/root/rpmbuild/BUILD/ceph-11.0.0-2460-g22053d0/src/mon/PGMonitor.cc: 911: FAILED assert(err == 0)
 ceph version v11.0.0-2460-g22053d0 (22053d057fb03e9c932da2771d7c90556567d1e4)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x55bb3cb61d55]
 2: (PGMonitor::check_osd_map(unsigned int)+0x150d) [0x55bb3cab811d]
 3: (PGMonitor::on_active()+0xf6) [0x55bb3cab8566]
 4: (PaxosService::_active()+0x195) [0x55bb3c9d5655]
 5: (PaxosService::election_finished()+0x7a) [0x55bb3c9d5cda]
 6: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, std::less<int>, std::allocator<int> > const*)+0x246) [0x55bb3c993a16]
 7: (Monitor::win_standalone_election()+0x17f) [0x55bb3c993e5f]
 8: (Monitor::bootstrap()+0xa1b) [0x55bb3c9949cb]
 9: (Monitor::init()+0xea) [0x55bb3c994c2a]
 10: (main()+0x2628) [0x55bb3c8e4958]
 11: (__libc_start_main()+0xf5) [0x7f1f94eccb15]
 12: (()+0x298445) [0x55bb3c958445]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2016-09-20 15:58:28.205597 7f1f995f2500 -1 /root/rpmbuild/BUILD/ceph-11.0.0-2460-g22053d0/src/mon/PGMonitor.cc: In function 'void PGMonitor::check_osd_map(epoch_t)' thread 7f1f995f2500 time 2016-09-20 15:58:28.203619
/root/rpmbuild/BUILD/ceph-11.0.0-2460-g22053d0/src/mon/PGMonitor.cc: 911: FAILED assert(err == 0)

 ceph version v11.0.0-2460-g22053d0 (22053d057fb03e9c932da2771d7c90556567d1e4)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x55bb3cb61d55]
 2: (PGMonitor::check_osd_map(unsigned int)+0x150d) [0x55bb3cab811d]
 3: (PGMonitor::on_active()+0xf6) [0x55bb3cab8566]
 4: (PaxosService::_active()+0x195) [0x55bb3c9d5655]
 5: (PaxosService::election_finished()+0x7a) [0x55bb3c9d5cda]
 6: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, std::less<int>, std::allocator<int> > const*)+0x246) [0x55bb3c993a16]
 7: (Monitor::win_standalone_election()+0x17f) [0x55bb3c993e5f]
 8: (Monitor::bootstrap()+0xa1b) [0x55bb3c9949cb]
 9: (Monitor::init()+0xea) [0x55bb3c994c2a]
 10: (main()+0x2628) [0x55bb3c8e4958]
 11: (__libc_start_main()+0xf5) [0x7f1f94eccb15]
 12: (()+0x298445) [0x55bb3c958445]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2016-09-20 15:58:28.205597 7f1f995f2500 -1 /root/rpmbuild/BUILD/ceph-11.0.0-2460-g22053d0/src/mon/PGMonitor.cc: In function 'void PGMonitor::check_osd_map(epoch_t)' thread 7f1f995f2500 time 2016-09-20 15:58:28.203619
/root/rpmbuild/BUILD/ceph-11.0.0-2460-g22053d0/src/mon/PGMonitor.cc: 911: FAILED assert(err == 0)

 ceph version v11.0.0-2460-g22053d0 (22053d057fb03e9c932da2771d7c90556567d1e4)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x55bb3cb61d55]
 2: (PGMonitor::check_osd_map(unsigned int)+0x150d) [0x55bb3cab811d]
 3: (PGMonitor::on_active()+0xf6) [0x55bb3cab8566]
 4: (PaxosService::_active()+0x195) [0x55bb3c9d5655]
 5: (PaxosService::election_finished()+0x7a) [0x55bb3c9d5cda]
 6: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, std::less<int>, std::allocator<int> > const*)+0x246) [0x55bb3c993a16]
 7: (Monitor::win_standalone_election()+0x17f) [0x55bb3c993e5f]
 8: (Monitor::bootstrap()+0xa1b) [0x55bb3c9949cb]
 9: (Monitor::init()+0xea) [0x55bb3c994c2a]
 10: (main()+0x2628) [0x55bb3c8e4958]
 11: (__libc_start_main()+0xf5) [0x7f1f94eccb15]
 12: (()+0x298445) [0x55bb3c958445]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

*** Caught signal (Aborted) **
 in thread 7f1f995f2500 thread_name:ceph-mon
 ceph version v11.0.0-2460-g22053d0 (22053d057fb03e9c932da2771d7c90556567d1e4)
 1: (()+0x68fdea) [0x55bb3cd4fdea]
 2: (()+0xf100) [0x7f1f95a9b100]
 3: (gsignal()+0x37) [0x7f1f94ee05f7]
 4: (abort()+0x148) [0x7f1f94ee1ce8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x55bb3cb61f37]
 6: (PGMonitor::check_osd_map(unsigned int)+0x150d) [0x55bb3cab811d]
 7: (PGMonitor::on_active()+0xf6) [0x55bb3cab8566]
 8: (PaxosService::_active()+0x195) [0x55bb3c9d5655]
 9: (PaxosService::election_finished()+0x7a) [0x55bb3c9d5cda]
 10: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, std::less<int>, std::allocator<int> > const*)+0x246) [0x55bb3c993a16]
 11: (Monitor::win_standalone_election()+0x17f) [0x55bb3c993e5f]
 12: (Monitor::bootstrap()+0xa1b) [0x55bb3c9949cb]
 13: (Monitor::init()+0xea) [0x55bb3c994c2a]
 14: (main()+0x2628) [0x55bb3c8e4958]
 15: (__libc_start_main()+0xf5) [0x7f1f94eccb15]
 16: (()+0x298445) [0x55bb3c958445]
2016-09-20 15:58:28.208314 7f1f995f2500 -1 *** Caught signal (Aborted) **
 in thread 7f1f995f2500 thread_name:ceph-mon

 ceph version v11.0.0-2460-g22053d0 (22053d057fb03e9c932da2771d7c90556567d1e4)
 1: (()+0x68fdea) [0x55bb3cd4fdea]
 2: (()+0xf100) [0x7f1f95a9b100]
 3: (gsignal()+0x37) [0x7f1f94ee05f7]
 4: (abort()+0x148) [0x7f1f94ee1ce8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x55bb3cb61f37]
 6: (PGMonitor::check_osd_map(unsigned int)+0x150d) [0x55bb3cab811d]
 7: (PGMonitor::on_active()+0xf6) [0x55bb3cab8566]
 8: (PaxosService::_active()+0x195) [0x55bb3c9d5655]
 9: (PaxosService::election_finished()+0x7a) [0x55bb3c9d5cda]
 10: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, std::less<int>, std::allocator<int> > const*)+0x246) [0x55bb3c993a16]
 11: (Monitor::win_standalone_election()+0x17f) [0x55bb3c993e5f]
 12: (Monitor::bootstrap()+0xa1b) [0x55bb3c9949cb]
 13: (Monitor::init()+0xea) [0x55bb3c994c2a]
 14: (main()+0x2628) [0x55bb3c8e4958]
 15: (__libc_start_main()+0xf5) [0x7f1f94eccb15]
 16: (()+0x298445) [0x55bb3c958445]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2016-09-20 15:58:28.208314 7f1f995f2500 -1 *** Caught signal (Aborted) **
 in thread 7f1f995f2500 thread_name:ceph-mon

 ceph version v11.0.0-2460-g22053d0 (22053d057fb03e9c932da2771d7c90556567d1e4)
 1: (()+0x68fdea) [0x55bb3cd4fdea]
 2: (()+0xf100) [0x7f1f95a9b100]
 3: (gsignal()+0x37) [0x7f1f94ee05f7]
 4: (abort()+0x148) [0x7f1f94ee1ce8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x55bb3cb61f37]
 6: (PGMonitor::check_osd_map(unsigned int)+0x150d) [0x55bb3cab811d]
 7: (PGMonitor::on_active()+0xf6) [0x55bb3cab8566]
 8: (PaxosService::_active()+0x195) [0x55bb3c9d5655]
 9: (PaxosService::election_finished()+0x7a) [0x55bb3c9d5cda]
 10: (Monitor::win_election(unsigned int, std::set<int, std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int, std::less<int>, std::allocator<int> > const*)+0x246) [0x55bb3c993a16]
 11: (Monitor::win_standalone_election()+0x17f) [0x55bb3c993e5f]
 12: (Monitor::bootstrap()+0xa1b) [0x55bb3c9949cb]
 13: (Monitor::init()+0xea) [0x55bb3c994c2a]
 14: (main()+0x2628) [0x55bb3c8e4958]
 15: (__libc_start_main()+0xf5) [0x7f1f94eccb15]
 16: (()+0x298445) [0x55bb3c958445]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

@tchaikov

your tool i have a test,and find aproblem,and want to tell you .

i have a cluster with 1 mon, 2 osd, and

  • i stop all osd , stop mon
  • rebuild mon data ,start mon ,everything work well
    -then i repeat, stop all osd, stop mon
  • rebuild mon data ,start mon , will not start the mon ,is there something wrong?

i clean all and first time work,and base the rebuild environment,once again,it will not start the mon

@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Sep 20, 2016

@zphj1987 thanks for the testing, i will try to repeat your steps tomorrow.

@renhwztetecs

This comment has been minimized.

Copy link
Member

renhwztetecs commented Sep 22, 2016

@tchaikov
I also use wip-17179-jewel version of the test, the leader mon met with the same core

2016-09-22 16:49:59.900190 7fab37d8b700 5 mon.node173@0(leader).paxos(paxos active c 1..3) is_readable = 1 - now=2016-09-22 16:49:59.900190 lease_expire=2016-09-22 16:50:04.900174 has v0 lc 3
2016-09-22 16:49:59.916854 7fab37d8b700 -1 mon/PGMonitor.cc: In function 'void PGMonitor::check_osd_map(epoch_t)' thread 7fab37d8b700 time 2016-09-22 16:49:59.900215
mon/PGMonitor.cc: 892: FAILED assert(err == 0)

ceph version 10.2.2.8 (4cf7ed7423032cffc3768f1a091251d3733b26d0)
1: (ceph::ceph_assert_fail(char const, char const, int, char const_)+0x85) [0x7fab3eddbaa5]
2: (PGMonitor::check_osd_map(unsigned int)+0x1528) [0x7fab3eb3efe8]
3: (PGMonitor::on_active()+0xf6) [0x7fab3eb3f416]
4: (PaxosService::active()+0x207) [0x7fab3ea88217]
5: (Context::complete(int)+0x9) [0x7fab3ea54a39]
6: (void finish_contexts(CephContext
, std::list<Context*, std::allocator<Context*> >&, int)+0xac) [0x7fab3ea5b27c]
7: (Paxos::finish_round()+0xd1) [0x7fab3ea7f5c1]
8: (Paxos::commit_finish()+0x656) [0x7fab3ea81df6]
9: (C_Committed::finish(int)+0x2b) [0x7fab3ea8501b]
10: (Context::complete(int)+0x9) [0x7fab3ea54a39]
11: (MonitorDBStore::C_DoTransaction::finish(int)+0xa7) [0x7fab3ea83bc7]
12: (Context::complete(int)+0x9) [0x7fab3ea54a39]
13: (Finisher::finisher_thread_entry()+0x216) [0x7fab3ed020e6]
14: (()+0x7df3) [0x7fab3d308df3]
15: (clone()+0x6d) [0x7fab3bbd33dd]
NOTE: a copy of the executable, or objdump -rdS <executable> is needed to interpret this.

@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Sep 22, 2016

@renhwztetecs i am not able to reproduce your issue with following steps. i repeated it for 3 times, no luck.

../src/stop.sh
rm -rf /tmp/rebuild-db; mkdir /tmp/rebuild-db
for i in 0 1 2; do ./bin/ceph-objectstore-tool --data-path dev/osd$i  --op update-mon-db --mon-store-path /tmp/rebuild-db; done
./bin/ceph-monstore-tool /tmp/rebuild-db/ rebuild -- --keyring ./keyring
for i in b c; do  rm -rf dev/mon.$i; done
rm -rf dev/mon.a/store.db/
mv /tmp/rebuild-db/store.db/ dev/mon.a/
for i in b c; do ./bin/ceph-mon --mkfs -c ceph.conf -i $i --keyring=./keyring;done
CEPH_NUM_MON=3 CEPH_NUM_OSD=3 ../src/vstart.sh -x -l -k mon osd
./bin/ceph -s

could you please paste your script so i can repeat it? thanks!

@renhwztetecs

This comment has been minimized.

Copy link
Member

renhwztetecs commented Sep 23, 2016

@tchaikov
the following is the cluster information and steps,
what I missed? environment is still , I can reproduction

cluster:

[root@node181 mon]# ceph osd tree
ID WEIGHT  TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 0.51472 root default                                       
-2 0.22887     host node181                                   
 1 0.07629         osd.1         up  1.00000          1.00000 
 2 0.07629         osd.2         up  1.00000          1.00000 
 0 0.07628         osd.0         up  1.00000          1.00000 
-3 0.28586     host node173                                   
 4 0.09529         osd.4         up  1.00000          1.00000 
 5 0.09529         osd.5         up  1.00000          1.00000 
 3 0.09528         osd.3         up  1.00000          1.00000 

steps info

  1. stop all server
    systemctl stop ceph-osd.target
    systemctl stop ceph-mon@node173
    systemctl stop ceph-osd.target
    systemctl stop ceph-mon@node181
  2. mkdir tmp_store and backup origin mon
    mkdir -p /tmp/mon-store node173
    mkdir -p /tmp/mon-store node181
    mv /var/lib/ceph/mon/ceph-node173 /var/lib/ceph/mon/ceph-node173_back
    mv /var/lib/ceph/mon/ceph-node181 /var/lib/ceph/mon/ceph-node181_back
  3. collect the cluster map from OSDs in node173
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-3/ --op update-mon-db --mon-store-path /tmp/mon-store/
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4/ --op update-mon-db --mon-store-path /tmp/mon-store/
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5/ --op update-mon-db --mon-store-path /tmp/mon-store/
  4. sync mon-store to node181
    rsync -avz /tmp/mon-store/ 10.118.202.181:/tmp/mon-store/
  5. collect the cluster map from OSDs in node181
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op update-mon-db --mon-store-path /tmp/mon-store/
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --op update-mon-db --mon-store-path /tmp/mon-store/
    /usr/bin/ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2/ --op update-mon-db --mon-store-path /tmp/mon-store/
  6. rebuild the monitor store
    /usr/bin/ceph-monstore-tool /tmp/mon-store rebuild -- --keyring /etc/ceph/ceph.client.admin.keyring
    mkdir -p /var/lib/ceph/mon/ceph-node181
    cp -r /tmp/mon-store/* /var/lib/ceph/mon/ceph-node181
    cp /keyring /var/lib/ceph/mon/ceph-node181
    touch done; touch systemd
    chown ceph:ceph -R ../ceph-node181
    mkdir -p /var/lib/ceph/mon/ceph-node173 // in node173
    scp -r /tmp/mon-store/* 10.118.202.173:/var/lib/ceph/mon/ceph-node173
    cp /keyring /var/lib/ceph/mon/ceph-node173
    touch done; touch systemd
    chown ceph:ceph -R ../ceph-node173
  7. start mon
    systemctl stop ceph-mon@node173
    systemctl stop ceph-mon@node181
@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Sep 23, 2016

@renhwztetecs i don't see anything obvious other than

mkdir -p /tmp/mon-store node173
mkdir -p /tmp/mon-store node181

can you file a bug on tracker? maybe we can continue the investigation there instead reusing this PR?

and could you upload your restore store.db and the output of

for i in `seq 0 5`; do
  ls /var/lib/ceph/osd/ceph-3/current/meta/
done
@renhwztetecs

This comment has been minimized.

Copy link
Member

renhwztetecs commented Sep 23, 2016

yeah!
I'will push it later

@renhwztetecs

This comment has been minimized.

Copy link
Member

renhwztetecs commented Sep 26, 2016

@tchaikov tchaikov changed the title jewel: add a tool to rebuild mon store from OSD [DNM] jewel: add a tool to rebuild mon store from OSD Sep 30, 2016
@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Sep 30, 2016

fix posted at #11276

@tchaikov tchaikov force-pushed the tchaikov:wip-17179-jewel branch from ae0277a to 651b906 Sep 30, 2016
@tchaikov tchaikov changed the title [DNM] jewel: add a tool to rebuild mon store from OSD jewel: add a tool to rebuild mon store from OSD Sep 30, 2016
@@ -215,7 +215,10 @@ int update_osdmap(ObjectStore& fs, OSDSuperblock& sb, MonitorDBStore& ms)

// trim stale maps
unsigned ntrimmed = 0;
for (auto e = first_committed; e < sb.oldest_map; e++) {

This comment has been minimized.

Copy link
@ktdreyer

ktdreyer Sep 30, 2016

Member

This commit says "doc: ...", but then it's also touching rebuild_mondb.cc ? Should that change be split out?

This comment has been minimized.

Copy link
@tchaikov

tchaikov Oct 1, 2016

Author Contributor

@ktdreyer, yes. the code change spilled out into the doc change in the original commit. let me fix it.

@tchaikov tchaikov force-pushed the tchaikov:wip-17179-jewel branch from 651b906 to e9f323d Oct 1, 2016
@renhwztetecs

This comment has been minimized.

Copy link
Member

renhwztetecs commented Oct 9, 2016

nothing coredump and test pass. 👍
client.admin lack of caps after rebuild,
I push PR fix it, please reviews.
#11381

@ghost ghost added core feature labels Oct 10, 2016
@ghost

This comment has been minimized.

Copy link

ghost commented Oct 10, 2016

@tchaikov could you please update the 301bbca commit message to display the files in which the code change were located. It's not a big deal but we usually expect a list of files impacted by the conflict. Thanks :-)

@ghost ghost self-assigned this Oct 10, 2016
@tchaikov tchaikov self-assigned this Oct 10, 2016
@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Oct 10, 2016

@dachary sure, will do!

@tchaikov tchaikov force-pushed the tchaikov:wip-17179-jewel branch from e9f323d to 787577a Oct 10, 2016
@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Oct 10, 2016

i will update the cross references in this changeset after the commits (#11276) are merged in master

ghost pushed a commit that referenced this pull request Oct 10, 2016
…m OSD

Reviewed-by: Loic Dachary <ldachary@redhat.com>
@tchaikov tchaikov changed the title jewel: add a tool to rebuild mon store from OSD [DNM] jewel: add a tool to rebuild mon store from OSD Oct 10, 2016
tchaikov added 3 commits Aug 29, 2016
Fixes: http://tracker.ceph.com/issues/17179
Signed-off-by: Kefu Chai <kchai@redhat.com>
Conflicts:
	src/tools/CMakeLists.txt: this file was added in master, so
		update src/CMakeLists.txt instead
	src/tools/Makefile-server.am: jewel is still using autotools,
		so update this file also.
        src/tools/rebuild_mondb.cc: move the code spilled into
                doc/rados/troubleshooting/troubleshooting-mon.rst
                by accident back to this commit.
(cherry picked from commit 24faea7)
Fixes: http://tracker.ceph.com/issues/17179
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit d909fa0)
so ceph-objectstore-tool is able to use it when rebuilding monitor
db.

Fixes: http://tracker.ceph.com/issues/17179
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 19ef4f1)
xiexingguo and others added 6 commits Sep 18, 2016
In general we return negative codes for error cases, so there is
no need perform the cast here.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 6a1c01d)
document the process to recover from leveldb corruption.

Fixes: http://tracker.ceph.com/issues/17179
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 79a9f29)
Conflicts:
        src/tools/rebuild_mondb.cc:
		remove the code change in this file from this commit.
		and the code gets removed is added in anther commit.
As follow:

[ 72%] Building CXX object src/tools/CMakeFiles/ceph-objectstore-tool.dir/RadosDump.cc.o
/home/jenkins-build/build/workspace/ceph-pull-requests/src/tools/rebuild_mondb.cc: In function ‘int update_mon_db(ObjectStore&, OSDSuperblock&, const string&, const string&)’:
/home/jenkins-build/build/workspace/ceph-pull-requests/src/tools/rebuild_mondb.cc:289:22: warning: ‘crc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
         if (have_crc && osdmap.get_crc() != crc) {
                      ^
/home/jenkins-build/build/workspace/ceph-pull-requests/src/tools/rebuild_mondb.cc:238:14: note: ‘crc’ was declared here
     uint32_t crc;

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit f16a314)
…e.db

we should rebuild pgmap_meta table from the collected osdmaps

Fixes: http://tracker.ceph.com/issues/17400
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit cdfa7a6)
we take it as an error if no caps is granted to an entity in the
specified keyring file when rebuilding the monitor db.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit b4bd400)
to make sure the recovered monitor store is ready for use.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit af8e211)
@tchaikov tchaikov force-pushed the tchaikov:wip-17179-jewel branch from 787577a to 25a35d4 Oct 18, 2016
@tchaikov tchaikov changed the title [DNM] jewel: add a tool to rebuild mon store from OSD jewel: add a tool to rebuild mon store from OSD Oct 18, 2016
@tchaikov tchaikov removed their assignment Oct 18, 2016
@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Oct 18, 2016

changelog

  • rebase against master
  • fix the xref, so they point to the corresponding commits in master.
@ghost

This comment has been minimized.

Copy link

ghost commented Oct 18, 2016

jenkins test this please

ghost pushed a commit that referenced this pull request Oct 18, 2016
…m OSD

Reviewed-by: Loic Dachary <ldachary@redhat.com>
@ghost

This comment has been minimized.

Copy link

ghost commented Oct 20, 2016

It passed the rados (http://tracker.ceph.com/issues/17487#note-19) suite except for two jobs that are, I believe unrelated. It also passed the upgrade/jewel-x and upgrade/hammer-x (http://tracker.ceph.com/issues/17487#note-22) suites.

@ghost ghost merged commit fb74b16 into ceph:jewel Oct 20, 2016
1 of 2 checks passed
1 of 2 checks passed
default Build finished.
Details
Signed-off-by all commits in this PR are signed
Details
@tchaikov tchaikov deleted the tchaikov:wip-17179-jewel branch Oct 20, 2016
@theanalyst theanalyst changed the title jewel: add a tool to rebuild mon store from OSD "jewel: tools: add a tool to rebuild mon store from OSD" Nov 17, 2016
@theanalyst theanalyst changed the title "jewel: tools: add a tool to rebuild mon store from OSD" jewel: tools: add a tool to rebuild mon store from OSD Nov 17, 2016
@Alanwalker3

This comment has been minimized.

Copy link

Alanwalker3 commented Jul 17, 2017

@tchaikov i want to ask that why fsmap cannot be restored.

@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Jul 17, 2017

because i don't understand cephfs enough to do this, and that was not my first priority by then. but please feel free to send a PR to enable this feature. i will be more than happy to test and review it.

@Alanwalker3

This comment has been minimized.

Copy link

Alanwalker3 commented Jul 17, 2017

@tchaikov ,thank you very much for your answer, i have another question that why pgmap cannot be restore fully, you can see the status :
Before restoring:
pgmap v94:2048 pgs, 2 pools, 34294 bytes data,23 objects 127 MB used, 2818 GB / 2818 GB avail 2048 active+clean
After restoring:
pgmap v13: 2048 pgs, 2 pools, 34294 bytes data,23 objects 131 MB used, 2818 GB / 2818 GB avail 2048 active+clean

@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Jul 17, 2017

@Alanwalker3 it would be great if we can move this discussion to ceph-devel.

@Alanwalker3

This comment has been minimized.

Copy link

Alanwalker3 commented Jul 18, 2017

@tchaikov How can i get into ceph-devel.

@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Jul 18, 2017

@Alanwalker3

This comment has been minimized.

Copy link

Alanwalker3 commented Jul 22, 2017

@tchaikov From the file src/tools/rebuild_mondb.cc ,we can see that only auth,monitor,osdmap,pgmap_pg are updated,and we did nothing for pgmap. So if it is the reason that we cannot restore pgmap fully.

@tchaikov

This comment has been minimized.

Copy link
Contributor Author

tchaikov commented Jul 23, 2017

@Alanwalker3 yes, we don't restore pgmap. as it will be reconstructed anyway after the cluster is back online.

So if it is the reason that we cannot restore pgmap fully.

i don't follow you. could you please rephrase this ?

@Alanwalker3

This comment has been minimized.

Copy link

Alanwalker3 commented Sep 4, 2017

@tchaikov because i was busy with my work,i cannot answer you soon.i agree with you that the pgmap will be reconstructed anyway after the cluster is back online,so there is no need to restore pgmap.
i have another question that one of known limititions:the MDS keyrings and other keyrings are missing.what does "other keyrings"mean and contain.
i also want to ask that i try to use ceph-decoder to decode auth:1,but it reports error.the command is:
ceph-decode import auth.1.txt type MonitorDBStore::Transaction decode dump_json.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.