Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon/OSDMonitor:Make the pg_num check more accurate #39062

Merged
merged 1 commit into from Sep 29, 2021

Conversation

fyzard1991
Copy link
Contributor

@fyzard1991 fyzard1991 commented Jan 25, 2021

In check_pg_num function, finding the corresponding osd according to the current pool's crush rule, and calculating whet
her the average value of pg_num on these osd will exceed the value of 'mon_max_pg_per_osd'.Make the pg_num check more accurate by counting all the pgs on the osd used by the new pool.

Fixes: https://tracker.ceph.com/issues/47062

Signed-off-by: Jerry Luo luojierui@chinatelecom.cn

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@fyzard1991
Copy link
Contributor Author

@tchaikov Excuse me ,here is my new PR.

@tchaikov
Copy link
Contributor

jenkins test make check

@fyzard1991
Copy link
Contributor Author

@tchaikov Thank you for your help.

@tchaikov
Copy link
Contributor

jenkins test make check

@tchaikov tchaikov added the DNM label Jan 31, 2021
@tchaikov
Copy link
Contributor

adding the DNM label, as the test failure might be related.

@tchaikov
Copy link
Contributor

tchaikov commented Feb 5, 2021

jenkins test make check

2 similar comments
@fyzard1991
Copy link
Contributor Author

jenkins test make check

@tchaikov
Copy link
Contributor

tchaikov commented Mar 8, 2021

jenkins test make check

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2021-03-09T08:44:40.689 INFO:tasks.ceph.mon.a.smithi025.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-1742-g89fb622b/rpm/el8/BUILD/ceph-17.0.0-1742-g89fb622b/src/osd/OSDMapMapping.h: In function 'mempool::osdmap_mapping::vector<pg_t>& OSDMapMapping::get_osd_acting_pgs(unsigned int)' thread 7f23d5b69700 time 2021-03-09T08:44:40.688560+0000
2021-03-09T08:44:40.690 INFO:tasks.ceph.mon.a.smithi025.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-1742-g89fb622b/rpm/el8/BUILD/ceph-17.0.0-1742-g89fb622b/src/osd/OSDMapMapping.h: 325: FAILED ceph_assert(osd < acting_rmap.size())
2021-03-09T08:44:40.691 INFO:tasks.ceph.mon.a.smithi025.stderr: ceph version 17.0.0-1742-g89fb622b (89fb622b10ca2e58fe0c913a8956377a886e2ab4) quincy (dev)
2021-03-09T08:44:40.691 INFO:tasks.ceph.mon.a.smithi025.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f23e34db1d8]
2021-03-09T08:44:40.692 INFO:tasks.ceph.mon.a.smithi025.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x2763f2) [0x7f23e34db3f2]
2021-03-09T08:44:40.692 INFO:tasks.ceph.mon.a.smithi025.stderr: 3: (OSDMonitor::check_pg_num(long, int, int, int, std::ostream*)+0x4fc) [0x559346ba097c]
2021-03-09T08:44:40.692 INFO:tasks.ceph.mon.a.smithi025.stderr: 4: (OSDMonitor::prepare_new_pool(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int, unsigned int, unsigned long, unsigned long, float, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned long, OSDMonitor::FastReadType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)+0x6cd) [0x559346bdfced]
2021-03-09T08:44:40.693 INFO:tasks.ceph.mon.a.smithi025.stderr: 5: (OSDMonitor::prepare_command_impl(boost::intrusive_ptr<MonOpRequest>, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&)+0x1845a) [0x559346c07b5a]
2021-03-09T08:44:40.693 INFO:tasks.ceph.mon.a.smithi025.stderr: 6: (OSDMonitor::prepare_command(boost::intrusive_ptr<MonOpRequest>)+0xf4) [0x559346c13054]
2021-03-09T08:44:40.694 INFO:tasks.ceph.mon.a.smithi025.stderr: 7: (OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0x373) [0x559346c172c3]
2021-03-09T08:44:40.694 INFO:tasks.ceph.mon.a.smithi025.stderr: 8: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0xa6d) [0x559346b98c2d]
2021-03-09T08:44:40.695 INFO:tasks.ceph.mon.a.smithi025.stderr: 9: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x2794) [0x559346a81c74]
2021-03-09T08:44:40.695 INFO:tasks.ceph.mon.a.smithi025.stderr: 10: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x7f9) [0x559346a864d9]
2021-03-09T08:44:40.696 INFO:tasks.ceph.mon.a.smithi025.stderr: 11: (Monitor::_ms_dispatch(Message*)+0x5f6) [0x559346a87766]
2021-03-09T08:44:40.696 INFO:tasks.ceph.mon.a.smithi025.stderr: 12: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x559346ab5b0c]
2021-03-09T08:44:40.696 INFO:tasks.ceph.mon.a.smithi025.stderr: 13: (DispatchQueue::entry()+0x126a) [0x7f23e37150ea]
2021-03-09T08:44:40.697 INFO:tasks.ceph.mon.a.smithi025.stderr: 14: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f23e37c2ae1]
2021-03-09T08:44:40.697 INFO:tasks.ceph.mon.a.smithi025.stderr: 15: (Thread::_entry_func(void*)+0xd) [0x7f23e35c241d]
2021-03-09T08:44:40.697 INFO:tasks.ceph.mon.a.smithi025.stderr: 16: /lib64/libpthread.so.0(+0x814a) [0x7f23e0fc114a]
2021-03-09T08:44:40.698 INFO:tasks.ceph.mon.a.smithi025.stderr: 17: clone()

https://pulpito.ceph.com/kchai-2021-03-09_08:22:40-rados-wip-kefu-testing-2021-03-09-1428-distro-basic-smithi/5949086/

@fyzard1991
Copy link
Contributor Author

@tchaikov All tests have been passed

@fyzard1991
Copy link
Contributor Author

@tchaikov excuse me,could you please review this request?

@fyzard1991
Copy link
Contributor Author

jenkins test make check

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2021-05-01T17:27:24.670 INFO:tasks.workunit.client.0.smithi042.stderr:Error EINVAL: pool size must be between 1 and 10
2021-05-01T17:27:24.672 INFO:tasks.workunit.client.0.smithi042.stderr:+ return 0
2021-05-01T17:27:24.672 INFO:tasks.workunit.client.0.smithi042.stderr:+ ceph osd pool set foo size 3
2021-05-01T17:27:24.941 INFO:tasks.workunit.client.0.smithi042.stderr:Error ERANGE: pool id 52 pg_num 123 size 3 would mean 18446744073709551587 total pgs, which exceeds max 30000 (mon_max_pg_per_osd 10000 * num_in_osds 3)
2021-05-01T17:27:24.942 DEBUG:teuthology.orchestra.run:got remote process result: 34

https://pulpito.ceph.com/kchai-2021-05-01_14:22:40-rados-wip-kefu-testing-2021-05-01-1113-distro-basic-smithi/6088233/

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@tchaikov
Copy link
Contributor

@fyzard1991 hi Jerry, did you manage to fix the test failure?

@fyzard1991
Copy link
Contributor Author

@tchaikov Yes, I think this test failed because the conditional judgement of out_osd statistics was wrongly written, and some of the pg were not included in the statistics causing the value of projected to be below the lower limit

@tchaikov
Copy link
Contributor

@fyzard1991 could you remove the merge commit in the PR. will try to review your change in this week.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

In check_pg_num function, finding the corresponding osd according to the
current pool's crush rule, and calculating whether the average value of
pg_num on these osd will exceed the value of 'mon_max_pg_per_osd'. Make
the pg_num check more accurate by counting all the pgs on the osd used
by the new pool.

Fixes: https://tracker.ceph.com/issues/47062
Signed-off-by: Jerry Luo <luojierui@chinatelecom.cn>
@tchaikov tchaikov merged commit 3f52895 into ceph:master Sep 29, 2021
@fyzard1991
Copy link
Contributor Author

@tchaikov Thank you for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants