New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon/MonClient: hunt monitors in parallel #11128

Merged
merged 7 commits into from Feb 14, 2017

Conversation

Projects
None yet
4 participants
@tchaikov
Contributor

tchaikov commented Sep 19, 2016

@athanatos

This comment has been minimized.

Contributor

athanatos commented Sep 19, 2016

LGTM!

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Sep 21, 2016

changelog

  • rebase against master,
  • always clear pending_cons in _reopen_session().
@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Sep 23, 2016

rebase against master for a clean jenkins run.

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Sep 26, 2016

changelog:

  • keep an "auth" pointer around so we can get the authorizer as requested by objecter when needed
  • do not assert(_opened()) in ms_dispatch().

@tchaikov tchaikov removed the needs-qa label Sep 26, 2016

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Sep 26, 2016

@yuriw, i will test this PR in my branch, it still fails some tests.

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Sep 27, 2016

jenkins, retest this please

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Oct 12, 2016

changelog

  • rebase against master
  • more cleanup
  • the MonClient::auth should outlive its active_conn, as the reset event could reset all active mon connections, but the objecter (or the owner of MonClient) still expect it to offer the AuthAuthorizer when connecting to a peer, an OSD for example.
@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Oct 12, 2016

retest this please.

@athanatos

This comment has been minimized.

Contributor

athanatos commented Oct 17, 2016

@tchaikov Looks like this has merge conflicts again :(

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Oct 19, 2016

changelog

  • rebased against master

@tchaikov tchaikov changed the title from mon/MonClient: hunt monitors in parallel to [DNM] mon/MonClient: hunt monitors in parallel Oct 21, 2016

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Oct 21, 2016

@ghost

This comment has been minimized.

ghost commented Nov 17, 2016

jenkins test this please (no logs)

@ghost

This comment has been minimized.

ghost commented Nov 21, 2016

jenkins test this please (eio now ignored in master)

@ghost

This comment has been minimized.

ghost commented Nov 22, 2016

jenkins test this please (fast mark down now fixed in master)

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Nov 23, 2016


2016-11-23 09:42:52.091559 7ff843cd8740 -1 asok(0x29c82f0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/home/jenkins-build/build/workspace/ceph-pull-requests/build/src/test/td/t-7200/out/client.admin.7555.asok': (17) File exists
2016-11-23 09:42:52.130459 7ff7ef652700  0 cephx: no TicketHandler for service mds
2016-11-23 09:42:52.130466 7ff7ef652700  0 cephx: no TicketHandler for service mds
2016-11-23 09:42:52.130520 7ff7ef652700  0 -- 127.0.0.1:0/2054089483 >> 127.0.0.1:6848/560830648 conn(0x2a97b10 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got BADAUTHORIZER
2016-11-23 09:42:52.130538 7ff7ef652700  0 cephx: no TicketHandler for service mds

4 - cephtool-test-mds.sh (Timeout)

retest this please.

@tchaikov tchaikov changed the title from [DNM] mon/MonClient: hunt monitors in parallel to mon/MonClient: hunt monitors in parallel Dec 1, 2016

@tchaikov tchaikov assigned athanatos and unassigned athanatos Dec 1, 2016

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Dec 1, 2016

will run it through the rados qa suite first.

@tchaikov tchaikov requested a review from jdurgin Feb 7, 2017

@tchaikov tchaikov changed the title from mon/MonClient: hunt monitors in parallel to [DNM] mon/MonClient: hunt monitors in parallel Feb 8, 2017

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Feb 8, 2017

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1  0x00007f99f575540a in __GI_abort () at abort.c:89
#2  0x00007f99f9eea710 in CephxClientHandler::handle_response (this=0x7f99c803bcf0, ret=0, indata=...) at /var/ceph/ceph/src/auth/cephx/CephxClientHandler.cc:195
#3  0x00007f99f9b441bb in MonConnection::authenticate (this=0x7f99bc002468, m=0x7f99e4002930) at /var/ceph/ceph/src/mon/MonClient.cc:1252
#4  0x00007f99f9b4387d in MonConnection::handle_auth (this=0x7f99bc002468, m=0x7f99e4002930, entity_name=..., want_keys=21, keyring=0x55ed95436f00)
    at /var/ceph/ceph/src/mon/MonClient.cc:1195
#5  0x00007f99f9b3befd in MonClient::handle_auth (this=0x55ed953f5fc8, m=0x7f99e4002930) at /var/ceph/ceph/src/mon/MonClient.cc:491
#6  0x00007f99f9b397f7 in MonClient::ms_dispatch (this=0x55ed953f5fc8, m=0x7f99e4002930) at /var/ceph/ceph/src/mon/MonClient.cc:270
#7  0x00007f99f9b7b4ed in Messenger::ms_deliver_dispatch (this=0x55ed954178a0, m=0x7f99e4002930) at /var/ceph/ceph/src/msg/Messenger.h:602
#8  0x00007f99f9b7a2e4 in DispatchQueue::entry (this=0x55ed95417a18) at /var/ceph/ceph/src/msg/DispatchQueue.cc:197
#9  0x00007f99f9c92a2a in DispatchQueue::DispatchThread::entry (this=0x55ed95417bc0) at /var/ceph/ceph/src/msg/DispatchQueue.h:102
#10 0x00007f99f9dc3a55 in Thread::entry_wrapper (this=0x55ed95417bc0) at /var/ceph/ceph/src/common/Thread.cc:89
#11 0x00007f99f9dc398a in Thread::_entry_func (arg=0x55ed95417bc0) at /var/ceph/ceph/src/common/Thread.cc:69
#12 0x00007f9a029a0424 in start_thread (arg=0x7f99e9ffb700) at pthread_create.c:333
#13 0x00007f99f58099bf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

not ready for review yet

@tchaikov tchaikov changed the title from [DNM] mon/MonClient: hunt monitors in parallel to mon/MonClient: hunt monitors in parallel Feb 11, 2017

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Feb 11, 2017

tested at http://pulpito.ceph.com/kchai-2017-02-11_07:28:54-rados-wip-16091-monc-in-parallel---basic-smithi/, all tests passed. the failed ones are :

@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Feb 11, 2017

@jdurgin could you take a look at this one? thanks!

@@ -24,28 +26,20 @@
#include "common/config.h"
#include "auth/AuthClientHandler.h"
#include "auth/RotatingKeyRing.h"
#include "common/SimpleRNG.h"

This comment has been minimized.

@jdurgin

jdurgin Feb 14, 2017

Member

this is unused now - can get rid of the header

@@ -362,20 +398,19 @@ class MonClient : public Dispatcher {
}
uint64_t get_global_id() const {
return global_id;
if (active_con) {

This comment has been minimized.

@jdurgin

jdurgin Feb 14, 2017

Member

looks like we need to take monc_lock here now, to protect accessing active_con

@@ -111,14 +107,17 @@ int MonClient::get_monmap_privately()
ldout(cct, 10) << "have " << monmap.epoch << " fsid " << monmap.fsid << dendl;
std::random_device rd;

This comment has been minimized.

@jdurgin

jdurgin Feb 14, 2017

Member

let's use random_r() to avoid seeding issues with this (http://www.pcg-random.org/posts/cpps-random_device.html)

This comment has been minimized.

@tchaikov

tchaikov Feb 14, 2017

Contributor

@jdurgin, i don't think we need a rng emitting a number sequence different from each other in each MonClient.

This comment has been minimized.

@jdurgin

jdurgin Feb 14, 2017

Member

what happens to our reconnect attempts when std::random_device keeps returning the same sequence of monitors, such that the first mon_client_hunt_parallel of the chosen sequence are unreachable?

same issue in _add_conns()

This comment has been minimized.

@tchaikov

tchaikov Feb 14, 2017

Contributor

@jdurgin

#include <random>
#include <vector>
#include <iostream>
#include <algorithm>

using namespace std;

void print_rng(int n)
{
  vector<unsigned> ranks(n);
  for (unsigned i = 0; i < n; i++) {
    ranks[i] = i;
  }
  std::random_device rd;
  std::mt19937 rng(rd());
  std::shuffle(ranks.begin(), ranks.end(), rng);
  for (unsigned i = 0; i < n; i++) {
    cout << ranks[i] << ", ";
  }
  cout << endl;
}

int main()
{
  for (int i = 0; i < 10; i++) {
    print_rng(10);
  }
}
[kchai@rex001 tmp]$ ./a.out
2, 9, 7, 0, 6, 5, 8, 3, 1, 4,
6, 0, 9, 3, 2, 5, 1, 4, 7, 8,
6, 2, 7, 5, 4, 0, 3, 9, 8, 1,
2, 9, 0, 5, 8, 1, 3, 7, 6, 4,
5, 1, 9, 6, 2, 8, 3, 7, 4, 0,
7, 3, 8, 5, 1, 4, 0, 2, 6, 9,
2, 4, 9, 6, 8, 1, 0, 5, 7, 3,
4, 6, 3, 2, 5, 9, 0, 1, 8, 7,
9, 4, 1, 8, 0, 2, 3, 6, 7, 5,
4, 7, 2, 1, 0, 8, 5, 6, 9, 3,
[kchai@rex001 tmp]$ ./a.out
8, 7, 2, 5, 4, 3, 0, 1, 9, 6,
4, 1, 8, 6, 5, 9, 2, 3, 0, 7,
6, 7, 2, 9, 0, 8, 1, 5, 3, 4,
7, 1, 0, 8, 5, 4, 2, 9, 6, 3,
0, 9, 1, 6, 5, 2, 4, 7, 8, 3,
5, 2, 8, 4, 6, 9, 0, 7, 1, 3,
9, 3, 6, 0, 1, 7, 5, 4, 2, 8,
4, 9, 6, 5, 8, 0, 3, 1, 2, 7,
0, 2, 7, 3, 5, 1, 9, 4, 6, 8,
6, 3, 9, 0, 4, 1, 5, 2, 7, 8,
[kchai@rex001 tmp]$ gcc --version
gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

also, i tested this with gcc 5.4.1, gcc 6.3, clang 3.9 and clang 4.0, all of them are generating different sequences in each iteration in the for loop, and in each run. so std::random_device is able to generate non-deterministic random number sequence.

the article you are referencing states that

C++11 random number generators can be initialized either with a single integer or with an object that matches the C++11 Seed Sequence concept.

right. i am using a single integer returned by std::random_device::operator() for creating this "single integer". and this "single integer" is used to seed the mt19937 random number engine. let's look at how "random" the std::random_device is. std::random_device uses the rdrand instruction if available. see https://en.wikipedia.org/wiki/RdRand#Compilers. this matches with my finding testing with different versions of compilers.

It Might Actually Be Deterministic

yes, it's scaring. but the point is "Might". right, libstdc++ always returns 0.0. maybe because it just cannot tell the entropy of rdrand instruction, but definitely not because it is "deterministic". the stackoverflow post referenced by that article applies to the gcc on MingW, which is not supported by us.

This comment has been minimized.

@jdurgin

jdurgin Feb 14, 2017

Member

What's the downside to using a known good method? I realize that this is fine for our typical linux user, but it seems easy to avoid potential bugs in ports to other platforms (e.g. arm, mips, running on an ethernet drive), or using a different stdlib, etc. I see no reason to rely on potentially deterministic behavior when there's a simple alternative.

This comment has been minimized.

@tchaikov

tchaikov Feb 14, 2017

Contributor

@jdurgin

What's the downside to using a known good method?

i think it would be better if we can maintain less code by leveraging existing and well-documented / tested library.

it seems easy to avoid potential bugs in ports to other platforms

on platforms w/o rdrand, libstdc++ reads from /dev/urandom instead so we already avoid potential bugs by using std::random_device, right?

libc++ use arc4random() or nacl_secure_random() or /dev/urandom or rand_s() on windows with different settings. all of them are non-deterministic.

or using a different stdlib

i will try my best to understand the behavior of the supported stdlib as long as we care about it.

josh, i do care about portability just like you do, but i also want to avoid re-inventing the wheel to achieve this. even if the wheel is already in our arsenal.

This comment has been minimized.

@jdurgin

jdurgin Feb 14, 2017

Member

I wasn't suggesting using a non-library function, but this isn't worth holding up this pr for.

This comment has been minimized.

@tchaikov

tchaikov Feb 14, 2017

Contributor

@jdurgin this scares me:

-/*
- * rand() is not thread-safe.  random_r family segfaults.
- * boost::random::* have build issues.
- */

quote from src/common/SimpleRNG.h.

@@ -1111,3 +1128,137 @@ void MonClient::handle_get_version_reply(MMonGetVersionReply* m)
}
m->put();
}
AuthAuthorizer* MonClient::build_authorizer(int service_id) const {
assert(auth || active_con->get_auth());

This comment has been minimized.

@jdurgin

jdurgin Feb 14, 2017

Member

need monc_lock here for using active_con

@@ -169,12 +202,18 @@ class MonClient : public Dispatcher {
double reopen_interval_multiplier;
string _pick_random_mon();

This comment has been minimized.

@jdurgin

jdurgin Feb 14, 2017

Member

this method is deleted in the implementation file

tchaikov added some commits Sep 14, 2016

auth: AuthClientHandler::init() pass parameter by const ref
Signed-off-by: Kefu Chai <kchai@redhat.com>
mon/monclient: hunt for multiple monitor in parallel
* add an option "mon_client_hunt_parallel" for the maxmimum number of parallel
  hunting sessions.

Fixes: http://tracker.ceph.com/issues/16091
Signed-off-by: Steven Dieffenbach <sdieffen@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
mon/MonClient: use __func__ for function names
Signed-off-by: Steven Dieffenbach <sdieffen@redhat.com>
mon/MonClient: remove unnecessary include
Signed-off-by: Kefu Chai <kchai@redhat.com>
mon/MonClient: remove unnecessary helper functions
refactor _reopen_session() by removing wrapper around it.

Signed-off-by: Kefu Chai <kchai@redhat.com>
client: move monc->set_want_keys() before monc->init()
if monc's tick connect to the mon before monc.set_want_keys() is called,
monc won't ask for the key for MDS service, and hence will fail to
build_authorizer() for MDS service. this change ready us for the
feature of monc-connect-to-mon-in-parallel.

Signed-off-by: Kefu Chai <kchai@redhat.com>
mon/MonClient: mark monc_lock a mutable
so we can label the getters of MonClient with the `const` specifier.

Signed-off-by: Kefu Chai <kchai@redhat.com>
@tchaikov

This comment has been minimized.

Contributor

tchaikov commented Feb 14, 2017

@jdurgin, fixed, rebased against master and repushed

@jdurgin

This comment has been minimized.

Member

jdurgin commented Feb 14, 2017

the rest looks good, nice cleanups along the way too!

@jdurgin jdurgin merged commit 4961f2e into ceph:master Feb 14, 2017

2 of 3 checks passed

default Build finished.
Details
Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details

@tchaikov tchaikov deleted the tchaikov:wip-16091 branch Feb 14, 2017

@tchaikov tchaikov referenced this pull request Aug 21, 2018

Merged

auth,common: add lockless auth #23591

0 of 3 tasks complete

tchaikov added a commit to tchaikov/ceph that referenced this pull request Aug 23, 2018

auth: drop the RWLock in AuthClientHandler
the `RWLock` in `AuthClientHandler` was introduced in ceph#1636. back to
then, we were accessing `MonClient::auth` directly, see
ceph@a2eb6ae#diff-fa6c2eba8356ae1442d1bf749beacfdfL6209
. and in ceph#11128, we added a helper method offering accessing to
`MonClient::auth`. this method guards the access to `auth` using
`monc_lock`.

and there is no other users outside of `MonClient` using
`AuthClientHandler`. and all accesses to `MonClient::auth` are now
protected by `monc_lock`.

Signed-off-by: Kefu Chai <kchai@redhat.com>

@tchaikov tchaikov referenced this pull request Aug 23, 2018

Merged

auth: drop the RWLock in AuthClientHandler #23699

0 of 3 tasks complete

tchaikov added a commit to tchaikov/ceph that referenced this pull request Aug 23, 2018

auth: drop the RWLock in AuthClientHandler
the `RWLock` in `AuthClientHandler` was introduced in ceph#1636. back to
then, we were accessing `MonClient::auth` directly, see
ceph@a2eb6ae#diff-fa6c2eba8356ae1442d1bf749beacfdfL6209
. and in ceph#11128, we added a helper method offering accessing to
`MonClient::auth`. this method guards the access to `auth` using
`monc_lock`.

and there is no other users outside of `MonClient` using
`AuthClientHandler`. and all accesses to `MonClient::auth` are now
protected by `monc_lock`.

Signed-off-by: Kefu Chai <kchai@redhat.com>

tchaikov added a commit to tchaikov/ceph that referenced this pull request Aug 23, 2018

auth: drop the RWLock in AuthClientHandler
the `RWLock` in `AuthClientHandler` was introduced in ceph#1636. back to
then, we were accessing `MonClient::auth` directly, see
ceph@a2eb6ae#diff-fa6c2eba8356ae1442d1bf749beacfdfL6209
. and in ceph#11128, we added a helper method offering accessing to
`MonClient::auth`. this method guards the access to `auth` using
`monc_lock`.

and there is no other users outside of `MonClient` using
`AuthClientHandler`. and all accesses to `MonClient::auth` are now
protected by `monc_lock`.

Signed-off-by: Kefu Chai <kchai@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment