New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr,librados: service map #15858

Merged
merged 14 commits into from Jul 10, 2017

Conversation

Projects
None yet
6 participants
@liewegas
Member

liewegas commented Jun 22, 2017

todo

  • mgr needs to persist the ServiceMap
  • expose ServiceMap and status to modules
  • allow librados clients to subscribe to servicemap updates
  • include services in 'ceph -s'
  • allow daemon status to be set prior to initial connect/startup?

@liewegas liewegas requested a review from jcsp Jun 22, 2017

@@ -3764,6 +3764,22 @@ int RGWRados::init_rados()
}
}
{

This comment has been minimized.

@liewegas

liewegas Jun 22, 2017

Member

@yehudasa how can i tell if we are radosgw-admin and note that accordingly in the metadata?

  1. random cli commands shouldn't register
  2. long-running radosgw-admin jobs maybe should, but they should appear as something different?

This comment has been minimized.

@mattbenjamin

mattbenjamin Jun 23, 2017

Contributor

@liewegas strongly approve #1; as regards #2, if some invocations of radosgw-admin do register, will it be confusing when those drop off (finish or are stopped)?

This comment has been minimized.

@mattbenjamin

mattbenjamin Jun 23, 2017

Contributor

we might like RGWRados to have more context--the type of instance is manifest at RGWStoreManager::get_storage(..) [rgw_main.cc, rgw_admin.cc, librgw.cc], and we'd like to be able to distinguish radosgw, nfs-ganesha instantiating librgw, and radosgw-admin, for sure

This comment has been minimized.

@yehudasa

yehudasa Jun 23, 2017

Member

@liewegas @mattbenjamin yeah, I'd go with (1), and maybe have some mechanism to explicitly set (2) when needed (e.g., when running radosgw-admin metadata sync run).

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 22, 2017

$ ceph service dump
{
    "epoch": 4,
    "modified": "2017-06-22 17:54:58.426099",
    "services": {
        "rgw": {
            "daemons": {
                "0": {
                    "start_epoch": 2,
                    "start_stamp": "2017-06-22 17:54:48.401011",
                    "gid": 4121,
                    "addr": "127.0.0.1:0/2028575351",
                    "metadata": {
                        "host": "gnit",
                        "num_handles": "\u0001",
                        "pid": "16525",
                        "zone_name": "",
                        "zonegroup_id": ""
                    }
                }
            }
        }
    }
}
$ ceph service status
{
    "rgw": {
        "0": {
            "status_stamp": "2017-06-22 17:54:48.401011",
            "status": {},
            "last_beacon": "2017-06-22 17:54:58.401271"
        }
    }
}
@liewegas

This comment has been minimized.

Member

liewegas commented Jun 23, 2017

  cluster:
    id:     9c7c1020-ab0a-4736-ab27-877af5150f74
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum a
    mgr: x(active)
    mds: 1/1/1 up {0=a=up:active}
    osd: 1 osds: 1 up, 1 in
    rgw: 1 active: j
 
  data:
    pools:   7 pools, 56 pgs
    objects: 47 objects, 5952 bytes
    usage:   215 GB used, 157 GB / 372 GB avail
    pgs:     56 active+clean
++p;
}
}
return changed;

This comment has been minimized.

@jcsp

jcsp Jun 23, 2017

Contributor

Maybe set pending_service_map_dirty to changed so that caller can just check that one thing instead of that and the return value.

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 23, 2017

BTW I'm just testing this with MON=1 OSD=1 MDS=1 RGW=1 ../src/vstart.sh -d -n -x -l

@dillaman

This comment has been minimized.

Contributor

dillaman commented Jun 23, 2017

Is it a typical RGW deployment where each is using a unique user id? Looks like at least ceph-ansible creates ids based on the hostname [1], so that will certainly be much longer than the single-character this is expecting.

[1] https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-rgw/tasks/pre_requisite.yml#L30

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 23, 2017

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 23, 2017

metadata["zone_name"] = zone_name;
string name = g_conf->name.get_id();
if (name.find("rgw.") == 0) {
name = name.substr(4);

This comment has been minimized.

@mattbenjamin

mattbenjamin Jun 23, 2017

Contributor

is name the attribute which should indicate that (e.g.) this is a "radosgw" or "nfs-ganesha" instance? a new field?
for nfs-ganesha, there is other info that we'd eventually be able to publish (e.g., exports)

This comment has been minimized.

@dillaman

dillaman Jun 23, 2017

Contributor

I would think the "rgw" used in service_daemon_register below is what should be changed -- name is the "unique" id for this instance of the daemon.

This comment has been minimized.

@mattbenjamin

mattbenjamin Jun 23, 2017

Contributor

@dillaman wouldn't it be useful to have some grouping, e.g.,
"rgw" {
"radosw" {
... }
"nfs" {
...}
...
}

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 26, 2017

@jcsp This moves the ephemeral status to DaemonState and makes the interaction look more like osdmap/mdsmap. The update flow is still daemon -> mgr's ServiceMap pending_service_map -> mon -> mgr's ServiceMap service_map and into DaemonState. I think this is how it needs to be, though, if we want to prevent making the daemon visible as an official part of the service until it is persisted...

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 26, 2017

@jcsp updated (altho for some reason i'm not seeing the commits at https://github.com/liewegas/ceph/commits/wip-mgr-servicemap appear here, github weirdness?).

@jcsp

This comment has been minimized.

Contributor

jcsp commented Jun 26, 2017

Github is definitely confused, commits linked on this page are not found. Will pull the branch and look locally.

@jcsp

This comment has been minimized.

Contributor

jcsp commented Jun 27, 2017

Some additions made while playing with this:
liewegas#52

In that PR I've made the services send the same common metadata that OSDs and MDSs use. The hostname from the metadata is also picked out for use in DaemonStateIndex, so that the services appear properly when people list their services by server. Now that we're inventing messages specifically for this, we should probably make hostname a first class thing rather than mixing it in with the freeform metadata though.

I'm not sure about the way mgr is subscribing to servicemap from the mons -- while the mons are the ones persisting it, I think we should think of the mgr as the authority (only writer), so it should not be listening to the map continuously, rather it should load it one time on startup similar to what it does with module config and with the osd/mds metadata. If we make that change, then the servicemap in ClusterState goes away and all that data is owned by DaemonServer/DaemonStateIndex, which feels right to me.

The output in ceph status probably shouldn't be a list of daemon names -- someone with 16 RGW gateways named after their domain names is going to find that painful. It should probably just be a count. I suspect we'll rejigger this to have a status-interpreting piece on the mgr that generates that string and any transforms the service status data into cluster health checks, at which point the completely generic piece in Monitor would go away, but just displaying the daemon count is still a sensible thing for now.

I think the interface to python should be woven in with the get_server_python/get_metadata_python piece, rather than exposing the servicemap wholesale to be walked by python -- most plugin/ui code will want to report/reason about services in general, including both the native ones and the librados ones.

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 27, 2017

I think the ClusterState piece needs to stay:

  1. It seems important/useful to distinguish between services that are persisted as part of the map so that consumers can rely on the map contents without seeing things disappear after a mgr restart (or have to wait 1s-10s of seconds for a "complete" map to be rebuilt). The subscription was the easiest way to get continuous stream to the mgr of what has been persisted (both during startup and when updates are sent to the mon--MMonMgrReport doesn't have an ack).
  2. When services start they immediately appear in DaemonState, but won't be persisted as part of the service map yet. We'd need some limbo state in DaemonState to manage which services are registered and which ones aren't.

Agree on ceph -s. I'll include a summary string in ServiceMap, and fall back to a simple daemon count?

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 27, 2017

Not sure I follow the suggestion about the get_{server,metadata}_python(). With your hostname change daemons should already appear there, right? And the top-down map view is exposed via get() the same way the other map types are.

}
r.service_daemon_register("rgw", name, metadata);
first = false;
}

This comment has been minimized.

@yehudasa

yehudasa Jun 29, 2017

Member

@liewegas as you mentioned in the comment to this commit, cannot tell if it is standalone daemon. Probably the wrong place to put this initialization. I think this should be done somewhere around the frontend(s) initialization. I'll look into it.

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 29, 2017

@yehudasa

This comment has been minimized.

Member

yehudasa commented Jun 29, 2017

@liewegas can we register, and update metadata later?

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 29, 2017

@yehudasa

This comment has been minimized.

Member

yehudasa commented Jun 29, 2017

@liewegas the tricky bit here is that all frontend initialization happens after rados related init. Flipping around initialization doesn't look like an easy task, and without having that info we wouldn't be able to provide interesting info e.g., frontends used, ports, etc. And I think being able to update info dynamically later would probably beneficial anyway.

@jcsp

This comment has been minimized.

Contributor

jcsp commented Jun 30, 2017

I guess this is a general issue -- any application that stores its config inside RADOS can't report any of that until RADOS is up.

Perhaps we should change the way we initialize MgrClient in librados, and only fire up a mgr session at the point someone asks to define their service metadata (or if they ever use rados_mgr_command). That way the metadata would always be available at mgr session start, and as a bonus we would be avoiding starting pointless mgr sessions for clients that don't need them.

As for changing the metadata at runtime... I guess it depends what we consider to be metadata. If there's something that a daemon can change at runtime, then I would call that information about the workload rather than metadata about the daemon itself -- we do have the status part for the changeable stuff.

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 30, 2017

Changing the MgrClient start time sounds good to me. I'll give that a go today.

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 30, 2017

@yehudasa ok, the service_daemon_register can happen at any time now. (the sample patch just moved it down after connect() to demonstrate but presumably it'll go elsewhere.)

metadata is still fixed for the lifetime of the librados instance, but remember there's service_daemon_status() for the stuff that changes.

@yehudasa

This comment has been minimized.

Member

yehudasa commented Jun 30, 2017

@liewegas thanks!

@liewegas

This comment has been minimized.

Member

liewegas commented Jul 1, 2017

retest this please

@liewegas liewegas modified the milestone: luminous Jul 6, 2017

liewegas and others added some commits Jun 22, 2017

mgr: index daemon state by string service type
If we use a string we can allow for other service names like
'rgw' and 'rbd-mirror' and 'iscsigw' and so on.

Signed-off-by: Sage Weil <sage@redhat.com>
vstart.sh: give rgw daemons letter names
Signed-off-by: Sage Weil <sage@redhat.com>
mgr: add ServiceMap
Signed-off-by: Sage Weil <sage@redhat.com>
mon: persist ServiceMap
Signed-off-by: Sage Weil <sage@redhat.com>
mon: include services in 'ceph -s'
Signed-off-by: Sage Weil <sage@redhat.com>
mgr: allow/track service registrations
Signed-off-by: Sage Weil <sage@redhat.com>
mgr: implement 'service dump' and 'service status'
Signed-off-by: Sage Weil <sage@redhat.com>
librados: allow service registrations
Signed-off-by: Sage Weil <sage@redhat.com>
mgr/PyModules: expose service_map to modules
Signed-off-by: Sage Weil <sage@redhat.com>
mgr: expose daemon status to modules
Signed-off-by: Sage Weil <sage@redhat.com>
librados: include common daemon metadata in service_daemon_register
Aside from being generally useful, this uniformity enables ceph-mgr
to have some common information about all the hosts
it knows about; otherwise we would sometimes learn
about a host without e.g. knowing about the cpu/ram/kernel.

Signed-off-by: John Spray <john.spray@redhat.com>
mgr: set hostname propery on ServiceMap services
Signed-off-by: John Spray <john.spray@redhat.com>
mon: make service summary string customizable; simple default
Eventually the mgr can populate this field with something tailored to the
service.

Signed-off-by: Sage Weil <sage@redhat.com>
rgw: register as a service
support dynamic reload, and also add frontend info

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
@liewegas

This comment has been minimized.

Member

liewegas commented Jul 10, 2017

@jcsp i think this is ready to merge!

@jcsp

jcsp approved these changes Jul 10, 2017

@jcsp jcsp merged commit 44bce9e into ceph:master Jul 10, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details
@ukernel

This comment has been minimized.

Member

ukernel commented Jul 12, 2017

 ceph version 12.0.3-2915-ge27477e (e27477ef0bccf4c2b41f13cf732f0b6e883ccc7b) luminous (rc)
 1: (()+0x83e074) [0x555555d92074]
 2: (()+0x115b0) [0x7ffff73905b0]
 3: (gsignal()+0x9f) [0x7ffff45b38df]
 4: (abort()+0x16a) [0x7ffff45b54da]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7ffff4ef552d]
 6: (()+0x8f2d6) [0x7ffff4ef32d6]
 7: (()+0x8f321) [0x7ffff4ef3321]
 8: (()+0x8f539) [0x7ffff4ef3539]
 9: (()+0x53e772) [0x555555a92772]
 10: (()+0x54810d) [0x555555a9c10d]
 11: (ServiceMap::decode(ceph::buffer::list::iterator&)+0x3e) [0x555555b63ece]
 12: (MgrStatMonitor::update_from_paxos(bool*)+0x268) [0x555555a8b2c8]
 13: (PaxosService::refresh(bool*)+0x3ff) [0x5555559ba88f]
 14: (Monitor::refresh_from_paxos(bool*)+0x1a3) [0x555555898ac3]
 15: (Monitor::init_paxos()+0x11d) [0x555555898f1d]
 16: (Monitor::preinit()+0x95e) [0x5555558b1c2e]
 17: (main()+0x43fe) [0x5555557dca5e]
 18: (__libc_start_main()+0xf1) [0x7ffff459e401]
 19: (_start()+0x2a) [0x55555586816a]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this.

commit "mon: persist ServiceMap" causes monitor to crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment