Skip to content

Service Discovery in BOSH Background

James Bayer edited this page Nov 10, 2013 · 1 revision

These discussions originated by some members of the BOSH team talking about SkyDNS and whether it could be relevant to BOSH.

Chris Brown

To some extent, yes, this would introduce more variability to a system. Jobs wouldn't be able to rely on knowing everything about the services that they are related to at deploy time anymore. The question is whether they ever needed to or whether they could get away with being able to interact with services through information that they could discover. There is already some variability in BOSH deployments because we allow dynamic IPs to be claimed by jobs that don't have static IPs.

BOSH's current approach of statically defining everything is great and one of my favourite features. In fact our entire technology stack from the health monitor in BOSH, through the health manager in Cloud Foundry, to the reaping of orphan services in the service brokers, takes this approach of defining a state that the world should be in and then aggressively changing the world to match that definition without human interaction. It's a fantastic pattern and I think that any way we can help BOSH accomplish it is worth looking into as although BOSH is statically defined the world that it lives in is unfortunately not.

The most important part of service discovery would be that if BOSH detected that a service was being affected by an outage then it could seamlessly try to recover the service while presenting the new URL for the recovered service to the rest of the system. It could also allow for different nodes in the cluster to be routed to different services based on locality requirements. Machines in a certain AZ could be pointed at the corresponding service in the same AZ without having to do some messy solution of having a new job with a different properties configuration per AZ. Additionally, if an ephemeral service became degraded in one AZ then BOSH could advertise another one that was working in a adjacent AZ automatically even if the performance would be slightly suboptimal. The feature may also enable the service to be upgraded out of band of the deployments which use it but I haven't thought enough about this to be able to comment on it.

For me service discovery would also be a method of sharing functionality between releases. There has been talk for a long time of being able to easily share release artifacts (packages and/or jobs) between different releases so that every time you need a service that has been made before you don't need to copy and paste it into your new release. In order to opinionatedly push the point across that, for the most part, service oriented architectures are a sensible pattern we could enable and encourage them with BOSH.

This is easiest to explain with an example. If we had a distributed system that required a large high availability Memcache cluster then we may not want to complicate our main release with all that knowledge and understanding of how to run, scale, and orchestrate this cluster. Instead we should look at it as if it's just an interface that needs to be met. In a pokey little development environment in BOSH-lite this could be a single node instance whereas in production it could be multiple levels of load balancing sitting in front of 1000s of nodes of Memcache. We don't really care as long as there is a URL that the rest of our distributed system can use to talk to something that quacks like Memcache.

A release or service could advertise the services it provides while listing the services it requires in order to run. During deploy time these dependencies could be resolved and the deployment could ask the BOSH director for the URL of the Memcache service which would refer to another deployment on that same director. This way a release would be a composable tree of services that could be worked on and improved independently as long as they didn't break their contract.

Admittedly, the above problem could be solved right now by specifying the URL in the deployment manifest for a service that was provided by a separate deployment. There would be no need for service discovery feature of the director for this to work. A concern that I have about the above is that it starts to bend the outer/inner shell boundary a little too much for my liking.

The challenges mentioned above are not a problems that have been encountered at the moment in Cloud Foundry though so even if this feature was on the roadmap I could see that this would be very, very low priority. However, I have heard murmurings of possibly splitting cf-release up into different releases that work together.

Anyway, that's enough Inline image 1 from me. I'm not sure if what I've written above is a good idea but it's something that I've been thinking about for a while. I don't actually have a clue what I'm talking about and most of this is me dreaming about features that I'd love to see years down the line.

Matthew Kocher

You bring up a lot of good points about releases providing services, which I think is a really good way of looking at this. CF Release currently provides a runtime service that's available at api.#{system_domain}. It relies on two instances of a relational database service, and a blobstore service. Right now, aws bootstrap provides these, and it's a fairly broken abstraction. If you squint enough, it begins to look a lot like User Provided Services in Cloud Controller.

Contents

Community Advisory Board, PMC Schedules

Developing CF

Latest CF Releases

Roadmap and Trackers

See CFF official project list.

Roadmaps are reflected in pivotal trackers. Tracker Instructions and steps to watch stories. Here is a flat list of all trackers:

CIs

Maybe other CIs hosted on cf-app.com are mentioned in slack ?

Using CF

Running CF

Tools

Clone this wiki locally