Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: VM consolidation, either before or after CC-Bridge bypass #173

Closed
emalm opened this issue Jul 3, 2017 · 7 comments
Closed

Proposal: VM consolidation, either before or after CC-Bridge bypass #173

emalm opened this issue Jul 3, 2017 · 7 comments

Comments

@emalm
Copy link
Member

emalm commented Jul 3, 2017

After one applies the CC-Bridge bypass operations file to cf-deployment, there are only a few services running on both the cc-bridge and diego-brain instance groups (ignoring the omnipresent consul and metron agents):

  • cc-bridge: cc_uploader and tps_watcher
  • diego-brain: auctioneer, file_server, and ssh_proxy

In aggregate, these services consume few computational and network resources and so could be co-located on existing cf-deployment VMs to reduce the total amount of IaaS resources required to deploy Cloud Foundry.

Of these five services, the auctioneer and tps_watcher each claim a service-specific lock to determine the active instance, so there is no benefit to deploying more than one instance per availability zone, to be tolerant to AZ failure. This active-standby coordination pattern is similar to the cloud_controller_clock job on the cc-clock instance group. The cc_uploader, file_server, and ssh_proxy services are all simultaneously active, though, and so they are more suited to scaling horizontally within an AZ, similar to the cloud_controller_ng service on the api instance group or other effectively stateless services in CF.

With these properties in mind, we propose the following set of changes:

  • move the auctioneer and tps_watcher jobs to the cc-clock instance group,
  • move the cc_uploader and file_server jobs to the api instance group,
  • move the ssh_proxy job to the router instance group,
  • eliminate the cc-bridge and diego-brain instance groups entirely.

Moving the ssh_proxy to the router instance group is also consistent with the proxy's function as a router for the SSH protocol and is already so consolidated in the BOSH-Lite operations file. This configuration would also simplify the network topology required for HAProxy to balance load across each service and reduce the number of static IPs required when an external load-balancer such as an F5 handles traffic for the gorouters and the SSH-Proxy instances.

If this consolidation is to be done before the CC-Bridge bypass is complete, the stager, nsync_listener, and tps_listener are horizontally scalable in the same way that the cc_uploader is and could be placed on the api instance group, and the nsync_bulker uses the active-standby coordination pattern that would suit it for co-location on the cc-clock instance group.

This proposal is merely a starting point for discussion. In particular, we eagerly solicit feedback from @zrob, @Gerg, and the rest of the CAPI team, as well as from @jvshahid and the rest of the Diego team. It may also make sense to rename the cc-clock instance group to reflect the multitude of services it would now host.

Thanks,
Eric

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/148265715

The labels on this github issue will be updated when the story is started.

@dsabeti
Copy link
Contributor

dsabeti commented Jul 3, 2017

Thanks for starting this thread, @ematpl. We love to get input from release teams about tips for colocating jobs effectively and meaningfully.

My initial thoughts:

  • I really like the idea of colocating the ssh_proxy with the router. It seems like they have similar concerns.
  • Moving the cc_uploader and file_server to the api VM also seems pretty logical.
  • It seems like you've outlined a good functional reason for colocating the auctioneer and tps_watcher with cc-clock, but there's no intrinsic reason to bind those jobs together. What would you name that job to describe its responsibilities and/or the services deployed to it?

@emalm
Copy link
Member Author

emalm commented Jul 3, 2017

  • It seems like you've outlined a good functional reason for colocating the auctioneer and tps_watcher with cc-clock, but there's no intrinsic reason to bind those jobs together. What would you name that job to describe its responsibilities and/or the services deployed to it?

Yeah, I'm struggling with a good name to encapsulate the responsibilities of the services on that VM: the cc-clock and nsync_bulker both do periodic synchronization of CF apps and tasks between CC and Diego, the auctioneer places new app and task instances on cells, and the tps_watcher watches the Diego API for instance crashes and submits them to CC. So there's not a consistent theme of responsibilities or interaction patterns across them: the only real commonality is their scaling characteristics, but so far I haven't thought of a good name to describe that property. To be fair, the diego-brain instance group already has this junk-drawer-like composition without even the scaling-behavior consistency, so we could continue to use that vague metaphor and call this instance group simply the brain. (Mnemonic: you want only one "lobe" per AZ.)

@dsabeti
Copy link
Contributor

dsabeti commented Jul 5, 2017

@anEXPer @staylor14 are good at naming things, maybe they can help.

This proposal seeks to consolidate some VMs based on their purposes and scaling characteristics. One instance_group is a collection of things that scale the same way, but otherwise don't seem to have a ton in common. Is there a meaningful name we can give this instance_group? It contains cc-clock, nsync_bulker, auctioneer, and tps_watcher.

@emalm
Copy link
Member Author

emalm commented Jul 27, 2017

@dsabeti @anEXPer what if we called that last instance group scheduler? That does describe the auctioneer's role, and has some alignment with the periodic operation of the cc-clock and nsync-bulker jobs.

@dsabeti
Copy link
Contributor

dsabeti commented Jul 27, 2017

@ematpl I think that's as close as we're going to get to a reasonable name. Let's go with that.

emalm added a commit to emalm/cf-deployment that referenced this issue Aug 10, 2017
- Moves ssh-proxy to router
- Moves cc_uploader to api
- Moves file_server to api
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain (cloud_controller_clock; nsync and tps; auctioneer) instance-groups on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 10, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain (cloud_controller_clock; nsync and tps; auctioneer) instance-groups on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 10, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 10, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 15, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 16, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 16, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 16, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 16, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 19, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 23, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 24, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 29, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 29, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Aug 31, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Sep 7, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Sep 11, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Sep 12, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Sep 15, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Sep 15, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
emalm added a commit to emalm/cf-deployment that referenced this issue Sep 19, 2017
- Moves ssh_proxy job to router instance-group
- Moves cc_uploader job to api instance-group
- Moves file_server job to api instance-group
- Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
@emalm
Copy link
Member Author

emalm commented Jun 21, 2018

Closed via #201.

@emalm emalm closed this as completed Jun 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants