New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: VM consolidation, either before or after CC-Bridge bypass #173
Comments
We have created an issue in Pivotal Tracker to manage this: https://www.pivotaltracker.com/story/show/148265715 The labels on this github issue will be updated when the story is started. |
Thanks for starting this thread, @ematpl. We love to get input from release teams about tips for colocating jobs effectively and meaningfully. My initial thoughts:
|
Yeah, I'm struggling with a good name to encapsulate the responsibilities of the services on that VM: the |
@anEXPer @staylor14 are good at naming things, maybe they can help. This proposal seeks to consolidate some VMs based on their purposes and scaling characteristics. One instance_group is a collection of things that scale the same way, but otherwise don't seem to have a ton in common. Is there a meaningful name we can give this instance_group? It contains |
@ematpl I think that's as close as we're going to get to a reasonable name. Let's go with that. |
- Moves ssh-proxy to router - Moves cc_uploader to api - Moves file_server to api - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain (cloud_controller_clock; nsync and tps; auctioneer) instance-groups on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain (cloud_controller_clock; nsync and tps; auctioneer) instance-groups on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
- Moves ssh_proxy job to router instance-group - Moves cc_uploader job to api instance-group - Moves file_server job to api instance-group - Consolidates remaining jobs from cc-bridge, cc-clock, and diego-brain instance-groups (cloud_controller_clock; nsync and tps; auctioneer) on new scheduler instance-group
Closed via #201. |
After one applies the CC-Bridge bypass operations file to cf-deployment, there are only a few services running on both the
cc-bridge
anddiego-brain
instance groups (ignoring the omnipresent consul and metron agents):cc-bridge
:cc_uploader
andtps_watcher
diego-brain
:auctioneer
,file_server
, andssh_proxy
In aggregate, these services consume few computational and network resources and so could be co-located on existing cf-deployment VMs to reduce the total amount of IaaS resources required to deploy Cloud Foundry.
Of these five services, the
auctioneer
andtps_watcher
each claim a service-specific lock to determine the active instance, so there is no benefit to deploying more than one instance per availability zone, to be tolerant to AZ failure. This active-standby coordination pattern is similar to thecloud_controller_clock
job on thecc-clock
instance group. Thecc_uploader
,file_server
, andssh_proxy
services are all simultaneously active, though, and so they are more suited to scaling horizontally within an AZ, similar to thecloud_controller_ng
service on theapi
instance group or other effectively stateless services in CF.With these properties in mind, we propose the following set of changes:
auctioneer
andtps_watcher
jobs to thecc-clock
instance group,cc_uploader
andfile_server
jobs to theapi
instance group,ssh_proxy
job to therouter
instance group,cc-bridge
anddiego-brain
instance groups entirely.Moving the
ssh_proxy
to therouter
instance group is also consistent with the proxy's function as a router for the SSH protocol and is already so consolidated in the BOSH-Lite operations file. This configuration would also simplify the network topology required for HAProxy to balance load across each service and reduce the number of static IPs required when an external load-balancer such as an F5 handles traffic for the gorouters and the SSH-Proxy instances.If this consolidation is to be done before the CC-Bridge bypass is complete, the
stager
,nsync_listener
, andtps_listener
are horizontally scalable in the same way that thecc_uploader
is and could be placed on theapi
instance group, and thensync_bulker
uses the active-standby coordination pattern that would suit it for co-location on thecc-clock
instance group.This proposal is merely a starting point for discussion. In particular, we eagerly solicit feedback from @zrob, @Gerg, and the rest of the CAPI team, as well as from @jvshahid and the rest of the Diego team. It may also make sense to rename the
cc-clock
instance group to reflect the multitude of services it would now host.Thanks,
Eric
The text was updated successfully, but these errors were encountered: