Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DESIGN: External IPs #1161

Closed
thockin opened this issue Sep 3, 2014 · 24 comments
Closed

DESIGN: External IPs #1161

thockin opened this issue Sep 3, 2014 · 24 comments
Assignees
Labels
kind/design Categorizes issue or PR as related to design. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@thockin
Copy link
Member

thockin commented Sep 3, 2014

Goal

To evaluate options for connecting “external” IP addresses to kubernetes-hosted applications.

Non-Goals

To discuss the kubernetes Service abstraction (mostly, see #1107).

Background

Running Pods and Services within a kubernetes cluster is fairly well defined (or becoming so).  A frequent question, though, is how to publish something to the “outside world” (for which the definition varies depending on the hosting situation).  Today we have a Service.CreateExternalLoadBalancer flag, which tells kubernetes to invoke cloud-provider logic to establish a load balancer across all minions on the service port on the minion’s interface.  This port is answered by the kube-proxy, which does it’s own round-robin “balancing”.

This design was easy to establish as a prototype, but is problematic for the long term because it depends on a flat service-port namespace, which we want to make go away (see #1107), and because it uses two levels of balancing, which is not good for performance or predictability.

We should be able to establish a more performant and direct pattern for externalizing kubernetes-hosted applications.

Design

There are a few ways this could be done.  The remainder of this doc will explore the trade-offs.  I will describe the need for external access as an external port, rather than an external IP.  This leaves the implementations free to optimize IP usage, if needed.

Option 1) Flag some services as “external”, run load balancers inside k8s

Services carry a flag indicating the need for an external port.  When a Service is created with this flag set, the kubernetes master will spin up a new pod (call it a “service pod”) which runs a kubernetes-aware balancer (this might just be kube-proxy or an enlightened HAProxy or ...).  Like all pods, this pod must be managed by a controller (call it a “service controller”) and is given a pod IP address.  The controller for this pod has the responsibility to reach out to the cloud provider and provision an external port which forwards to the service pod’s IP.  Whatever portal solution we decide to use (see #1107) would forward to this service pod.  If the service pod is relocated, the service controller may need to update the cloud provider.

Pros:

  • Fairly consistent experience across cloud providers
  • Same implementation, minus external, can be used for “real” internal load balancing

Cons:

  • Less obvious how to integrate real cloud-provider load-balancing
  • Users don’t get to choose their balancing solution
    • could be mitigated by extra fields in Service
  • Expensive for singleton pods that want an external port
    • could be mitigated by changing the external flag to indicate N=1 vs N>1

Option 2) Flag some services as “external”, let cloudprovider define the meaning

Similar to option 1, but more abstract.  Services carry a flag indicating the need for an external port.  When a Service is created with this flag set, the kubernetes master will reach out to the cloud provider and provision an external service.  The cloud provider modules would determine what this means.  For the simplest cases, this could be implemented the same as option 1.  For cloud providers that can use native load balancers, this could be implemented in terms of them.

Pros:

  • As an abstraction, provides freedom to evolve the implementation
  • Sets a pattern for internal load balancing as a function of cloud provider

Cons:

  • Less consistent across cloud providers
  • Users don’t get to choose their balancing solution
    • could be mitigated by extra fields in Service
  • Expensive for singleton pods that want an external port
    • could be mitigated by changing the external flag to indicate N=1 vs N>1

Option 3) Any pod can be flagged as “external”, users set up their own balancing

Similar to option 2, but not tied to Services.  Pods carry a flag indicating the need for an external port.  When a Pod is created with this flag set, the kubernetes master will reach out to the cloud provider and provision an external port.  The cloud provider modules would determine what this means.  The end user is responsible for setting up the pod to be a load balancer, though we can make kube-proxy be suitable for this purpose.  Because this is a plain pod, it needs to be run under a controller, itself - probably a replication controller of N=1.  In order for this pattern to hold for internal load balancing, the user would need another Service with a Portal (see #1107) that forwards to it.

Pros:

  • Users can configure anything they want as their balancer
  • This is the underlying form of options 1 and 2 anyway
  • Conceptually simple
  • Very composable
  • Works well for singleton pods that want an external IP but don’t need balancing

Cons:

  • Forces users to set up balancing themselves
  • Less obvious how to integrate real cloud-provider load-balancing
  • Not obvious which master module is responsible for syncing external IP to internal IP when the pod is relocated
  • Internal use through Portals (See DESIGN: Services v2 #1107) would be an extra hop (kube-proxy + load balancer) since it is not part of the Service abstraction
    • could be mitigated with a new kind of Service, “singleton” or something to make a direct portal
@thockin
Copy link
Member Author

thockin commented Sep 3, 2014

@kelseyhightower Since you were asking about this today.

@jbeda
Copy link
Contributor

jbeda commented Sep 4, 2014

How does this devolve where there is no cloud provider? What about a simple case where external IPs are 1:1 with nodes or external IPs must be provisioned and forwarded explicitly?

For me the sniff test here is that we should make it super easy to install a routing service into a k8s cluster that can run an L7 load balancer and dispatch (based on some rules) to services. I'd love to map that out for the various options here.

@thockin
Copy link
Member Author

thockin commented Sep 4, 2014

No cloud provider: I don't know. There's a bunch of things we do that
assume a cloud provider module.

If you only have 1 external IP per VM, this all breaks down - you can not
have a stable external IP for a service.

If external IPs must be provisioned and forwarded explicitly, that's
exactly what these proposals address, no?

Maybe I just don't get what you mean by "routing service"?

For an L7 balancer, we probably need some other consideration - this has
been focused on L3. How do we designate that we want an L3 vs L7 balancer?
Maybe that becomes a kind of service, so you can spec:

Service:

  • Name: my-service
  • Kind: http
  • External: true

?

On Thu, Sep 4, 2014 at 2:11 PM, Joe Beda notifications@github.com wrote:

How does this devolve where there is no cloud provider? What about a
simple case where external IPs are 1:1 with nodes or external IPs must be
provisioned and forwarded explicitly?

For me the sniff test here is that we should make it super easy to install
a routing service into a k8s cluster that can run an L7 load balancer and
dispatch (based on some rules) to services. I'd love to map that out for
the various options here.

Reply to this email directly or view it on GitHub
#1161 (comment)
.

@lavalamp
Copy link
Member

lavalamp commented Sep 5, 2014

I prefer hybrid option 2 & 1: cloudprovider is tasked with providing an externally accessible IP address, but our default cloudprovider should implement this via a bridge/loadbalancer k8s application (replicationController, pods w/hostport or equivalent. I think this could be accomplished with our proxy if we made it take env var configuration).

@jbeda
Copy link
Contributor

jbeda commented Sep 5, 2014

Some scenarios we need to support (I think):

  • A case where a ticket with network engineering must be filed to forward connections from the internet to a specific machine and port. In this case the mapping is very static and isn't configured interactively.
  • The case where you buy cheap machines from a hosting provider where you get a single IP per machine. External ports will be competing with other native daemons listening on the machine. There is a fixed pool of IPs available.

As for "routing service": Dynamically configured L7 load balancer that forwards to other services/jobs based. Specifically, I see it like this:

  • Run N instances of an HTTP proxy on my k8s cluster
  • Configure each of those behind an L3 load balancer. As such, they have to listen on port 80. If an L3 load balancer isn't available, gather all of the external IPs such that I can publish them in DNS and use DNS round robin.
  • Provide an API to users to configure the router. It would be rules like "forward all HTTP traffic sent to blog.mydomain.com to the wordpress-mydomain service" or "Forward all HTTP requests to blog.mydomain.com and with path prefix /static to the static-assets-mydomain service"

Many users are going to be running where they don't have a cloud L7 service to lean on and will want to run their own shared LB service across their cluster.

@jbeda
Copy link
Contributor

jbeda commented Sep 5, 2014

Okay -- here is a concrete proposal:

Have an explicit set of external IPs and ports that users can access. Kubernetes knows what external IPs map to which nodes. (access policy for which users get to map to which IPs/ports is TBD). If the cloud supports L3 load balancing or assignable IPs then it is understood that this mapping is fluid and robust against rescheduling.

When a user wants to expose a service or pod (I slightly prefer pod) externally they can say:

  • give me any external port
  • give me any IP but port 80
  • give me a whole IP
  • give me port 80 on a specific IP

We need to make sure that users that have mapped DNS to an IP can continue to use that IP over time. Because of this I think we'll end up with external IPs as a first class entity in the system.

Most users won't see this though -- I think that they'll get access to an HTTP router (described above) and will use that. The HTTP router will use this to get a stable IP and probably claim 80 and 443 on that IP. More advanced users can map the same IP to multiple machines using L3 load balancing, etc.

@smarterclayton
Copy link
Contributor

Labeling Joe's as option 4. If you expose a pod externally on a random port (proxy or iptables) you either need to determine up front that the port is available, or assign later (schedule?) as a binding of external port -> pod port. When a pod is deleted you need to remove the invalid binding / return the port to a pool.

@bgrant0607 bgrant0607 added kind/design Categorizes issue or PR as related to design. sig/network Categorizes an issue or PR as relevant to SIG Network. labels Sep 25, 2014
@smarterclayton
Copy link
Contributor

Going a bit further here as relevant to #561, I think Joe's option 4 is the best way to do the following:

  • describe a set of internal IPs that are candidate for balancing (service label selector, which is what services are intended to do as opposed to portals)
  • request / be assigned a "pet" IP address (or addresses) that is stable and available to external users (the user indicates a request for an external IP and the system attempts to make it stable as possible, but you still may have to get it) that can be mapped to those internal IPs
  • the infrastructure is responsible for satisfying those requests for stable and available IPs that map to those internal IPs
    • It may do so via an external TCP load balancer (GCE, ELB, F5, existing HA system solutions)
    • It may do so via a "best effort" IPtables rule on a host (not really stable, just cheap and easy to implement)
    • It should be easy for an administrator to transform a "best effort" rule on a host to a "really available" IP later, although potentially not without transient disruption to applications

Once you have those three constructs, you can build any sort of HTTP proxy solution running in pods (as discussed in openshift/origin#88 and #561) reliably. We should ootb be able to provide a solution.

@eparis
Copy link
Contributor

eparis commented Sep 25, 2014

I'm of the belief (after "discussing" with @smarterclayton and a whiteboard) that external IPs (or whatever we want to call them) should be a first class citizen that kube knows about. But which kube does not 'manage'. The infrastructure needs to provide these 'external resources' and in communication with kube (either the apiserver or etcd itself) configure itself so that traffic which hits the right external IP/port will get forwarded to (one of) the right minion IP/ports.

Kube would need to know what resources (IPs, Ports) are available, and the admin needs to specify that a pod/service (I prefer service) should be mapped to an external resource. It would then be up to the cloud provider/external resource to get traffic destined for a given 'external ip/port' to the right internal service. I do not believe that a minion IP address should be an 'external IP' in anyway. Minions are too ephemeral and External IPs are too pet-like.

For a bare metal standup this means a simple implementation could be a single machine with 1 or more static IPs assigned. That single machine could listen to the apiserver/etcd to learn about mappings from external address to internal services and set up iptables rules. This pushes the problem of HOW to do the mapping outside of kube itself and onto the 'cloud provider'. The simple iptables machine could be replaced with haproxy/f5/some clustered magic/etc. It also means that if needed one could stand up an haproxy container to routing things INSIDE the kube cluster and have the external machine just push data at that/those internal routers....

@smarterclayton
Copy link
Contributor

admin needs to specify that a pod/service (I prefer service) should be mapped to an external resource

If the service gets deleted, should the external binding go away? Would I then be unable to rename the service that an external IP connects to without potentially losing a stable IP? Needs more thought for sure.

@eparis
Copy link
Contributor

eparis commented Sep 25, 2014

Whether the 'external service' maps to a label on a service or a label on a pod, I'm not sure I care (pod might make more sense). But yes, if you delete the internal service/pod the external resource doesn't go anywhere. If you add a new service/pod with the right labels, it magically start forwarding traffic to the right service/minion....

@bgrant0607 bgrant0607 added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 4, 2014
@bgrant0607
Copy link
Member

/cc @kelseyhightower Since it looked like he was asking about this on IRC this morning.

@bgrant0607 bgrant0607 added this to the v0.9 milestone Oct 20, 2014
@KyleAMathews
Copy link
Contributor

Also SSL termination. Any service I'm exposing to the outside world will be run over SSL so I'll need an easy way to do that.

@jbeda
Copy link
Contributor

jbeda commented Oct 23, 2014

@KyleAMathews I agree that SSL termination is easy, but I think that is really at a different layer.

One way to think about it -- a cluster will have a limited set of external IPs that will be mapped to host machines (VMs or otherwise) in a variety of ways. In some environments we can reconfigure and expand this set (either via API or filing a ticket with an ops team) while other environments will have to stick with what they have.

This issue is really about:

  • How do we represent the reality of how external IPs are mapped to host nodes?
  • How do we get traffic coming in on those IPs to specific pods (directly or through services?)
  • [advanced] How do we present ways for users to expand/reconfigure the IP situation?

Stuff like HTTP load balancing and SSL termination would be done by a service that is run on kubernetes (or built into the hosting cloud) and uses this mechanism to provide an API at that level. It is super important but we got to get this stuff nailed first.

@nhorman
Copy link

nhorman commented Nov 7, 2014

Hey all, new to this, but have been lurking for awhile.

Having read this, it seems to me that we should treat external ip's as exactly that, external. We have no guarantee that we as users of a cloud service, that we will be able to assert any configuration on the network connecting minions such that a user provided external ip will be reachable on any given pod. As such I think we need to handle external ip's as a mapped resource, not unlike the way service addressing is currently handled (we might even be able to piggy back on some of that infrastructure). To provide a concrete example, if a kube user (to use eparis' parlance from here:
https://github.com/GoogleCloudPlatform/kubernetes/pull/2003/files) Wants to provide external access to a service that he is running in kubernetes, he:

  1. Obtains a private ip address range from the infrastructure admin
  2. Adds etcd keys mapping one of the private ip addresses obtained in (1) to an alternate endpoint (note, not a public facing ip address)
  3. Adds a resource key to the json file used to describe a pod, indicating that a given container should allocate an additional interface using the selected mapped ip address in (2).
  4. When kubelet on a given minion receives the json information in (3) to deploy the pod, the minion creates an additional interface in the pods namespace with the private ip address, and assigns it to the correct container, it also establishes either iptables rules or xfrm routes to tunnel all traffic to and from that ip address to the alternate endpoint specified in (1).
  5. The alternate endpoint is then responsible for preforming any and all needed DNAT/SNAT translations so that traffic to/from the private ip address assigned to a minion for public use appears to originate/terminate at the public facing ip address

This approach does place some burden on both the kube user and the infrastructure admin. Specifically it requires that the infrastructure admin be able to route the private address mapping range in dynamically (that is to say, physical routers will have to have route tables updated to reflect the potentially changing location of a pod (say if it crashes and is restarted on another minion), but I think if a given infrastructure provider wants to provide cloud services, and only needs to manager a limited address range within the borders of their own networks, that may be reasonable. It also requires kube users to stand up their own NAT gateways so that public address are properly translated to internal private addresses. I don't think that is a big deal though as infrastructure admins and kube admins will need to provide these systems as a service anyway, given that they will need to exist on the periphery of a given kubernetes cluster so that they have proper access to the kube clusters etcd daemon to properly manager their tunnel endpoint.

@eparis
Copy link
Contributor

eparis commented Nov 7, 2014

@nhorman So if the 'alternate endpoint' is responsible for setting up the dnat/snat translations, why do I need an additional interface in the pod? Can I not just get it to 'go right to the pod' or maybe 'go to the minion' and have the minion get it to the pod?

@nhorman
Copy link

nhorman commented Nov 7, 2014

@eparis strictly speaking you don't need an additional interface, but from an implementation standpoint its much easier I think to implement it with separate interfaces. Creating an extra interface to hold an extra ip address for the pod allows you to avoid the problems of port sharing (i.e. if two pods both want to use port 80). It also avoids the confusion that arises from packet aliasing (that is to say, if an external address maps to a pod address, you don't have to figure out which pod an external packet belongs to on the minion, you know based on the destination address)

@brendandburns
Copy link
Contributor

Closing this as it is largely obsolete, and we have a working solution. (Possibly with usability improvements needed)

@ramschmaerchen
Copy link

@brendanburns To ease looking for the current working solution, I kindly ask you to point us to it's source / documentation. That would help a lot. Thanks!

@bgrant0607
Copy link
Member

/cc @satnam6502

@devurandom
Copy link

@brendandburns I second @ramschmaerchen's request: Where is the documentation located?

I see the docs about external Services, but that seems to imply that I need a "cloud provider" (I assume GCE or AWS, etc), which creates the actual load balancer ("LoadBalancer: … also ask the cloud provider for a load balancer …").

@thockin
Copy link
Member Author

thockin commented Jun 27, 2015

Without a cloud provider we don't have a way to auto-provision
load-balancers. After 1.0 we will make this even more modular so you can
drop some scripts in place to do whatever provisioning you want without
modifying kubernetes code.

@ghodss

On Fri, Jun 26, 2015 at 6:43 AM, Dennis Schridde notifications@github.com
wrote:

@brendandburns https://github.com/brendandburns I second @ramschmaerchen
https://github.com/ramschmaerchen's request: Where is the documentation
located?

I see the docs about external Services
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/services.md#external-services,
but that seems to imply that I need a "cloud provider" (I assume GCE or
AWS, etc), which creates the actual load balancer ("LoadBalancer: … also
ask the cloud provider for a load balancer …").


Reply to this email directly or view it on GitHub
#1161 (comment)
.

@jswoods
Copy link

jswoods commented Jul 22, 2015

@thockin - do you have any docs yet on how I may be able to drop in some scripts to provision my own load balancers? Or perhaps point me to the issue/place to look where those docs will be created?

@bprashanth
Copy link
Contributor

vishh pushed a commit to vishh/kubernetes that referenced this issue Apr 6, 2016
b3atlesfan pushed a commit to b3atlesfan/kubernetes that referenced this issue Feb 5, 2021
Replaces gorillalabs go-powershell with bhendo/go-powershell
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests