Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP L7 load balancer / reverse proxy #561

Closed
lavalamp opened this issue Jul 22, 2014 · 51 comments
Closed

HTTP L7 load balancer / reverse proxy #561

lavalamp opened this issue Jul 22, 2014 · 51 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network.
Milestone

Comments

@lavalamp
Copy link
Member

Configurable, using labels to route traffic.

From discussion: https://groups.google.com/d/msg/google-containers/frOLMyNl5U4/W5_DQUL933IJ

I suppose, if we had some sort of dynamic router based on label queries, you could make that work for your needs with a bit of configuration. I'm not sure if there's really a need for that once DNS naming is set up, though.

I think you're going to want to incorporate some sort of http/s router or at least have a suggested means of configuring one. It seems to be one of the most obvious use cases for Kubernetes.

Like I said, I'm not sure this is needed if there's a good load balancer and DNS name resolution, but filing this for tracking the discussion.

@smarterclayton
Copy link
Contributor

Discussed a bit in #260 already. I've got some folks looking at adding arbitrary load balancer units and label based query backends for http(s), sni, and websockets - will have them describe some of what they're working on soon.

@bgrant0607 bgrant0607 added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Sep 25, 2014
@bgrant0607
Copy link
Member

/cc @thockin

@smarterclayton
Copy link
Contributor

Pull openshift/origin#88 in origin is a prototype of a route - a resource representing an inbound connection from the external network that would be satisfied by a load balancer and direct traffic to a service. It will (but is not yet) complemented by a go client implementation that can read endpoints like the kube-proxy and generate arbitrary proxy server configs for things like apache, haproxy, and nginx. Ideally, those would be routers running in docker containers with external ips, parameterized by the address of the master.

@thockin
Copy link
Member

thockin commented Sep 25, 2014

Yeah, I think something will need to work more or less out of the box
here. Services are great, but they are mostly an internal construct - our
story about routing external traffic is not as strong as it needs to be.
Not enough hours in the day to think about it all :)

On Wed, Sep 24, 2014 at 9:50 PM, Clayton Coleman notifications@github.com
wrote:

Pull openshift/origin#88 openshift/origin#88 in
origin is a prototype of a route - a resource representing an inbound
connection from the external network that would be satisfied by a load
balancer and direct traffic to a service. It will (but is not yet)
complemented by a go client implementation that can read endpoints like the
kube-proxy and generate arbitrary proxy server configs for things like
apache, haproxy, and nginx. Ideally, those would be routers running in
docker containers with external ips, parameterized by the address of the
master.

Reply to this email directly or view it on GitHub
#561 (comment)
.

@lavalamp
Copy link
Member Author

FYI, /api/v1betaX/proxy/services/serviceName works already, and it's as load balanced as anything else in our system ;)

@thockin
Copy link
Member

thockin commented Sep 25, 2014

I somehow doubt we want to route all external traffic through our apiserver
:)

On Thu, Sep 25, 2014 at 10:24 AM, Daniel Smith notifications@github.com
wrote:

FYI, /api/v1betaX/proxy/services/serviceName works already, and it's as
load balanced as anything else in our system ;)

Reply to this email directly or view it on GitHub
#561 (comment)
.

@rektide
Copy link

rektide commented Oct 4, 2014

Just a headsup: Mailgun has a very slick go-based, etcd configured (with additional http control points) http proxy, Vulcand: https://github.com/mailpipe/vulcand

@bgrant0607 bgrant0607 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed kind/support-question labels Dec 3, 2014
@bgrant0607 bgrant0607 changed the title Consider an HTTP proxy component HTTP L7 load balancer / reverse proxy Dec 4, 2014
@bgrant0607 bgrant0607 added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Dec 4, 2014
@bgrant0607
Copy link
Member

For reference, GCE's L7 APIs:

"Route" sounds too network-y. URLMapper sounds more accurate.

Not sure how fancy we'd want to get with URL mapping. Probably at least permit a target path, in order to facilitate multiplexing. Ideally not more general-purpose pattern matching.

Copied from #2585:

OpenShift's Route type:

type Route struct {
    TypeMeta   `json:",inline" yaml:",inline"`
    ObjectMeta `json:"metadata,omitempty" yaml:"metadata,omitempty"`
    // Required: Alias/DNS that points to the service
    // Can be host or host:port
    // host and port are combined to follow the net/url URL struct
    Host string `json:"host" yaml:"host"`
    // Optional: Path that the router watches for, to route traffic for to the service
    Path string `json:"path,omitempty" yaml:"path,omitempty"`
    // the name of the service that this route points to
    ServiceName string `json:"serviceName" yaml:"serviceName"`
}

Much like our service proxy watches endpoints, an HTTP reverse proxy, such as HAProxy, could watch routes and reprogram itself (or a management agent could do that to the proxy).

I'd change this to follow v1beta3 metadata/spec/status conventions --host, path, and serviceName would go in spec. Also, rather than just service name, I'd be inclined to use whatever our canonical object cross-reference format is -- ObjectReference or (partial) URL (#1490 (comment)). I'm more and more leaning towards partial URLs, which would be generated similarly to selfLink upon GET of a particular API version.

@phemmer
Copy link

phemmer commented Jan 26, 2015

This is very much of interest to us as we run numerous applications, all fronted by a single external endpoint (eg, http://api.example.com).

We've run our own in-house layer 7 load balancer / router for a few years now, and aside from URL rewriting (which is mentioned with the 'target path' thing), the only other feature I can think of which would be of interest is source address filtering.
Our specific use case for this is to allow internal applications to talk to other internal applications through the load balancer using 'private' routes. But we could probably work around this by just adding authentication to these routes.

I'm also wondering if it would be good to define what http headers get added to the request. Since a layer 7 router hides a lot of the client information, this'll need to be passed on. Information such as client IP address, client/server port, whether the client is using SSL, SSL client cert subject & verification status, etc.

@bgrant0607 bgrant0607 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. status/help-wanted area/example and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Feb 28, 2015
@supirman
Copy link

Hi, I am master student who want to apply to GSOC. I am interested with this problem. And I had project that involving nginx as load balancer and reverse proxy in the past. Can this be implemented using nginx?

@smarterclayton
Copy link
Contributor

It could, although we would prefer a solution that is generic to load balancers. The OpenShift route concept and mechanism is mostly fully implemented and integrates now with HAProxy and F5 automatically, and has a generic template mode for any other router. I think that a lot of the work for this has really been done, and it's a matter of defining how we evolve the service api and then moving that code over to Kube.

On Mar 27, 2015, at 5:14 AM, Firman Rosdiansyah notifications@github.com wrote:

Hi, I am master student who want to apply to GSOC. I am interested with this problem. And I had project that involving nginx as load balancer and reverse proxy in the past. Can this be implemented using nginx?


Reply to this email directly or view it on GitHub.

@glerchundi
Copy link

Hi guys, I've already created a working example of http reverse proxy and loadbalancer inside kubernetes using nginx + confd (with etcd backend, which is just a proxy to the master etcd). It is composed by three components:

  • etcd-proxy: it persists routing and upstreams data
  • nginx-loadbalancer: using nginx and confd, it works as a reverse proxy load balancing traffic based in the registered upstreams.
  • loadbalancer-feeder: Listens to kubernetes pod events using kubelistener (single ones or created by replication controllers) and updates loadbalancer upstreams accordingly.

A working controller+service example is also available at: https://github.com/glerchundi/kubernetes-http-loadbalancer

Any comment is really appreciated!

@bgrant0607
Copy link
Member

@glerchundi Thanks for the pointers! Just a quick comment for now since we're trying to wrap up 1.0. If you define a service (set portalIP: "None" if you don't need a VIP allocated) for your pods, the endpoints controller will generate a list of their addresses and ports in an Endpoints object for you.
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/services.md#headless-services

@glerchundi
Copy link

@bgrant0607 ok, thanks for pointing that out. After watching some videos about kubernetes, it seems that the best way/pattern to handle this is to watch services and not replication controllers / pods and modify the http load-balancer accordingly.

Something like:

Service 1: App Production (label: app, production)
Service 2: App Canary (label: app, canary)
Service 3: Load Balancer (selector: app)

This would balance between two services (App Production & App Canary) which in turn will balance between all pods belonging to the corresponding replication controller.

@bgrant0607
Copy link
Member

@glerchundi Do you mean "service 1" and "service 2" or "replication controller 1" and "replication controller 2"? Otherwise, yes, that's the recommended approach.

@smarterclayton
Copy link
Contributor

I don't think joining is terrible. The route is an atomic unit of change,
in some cases I'd prefer to change one route from a blue to green
deployment than two. I think we could make the argument paths together is
strictly better than paths separate.

On Aug 7, 2015, at 8:54 PM, Prashanth B notifications@github.com wrote:

Another idea we discussed was to move from a service-per-route model, to a
service-per-path model. Eg:

type: routeSpec:
host: foo.bar.com
paths:
- /prod: svc1, port
- /test: svc2, port
tlsMode: Termination
secret: certStatus:
host: foo.bar.com
ingressIp: 134

That seems to fit better with my mental model of a website with multiple
endpoints serviced by different groups of pods, all sharing a common
security policy.

The openshift model is to have a route for /prod and another for /test, and
since they all join the same router things work out. But in a world where a
single route creates a new loadbalancer, the ability to specify multiple
services per path makes it easier to get a single ip for the entire site.

Another wrinkle is that a router might be one or multiple IPs and DNSes.
If we had elastic ip binding, I might want to add that to a given router
(shared or no) and thus there may be multiple effective IPs/DNS entries for
clients.

Is there a reason to handle each path with a different route object?


Reply to this email directly or view it on GitHub
#561 (comment)
.

@bprashanth
Copy link
Contributor

Ok, so it sounds like the main reason is that a single-service-per-route keeps it hermetic. I mostly buy that. I also like small resources because they're easier to update, watch, display etc, and I can invalidate a single path<->service (eg: because the service doesn't have nodeport, which is currently required for gce l7).

So the model we're assuming is: a hostname + multiple url endpoints, each managed by a different kubernetes service.

To make a useful 1.1 l7 api around this I think we need the ability to route all requests for that hostname, through a single ip, to different backends based on routes, without creating a global router up front (because gce only allows one cert per loadbalancer ip, and mixing the models complicates things).

There are 2 ways to achieve this:

  1. Keep the routes simple (more like the first route HTTP L7 load balancer / reverse proxy #561 (comment)) and implement a basic claims model that allows joining
  2. Make the route expressive enough to accommodate the basic requirement (more like the second route HTTP L7 load balancer / reverse proxy #561 (comment)), so we have a usable api even without joining

@thockin and @bgrant0607 wdyt?

@thockin
Copy link
Member

thockin commented Aug 8, 2015

I'm having a hard time seeing the whole picture from this thread. I want
to be the voice of "do the simplest thing we can get away with", here. I
argued with Prashanth that the multitude of tiny route objects feels
awkward to me. Admittedly, I am not the webbiest guy, but my mental model
is really: Some set of inputs arrive at a mux which decides based on path
which backend Service:Port to send traffic to.

e.g ingress{"foobar.com"} -> map{"/foo": Service{"foo", 80}, "/bar":
Service{"bar", 8080}}

Changing that to multiple route objects seems confusing and unnecessary.
Can someone explain it?

It might help to get a sketch of the data model and some examples using it.

On Fri, Aug 7, 2015 at 7:34 PM, Prashanth B notifications@github.com
wrote:

Ok, so it sounds like the main reason is that a single-service-per-route
keeps it hermetic. I mostly buy that. I also like small resources because
they're easier to update, watch, display etc, and I can invalidate a single
path<->service map (eg: because the service doesn't have nodeport, which is
currently required for gce l7).

So the model we're assuming is: a hostname + multiple url endpoints, each
managed by a different kubernetes service.

To make a useful 1.1 l7 api around this, I think we need the ability to
route all requests for that hostname, through a single ip, to different
backends based on routes, without creating a global router up front
(because gce only allows one cert per loadbalancer ip, and mixing the
models complicates things).

There are 2 ways to achieve this:

  1. Keep the routes simple (more like the first route HTTP L7 load balancer / reverse proxy #561 (comment)
    HTTP L7 load balancer / reverse proxy #561 (comment))
    and implement a basic claims model that allows joining
  2. Make the route expressive enough to accomodate the basic requirement
    (more like the second route HTTP L7 load balancer / reverse proxy #561 (comment)
    HTTP L7 load balancer / reverse proxy #561 (comment)),
    so we have a usable api even without joining

@thockin https://github.com/thockin and @bgrant0607
https://github.com/bgrant0607 wdyt?


Reply to this email directly or view it on GitHub
#561 (comment)
.

@smarterclayton
Copy link
Contributor

I may not have expressed it clearly, but having multiple paths and services per route doesn't seem so bad (nor multiple hosts) because you can change them atomically (which when doing a blue-green cutover has some advantages). We did not start with that due to caution, but in practice folks have asked for it.

@thockin
Copy link
Member

thockin commented Aug 8, 2015

Gotcha. That sort of approximates GCE's API, too. We do need to think
about what is possible in AWS and others before we design something
unimplementable.

@justinsb (should have looped him in sooner, sorry).

On Fri, Aug 7, 2015 at 9:12 PM, Clayton Coleman notifications@github.com
wrote:

I may not have expressed it clearly, but having multiple paths and
services per route doesn't seem so bad (nor multiple hosts) because you can
change them atomically (which when doing a blue-green cutover has some
advantages). We did not start with that due to caution, but in practice
folks have asked for it.


Reply to this email directly or view it on GitHub
#561 (comment)
.

@smarterclayton
Copy link
Contributor

If we do have multiples we have to consider partial rejection on shared
routers for duplicate hosts or paths.

On Aug 8, 2015, at 12:15 AM, Tim Hockin notifications@github.com wrote:

Gotcha. That sort of approximates GCE's API, too. We do need to think
about what is possible in AWS and others before we design something
unimplementable.

@justinsb (should have looped him in sooner, sorry).

On Fri, Aug 7, 2015 at 9:12 PM, Clayton Coleman notifications@github.com
wrote:

I may not have expressed it clearly, but having multiple paths and
services per route doesn't seem so bad (nor multiple hosts) because you
can
change them atomically (which when doing a blue-green cutover has some
advantages). We did not start with that due to caution, but in practice
folks have asked for it.


Reply to this email directly or view it on GitHub
<
#561 (comment)

.


Reply to this email directly or view it on GitHub
#561 (comment)
.

@bprashanth
Copy link
Contributor

Please review #12827 when you have time

@justinsb
Copy link
Member

AWS ELB has very limited Layer 7 support. Although it has some Layer 7 features, these are limited to SSL termination (with a single cert), sticky sessions based on cookies, and writing an access log. There is no path-based routing for example. Typically you set up ELB in front of nginx/haproxy. I think we would likely want to do the same thing for AWS, with a k8s managed nginx/haproxy/vulcand.

In other words, the AWS API for load balancing is so limited that I do not think we should even try to constrain the k8s API to fit within it. Rather, we should have a k8s option that uses a cloudprovider Layer 4 load balancer in front of a k8s managed software load balancer. If you have a better load balancer (GCE, OpenStack, hardware) then ideally we would allow you to use that instead. But AWS will be primarily software implemented.

@phemmer
Copy link

phemmer commented Aug 18, 2015

AWS ELB has very limited Layer 7 support. ... There is no path-based routing for example

In AWS land, they have a separate service for this, API Gateway. It's basically another layer that sits on top of the ELB.

@justinsb
Copy link
Member

Oh, good point @phemmer. I hadn't seen it marketed this way, but it does look like we could indeed use API Gateway as "just" a Layer 7 load balancer. I can't help but worry that it isn't really what it is intended for, but I'm very happy for the suggestion - we'll have to evaluate it!

@smarterclayton
Copy link
Contributor

Yeah, that's how we use ELB today - as the HA layer for a pair of redundant
proxies.

On Mon, Aug 17, 2015 at 10:14 PM, Justin Santa Barbara <
notifications@github.com> wrote:

Oh, good point @phemmer https://github.com/phemmer. I hadn't seen it
marketed this way, but it does look like we could indeed use API Gateway as
"just" a Layer 7 load balancer. I can't help but worry that it isn't really
what it is intended for, but I'm very happy for the suggestion - we'll have
to evaluate it!


Reply to this email directly or view it on GitHub
#561 (comment)
.

Clayton Coleman | Lead Engineer, OpenShift

@bprashanth
Copy link
Contributor

Rather, we should have a k8s option that uses a cloudprovider Layer 4 load balancer in front of a k8s managed software load balancer. If you have a better load balancer (GCE, OpenStack, hardware) then ideally we would allow you to use that instead. But AWS will be primarily software implemented.

This is exactly the case for loadbalancer classes (either embedded in the ingress point or via claims). I'm a little wary of offer this out of the box, because there are several multi-tier setups (ELB l7 for ssl termination -> nginx, f5 l4 -> apache ssl proxy -> l7 etc). You should have 2 loadbalancer controllers, one for aws and another for haproxy.

handwaving a bit in this example:

aws and haproxy loadbalancer controllers running in cluster

  1. create svc
  2. create ingresspoint {/foo: svc1, class:haproxy, layer:7}
  3. wait for haproxy loadbalancer controller to allocate an ip
  4. wrap the ip in a service with nodeport (there have been a couple of discussions on how to do this: Simple services with external IPs on bare-metal #10456, DESIGN: External IPs #1161, External IPs support #12561)
  5. create ingresspoint {/foo: nodeportsvc1, class: elb, layer:4}

In other words, the AWS API for load balancing is so limited that I do not think we should even try to constrain the k8s API to fit within it.

This is why I'd like to move away from the current interface/cloud-provider model, to a more plugin centric approach. Each loadbalancer is a different beast and kube should just get out of the way. Most of the points @justinsb mentioned for ELB are true for GCELB as well.

@pires
Copy link
Contributor

pires commented Aug 27, 2015

/cc @mikedanese

https://github.com/kubernetes/contrib/tree/master/service-loadbalancer seems a nice project to fork and take further on to support other load-balancing solutions.

@mikedanese
Copy link
Member

@bprashanth actually authored that package. I'm just git blamed since I moved it out of the main repo.

@JeanMertz
Copy link

I haven't seen this mentioned here, but is there any thought on integrating the new HTTPS LBs on GCE?

https://cloud.google.com/compute/docs/load-balancing/http/ssl-certificates?hl=en_US

We currently create separate RCs to host a standard nginx reverse-proxy with SSL termination for each "real" service that we host on Kubernetes. This is managable so far, but leveraging Kubernetes to auto-create a managed LB for us, would be much better.

@bprashanth
Copy link
Contributor

@pires yeah the real challenge is providing a consistent interface that allows multiple loadbalancers to co-exist in the same cluster. https://github.com/kubernetes/kubernetes/pull/12827/files talks about our efforts in this direction.

@JeanMertz yes. See same TLS bits on same proposal.

@jayunit100
Copy link
Member

related to https://github.com/kubernetes/contrib/tree/master/service-loadbalancer which proposes resolution of this issue as a possible next iteration

@aronchick aronchick added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 1, 2015
@bgrant0607
Copy link
Member

Can this be closed in favor of more specific follow-up issues?

@thockin
Copy link
Member

thockin commented Oct 23, 2015

I'm closing this in favor of the more detailed bugs, now that we have something. Yay!

@thockin thockin closed this as completed Oct 23, 2015
vishh pushed a commit to vishh/kubernetes that referenced this issue Apr 6, 2016
Make machine-id sources flag a comma-separated list.
wking pushed a commit to wking/kubernetes that referenced this issue Jul 21, 2020
Remove node_modules from GitBook
b3atlesfan pushed a commit to b3atlesfan/kubernetes that referenced this issue Feb 5, 2021
Resolving conflicts for pull request kubernetes#561 and adding documentation.
linxiulei pushed a commit to linxiulei/kubernetes that referenced this issue Jan 18, 2024
Added arm64 targets for linux binaries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests