Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] add support for keepalived #68

Merged
merged 24 commits into from
Aug 12, 2015

Conversation

Kosta-Github
Copy link
Contributor

This is work-in-progress (but actually seems to work so far for me).

I would like to use keepalived for this functionality (from the docs):

... high-availability is achieved by VRRP protocol.
VRRP is a fundamental brick for router failover.

This allows you to specify a virtual IP and connect that virtual IP to one of the nodes running the HAProxy in the cluster. If that node is not reachable anymore it switches automatically to another node in the cluster. This is probably similar to AWS's elastic IP, but I am unfamiliar with that, since I cannot use AWS for various reasons.

The question is: would you be interested in integrating this functionality into your technology stack? If so, I would add something to the README.md as well and do some more testing.

This functionality paired with my last PR #59 provides you with a nice highly available load balancer mechanism.

@sielaq
Copy link
Contributor

sielaq commented Jul 2, 2015

Hi @Kosta-Github ,

why do you need VIP and keepalived ?
What is a problem you try to solve ?
Is <your_service>.service.consul not HA enough?

@Kosta-Github
Copy link
Contributor Author

From within the cluster the consul service discovery mechanism works. But when we start directing traffic from outside the cluster into it we cannot use the consul mechanism anymore.

Since our infrastructure doesn't support elastic load balancing or elastic IP I setup a virtual IP and all external DNS queries such as service_1.my_product.my_company.com, service_2.my_product.my_company.com, ... get resolved to this virtual IP. This virtual IP in turn is tied to one of the cluster nodes via keepalived and this node will be hit by the outside traffic and the HAProxy running on that node doing the load balancing. As soon as this node goes down or is not reachable keepalived does an automatic fail-over to one of the other cluster nodes.

This way my PR #59 allows to still access the different services from the outside by their corresponding service names (service_1, service_2, ...).

@sielaq
Copy link
Contributor

sielaq commented Jul 7, 2015

We use HW for SSL interruption, so we can plug behind them HAproxy etc..
You can simulate this by setting up front apache / nginx
ans the use PassProxy mechanism:

ProxyPreserveHost Off
ProxyPass /your_endpoint  http://your_service.service.consul connectiontimeout=3 timeout=10 retry=2
ProxyPassReverse /your_endpoint  http://your_service.service.consul

moreover you can run consul agent on your apache boxes and generate apache configuration more dynamically. And use mode proxy balancer.

@Kosta-Github
Copy link
Contributor Author

I am up for this kind of setup: https://thejimmahknows.com/high-availability-using-haproxy-and-keepalived/

This works pretty nice for us for the past week now.

I don't want to add another load balancer in front of that, since that would become a single-point-of-failure again.

@bogus-py
Copy link
Contributor

bogus-py commented Jul 8, 2015

I would not recommend using the same haproxy for both internal and public facing services. Once you make your HAproxy accessible from the outside, anybody can access any of your services by manipulating the Host Header.
For this to succeed the attacker needs some knowledge about how your internal services are named, but this is nothing more than security by obscurity.

I would do as @sielaq suggested. Run separate HAproxys (or nginx, varnish, apache httpd, whatever) that only give access to your public facing services and use keepalived for HA.

@Kosta-Github
Copy link
Contributor Author

Ok, sorry, I wasn't tat clear enough: with outside of the cluster I still mean from inside of the company's intranet. For traffic from the internet there are more systems around, doing auth, SSL termination, ... But those system should not be tightly coupled to the cluster implementations...

@bogus-py
Copy link
Contributor

bogus-py commented Jul 8, 2015

Got it.

Here's another idea on this that I've been playing with:
The consul domain (consul.) is configurable. What if we set it to something like consul.intern.mycompany.com. and configure our DNS servers for intern.mycompany.com to forward consul stuff to consul accordingly. This way we have proper DNS inside our intranet, and with the right dns-serch setting for docker within the cluster everything is still resolvable as usual.
Any thoughts?

@Kosta-Github
Copy link
Contributor Author

The problem for me is, that I cannot change the company-wide DNS settings in that way.

I already had a fight with DevOps to allow mapping all DNS queries *.my_product.my_company.com to my_product.my_company.com. They came up with a solution that allows me to define up-to-max 20 names <name_01>.my_product.my_company.com ... <name_20>.my_product.my_company.com that will be mapped to my_product.my_company.com; no wildcard possible. And adding/changing those names will need to propagate through the DNS settings which right now takes >15 minutes...

And again, this is for the company intranet, not for internet accessibility.

@bogus-py
Copy link
Contributor

bogus-py commented Jul 8, 2015

Got it. In my case I'm the DevOps dude :-)

What I had in mind wasn't wildcard A-records but dns delegation (configure DNS servers to use consul DNS interface to resolv all *.consul.intern.mycompnay.com). But this of course requires cooperation from your DNS admin.

{{env "KEEPALIVED_VIP"}} # the virtual IP
}
unicast_peer { # IP addresses of all other peer nodes
{{range nodes}}{{$n := .}}{{if ne $n.Address $node.Address}}{{$n.Address}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do:

{{range service "consul"}}{{$n := .}}{{if ne $n.Address $node.Address}}{{$n.Address}}

{{nodes}} contains also slaves, are you running keepalived on every consul host ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch; I will change that...

in order to limit peer list to the nodes running a `consul agent`
sielaq added a commit that referenced this pull request Aug 12, 2015
@sielaq sielaq merged commit 0e08d7c into eBayClassifiedsGroup:master Aug 12, 2015
@Kosta-Github
Copy link
Contributor Author

cool; thanks for merging!

@Kosta-Github Kosta-Github deleted the Kosta/keepalived branch August 13, 2015 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants