Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale Haproxy processes #200

Open
malterb opened this issue Feb 17, 2016 · 22 comments
Open

Stale Haproxy processes #200

malterb opened this issue Feb 17, 2016 · 22 comments
Projects

Comments

@malterb
Copy link
Contributor

malterb commented Feb 17, 2016

Hi,

I am running into the issue that I constantly get stale haproxy processes. I have tried "everything", but can't get it to work. This is my bamboo.log for an occasion, where it happened:

2016/02/17 16:42:48 Starting update loop
2016/02/17 16:42:48 Environment variable not set: MARATHON_USE_EVENT_STREAM
2016/02/17 16:42:48 Environment variable not set: STATSD_ENABLED
2016/02/17 16:42:48 bamboo_startup => 2016-02-17T16:42:48Z
2016/02/17 16:42:48 Queuing an haproxy update.
2016/02/17 16:42:48 Skipped HAProxy configuration reload due to lack of changes
2016/02/17 16:42:48 subscribe_event => 2016-02-17T16:42:49.973Z
2016/02/17 16:42:48 Queuing an haproxy update.
2016/02/17 16:42:48 Skipped HAProxy configuration reload due to lack of changes
2016/02/17 16:43:25 status_update_event => 2016-02-17T16:43:25.568Z
2016/02/17 16:43:25 Queuing an haproxy update.
2016/02/17 16:43:25 Generating validation command
2016/02/17 16:43:25 Validating config
2016/02/17 16:43:25 Exec cmd: haproxy -c -f /tmp/bamboo601755456
2016/02/17 16:43:25 Exec cmd: haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf $(cat /var/run/haproxy.pid)
2016/02/17 16:43:25 Cleaning up config
2016/02/17 16:43:25 Exec cmd:
2016/02/17 16:43:25 Reloaded HAProxy configuration
2016/02/17 16:43:27 status_update_event => 2016-02-17T16:43:28.492Z
2016/02/17 16:43:27 Queuing an haproxy update.
2016/02/17 16:43:27 Generating validation command
2016/02/17 16:43:27 Validating config
2016/02/17 16:43:27 Exec cmd: haproxy -c -f /tmp/bamboo935768479
2016/02/17 16:43:27 Exec cmd: haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf $(cat /var/run/haproxy.pid)
2016/02/17 16:43:27 Cleaning up config
2016/02/17 16:43:27 Exec cmd:
2016/02/17 16:43:27 Reloaded HAProxy configuration
2016/02/17 16:54:56 Domain mapping: Stated changed

as you can see from my processes, there are two sets of haproxies:

root@haproxy:/# ps aux | grep haproxy
haproxy  22450  0.0  0.0  25796  5212 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22327 22328 22329 22330
haproxy  22451  0.0  0.0  25816  4720 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22327 22328 22329 22330
haproxy  22452  0.0  0.0  25816  4516 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22327 22328 22329 22330
haproxy  22453  0.0  0.0  26044  5404 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22327 22328 22329 22330
haproxy  22460  0.0  0.0  25712  5020 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22450 22451 22452 22453
haproxy  22461  0.0  0.0  25852  4960 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22450 22451 22452 22453
haproxy  22462  0.0  0.0  25824  5308 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22450 22451 22452 22453
haproxy  22463  0.0  0.0  26072  5460 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22450 22451 22452 22453
root     22568  0.0  0.0  10472  2224 pts/0    S+   16:52   0:00 grep --color=auto haproxy
root@haproxy:/# cat /var/run/haproxy.pid
22460
22461
22462
22463

and I use the following config for bamboo:

  "HAProxy": {
    "TemplatePath": "/var/bamboo/haproxy_template.cfg",
    "OutputPath": "/etc/haproxy/haproxy.cfg",
    "ReloadCommand": "haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf $(cat /var/run/haproxy.pid)",
    "ReloadValidationCommand": "haproxy -c -f {{.}}"
  },

and the config parts of my haproxy.cfg:

global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin
        stats timeout 30s
        user haproxy
        group haproxy
        daemon
    tune.ssl.default-dh-param 2048
    nbproc 4

defaults
        log     global
        mode    http
        option  httplog
        option forwardfor
        option dontlognull
    option forceclose
        timeout connect 5000
        timeout client  50000
        timeout server  50000

Sorry for the long post. Would've gone to SO or SF, but thought this might be an issue with bamboo.

Can anyone point me in the right direction?

@hammi85
Copy link

hammi85 commented Feb 19, 2016

I can report exactly the same issues. When I run my bamboo docker container in an old build, everything works fine but since I updated my container 2 days ago this is happening to me too.

A little help would be awesome :)

@rasputnik
Copy link
Contributor

I've seen this before - haroxy constantly reloading will often cause a logjam if they happen too frequently, although Bamboo attempts to debounce reloads.

Are you constantly redeploying apps?
A haproxy reload should only happen when marathons tasks move around in Mesos, causing the config to change and requiring a reload.

I fixed #177 - which caused unnecessary restarts - for the 0.2.14 release, are you using an older version?

@mohamedhaleem
Copy link

We typically deploy multiple times in the course of a day. This is part of CI/CD env and it happens for me with 0.2.14. Sorry about the long post..

Here is the snip in bamboo.json

"HAProxy": {
"TemplatePath": "/var/bamboo/haproxy_template.cfg",
"OutputPath": "/etc/haproxy/haproxy.cfg",
"ReloadCommand": "/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf $(cat /var/run/haproxy.pid)",
"ReloadValidationCommand": "/sbin/haproxy -c -f {{.}}"
}

Before starting bamboo, here is the ps output looks like...

#> ps aux | grep haproxy |grep -v grep
root 30508 0.0 0.0 46332 1724 ? Ss 17:55 0:00 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
haproxy 30509 0.0 0.0 52108 3608 ? S 17:55 0:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
haproxy 30510 7.5 0.0 52380 2028 ? Ss 17:55 13:03 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds

#> cat /var/run/haproxy.pid
30510

At the next refresh, or app deploy we notice the following..

Bamboo logs
2016/02/20 20:51:08 Starting update loop
2016/02/20 20:51:08 bamboo_startup => 2016-02-20T20:51:08Z
2016/02/20 20:51:08 Queuing an haproxy update.
2016/02/20 20:51:08 Exec cmd: /sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
[martini] listening on :8000 (development)
2016/02/20 20:51:08 HAProxy: Configuration updated

#> ps aux | grep haproxy |grep -v grep
haproxy 30820 7.2 0.0 52072 1784 ? Ss 20:51 0:02 /sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 30510

After the next refresh...

#> ps aux | grep haproxy |grep -v grep
haproxy 30820 7.2 0.0 52072 1784 ? Ss 20:51 0:02 /sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 30510
haproxy 30770 7.2 0.0 52072 1784 ? Ss 20:51 0:02 /sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 30820

The next refresh just keeps adding to the list of haproxy processes

@timoreimann
Copy link
Contributor

HAProxy processes are designed to live as long as there are still connections being served. Could it possibly be that you have some long-running connections still pending when the reload is initiated?

We're operating in a high-frequency deployment environment as well. For us, it's not uncommon to see 15-20 HAProxy processes being alive at the same time due to long-running WebSocket connections. They do rotate out after a few hours and get replaced by newer processes, however, which is an indication for progress. You might want to check on that behavior as well.

@malterb
Copy link
Contributor Author

malterb commented Feb 23, 2016

There shouldn't be any long-running connections to be honest (5s max). Our problem is: the stale haproxy processes still accept connections and cause 503 due to obviously now defunct instances.

@timoreimann
Copy link
Contributor

It seems strange that HAProxy takes over so many PIDs. For us, it's only ever one PID that's passed to -sf, and the PID file never contains more than one entry either.

I'd try to figure if the PID file is populated/cleaned up properly. Are you using HAProxy natively or inside Docker?

@malterb
Copy link
Contributor Author

malterb commented Feb 23, 2016

I always get 4 because of nbproc. The issue remains even when I use nbproc 1 and hence only one PID.

@mohamedhaleem
Copy link

i found a similar problem others have reported with consul template / haproxy - hashicorp/consul-template#442

Could this be similar [go] related issues?

Today we updated to 0.2.15 - and changed the reload command as follows:

"ReloadCommand": "/bin/systemctl reload haproxy"

so far, seems to be working a world better

@jmprusi
Copy link

jmprusi commented Mar 8, 2016

EDIT: Even with grace 0s I'm having stale haproxy processes.
_
I was having this issue (haproxy 1.6 inside docker), using "grace 0s" in the "defaults" section in the haproxy template conf, solves the issue.

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        timeout connect 5000
        timeout client  50000
        timeout server  50000
        grace  0s

# Template Customization
frontend http-in
        bind *:80
        {{ $services := .Services }}

Documentation:
https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#4.2-grace

this works for short lived connections.. if you have long running conns.. those will get killed, so perhaps you can increase the grace period... But the weird thing is.. Why haproxy keeps accepting new connections?
_

@malterb
Copy link
Contributor Author

malterb commented Mar 16, 2016

Has anyone tried marathon-lb's reload command?

https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run

#!/bin/bash
exec 2>&1
export PIDFILE="/tmp/haproxy.pid"
exec 200<$0

reload() {
  echo "Reloading haproxy"
  if ! haproxy -c -f /marathon-lb/haproxy.cfg; then
    echo "Invalid config"
    return 1
  fi
  if ! flock 200; then
    echo "Can't aquire lock, reload already in progress?"
    return
  fi

  # Begin to drop SYN packets with firewall rules
  IFS=',' read -ra ADDR <<< "$PORTS"
  for i in "${ADDR[@]}"; do
    iptables -w -I INPUT -p tcp --dport $i --syn -j DROP
  done

  # Wait to settle
  sleep 0.1

  # Save the current HAProxy state
  socat /var/run/haproxy/socket - <<< "show servers state" > /var/state/haproxy/global

  # Trigger reload
  haproxy -p $PIDFILE -f /marathon-lb/haproxy.cfg -D -sf $(cat $PIDFILE)

  # Remove the firewall rules
  IFS=',' read -ra ADDR <<< "$PORTS"
  for i in "${ADDR[@]}"; do
    iptables -w -D INPUT -p tcp --dport $i --syn -j DROP
  done

  # Need to wait 1s to prevent TCP SYN exponential backoff
  sleep 1
  flock -u 200
}

mkdir -p /var/state/haproxy
mkdir -p /var/run/haproxy

reload

trap reload SIGHUP
while true; do sleep 0.5; done

@malterb
Copy link
Contributor Author

malterb commented Mar 18, 2016

Found hashicorp/consul-template#442 and golang/go#13164

Could actually be related to Go. I just compiled bamboo with go 1.6 and will update this accordingly.

BTW: Another reload script that I might try if doesn't work: https://github.com/eBayClassifiedsGroup/PanteraS/blob/master/infrastructure/haproxy_reload.sh

@imrangit
Copy link

@elmalto: were you able to resolve the issue with the latest Go 1.6 or did you employ a reload script?

-Imran

@malterb
Copy link
Contributor Author

malterb commented Apr 19, 2016

I have not seen this issue since upgrading to 1.6

Malte
On Tue, Apr 19, 2016 at 08:16 imrangit notifications@github.com wrote:

@elmalto https://github.com/elmalto: were you able to resolve the issue
with the latest Go 1.6 or did you employ a reload script?

-Imran


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#200 (comment)

@mvallerie
Copy link
Contributor

Hey,

As stated in #206 , we had this issue before on our Mesos cluster. After migrating our docker images to Go 1.6 about 2 days ago, looks like it fixed it.

I can't confirm this since we also have a lot of long running connections, but the number of haproxy processes after 2 days seems much more reasonnable than before. I'll have another look during the next week and post again if something changes.

Thanks for having found that out anyway :).

@j1n6
Copy link
Contributor

j1n6 commented May 16, 2016

Upgrade might have helped.
I have a hunch that it's likely to be HAProxy itself.

Do you have any information/data about how often your deployment triggers reload?

On 13 May 2016, at 04:41, Mikaël Vallerie notifications@github.com wrote:

Hey,

As stated in #206 , we had this issue before on our Mesos cluster. After migrating our docker images to Go 1.6 about 2 days ago, looks like it fixed it.

I can't confirm this since we also have a lot of long running connections, but the number of haproxy processes after 2 days seems much more reasonnable than before. I'll have another look during the next week and post again if something changes.

Thanks for having found that out anyway :).


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

@mvallerie
Copy link
Contributor

Hey, sure !

Usually on this cluster we get around 0 to 5 updates a day. The day it failed, we had many more (probably around 10), which resulted in something like 15+ haproxy processes on some mesos slaves.

We have one bamboo running (as a docker container) on each mesos slave. Right now, they have been up for 1 week, we had some updates on last friday but the amount of haproxy processes increased only until 6. And more important, sometimes, this amount is getting down, which wasn't the case before the upgrade.

I have a hunch that it's likely to be HAProxy itself.

My guess is you are right. We used marathon-lb before bamboo, and we also had this issue with it.

@j1n6
Copy link
Contributor

j1n6 commented May 16, 2016

I suggest moving to Nginx to replace Haproxy, there's a branch that @bluepeppers has been working on that would support multiple reload destination - but it's still WIP.

@mvallerie
Copy link
Contributor

mvallerie commented May 16, 2016

Does nginx support TCP balancing (as Haproxy does) out of its "Plus" version ? Looks unclear to me.

I know it may work after building nginx with some extra modules. I'm just unsure about what those "extra modules" may or may not support compared to haproxy.

@j1n6
Copy link
Contributor

j1n6 commented May 17, 2016

Yup, it does support it.
If you are using Nginx Plus version, both TCP and UDP are supported out of the box.

If you are using open sourced, try to use this Nginx compatible fork: https://github.com/alibaba/tengine

Sent from my iPhone

On 17 May 2016, at 00:26, Mikaël Vallerie notifications@github.com wrote:

Does nginx support TCP balancing (as Haproxy does) out of its "Plus" version ? Looks unclear to me.

I know it may work after building nginx with some extra modules. I'm just unsure about it.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub

@mvallerie
Copy link
Contributor

mvallerie commented Feb 23, 2017

@activars Just to let you know (we're still on haproxy), it happened again today on one of our mesos slaves. This issue now seems to happen only in very rare / specific situations (this is the only time it happened since), and is probably more related to haproxy or go, so it's probably not necessary to reopen.

According to the refs above, upgrading haproxy to the latest 1.5.x might be the way to definitely fix that out. Considering minor version upgrades shouldn't harm, I prepared a docker image including haproxy 1.5.19 (vs 1.5.8) and based on golang 1.8 (vs 1.6, well, that's not minor but let's trust the promise of compatibility, and let me know if that sounds like a terrible mistake :).

I'm going to test this during the next few days.

@nagsharma32
Copy link

I still have the problem. Running haproxy:1.7.5

@tcolgate
Copy link

I'm very sorry for the lack of comms on this thread. We no longer run bamboo (no longer on mesos), and are not going to be able to provide ongoing maintenance. If anyone is interesting in maintaining it going forward, please raise another issue and we'll look at redirecting people to a fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Planning
Backlog
Development

No branches or pull requests