Stale Haproxy processes #200

malterb · 2016-02-17T17:11:25Z

Hi,

I am running into the issue that I constantly get stale haproxy processes. I have tried "everything", but can't get it to work. This is my bamboo.log for an occasion, where it happened:

2016/02/17 16:42:48 Starting update loop
2016/02/17 16:42:48 Environment variable not set: MARATHON_USE_EVENT_STREAM
2016/02/17 16:42:48 Environment variable not set: STATSD_ENABLED
2016/02/17 16:42:48 bamboo_startup => 2016-02-17T16:42:48Z
2016/02/17 16:42:48 Queuing an haproxy update.
2016/02/17 16:42:48 Skipped HAProxy configuration reload due to lack of changes
2016/02/17 16:42:48 subscribe_event => 2016-02-17T16:42:49.973Z
2016/02/17 16:42:48 Queuing an haproxy update.
2016/02/17 16:42:48 Skipped HAProxy configuration reload due to lack of changes
2016/02/17 16:43:25 status_update_event => 2016-02-17T16:43:25.568Z
2016/02/17 16:43:25 Queuing an haproxy update.
2016/02/17 16:43:25 Generating validation command
2016/02/17 16:43:25 Validating config
2016/02/17 16:43:25 Exec cmd: haproxy -c -f /tmp/bamboo601755456
2016/02/17 16:43:25 Exec cmd: haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf $(cat /var/run/haproxy.pid)
2016/02/17 16:43:25 Cleaning up config
2016/02/17 16:43:25 Exec cmd:
2016/02/17 16:43:25 Reloaded HAProxy configuration
2016/02/17 16:43:27 status_update_event => 2016-02-17T16:43:28.492Z
2016/02/17 16:43:27 Queuing an haproxy update.
2016/02/17 16:43:27 Generating validation command
2016/02/17 16:43:27 Validating config
2016/02/17 16:43:27 Exec cmd: haproxy -c -f /tmp/bamboo935768479
2016/02/17 16:43:27 Exec cmd: haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf $(cat /var/run/haproxy.pid)
2016/02/17 16:43:27 Cleaning up config
2016/02/17 16:43:27 Exec cmd:
2016/02/17 16:43:27 Reloaded HAProxy configuration
2016/02/17 16:54:56 Domain mapping: Stated changed

as you can see from my processes, there are two sets of haproxies:

root@haproxy:/# ps aux | grep haproxy
haproxy  22450  0.0  0.0  25796  5212 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22327 22328 22329 22330
haproxy  22451  0.0  0.0  25816  4720 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22327 22328 22329 22330
haproxy  22452  0.0  0.0  25816  4516 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22327 22328 22329 22330
haproxy  22453  0.0  0.0  26044  5404 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22327 22328 22329 22330
haproxy  22460  0.0  0.0  25712  5020 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22450 22451 22452 22453
haproxy  22461  0.0  0.0  25852  4960 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22450 22451 22452 22453
haproxy  22462  0.0  0.0  25824  5308 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22450 22451 22452 22453
haproxy  22463  0.0  0.0  26072  5460 ?        Ss   16:43   0:00 haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 22450 22451 22452 22453
root     22568  0.0  0.0  10472  2224 pts/0    S+   16:52   0:00 grep --color=auto haproxy
root@haproxy:/# cat /var/run/haproxy.pid
22460
22461
22462
22463

and I use the following config for bamboo:

  "HAProxy": {
    "TemplatePath": "/var/bamboo/haproxy_template.cfg",
    "OutputPath": "/etc/haproxy/haproxy.cfg",
    "ReloadCommand": "haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf $(cat /var/run/haproxy.pid)",
    "ReloadValidationCommand": "haproxy -c -f {{.}}"
  },

and the config parts of my haproxy.cfg:

global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin
        stats timeout 30s
        user haproxy
        group haproxy
        daemon
    tune.ssl.default-dh-param 2048
    nbproc 4

defaults
        log     global
        mode    http
        option  httplog
        option forwardfor
        option dontlognull
    option forceclose
        timeout connect 5000
        timeout client  50000
        timeout server  50000

Sorry for the long post. Would've gone to SO or SF, but thought this might be an issue with bamboo.

Can anyone point me in the right direction?

The text was updated successfully, but these errors were encountered:

hammi85 · 2016-02-19T14:03:51Z

I can report exactly the same issues. When I run my bamboo docker container in an old build, everything works fine but since I updated my container 2 days ago this is happening to me too.

A little help would be awesome :)

rasputnik · 2016-02-19T15:11:33Z

I've seen this before - haroxy constantly reloading will often cause a logjam if they happen too frequently, although Bamboo attempts to debounce reloads.

Are you constantly redeploying apps?
A haproxy reload should only happen when marathons tasks move around in Mesos, causing the config to change and requiring a reload.

I fixed #177 - which caused unnecessary restarts - for the 0.2.14 release, are you using an older version?

mohamedhaleem · 2016-02-19T20:59:56Z

We typically deploy multiple times in the course of a day. This is part of CI/CD env and it happens for me with 0.2.14. Sorry about the long post..

Here is the snip in bamboo.json

"HAProxy": {
"TemplatePath": "/var/bamboo/haproxy_template.cfg",
"OutputPath": "/etc/haproxy/haproxy.cfg",
"ReloadCommand": "/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf $(cat /var/run/haproxy.pid)",
"ReloadValidationCommand": "/sbin/haproxy -c -f {{.}}"
}

Before starting bamboo, here is the ps output looks like...

#> ps aux | grep haproxy |grep -v grep
root 30508 0.0 0.0 46332 1724 ? Ss 17:55 0:00 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
haproxy 30509 0.0 0.0 52108 3608 ? S 17:55 0:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
haproxy 30510 7.5 0.0 52380 2028 ? Ss 17:55 13:03 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds

#> cat /var/run/haproxy.pid
30510

At the next refresh, or app deploy we notice the following..

Bamboo logs
2016/02/20 20:51:08 Starting update loop
2016/02/20 20:51:08 bamboo_startup => 2016-02-20T20:51:08Z
2016/02/20 20:51:08 Queuing an haproxy update.
2016/02/20 20:51:08 Exec cmd: /sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
[martini] listening on :8000 (development)
2016/02/20 20:51:08 HAProxy: Configuration updated

#> ps aux | grep haproxy |grep -v grep
haproxy 30820 7.2 0.0 52072 1784 ? Ss 20:51 0:02 /sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 30510

After the next refresh...

#> ps aux | grep haproxy |grep -v grep
haproxy 30820 7.2 0.0 52072 1784 ? Ss 20:51 0:02 /sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 30510
haproxy 30770 7.2 0.0 52072 1784 ? Ss 20:51 0:02 /sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 30820

The next refresh just keeps adding to the list of haproxy processes

timoreimann · 2016-02-20T23:20:21Z

HAProxy processes are designed to live as long as there are still connections being served. Could it possibly be that you have some long-running connections still pending when the reload is initiated?

We're operating in a high-frequency deployment environment as well. For us, it's not uncommon to see 15-20 HAProxy processes being alive at the same time due to long-running WebSocket connections. They do rotate out after a few hours and get replaced by newer processes, however, which is an indication for progress. You might want to check on that behavior as well.

malterb · 2016-02-23T12:43:31Z

There shouldn't be any long-running connections to be honest (5s max). Our problem is: the stale haproxy processes still accept connections and cause 503 due to obviously now defunct instances.

timoreimann · 2016-02-23T13:13:53Z

It seems strange that HAProxy takes over so many PIDs. For us, it's only ever one PID that's passed to -sf, and the PID file never contains more than one entry either.

I'd try to figure if the PID file is populated/cleaned up properly. Are you using HAProxy natively or inside Docker?

malterb · 2016-02-23T13:26:21Z

I always get 4 because of nbproc. The issue remains even when I use nbproc 1 and hence only one PID.

mohamedhaleem · 2016-03-02T20:26:24Z

i found a similar problem others have reported with consul template / haproxy - hashicorp/consul-template#442

Could this be similar [go] related issues?

Today we updated to 0.2.15 - and changed the reload command as follows:

"ReloadCommand": "/bin/systemctl reload haproxy"

so far, seems to be working a world better

jmprusi · 2016-03-08T14:04:19Z

EDIT: Even with grace 0s I'm having stale haproxy processes.
_
I was having this issue (haproxy 1.6 inside docker), using "grace 0s" in the "defaults" section in the haproxy template conf, solves the issue.

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        timeout connect 5000
        timeout client  50000
        timeout server  50000
        grace  0s

# Template Customization
frontend http-in
        bind *:80
        {{ $services := .Services }}

Documentation:
https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#4.2-grace

this works for short lived connections.. if you have long running conns.. those will get killed, so perhaps you can increase the grace period... But the weird thing is.. Why haproxy keeps accepting new connections?
_

malterb · 2016-03-16T15:10:49Z

Has anyone tried marathon-lb's reload command?

https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run

#!/bin/bash
exec 2>&1
export PIDFILE="/tmp/haproxy.pid"
exec 200<$0

reload() {
  echo "Reloading haproxy"
  if ! haproxy -c -f /marathon-lb/haproxy.cfg; then
    echo "Invalid config"
    return 1
  fi
  if ! flock 200; then
    echo "Can't aquire lock, reload already in progress?"
    return
  fi

  # Begin to drop SYN packets with firewall rules
  IFS=',' read -ra ADDR <<< "$PORTS"
  for i in "${ADDR[@]}"; do
    iptables -w -I INPUT -p tcp --dport $i --syn -j DROP
  done

  # Wait to settle
  sleep 0.1

  # Save the current HAProxy state
  socat /var/run/haproxy/socket - <<< "show servers state" > /var/state/haproxy/global

  # Trigger reload
  haproxy -p $PIDFILE -f /marathon-lb/haproxy.cfg -D -sf $(cat $PIDFILE)

  # Remove the firewall rules
  IFS=',' read -ra ADDR <<< "$PORTS"
  for i in "${ADDR[@]}"; do
    iptables -w -D INPUT -p tcp --dport $i --syn -j DROP
  done

  # Need to wait 1s to prevent TCP SYN exponential backoff
  sleep 1
  flock -u 200
}

mkdir -p /var/state/haproxy
mkdir -p /var/run/haproxy

reload

trap reload SIGHUP
while true; do sleep 0.5; done

malterb · 2016-03-18T10:03:02Z

Found hashicorp/consul-template#442 and golang/go#13164

Could actually be related to Go. I just compiled bamboo with go 1.6 and will update this accordingly.

BTW: Another reload script that I might try if doesn't work: https://github.com/eBayClassifiedsGroup/PanteraS/blob/master/infrastructure/haproxy_reload.sh

imrangit · 2016-04-19T15:16:15Z

@elmalto: were you able to resolve the issue with the latest Go 1.6 or did you employ a reload script?

-Imran

malterb · 2016-04-19T15:17:11Z

I have not seen this issue since upgrading to 1.6

Malte
On Tue, Apr 19, 2016 at 08:16 imrangit notifications@github.com wrote:

@elmalto https://github.com/elmalto: were you able to resolve the issue
with the latest Go 1.6 or did you employ a reload script?

-Imran

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#200 (comment)

mvallerie · 2016-05-13T03:41:01Z

Hey,

As stated in #206 , we had this issue before on our Mesos cluster. After migrating our docker images to Go 1.6 about 2 days ago, looks like it fixed it.

I can't confirm this since we also have a lot of long running connections, but the number of haproxy processes after 2 days seems much more reasonnable than before. I'll have another look during the next week and post again if something changes.

Thanks for having found that out anyway :).

j1n6 · 2016-05-16T18:18:09Z

Upgrade might have helped.
I have a hunch that it's likely to be HAProxy itself.

Do you have any information/data about how often your deployment triggers reload?

On 13 May 2016, at 04:41, Mikaël Vallerie notifications@github.com wrote:

Hey,

As stated in #206 , we had this issue before on our Mesos cluster. After migrating our docker images to Go 1.6 about 2 days ago, looks like it fixed it.

I can't confirm this since we also have a lot of long running connections, but the number of haproxy processes after 2 days seems much more reasonnable than before. I'll have another look during the next week and post again if something changes.

Thanks for having found that out anyway :).

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

mvallerie · 2016-05-16T21:07:34Z

Hey, sure !

Usually on this cluster we get around 0 to 5 updates a day. The day it failed, we had many more (probably around 10), which resulted in something like 15+ haproxy processes on some mesos slaves.

We have one bamboo running (as a docker container) on each mesos slave. Right now, they have been up for 1 week, we had some updates on last friday but the amount of haproxy processes increased only until 6. And more important, sometimes, this amount is getting down, which wasn't the case before the upgrade.

I have a hunch that it's likely to be HAProxy itself.

My guess is you are right. We used marathon-lb before bamboo, and we also had this issue with it.

j1n6 · 2016-05-16T21:44:59Z

I suggest moving to Nginx to replace Haproxy, there's a branch that @bluepeppers has been working on that would support multiple reload destination - but it's still WIP.

mvallerie · 2016-05-16T23:26:00Z

Does nginx support TCP balancing (as Haproxy does) out of its "Plus" version ? Looks unclear to me.

I know it may work after building nginx with some extra modules. I'm just unsure about what those "extra modules" may or may not support compared to haproxy.

j1n6 · 2016-05-17T03:29:02Z

Yup, it does support it.
If you are using Nginx Plus version, both TCP and UDP are supported out of the box.

If you are using open sourced, try to use this Nginx compatible fork: https://github.com/alibaba/tengine

Sent from my iPhone

On 17 May 2016, at 00:26, Mikaël Vallerie notifications@github.com wrote:

Does nginx support TCP balancing (as Haproxy does) out of its "Plus" version ? Looks unclear to me.

I know it may work after building nginx with some extra modules. I'm just unsure about it.

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub

mvallerie · 2017-02-23T04:38:36Z

@activars Just to let you know (we're still on haproxy), it happened again today on one of our mesos slaves. This issue now seems to happen only in very rare / specific situations (this is the only time it happened since), and is probably more related to haproxy or go, so it's probably not necessary to reopen.

According to the refs above, upgrading haproxy to the latest 1.5.x might be the way to definitely fix that out. Considering minor version upgrades shouldn't harm, I prepared a docker image including haproxy 1.5.19 (vs 1.5.8) and based on golang 1.8 (vs 1.6, well, that's not minor but let's trust the promise of compatibility, and let me know if that sounds like a terrible mistake :).

I'm going to test this during the next few days.

nagsharma32 · 2019-04-10T22:45:56Z

I still have the problem. Running haproxy:1.7.5

tcolgate · 2019-04-23T12:10:34Z

I'm very sorry for the lack of comms on this thread. We no longer run bamboo (no longer on mesos), and are not going to be able to provide ongoing maintenance. If anyone is interesting in maintaining it going forward, please raise another issue and we'll look at redirecting people to a fork.

malterb mentioned this issue Mar 16, 2016

Stale HAProxy configurations remain listening mesosphere/marathon-lb#71

Closed

mvallerie mentioned this issue May 13, 2016

Now using golang:1.6.2 as a base for docker images #206

Merged

rasputnik mentioned this issue Jul 18, 2016

Race condition causing not released lock and too many HAProxy processes mesosphere/marathon-lb#267

Closed

drewrobb mentioned this issue Sep 23, 2016

Dr/new rebase strava/bamboo#4

Merged

mvallerie mentioned this issue Apr 6, 2017

Dockerfile: Go 1.8 & Official latest haproxy 1.5.X #234

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stale Haproxy processes #200

Stale Haproxy processes #200

malterb commented Feb 17, 2016

hammi85 commented Feb 19, 2016

rasputnik commented Feb 19, 2016

mohamedhaleem commented Feb 19, 2016

timoreimann commented Feb 20, 2016

malterb commented Feb 23, 2016

timoreimann commented Feb 23, 2016

malterb commented Feb 23, 2016

mohamedhaleem commented Mar 2, 2016

jmprusi commented Mar 8, 2016

malterb commented Mar 16, 2016

malterb commented Mar 18, 2016

imrangit commented Apr 19, 2016

malterb commented Apr 19, 2016

mvallerie commented May 13, 2016

j1n6 commented May 16, 2016

mvallerie commented May 16, 2016

j1n6 commented May 16, 2016

mvallerie commented May 16, 2016 •

edited

Loading

j1n6 commented May 17, 2016

mvallerie commented Feb 23, 2017 •

edited

Loading

nagsharma32 commented Apr 10, 2019

tcolgate commented Apr 23, 2019

Stale Haproxy processes #200

Stale Haproxy processes #200

Comments

malterb commented Feb 17, 2016

hammi85 commented Feb 19, 2016

rasputnik commented Feb 19, 2016

mohamedhaleem commented Feb 19, 2016

timoreimann commented Feb 20, 2016

malterb commented Feb 23, 2016

timoreimann commented Feb 23, 2016

malterb commented Feb 23, 2016

mohamedhaleem commented Mar 2, 2016

jmprusi commented Mar 8, 2016

malterb commented Mar 16, 2016

malterb commented Mar 18, 2016

imrangit commented Apr 19, 2016

malterb commented Apr 19, 2016

mvallerie commented May 13, 2016

j1n6 commented May 16, 2016

mvallerie commented May 16, 2016

j1n6 commented May 16, 2016

mvallerie commented May 16, 2016 • edited Loading

j1n6 commented May 17, 2016

mvallerie commented Feb 23, 2017 • edited Loading

nagsharma32 commented Apr 10, 2019

tcolgate commented Apr 23, 2019

mvallerie commented May 16, 2016 •

edited

Loading

mvallerie commented Feb 23, 2017 •

edited

Loading