Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Plane API is killed by SIGTERM in openshift #329

Open
git001 opened this issue Apr 2, 2024 · 8 comments · May be fixed by #330
Open

Data Plane API is killed by SIGTERM in openshift #329

git001 opened this issue Apr 2, 2024 · 8 comments · May be fixed by #330

Comments

@git001
Copy link

git001 commented Apr 2, 2024

Introduction

I try to run haproxy in front of craftcms and want to use the Data Plane API for management.

Data Plane API infos

This is the dataplane api version.

curl -vLo dataplaneapi-haproxy-v2.9.1.tar.gz \
  https://github.com/haproxytech/dataplaneapi/archive/refs/tags/v2.9.1.tar.gz

Due to the fact that the trace output of the dataplaneapi binary was not very helpful have I added this lines to the code which creates the output below.

original code

} else {
// if nbproc is not set, use master socket with 1 process
ms := runtime_options.MasterSocket(masterSocket, 1)
runtimeClient, err = runtime_api.New(ctx, mapsDir, ms)
if err == nil {
return runtimeClient
}
log.Warningf("Error setting up runtime client with master socket: %s : %s", masterSocket, err.Error())
}

my "patch"

# client-native/cn.go:96
			} else {
				// if nbproc is not set, use master socket with 1 process
				out, myerr := exec.Command("ls", "-la", "/data/haproxy/run/master-socket").Output()

				if myerr != nil {
					fmt.Printf("%s", myerr)
				}

				fmt.Println("Command Successfully Executed")
				output := string(out[:])
				fmt.Println(output)

				ms := runtime_options.MasterSocket(masterSocket, 1)
				runtimeClient, err = runtime_api.New(ctx, mapsDir, ms)
				if err == nil {
					return runtimeClient
				}
				log.Warningf("Error setting up runtime client with master socket (1): %s : %s", masterSocket, err.Error())
			}

That's the output when I run the HAProxy with the dataplane api

alex@CPC-aleks-RW2GP on 02/04/2024 at 22:49:17_UTC /mnt/c/local_data/git-repos/craftcms_k8s$ oc -n craftcms logs craftcms-hap-b87b89874-ppssv
[NOTICE]   (1) : New program 'api' (8) forked
[NOTICE]   (1) : New worker (9) forked
[NOTICE]   (1) : Loading success.
[WARNING]  (9) : fcgi-servers/craftcms1 changed its IP from (none) to 10.129.2.11 by kube-dns/dns1.
fcgi-servers/craftcms1 changed its IP from (none) to 10.129.2.11 by kube-dns/dns1.
[WARNING]  (9) : Server fcgi-servers/craftcms1 ('craftcms-php.craftcms.svc.cluster.local') is UP/READY (resolves again).
Server fcgi-servers/craftcms1 ('craftcms-php.craftcms.svc.cluster.local') is UP/READY (resolves again).
[WARNING]  (9) : Server fcgi-servers/craftcms1 administratively READY thanks to valid DNS answer.
Server fcgi-servers/craftcms1 administratively READY thanks to valid DNS answer.
configuration file /data/haproxy/etc/dataplaneapi.yaml does not exists, creating one

time="2024-04-02T22:49:16Z" level=info msg="Build from: "
time="2024-04-02T22:49:16Z" level=info msg="HAProxy Data Plane API  .dev.dirty"
time="2024-04-02T22:49:16Z" level=info msg="Build date: 2024-04-02T22:45:03Z"
time="2024-04-02T22:49:16Z" level=info msg="Reload strategy: custom"
Command Successfully Executed
srwxr-xr-x. 1 1000940000 root 0 Apr  2 22:49 /data/haproxy/run/master-socket

time="2024-04-02T22:49:16Z" level=warning msg="Error setting up runtime client with master socket (1): /data/haproxy/run/master-socket;sockpair@7 : dial unix /data/haproxy/run/master-socket;sockpair@7: connect: no such file or directory"

[NOTICE]   (1) : haproxy version is 2.9.6-9eafce5
[NOTICE]   (1) : path to executable is /usr/local/sbin/haproxy
[ALERT]    (1) : Current program 'api' (8) exited with code 1 (Exit)
[ALERT]    (1) : exit-on-failure: killing every processes with SIGTERM
[ALERT]    (1) : Current worker (9) exited with code 143 (Terminated)
[WARNING]  (1) : All workers exited. Exiting... (1)

My observations

This line confuses me

time="2024-04-02T22:49:16Z" level=warning \
msg="Error setting up runtime client with master socket (1): \
  /data/haproxy/run/master-socket;sockpair@7 : \
    dial unix /data/haproxy/run/master-socket;sockpair@7: \
      connect: no such file or directory"

because the ls before the command runtime_api.New(...) shows that the socket is there.

srwxr-xr-x. 1 1000940000 root 0 Apr  2 22:49 /data/haproxy/run/master-socket

and I can execute the help command on the master socket

alex@CPC-aleks-RW2GP on 02/04/2024 at 23:11:02_UTC /mnt/c/local_data/git-repos/craftcms_k8s$ oc -n craftcms rsh --shell /bin/bash craftcms-hap-b87b89874-ddxs8
groups: cannot find name for group ID 1000940000
1000940000@craftcms-hap-b87b89874-ddxs8:/$ echo "help"|socat /data/haproxy/run/master-socket -
The following commands are valid at this level:
  @!<pid>                                 : send a command to the <pid> process
  @<relative pid>                         : send a command to the <relative pid> process
  @master                                 : send a command to the master process
  hard-reload                             : achieve a hard-reload (-st) of haproxy
  operator                                : lower the level of the current CLI session to operator
  reload                                  : achieve a soft-reload (-sf) of haproxy
  show cli level                          : display the level of the current CLI session
  show cli sockets                        : dump list of cli sockets
  show proc                               : show processes status
  show startup-logs                       : report logs emitted during HAProxy startup
  show version                            : show version of the current process
  user                                    : lower the level of the current CLI session to user
  help [<command>]                        : list matching or all commands
  prompt [timed]                          : toggle interactive mode with prompt
  quit                                    : disconnect

My assumption is that dataplaneapi tries to connect to /data/haproxy/run/master-socket;sockpair@7 which of course does not exist.

Important

When is the ;sockpair@7 added to the master-socket?

haproxy infos

haproxy run

This is how the haproxy is started.

oc -n craftcms exec craftcms-hap-b87b89874-ddxs8 -- ps axf
    PID TTY      STAT   TIME COMMAND
     24 ?        Rs     0:00 ps axf
      1 ?        Ss     0:00 haproxy -f /data/haproxy/etc/haproxy.cfg -db -W -S /data/haproxy/run/master-socket
      8 ?        Sl     0:01 haproxy -f /data/haproxy/etc/haproxy.cfg -db -W -S /data/haproxy/run/master-socket

haproxy config

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    log stdout format raw daemon debug
    pidfile     /data/haproxy/run/haproxy.pid
    master-worker
    stats socket /data/haproxy/run/stats mode 660 level admin expose-fd listeners

resolvers kube-dns
  nameserver dns1 dns-default.openshift-dns.svc.cluster.local:53
  accepted_payload_size 4096
  resolve_retries       3
  timeout resolve       1s
  timeout retry         1s
  hold other           30s
  hold refused         30s
  hold nx              30s
  hold timeout         30s
  hold valid           10s
  hold obsolete        30s

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    balance                 leastconn
    log                     global
    option                  httplog
    option                  dontlognull
    option                  log-health-checks
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s

userlist haproxy-dataplaneapi
    user admin insecure-password mypassword
#
program api
   command /usr/bin/dataplaneapi -f=/data/haproxy/etc/dataplaneapi.yaml --log-to=stdout --log-level=trace --spoe-dir=/data/haproxy/spoe --maps-dir=/data/haproxy/maps --ssl-certs-dir=/data/haproxy/ssl --general-storage-dir=/data/haproxy/general --host 0.0.0.0 --port 5555 --haproxy-bin /usr/sbin/haproxy --config-file /data/haproxy/etc/haproxy.cfg --reload-cmd "kill -SIGUSR2 1" --restart-cmd "kill -SIGUSR2 1" --reload-delay 5 --userlist haproxy-dataplaneapi --socket-path=/data/haproxy/run/data-plane.sock
   no option start-on-reload

#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend craft-cms
  bind *:8080

  tcp-request inspect-delay 5s
  tcp-request content accept if HTTP

  monitor-uri /health
  http-request deny if { path_sub -i %0a %0d }
  http-request deny if { hdr_len(content-length) 0 }
  http-request del-header Proxy
  http-request set-header Host %[req.hdr(Host),lower]

  acl exist-php-ext path_sub -i .php
  http-request set-path /index.php%[path] if !exist-php-ext !{ path_end .php }

  http-response set-header Strict-Transport-Security "max-age=16000000; includeSubDomains; preload;"

  default_backend fcgi-servers

listen stats
  bind *:1936
  monitor-uri /healthz
  http-request use-service prometheus-exporter if { path /metrics }
  stats enable
  stats uri /

backend fcgi-servers

  option httpchk
  http-check connect proto fcgi
  http-check send meth GET uri /fpm-ping
  
  use-fcgi-app php-fpm

  # https://www.haproxy.com/blog/circuit-breaking-haproxy
  server-template craftcms 5 craftcms-php.craftcms.svc.cluster.local:9000 proto fcgi check resolvers kube-dns init-addr none observe layer7  error-limit 5  on-error mark-down inter 10s  rise 30  slowstart 40s

fcgi-app php-fpm
    log-stderr global
    option keep-conn
    option mpxs-conns
    option max-reqs 10

    docroot /app/web
    index index.php
    path-info ^(/.+\.php)(/.*)?$

Output of haproxy -vv

haproxy -vv
HAProxy version 2.9.6-9eafce5 2024/02/26 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2025.
Known bugs: http://www.haproxy.org/bugs/bugs-2.9.6.html
Running on: Linux 5.14.0-284.52.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jan 30 08:35:38 EST 2024 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_PTHREAD_EMULATION=1 USE_LINUX_TPROXY=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_TFO=1 USE_QUIC=1 USE_PROMEX=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_QUIC_OPENSSL_COMPAT=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX +PTHREAD_EMULATION +QUIC +QUIC_OPENSSL_COMPAT +RT +SHM_OPEN +SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 -SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=8).
Built with OpenSSL version : OpenSSL 3.0.2 15 Mar 2022
Running on OpenSSL version : OpenSSL 3.0.2 15 Mar 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with Lua version : Lua 5.4.4
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.39 2021-10-29
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.4.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
       quic : mode=HTTP  side=FE     mux=QUIC  flags=HTX|NO_UPG|FRAMED
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
        [BWLIM] bwlim-in
        [BWLIM] bwlim-out
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace

Dockerfile

That's the Dockerfile of the image.

FROM haproxytech/haproxy-ubuntu:2.9

COPY container-files/ /

RUN set -x \
  && cp /usr/bin/dataplaneapi /usr/bin/dataplaneapi.orig \
  && cp /data/haproxy/bin/dataplaneapi /usr/bin/dataplaneapi \
  && mkdir -p /data/haproxy/etc \
    /data/haproxy/run \
    /data/haproxy/maps \
    /data/haproxy/ssl \
    /data/haproxy/general \
    /data/haproxy/spoe \
  && chown -R 1001:0 /data \
  && chmod -R g=u /data \
  && /usr/bin/dataplaneapi -v

USER 1001
@mjuraga
Copy link
Collaborator

mjuraga commented Apr 3, 2024

Hi, can you paste the output of: /data/haproxy/etc/dataplaneapi.yaml?

@git001
Copy link
Author

git001 commented Apr 3, 2024

that's it

$ cat  /data/haproxy/etc/dataplaneapi.yaml
config_version: 2
name: craftcms-hap-debug
mode: single
status: ""
dataplaneapi:
  advertised:
    api_address: ""
    api_port: 0
haproxy:
  reload:
    reload_strategy: custom

@git001
Copy link
Author

git001 commented Apr 3, 2024

Maybe there is another issue and the message above hides the original problem.
As OpenShift restricts the Pod run environment could the socket call fail.

https://github.com/haproxytech/client-native/blob/e914b0d0f77265cc83cb61eea069dea75c38a706/runtime/runtime_single_client.go#L209-L223

Which, I think, is called from here.

https://github.com/haproxytech/client-native/blob/e914b0d0f77265cc83cb61eea069dea75c38a706/runtime/runtime_single_client.go#L62-L80

@mjuraga
Copy link
Collaborator

mjuraga commented Apr 3, 2024

Hi @git001, thanks for your input, but the bug is in the dataplaneapi code, it picks up the master socket location from a env variable set by the HAProxy: https://github.com/haproxytech/dataplaneapi/blob/master/configure_data_plane.go#L121 and it doesn't properly sanitize the socket location (doesn't remove sockpair@ suffix added by the latest versions of HAProxy). We can fix this, or you could give it a shot if that is interesting for you.

@git001
Copy link
Author

git001 commented Apr 3, 2024

@mjuraga thanks for the tip. I will try to fix it with a PR

@git001
Copy link
Author

git001 commented Apr 3, 2024

I have now fixed the sockpair@ bug with this code.

...
	// Override options with env variables
	if os.Getenv("HAPROXY_MWORKER") == "1" {
		mWorker = true
		masterRuntime := os.Getenv("HAPROXY_MASTER_CLI")
		if misc.IsUnixSocketAddr(masterRuntime) {

			fmt.Printf("before Replace masterRuntime :%v:\n", masterRuntime)

			if strings.HasPrefix(masterRuntime, "unix@") {
				haproxyOptions.MasterRuntime = strings.Replace(masterRuntime, "unix@", "", 1)
				if strings.Contains(haproxyOptions.MasterRuntime, "sockpair@") {
					semikolon := strings.Index(haproxyOptions.MasterRuntime, ";")
					haproxyOptions.MasterRuntime = haproxyOptions.MasterRuntime[:semikolon]
				}
			}

			fmt.Printf("after Replace masterRuntime :%v:\n", haproxyOptions.MasterRuntime)
		}
	}
...

From the output below is shown that the socketpair is gone.

before Replace masterRuntime :unix@/data/haproxy/run/master-socket;sockpair@7:
after Replace masterRuntime :/data/haproxy/run/master-socket:

The problem with the SIGTERM is still there.

 haproxy -f /data/haproxy/etc/haproxy.cfg -db -W -S /data/haproxy/run/master-socket
[NOTICE]   (9) : New program 'api' (11) forked
[NOTICE]   (9) : New worker (12) forked
[NOTICE]   (9) : Loading success.
[WARNING]  (12) : fcgi-servers/craftcms1 changed its IP from (none) to 10.129.2.11 by kube-dns/dns1.
fcgi-servers/craftcms1 changed its IP from (none) to 10.129.2.11 by kube-dns/dns1.
[WARNING]  (12) : Server fcgi-servers/craftcms1 ('craftcms-php.craftcms.svc.cluster.local') is UP/READY (resolves again).
Server fcgi-servers/craftcms1 ('craftcms-php.craftcms.svc.cluster.local') is UP/READY (resolves again).
[WARNING]  (12) : Server fcgi-servers/craftcms1 administratively READY thanks to valid DNS answer.
Server fcgi-servers/craftcms1 administratively READY thanks to valid DNS answer.
configuration file /data/haproxy/etc/dataplaneapi.yaml does not exists, creating one
time="2024-04-03T11:26:23Z" level=info msg="HAProxy Data Plane API  .dev.dirty"
time="2024-04-03T11:26:23Z" level=info msg="Reload strategy: custom"
time="2024-04-03T11:26:23Z" level=info msg="Build from: "
time="2024-04-03T11:26:23Z" level=info msg="Build date: 2024-04-03T11:23:50Z"

before Replace masterRuntime :unix@/data/haproxy/run/master-socket;sockpair@7:
after Replace masterRuntime :/data/haproxy/run/master-socket:

Command Successfully Executed
srwxr-xr-x. 1 1000950000 root 0 Apr  3 11:26 /data/haproxy/run/master-socket

ms :{/data/haproxy/run/master-socket 1}: masterSocket :/data/haproxy/run/master-socket: mapsDir :{/data/haproxy/maps}:
[NOTICE]   (9) : haproxy version is 2.9.6-9eafce5
[NOTICE]   (9) : path to executable is /usr/local/sbin/haproxy
[ALERT]    (9) : Current program 'api' (11) exited with code 1 (Exit)
[ALERT]    (9) : exit-on-failure: killing every processes with SIGTERM
[ALERT]    (9) : Current worker (12) exited with code 143 (Terminated)
[WARNING]  (9) : All workers exited. Exiting... (1)

As mentioned in #329 (comment) could it be that the socket call in client-native be another issue?

git001 added a commit to git001/dataplaneapi that referenced this issue Apr 3, 2024
With commit haproxy/haproxy@8a02257
was the `sockpair@` added to the master socket.

fix: haproxytech#329
@git001 git001 linked a pull request Apr 3, 2024 that will close this issue
@mjuraga
Copy link
Collaborator

mjuraga commented Apr 5, 2024

For test could you try running dpapi standalone, without running from the program section.

@git001
Copy link
Author

git001 commented Apr 5, 2024

Any chance to merge the PR #330 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants