-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azure-lb: Don't redirect nc listener output to pidfile #1528
Conversation
The `lb_start()` function spawns an `nc` listener background process and echoes the resulting pid to `$pidfile`. Due to a bug in the redirection, all future data received by the `nc` process is also appended to `$pidfile`. If binary data is received later and appended to `$pidfile`, the monitor operation fails when `grep` searches the now-binary file. ``` line 97: kill: Binary: arguments must be process or job IDs ] line 97: kill: file: arguments must be process or job IDs ] line 97: kill: /var/run/nc_PF2_02.pid: arguments must be process or job IDs ] line 97: kill: matches: arguments must be process or job IDs ] ``` Then the start operation fails during recovery. `lb_start()` spawns a new `nc` process, but the old process is still running and using the configured port. ``` nc_PF2_02_start_0:777:stderr [ Ncat: bind to :::62502: Address already in use. QUITTING. ] ``` This patch fixes the issue by removing the `nc &` command from the section whose output gets redirected to `$pidfile`. Now, only the `nc` PID is echoed to `$pidfile`. Resolves: RHBZ#1850778 Resolves: RHBZ#1850779
LGTM. |
@nrwahl2 , apologies in advance if this is not the correct forum...But I'm seeing this issue exactly on my clusters and it persists after updating to resource-agents.x86_64 - 4.1.1-61.el7 from CentOS Base repo. My servers are running RHEL 7.9, however, we're pulling Pacemaker/Resource Agents from CentOS base. I've confirmed the updated azure-lb.sh in /usr/lib/ocf/resource.d/heartbeat does not reflect changes in this pull in resource-agents.x86_64 - 4.1.1-61.el7. Thanks in advance! |
@sjohnsonsf On RHEL, the fix was introduced in https://centos.pkgs.org/7/centos-updates-x86_64/resource-agents-4.1.1-61.el7_9.4.x86_64.rpm.html Can you give that a try? |
@nrwahl2 pulling from Updates Repo seems to have done it. Can't thank you enough for the prompt reply! |
@sjohnsonsf No problem, glad it's sorted out! |
Hey @nrwahl2 , since patching, our clusters have stabilized but the azure-lb resource still fails for us, though now it does not cause a cluster down scenarios. What's particularly interesting for us is that it has occurred every Tuesday, at the same exact time. As we're a large environment I am reviewing relevant logs via Splunk, etc. to correlate but wanted to see if you had any experience with what could trigger this bug (what's sending data to the listener port?). Any tips would be helpful. Thanks again! |
@sjohnsonsf Try changing:
to
I'm investigating an apparent bug right now where the So far, it seems that redirecting output prevents the |
@nrwahl2 thanks. The culprit in our environment is a BeyondTrust appliance scanning VMs for root password change/management. We're in the processes of testing to confirm but glad to know the Resource Agent will be able to handle this sort of the thing in the future. Thanks again! |
The
lb_start()
function spawns annc
listener background processand echoes the resulting pid to
$pidfile
. Due to a bug in theredirection, all future data received by the
nc
process is alsoappended to
$pidfile
.If binary data is received later and appended to
$pidfile
, themonitor operation fails when
grep
searches the now-binary file.Then the start operation fails during recovery.
lb_start()
spawns anew
nc
process, but the old process is still running and using theconfigured port.
This patch fixes the issue by removing the
nc &
command from thesection whose output gets redirected to
$pidfile
. Now, only thenc
PID is echoed to
$pidfile
.Resolves: RHBZ#1850778
Resolves: RHBZ#1850779