Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ss_fping.php script times out when there is a lot of packet loss #5221

Closed
jdcoats opened this issue Feb 2, 2023 · 11 comments
Closed

ss_fping.php script times out when there is a lot of packet loss #5221

jdcoats opened this issue Feb 2, 2023 · 11 comments
Labels
confirmed Bug is confirm by dev team enhancement General tag for an enhancement resolved A fixed issue
Milestone

Comments

@jdcoats
Copy link

jdcoats commented Feb 2, 2023

When monitoring using Advanced Ping and there is significant packet loss happening instead of recording it I get script server timeouts and broken graphs instead documented latency or loss. Also note the bad line breaks in the log.
Example

2023/02/02 13:59:12 - SPINE: Poller[Main Poller] PID[23002] PT[140031989618368] WARNING: SS[2] The PHP Script Server did not respond in time for Timeout[7.00], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.162' '20' 'ICMP' ''
2023/02/02 13:57:10 - SPINE: Poller[Main Poller] PID[16189] PT[140596601669312] WARNING: SS[1] The PHP Script Server did not respond in time for Timeout[7.00], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.176' '20' 'ICMP' ''
] and will therefore be restarted2023/02/02 13:56:18 - SPINE: Poller[Main Poller] PID[12747] PT[140048724903616] WARNING: SS[1] The PHP Script Server did not respond in time for Timeout[7.01], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.176' '20' 'ICMP' ''
2023/02/02 13:56:11 - SPINE: Poller[Main Poller] PID[12747] PT[140048741689024] WARNING: SS[1] The PHP Script Server did not respond in time for Timeout[7.00], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.161' '20' 'ICMP' ''
2023/02/02 13:55:10 - SPINE: Poller[Main Poller] PID[9094] PT[140213569451712] WARNING: SS[6] The PHP Script Server did not respond in time for Timeout[7.00], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.162' '20' 'ICMP' ''
2023/02/02 13:54:11 - SPINE: Poller[Main Poller] PID[5663] PT[140093134186176] WARNING: SS[3] The PHP Script Server did not respond in time for Timeout[7.00], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.162' '20' 'ICMP' ''
] and will therefore be restarted2023/02/02 13:52:14 - SPINE: Poller[Main Poller] PID[31159] PT[140229734295232] WARNING: SS[14] The PHP Script Server did not respond in time for Timeout[7.00], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.161' '20' 'ICMP' ''
2023/02/02 13:59:12 - SPINE: Poller[[Main Poller](https://servername/cacti/pollers.php?action=edit&id=1)] PID[23002] PT[140031989618368] WARNING: SS[2] The PHP Script Server did not respond in time for Timeout[7.00], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.162' '20' 'ICMP' ''
2023/02/02 13:57:10 - SPINE: Poller[[Main Poller](https://servername/cacti/pollers.php?action=edit&id=1)] PID[16189] PT[140596601669312] WARNING: SS[1] The PHP Script Server did not respond in time for Timeout[7.00], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.176' '20' 'ICMP' ''
] and will therefore be restarted2023/02/02 13:56:18 - SPINE: Poller[[Main Poller](https://servername/cacti/pollers.php?action=edit&id=1)] PID[12747] PT[140048724903616] WARNING: SS[1] The PHP Script Server did not respond in time for Timeout[7.01], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.176' '20' 'ICMP' ''
2023/02/02 13:56:11 - SPINE: Poller[[Main Poller](https://servername/cacti/pollers.php?action=edit&id=1)] PID[12747] PT[140048741689024] WARNING: SS[1] The PHP Script Server did not respond in time for Timeout[7.00], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.161' '20' 'ICMP' ''
2023/02/02 13:55:10 - SPINE: Poller[[Main Poller](https://servername/cacti/pollers.php?action=edit&id=1)] PID[9094] PT[140213569451712] WARNING: SS[6] The PHP Script Server did not respond in time for Timeout[7.00], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.162' '20' 'ICMP' ''
2023/02/02 13:54:11 - SPINE: Poller[[Main Poller](https://servername/cacti/pollers.php?action=edit&id=1)] PID[5663] PT[140093134186176] WARNING: SS[3] The PHP Script Server did not respond in time for Timeout[7.00], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.162' '20' 'ICMP' ''
] and will therefore be restarted2023/02/02 13:52:14 - SPINE: Poller[[Main Poller](https://servername/cacti/pollers.php?action=edit&id=1)] PID[31159] PT[140229734295232] WARNING: SS[14] The PHP Script Server did not respond in time for Timeout[7.00], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '104.170.44.161' '20' 'ICMP' ''

image

ping 104.170.44.161 -t

Pinging 104.170.44.161 with 32 bytes of data:
Reply from 104.170.44.161: bytes=32 time=51ms TTL=120
Request timed out.
Request timed out.
Reply from 104.170.44.161: bytes=32 time=52ms TTL=120
Request timed out.
Request timed out.
Reply from 104.170.44.161: bytes=32 time=52ms TTL=120
Reply from 104.170.44.161: bytes=32 time=51ms TTL=120
Request timed out.
Request timed out.
Request timed out.
Reply from 104.170.44.161: bytes=32 time=51ms TTL=120
Reply from 104.170.44.161: bytes=32 time=52ms TTL=120
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Reply from 104.170.44.161: bytes=32 time=51ms TTL=120
Request timed out.
Request timed out.
Request timed out.
Reply from 104.170.44.161: bytes=32 time=52ms TTL=120
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Reply from 104.170.44.161: bytes=32 time=50ms TTL=120
Request timed out.
Reply from 104.170.44.161: bytes=32 time=52ms TTL=120

@jdcoats jdcoats added bug Undesired behaviour unverified Some days we don't have a clue labels Feb 2, 2023
@TheWitness
Copy link
Member

yea, that's the whole problem with doing Advanced ping in Cacti.

  1. in bad packet loss situations the poller times out (not good)
  2. you never quite know what the script timeout value should be.

Adding @xmacan and @thurban to the thread. I know they've got some other solutions. I would do some of this "stuff" Async myself and write a plugin to store it in a table :) But then, I would need time to write it ;)

@TheWitness
Copy link
Member

@jdcoats are these icmp pings? If so, do you have a problem maybe pinging the ssh port instead? ICMP relies on the 'ping' command which does not have a sub-second timeout. Best to use the TCP or UDP. Also, watch the Script Timeout in Cacti. If it's too high, everything start to break.

The other enhancement is maybe instead of inserting "0" we insert "U". I like the second half. That way we could add an area fill to show when the host was down. That would be a nice 1.3.0 feature enhancement.

@TheWitness
Copy link
Member

Maybe what web need to do is exit after two failed ping attempts.

@TheWitness
Copy link
Member

ss_fping.zip

This version of the script will exit if more than 25% of ping sweeps failed.

@TheWitness
Copy link
Member

@jdcoats any update?

@jdcoats
Copy link
Author

jdcoats commented Mar 23, 2023

I will try to confirm this tomorrow, sorry for the delay

@jdcoats
Copy link
Author

jdcoats commented Mar 23, 2023

now if only i could force some packet loss :)

@TheWitness
Copy link
Member

😂

@TheWitness TheWitness added confirmed Bug is confirm by dev team and removed unverified Some days we don't have a clue labels Mar 23, 2023
@TheWitness TheWitness changed the title ping issue ss_fping.php script times out when there is a lot of packet loss Mar 23, 2023
@TheWitness TheWitness added this to the v1.2.25 milestone Mar 23, 2023
@jdcoats
Copy link
Author

jdcoats commented Mar 29, 2023

when i re-create the packet loss I see this

SPINE: Poller[[Main Poller](https://syslog01.tmh.org/cacti/pollers.php?action=edit&id=1)] PID[6786] PT[140055276394176] WARNING: SS[0] The PHP Script Server did not respond in time for Timeout[5.00], Command[/var/www/localhost/htdocs/cacti/scripts/ss_fping.php ss_fping '10.6.33.77' '20' 'ICMP' ''

The Packet Loss is 43 packets transmitted, 33 received, 23.2558% packet loss, time 42155ms
then the graph is nan
image

I can find a few times where something was actually graphed
image

@TheWitness
Copy link
Member

I'm going to defer this to 1.3.x as 1.2.x is EODL.

@TheWitness TheWitness modified the milestones: v1.2.25, v1.3.0 Mar 31, 2023
TheWitness added a commit that referenced this issue Apr 2, 2023
This fix should work pretty well when Cacti starts using the `fping` binary now.
@TheWitness TheWitness added resolved A fixed issue enhancement General tag for an enhancement and removed bug Undesired behaviour labels Apr 2, 2023
@TheWitness
Copy link
Member

Okay, this should be resolved now in the develop branch.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
confirmed Bug is confirm by dev team enhancement General tag for an enhancement resolved A fixed issue
Projects
None yet
Development

No branches or pull requests

2 participants