Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spine using CPU is very high #88

Closed
hel2o opened this issue Dec 22, 2016 · 26 comments
Closed

Spine using CPU is very high #88

hel2o opened this issue Dec 22, 2016 · 26 comments

Comments

@hel2o
Copy link

hel2o commented Dec 22, 2016

When I use other availability/reachability options methods other than snmp uptime the spine using CPU is very high ! is bug ???

@cigamit
Copy link
Member

cigamit commented Dec 22, 2016

What is the version of MySQL and what method are you using?

@hel2o
Copy link
Author

hel2o commented Dec 22, 2016

Using snmp uptime and ping with tcp is normal
Using others method will make cpu usage is very high
MySql Ver 14.14 Distrib 5.7.16
Cacti Ver 1.0
Spine Ver 1.0

@cigamit
Copy link
Member

cigamit commented Dec 22, 2016

  1. Upgrade to MySQL 5.7.17, they just fixed an issue that caused all sorts of grief
  2. be more specific about 'which' methods
  3. if it's specific to icmp, let us know if you are using setuid() on the binary or Linux capabilities

@cigamit
Copy link
Member

cigamit commented Dec 22, 2016

Lastly, add this to the mix:
processes, threads, devices

@hel2o
Copy link
Author

hel2o commented Dec 22, 2016

OK, I try update MySQL to 5.7.17
normal method : snmp uptime ,ping-udp , snmp-desc ,snmp-getnext
abnormal method : ping and snmp uptime ,ping or snmp uptime
Method:spine Processes:4 Threads:8 Hosts:273 HostsPerProcess:69 DataSources:5481 RRDsProcessed:3028
-- thanks

@hel2o
Copy link
Author

hel2o commented Dec 22, 2016

When I upgraded the MySQL version to 5.7.17 the problem still exists !!!

@cigamit
Copy link
Member

cigamit commented Dec 23, 2016

what type of pings?

@hel2o
Copy link
Author

hel2o commented Dec 25, 2016

icmp and udp

@cigamit
Copy link
Member

cigamit commented Dec 27, 2016

ICMP is a bit of a problem since if you have many threads, every thread get's interrupted every time an icmp reply comes back from the monitored hosts. If you reduce the threads and processes, things will get a little better. With a site your size, 1 process and 8 threads should be good enough. Also, consider using boost if your polling times go up.

@hel2o
Copy link
Author

hel2o commented Dec 28, 2016

NO,I only have one device that is used in ICMP detection mode,the other are SNMP, I tried a new CENTOS6.8 is the same situation

@cigamit
Copy link
Member

cigamit commented Dec 28, 2016

ls -altr from the spine bin directory.

@hel2o
Copy link
Author

hel2o commented Dec 28, 2016

default

@cigamit
Copy link
Member

cigamit commented Dec 28, 2016

chmod +s spine (as root)

@cigamit
Copy link
Member

cigamit commented Dec 28, 2016

The net result should be 'rwsr-sr-x' or close to it.

@hel2o
Copy link
Author

hel2o commented Dec 28, 2016

default
CPU is still very high
default
default

@cigamit
Copy link
Member

cigamit commented Dec 28, 2016

What os version and release? I'm polling over 700 hosts using ICMP and this does not happen. You should strace the process once it starts, or strace it manually to see what is happening.

In fact, you can debug the actual host:

strace -s 2500 ./spine -R -V 3 -S -f host_id -l host_id

With the host_id of the host you think is causing this.

@hel2o
Copy link
Author

hel2o commented Dec 28, 2016

root@ISC:bin#./spine -R -V 5 -S -f 654 -l 654
SPINE: Using spine config file [../etc/spine.conf]
2016-12-28 10:32:58 - SPINE: Poller[1] DEBUG: The path_php_server variable is /var/www/html/cacti/script_server.php
2016-12-28 10:32:58 - SPINE: Poller[1] DEBUG: The path_cactilog variable is /var/www/html/cacti/log/cacti.log
DEBUG: The log_destination variable is 4 (STDOUT)
DEBUG: The path_php variable is /usr/bin/php
DEBUG: The availability_method variable is 2
DEBUG: The ping_recovery_count variable is 2
DEBUG: The ping_failure_count variable is 2
DEBUG: The ping_method variable is 2
DEBUG: The ping_retries variable is 1
DEBUG: The ping_timeout variable is 400
DEBUG: The snmp_retries variable is 3
DEBUG: The log_perror variable is 0
DEBUG: The log_pwarn variable is 0
DEBUG: The boost_redirect variable is 0
DEBUG: The log_pstats variable is 0
DEBUG: The threads variable is 8
DEBUG: The polling interval is 60 seconds
DEBUG: The number of concurrent processes is 4
DEBUG: The script timeout is 30
DEBUG: The selective_device_debug variable is
DEBUG: The spine_log_level variable is 1
DEBUG: The number of php script servers to run is 10
DEBUG: StartDevice='654', EndDevice='654', TotalPHPScripts='0
DEBUG: The PHP Script Server is Not Required
DEBUG: The Maximum SNMP OID Get Size is 100
Version 1.0.0 starting
DEBUG: MySQL is Thread Safe!
DEBUG: Spine is running asroot.
SPINE: Initializing Net-SNMP API
DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP.
SPINE: Initializing PHP Script Server(s)
NOTE: Spine will support multithread device polling.
DEBUG: Initial Value of Active Threads is 0
DEBUG: Valid Thread to be Created
DEBUG: The Value of Active Threads is 1
DEBUG: In Poller, About to Start Polling of Device for Device ID 654
Device[654] DEBUG: Entering ICMP Ping
Device[654] DEBUG: ICMP Device Alive, Try Count:1, Time:7.3528 ms
Device[654] PING: Result ICMP: Device is Alive
Device[654] TH[1] Device has no information for recache.
Device[654] TH[1] Total Time: 0.066 Seconds
Device[654] TH[1] DEBUG: HOST COMPLETE: About to Exit Device Polling Thread Function
DEBUG: The Value of Active Threads is 7 for Device ID 654
DEBUG: Thread Cleanup Complete
DEBUG: PHP Script Server Pipes Closed
DEBUG: Allocated Variable Memory Freed
DEBUG: MYSQL Free & Close Completed
DEBUG: Net-SNMP Close Completed
Time: 0.2125 s, Threads: 8, Devices: 1

@cigamit
Copy link
Member

cigamit commented Dec 28, 2016

Well, that is not the one. He's just doing a ping and existing. 200ms total.

@hel2o
Copy link
Author

hel2o commented Dec 28, 2016

I just enabled the host_id for 654 of this host will make cpu high

@cigamit
Copy link
Member

cigamit commented Dec 28, 2016

Strace is the next step then. Again, though what OS and release?

@hel2o
Copy link
Author

hel2o commented Dec 28, 2016

Linux version 2.6.32-642.el6.x86_64 (mockbuild@worker1.bsys.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Tue May 10 17:27:01 UTC 2016
CentOS release 6.8 (Final)

@cigamit
Copy link
Member

cigamit commented Dec 28, 2016

Same here. Once you fine the host, and do the strace, do a tcpdump as well. This makes no sense. Not sure if GitHub allows you to upload them though.

@cigamit
Copy link
Member

cigamit commented Dec 29, 2016

Are you planning on attaching the required output?

@interduo
Copy link
Contributor

interduo commented Dec 29, 2016

I got the same problem. I wilk debugowanie it after 5th day of 2017.

@cigamit
Copy link
Member

cigamit commented Dec 29, 2016

Very odd in that we moved to semaphores to avoid this problem, the tcpdump/strace will be important to understand what is going on. Might be in the socket timeout. I'll have a look. Just udp ping right?

@cigamit
Copy link
Member

cigamit commented Dec 29, 2016

Pretty sure this resolved it.
Cacti/spine@93cc9de

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants