New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snmp plugin strange segfault #804

Closed
adnane- opened this Issue Nov 13, 2014 · 4 comments

Comments

Projects
None yet
3 participants
@adnane-

adnane- commented Nov 13, 2014

We are using latest collectd and the plugin SNMP but we are facing an issue.
When a network switch device is not responsing, a segfault event is raised and collectd daemon gets killed.

Through the use of { snmp_gets }, we get the following:
snmpwalk -v 2c -c comstrg swicth IF-MIB::ifInOctets
Timeout: No Response from switch

This same switch configured in collectd fires a timeout (as a collectd error and a segfault event) even though it is the only device configured in collectd configuration file.

Here is some stacktrace:

[2014-11-13 15:55:01] snmp plugin: host esomehost: snmp_sess_synch_response failed: Timeout
Nov 13 15:55:01 machine kernel: collectd[41452]: segfault at 18 ip 000000363b075fb3 sp 00007f4cb1602580 error 4 in libc-2.12.so[363b000000+18a000]

We're using : CentOS 6.6
Collectd commit : ec0e109

compilation
./configure --prefix=/usr/local/monitoring/snmp --disable-apple_sensors --disable-aquaero --disable-mic --disable-netapp --disable-nut --disable-lpar --disable-oracle --disable-pf --disable-onewire --disable-redis --disable-routeros --disable-tokyotyrant --disable-tape --disable-sigrok --disable-write_mongodb --disable-write_redis --disable-zfs_arc --disable-xmms --disable-rrdcached --disable-pinba --disable-modbus --disable-gmond --disable-libvirt --disable-java --disable-madwifi --disable-wireless --disable-olsrd --disable-teamspeak2 --enable-write_riemann --disable-amqp --enable-snmp --enable-write_graphite--disable-perl

############################################
Configurations : https://gist.github.com/adnane-/05905e0db4c37b7c322a

valgrind -v --leak-check=full /usr/local/monitoring/collectd/sbin/collectd -P /usr/local/monitoring/collectd//var/run//collectd.pid -C /usr/local/monitoring/collectd/etc/collectd.conf --> https://gist.github.com/bragonznx/4da47f411d1ec974d2ec

strace /usr/local/monitoring/collectd/sbin/collectd -P /usr/local/monitoring/collectd//var/run//collectd.pid -C /usr/local/monitoring/collectd/etc/collectd.conf --> https://gist.github.com/adnane-/1d69129f004f04179932

@adnane- adnane- changed the title from snmp plugin get collectd killed for segfault to snmp plugin gets collectd killed for segfault Nov 13, 2014

@adnane- adnane- changed the title from snmp plugin gets collectd killed for segfault to snmp plugin strange segfault Nov 19, 2014

@mfournier

This comment has been minimized.

Show comment
Hide comment
@mfournier

mfournier Nov 19, 2014

Contributor

@adnane- as you seem to have built collectd from source, could you please add --enable-debug to the options to ./configure, then run collectd in gdb (gdb /usr/local/monitoring/collectd/sbin/collectd, then run -P /usr/local/monitoring/collectd//var/run//collectd.pid -C /usr/local/monitoring/collectd/etc/collectd.conf). When this segfault occurs, type backtrace in the gdb shell and paste the output here. Thanks !

Contributor

mfournier commented Nov 19, 2014

@adnane- as you seem to have built collectd from source, could you please add --enable-debug to the options to ./configure, then run collectd in gdb (gdb /usr/local/monitoring/collectd/sbin/collectd, then run -P /usr/local/monitoring/collectd//var/run//collectd.pid -C /usr/local/monitoring/collectd/etc/collectd.conf). When this segfault occurs, type backtrace in the gdb shell and paste the output here. Thanks !

@mfournier mfournier added the Bug label Nov 19, 2014

@mfournier

This comment has been minimized.

Show comment
Hide comment
@mfournier

mfournier Nov 19, 2014

Contributor

Also, to followup your post on the mailing-list, does this problem occur only when you monitor 4 devices together ? Or can you also get collectd to crash when monitoring only the 4th one ?

Contributor

mfournier commented Nov 19, 2014

Also, to followup your post on the mailing-list, does this problem occur only when you monitor 4 devices together ? Or can you also get collectd to crash when monitoring only the 4th one ?

@pyr pyr closed this in 781f635 Nov 19, 2014

pyr added a commit that referenced this issue Nov 19, 2014

Let snmp_synch_response deal with PDU freeing
When reading from tables, upon errors the PDUs sent are already
freed by snmp_synch_response since they are right after
snmp_send is called.

This commit syncs collectd's approach with other occurences of
snmp_synch_response calls.

There might be a few corner cases where we leak PDUs, but it
is unclear how to check for those since we would need to
have an indication that snmp_send was never called, which
as far as I can tell is not possible.

The potential for failure in snmp_send is rather low and will
be easily spotted though, since when crafting invalid PDUs
snmp send will constantly fail and since valid configurations
can never leak memory.

This fixes #804
@bragonznx

This comment has been minimized.

Show comment
Hide comment
@bragonznx

bragonznx Nov 19, 2014

I confirm that : 781f635 fix the bug !

bragonznx commented Nov 19, 2014

I confirm that : 781f635 fix the bug !

pyr added a commit that referenced this issue Nov 19, 2014

Let snmp_synch_response deal with PDU freeing
When reading from tables, upon errors the PDUs sent are already
freed by snmp_synch_response since they are right after
snmp_send is called.

This commit syncs collectd's approach with other occurences of
snmp_synch_response calls.

There might be a few corner cases where we leak PDUs, but it
is unclear how to check for those since we would need to
have an indication that snmp_send was never called, which
as far as I can tell is not possible.

The potential for failure in snmp_send is rather low and will
be easily spotted though, since when crafting invalid PDUs
snmp send will constantly fail and since valid configurations
can never leak memory.

This fixes #804

pyr added a commit that referenced this issue Nov 19, 2014

Avoid reintroducing #610, updates the fix to #804
We might as well mess with avoid freeing the req pointer
only when failures occur, otherwise perform as before
@bragonznx

This comment has been minimized.

Show comment
Hide comment
@bragonznx

bragonznx Nov 19, 2014

I can confirm it works with : 79e90bb too

bragonznx commented Nov 19, 2014

I can confirm it works with : 79e90bb too

mfournier added a commit that referenced this issue Nov 19, 2014

snmp: avoid freeing req under normal operation
Equivalent patch to 79e90bb, to avoid issue #804 introduced while
fixing #610.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment