New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network plugin: Client doesn't retry to resolve server. #627

Closed
yanosz opened this Issue May 30, 2014 · 5 comments

Comments

Projects
None yet
5 participants
@yanosz

yanosz commented May 30, 2014

Version: 4.10
Distribution: OpenWRT

collectd stops transmittag data using the network-plugin, if longer outages occur. (Tested IPv6 only).

  • Eg. no IP-Addresse are available on boot. Connectivity is restored a few hours later -> collectd doesn't start to transmit
  • IP-connectivity is down for a few days -> collectd doesn't start to transmit the moment it's restored
@g3ntleman

This comment has been minimized.

Show comment
Hide comment
@g3ntleman

g3ntleman May 31, 2014

I'm suffering from the same issue.

g3ntleman commented May 31, 2014

I'm suffering from the same issue.

@christf

This comment has been minimized.

Show comment
Hide comment
@christf

christf commented May 31, 2014

me too.

@octo

This comment has been minimized.

Show comment
Hide comment
@octo

octo Sep 6, 2014

Member

#609 has some additional information, namely:

Apr 30 07:30:50 host-c collectd:  network plugin: getaddrinfo (host-d.example.com, (null)) failed: Name or service not known
Apr 30 07:30:50 host-c collectd:  network plugin: network_config_add_server: sockent_open failed.
Apr 30 07:30:50 host-c collectd:  network plugin: getaddrinfo (host-d.example.com, (null)) failed: Name or service not known
Apr 30 07:30:50 host-c collectd:  network plugin: network_config_add_server: sockent_open failed.

This looks like a problem in DNS resolution. AfaIk the resolver will cache negative results, based on the SOA record of your nameserver. This doesn't really explain why it would work when restarting collectd, but it could explain those errors.

Member

octo commented Sep 6, 2014

#609 has some additional information, namely:

Apr 30 07:30:50 host-c collectd:  network plugin: getaddrinfo (host-d.example.com, (null)) failed: Name or service not known
Apr 30 07:30:50 host-c collectd:  network plugin: network_config_add_server: sockent_open failed.
Apr 30 07:30:50 host-c collectd:  network plugin: getaddrinfo (host-d.example.com, (null)) failed: Name or service not known
Apr 30 07:30:50 host-c collectd:  network plugin: network_config_add_server: sockent_open failed.

This looks like a problem in DNS resolution. AfaIk the resolver will cache negative results, based on the SOA record of your nameserver. This doesn't really explain why it would work when restarting collectd, but it could explain those errors.

@octo

This comment has been minimized.

Show comment
Hide comment
@octo

octo Sep 6, 2014

Member

n/m, the problem is that network_config_add_server(), which is called at configure time, tries to open the socket – and fails. Opening the socket should of course be re-tried at runtime, i.e. when calls to write() determine that the opening the socket hasn't been tried for a while.

Member

octo commented Sep 6, 2014

n/m, the problem is that network_config_add_server(), which is called at configure time, tries to open the socket – and fails. Opening the socket should of course be re-tried at runtime, i.e. when calls to write() determine that the opening the socket hasn't been tried for a while.

@octo octo changed the title from network transmissions: Crash on network outages to network plugin: Client doesn't retry to resolve server. Sep 6, 2014

octo added a commit that referenced this issue Sep 6, 2014

network plugin: Improve client connecting behavior.
This moves the socket creation logic so it's called from
networt_send_buffer_plain(). This allows us to recover after network
failures or when collectd was started before the network was available.

Fixes: #627
@octo

This comment has been minimized.

Show comment
Hide comment
@octo

octo Sep 6, 2014

Member

This should fix it. It's currently sitting in collectd-5.3 and will be merged to the collectd-5.4 and master branches. This will not be fixed in 4.* though.

Best regards,
—octo

Member

octo commented Sep 6, 2014

This should fix it. It's currently sitting in collectd-5.3 and will be merged to the collectd-5.4 and master branches. This will not be fixed in 4.* though.

Best regards,
—octo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment