New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on `Network is unreachable` (7) #1603

Closed
drcrallen opened this Issue Jun 21, 2017 · 15 comments

Comments

Projects
None yet
6 participants
@drcrallen
Copy link

drcrallen commented Jun 21, 2017

In one of our systemd units that depends on network.target, we had an odd scenario where the network bounced a bit on startup. This happened to have a down period when right when a curl command tried to fetch over http. The command failed with Immediate connect fail for 169.254.169.254: Network is unreachable and did NOT retry despite having retry flags set (--max-time 10 --retry 5 --retry-delay 10).

As far as I can tell this is because the retry mechanics at https://github.com/curl/curl/blob/curl-7_54_0/src/tool_operate.c#L1573 do not check for CURLE_COULDNT_CONNECT(7), even though the network being unreachable is often transient or temporary.

This is very easily reproducible by being on a laptop and disabling wireless (assuming no wired connection) and attempting a curl command with retry. Then enabling the wireless and trying again.

I did this

Executed curl with retry flags set but with no network active. For example, turn off wireless on your laptop and do

$ /usr/local/Cellar/curl/7.54.0/bin/curl -v 8.8.8.8 --max-time 60 --retry 5 --retry-delay 10 --retry-max-time 30 --retry-connrefused
* Rebuilt URL to: 8.8.8.8/
*   Trying 8.8.8.8...
* TCP_NODELAY set
* Immediate connect fail for 8.8.8.8: Network is unreachable
* Closing connection 0
curl: (7) Couldn't connect to server

I expected the following

Retry on errors

curl/libcurl version

curl 7.54.0 (x86_64-cros-linux-gnu) libcurl/7.54.0 OpenSSL/1.0.2k zlib/1.2.11 c-ares/1.12.0
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile NTLM SSL libz TLS-SRP UnixSockets HTTPS-proxy

operating system

Linux 4.9 (CoreOS)

@drcrallen

This comment has been minimized.

Copy link

drcrallen commented Jun 21, 2017

wait, there's an explicit flag, nevermind

@drcrallen drcrallen closed this Jun 21, 2017

@drcrallen

This comment has been minimized.

Copy link

drcrallen commented Jun 21, 2017

Actually, I take that back, https://github.com/curl/curl/blob/curl-7_54_0/src/tool_operate.c#L1583 seems to only consider connection refused, and not consider the network being unavailable. So I'm not sure the flag will work.

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Jun 21, 2017

I don't consider most instances of CURLE_COULDNT_CONNECT to be transient. Usually that happens when the server is down, you're using the wrong port number or the wrong IP address or similar. In such cases a retry is pointless.

But I could probably agree to having retry-connrefused also consider "couldn't connect" as a form of refused, and thus make retry even for that condition if this option is used.

@drcrallen

This comment has been minimized.

Copy link

drcrallen commented Jun 22, 2017

Thanks @bagder

Looking at http://www-numi.fnal.gov/offline_software/srt_public_context/WebDocs/Errors/unix_system_errors.html would the following make sense to add in addition to the already present ECONNREFUSED?

  • ENETDOWN
  • ENETUNREACH
  • ENETRESET

at https://github.com/curl/curl/blob/curl-7_54_0/src/tool_operate.c#L1583

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Jun 22, 2017

@gnawhleinad, do you have any opinions on this, as you brought the --retry-connrefused option in the first place?

@gnawhleinad

This comment has been minimized.

Copy link
Contributor

gnawhleinad commented Jun 22, 2017

would the following make sense to add in addition to the already present ECONNREFUSED?

  • ENETDOWN
  • ENETUNREACH
  • ENETRESET

do you have any opinions on this, as you brought the --retry-connrefused option in the first place?

@bagder, @drcrallen, how about, instead, making it (more) consistent with wget's implementation (in src/connect.c#L652-L661) and add ENETUNREACH and EHOSTUNREACH?

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Jun 24, 2017

I don't think consistency with wget is important in this regard as I really don't users expect our options to work identically. What's important is what we consider "connection refused" might be. What network errors can we consider being transient and suitable for retry when this option is used.

So @gnawhleinad suggest adding two errnos to be considered for "retry for connection refused", out of which one overlaps with @drcrallen's three new suggested errnos... Not sure where this puts us...

@jay

This comment has been minimized.

Copy link
Member

jay commented Jun 25, 2017

What's important is what we consider "connection refused" might be. What network errors can we consider being transient and suitable for retry when this option is used.

transient is open to interpretation. netreset could be transient but it doesn't strike me as similar to a connrefused type situation since the transfer may have already started. i think the rest could be considered connrefused

@gnawhleinad

This comment has been minimized.

Copy link
Contributor

gnawhleinad commented Jun 27, 2017

I think we had this conversation/idea but we could (additionally) introduce a separate flag (e.g. --retry-oserrno) to accept an arbitrary set of oserrno to be considered transient.

In any case, I'm more than happy to write a pull-request in any direction(s). Let me know how I can help!

@drcrallen

This comment has been minimized.

Copy link

drcrallen commented Jul 21, 2017

Network unreachable solves my immediate problem. The other two were suggestive. Adopting the criteria what uses seems like a reasonable step and we can see if anyone else complains about the lack of the other error codes.

Oserrno sounds like a good catch-all, but is not very portable. This means the suggested use should probably be "only use until there is community consensus on how to handle this kind of error." Does that sound accurate?

@bagder bagder added the help wanted label Jul 29, 2017

@wkornewald

This comment has been minimized.

Copy link

wkornewald commented Aug 21, 2017

I've hit this issue indirectly when accessing localhost for a service that only listens on IPv4 127.0.0.1. In this case, curl also tries ::1 which is unreachable and just fails:

/ # curl --retry 15 --retry-connrefused --connect-timeout 4 --retry-delay 1 -v http://localhost
* Rebuilt URL to: http://localhost/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* connect to 127.0.0.1 port 80 failed: Connection refused
*   Trying ::1...
* TCP_NODELAY set
* Immediate connect fail for ::1: Network unreachable
*   Trying ::1...
* TCP_NODELAY set
* Immediate connect fail for ::1: Network unreachable
* Failed to connect to localhost port 80: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 80: Connection refused
@pvgoran

This comment has been minimized.

Copy link

pvgoran commented Dec 20, 2017

I don't think consistency with wget is important in this regard as I really don't users expect our options to work identically.

Well, I kind of expected that curl's --retry-connrefused would do the same thing. Then I discovered that curl doesn't retry on "No route to host", and I was like "damn, curl can't do this... I'll have to stick with wget which I like less".

So, I don't say that curl needs to implement this option in the same way as wget does, but it would be nice to have some option that would allow retrying when the host is unreachable or when the network is down overall. Maybe it could be --retry-noroute which would handle ENETDOWN, ENETUNREACH and EHOSTUNREACH, maybe these error codes could be handled by --retry-connrefused...

@bagder bagder closed this in ccd1ec7 Feb 15, 2018

@pvgoran

This comment has been minimized.

Copy link

pvgoran commented Feb 15, 2018

How comes ccd1ec7 closes this issue?.. The functionality is still not there, it's just added to the TODO, right?

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Feb 15, 2018

That is what we do with issues that don't seem to be going anywhere. Explained here: https://curl.haxx.se/docs/bugs.html#Lack_of_time_interest

If anyone wants to work on this, we can reopen this issue again or create a new!

@pvgoran

This comment has been minimized.

Copy link

pvgoran commented Feb 16, 2018

Thanks for the explanation.

@lock lock bot locked as resolved and limited conversation to collaborators May 17, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.