New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c-ares deviates from stock resolver on "http://1346569778" #893

Closed
bagder opened this Issue Jun 23, 2016 · 10 comments

Comments

Projects
None yet
2 participants
@bagder
Member

bagder commented Jun 23, 2016

I did this

"curl http://1346569778"

I expected the following

* Rebuilt URL to: http://1346569778/
*   Trying 80.67.6.50...
* Connected to 1346569778 (80.67.6.50) port 80 (#0)
> GET / HTTP/1.1
> Host: 1346569778
> User-Agent: curl/7.47.0

But if I use curl built with c-ares we instead get this:

* Rebuilt URL to: http://1346569778/
* Added connection 0. The cache now contains 1 members
* Could not resolve: 1346569778 (Domain name not found)
* Closing connection 0

curl/libcurl version

current git master: curl-7_49_1-59-g91697d2: 91697d2

operating system

Tried it on Linux but is most certainly universal

@jay

This comment has been minimized.

Member

jay commented Jun 23, 2016

I can't reproduce this in Windows but I can in Linux.
Windows:

curl 7.50.0-DEV (i686-w64-mingw32) libcurl/7.50.0-DEV mbedTLS/2.2.1 zlib/1.2.8 libidn/1.32 libssh2/1.7.0 nghttp2/1.11.0
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS Debug TrackMemory IDN IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL libz HTTP2

* STATE: INIT => CONNECT handle 0xe68f68; line 1402 (connection #-5000)
* Added connection 0. The cache now contains 1 members
* STATE: CONNECT => WAITRESOLVE handle 0xe68f68; line 1439 (connection #0)
* Could not resolve host: 1346569778
* Closing connection 0
* The cache now contains 0 members
curl: (6) Could not resolve host: 1346569778

Linux:

curl 7.49.1 (x86_64-pc-linux-gnu) libcurl/7.49.1 OpenSSL/1.0.2h zlib/1.2.8 libidn/1.28 nghttp2/1.11.1 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp 
Features: IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets 

*   Trying 80.67.6.50...
* Connected to 1346569778 (80.67.6.50) port 80 (#0)
@jay

This comment has been minimized.

Member

jay commented Jun 23, 2016

hm are you saying integer format is or is not expected to work? what is your curl -V

@jay

This comment has been minimized.

Member

jay commented Jun 23, 2016

I compiled both as synch and the only difference I see is in getaddrinfo. Both call Curl_getaddrinfo_ex which calls getaddrinfo. In Windows it returns error WSAHOST_NOT_FOUND (EAI_NONAME), but in Linux it returns success and addr receives the IP. In Windows if I add ai_flags AI_NUMERICHOST then it will work.

@bagder

This comment has been minimized.

Member

bagder commented Jun 23, 2016

I was triggered by this Firefox bug which points to the infamous WHATWG spec that says this URL format is legitimate and should get "canonicalized" to the normal numerical format and then used. The procedure is important as it would change what to send in the Host: header for a request to the target server.

An interesting variation on the same style is: curl http://0x50.0x43.6.0x32 which also works on Linux with the stock resolver because getaddrinfo thinks its fine.

@jay

This comment has been minimized.

Member

jay commented Jun 23, 2016

handling it early would probably be best if we want the effective url to have the x.x.x.x ip address. here is some scratch i put in parseurlandfill to get it working:

diff --git a/lib/url.c b/lib/url.c
index 3f0bde2..83cb6b9 100644
--- a/lib/url.c
+++ b/lib/url.c
@@ -4390,6 +4390,45 @@ static CURLcode parseurlandfillconn(struct SessionHandle
       }
     }
   }
+  else {
+    char *p;
+    bool isneg = false;
+    bool isintstr = true;
+    unsigned ip = 0;
+    for(p = conn->host.name; *p; ++p) {
+      unsigned n;
+      unsigned max = isneg ? (unsigned)INT_MIN : UINT_MAX;
+      if(p == conn->host.name && *p == '-') {
+        isneg = true;
+        continue;
+      }
+      if(p - conn->host.name > 10 || *p < '0' || *p > '9') {
+        isintstr = false;
+        break;
+      }
+      n = *p - 48;
+      if(ip > max / 10 || (ip == max / 10 && n > max % 10)) {
+        isintstr = false;
+        break;
+      }
+      ip *= 10;
+      ip += n;
+    }
+    if(isintstr) {
+      int x[4];
+      if(isneg)
+        ip = (unsigned)-(int)ip;
+      x[3] = ip % 256;
+      ip /= 256;
+      x[2] = ip % 256;
+      ip /= 256;
+      x[1] = ip % 256;
+      ip /= 256;
+      x[0] = ip % 256;
+      ip /= 256;
+      snprintf(conn->host.name, 16, "%d.%d.%d.%d", x[0], x[1], x[2], x[3]);
+    }
+  }

   if(data->set.scope_id)
     /* Override any scope that was set above.  */

issues:
i didn't test negative
ipv4 as ipv6
didn't rebuild url so that effective url shows ip
doesn't handle 0x50.0x43.6.0x32 shenanigans
probably other things

> GET / HTTP/1.1
> Host: 80.67.6.50
> User-Agent: curl/7.50.0-DEV
> Accept: */*
@bagder

This comment has been minimized.

Member

bagder commented Jun 24, 2016

i didn't test negative

I don't think negative works anywhere.

doesn't handle 0x50.0x43.6.0x32

I think that's just as well!

snprintf(conn->host.name, 16, "%

Can we be certain that the target buffer is always allocated that big already? For a command line likecurl 9 for example... (which admittedly is a funny IP address, but still...)

@jay

This comment has been minimized.

Member

jay commented Jun 25, 2016

yes, I'm sure you're right so how about this:

diff --git a/lib/url.c b/lib/url.c
index 3f0bde2..f789aaf 100644
--- a/lib/url.c
+++ b/lib/url.c
@@ -4390,6 +4390,27 @@ static CURLcode parseurlandfillconn(struct SessionHandle
       }
     }
   }
+  /* If we have an address like 1346569778 convert that to IPv4 x.x.x.x */
+  else {
+    char *p;
+    unsigned ip = 0;
+    for(p = conn->host.name; *p; ++p) {
+      unsigned n = *p - 48;
+      const unsigned max = 4294967295u;
+      if(*p < '0' || *p > '9' || ip > max / 10 ||
+         (ip == max / 10 && n > max % 10))
+        break;
+      ip = (ip * 10) + n;
+    }
+    if(p != conn->host.name && !*p) {
+      free(conn->host.name);
+      conn->host.name = aprintf("%d.%d.%d.%d", ip / 16777216,
+                                (ip / 65536) % 256, (ip / 256) % 256,
+                                ip % 256);
+      if(!conn->host.name)
+        return CURLE_OUT_OF_MEMORY;
+    }
+  }

   if(data->set.scope_id)
     /* Override any scope that was set above.  */

issues:
strtol may be cleaner with some check
host.name iirc is actually alloc'd somewhere else so I don't think I can just free(conn->host.name) without leaving something dangling

@bagder

This comment has been minimized.

Member

bagder commented Jun 27, 2016

strtol may be cleaner with some check

Maybe, but then again long is 64 bit on many platforms these days and I think we should leave any number using more than 32bits alone and pass it to the name resolver. So it would need some additional funny checks that may very well not make using strtol() worth it...

host.name iirc is actually alloc'd somewhere else

Yeah. the allocation is held in 'rawalloc'. See this logic

@bagder

This comment has been minimized.

Member

bagder commented Jun 27, 2016

I'm trying to parse that section of the WHATWG spec, and if I understand it correctly it seems to imply that also "1234.1234" should be considered a valid IPv4 address. That spec is horribly hard to read and parse at a glance, but step 14 and 15 does not limit the math to just one numerical part but can in fact be one, two or three. Perhaps. At the same time step 8 and 9 seem to contradict that so I am certainly not sure!

@jay

This comment has been minimized.

Member

jay commented Jun 28, 2016

IPV4 parser 6.2 says n is the result of parsing. And number algorithm parsing is in IPv4 number parser section (it's immediately above current). In IPv4 number parser the last step of the number parser is "Return the mathematical integer value that is represented by input in radix-R notation". So if you have 65535 you return that. Then back at step 8 of IPV4 parser it says "If any item in numbers is greater than 255, syntax violation".

In any case what I wrote isn't compatible with all that since it only does base 10 with no separators. Also the effective url should maybe be changed, probably by parsing earlier and repairing the url in that case.

@bagder bagder closed this in c443a8c Oct 27, 2016

@lock lock bot locked as resolved and limited conversation to collaborators May 7, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.