New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please document syntax and semantics of NO_PROXY environment variable. #1208

Closed
piotr-dobrogost opened this Issue Jan 13, 2017 · 11 comments

Comments

Projects
None yet
4 participants
@piotr-dobrogost

piotr-dobrogost commented Jan 13, 2017

Curl's man page about NO_PROXY:

list of host names that shouldn't go through any proxy. If set to a asterisk '*' only, it matches all hosts.

Curl's manual about NO_PROXY:

If the host name matches one of these strings, or the host is within the domain of one of these strings, transactions with that node will not be proxied.

Description in the manual lacks information about asterisk and description in the man page should use phrasing from the manual which is more specific than the general list of host names phrase used.

In addition, even description from the manual is not specific enough. From https://curl.haxx.se/mail/archive-2007-11/0043.html :

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Wed, 14 Nov 2007 23:33:52 +0100 (CET)
On Wed, 14 Nov 2007, Peter Love wrote:

Is there an RFC which describes NO_PROXY usage?

Not that I'm aware of. When I implemented it (many years ago), I just checked
how other tools were documented to treat it and cloned that behavior.

It seems that behavior was cloned but not the documentation which is very unfortunate taking into consideration that curl's behavior is very often the informal specification of many aspects of dealing with HTTP.

The documentation should answer at least the following questions:

  • Should values be fully qualified domain names (FQDN) to match exact domain? If one wants to match example.com should the value be example.com or example.com. (trailing dot) or either is allowed?
  • How to specify subdomains, with the dot at the beginning, with asterisk and the dot or nothing is required (subdomains are matched by default)?
  • Do subdomains on any level match; .example.com match foo.example.com and foo.bar.example.com and so forth?
  • Is it possible to match any subdomain without matching parent domain; to match any *.example.com but not example.com?
@bagder

This comment has been minimized.

Member

bagder commented Jan 13, 2017

@eramoto just worked on some "no proxy" fixes in #1140 that was just landed. Can can you come up with a suggested documentation change together so that those questions get their answers?

@eramoto

This comment has been minimized.

Contributor

eramoto commented Jan 16, 2017

The current curl (7.52.2) is as follows:

Should values be fully qualified domain names (FQDN) to match exact domain?
If one wants to match example.com should the value be example.com or example.com. (trailing dot) or either is allowed?

The value should be example.com if it wants to match example.com

How to specify subdomains, with the dot at the beginning, with asterisk and the dot or nothing is required (subdomains are matched by default)?

The value can specify subdomains in presence or absence of the dot (e.g. .example.com or example.com)
but can not specify regular expression (e.g. *.example.com).

Do subdomains on any level match; .example.com match foo.example.com and foo.bar.example.com and so forth?

The value matches subdomains on any level (e.g. foo.example.com and foo.bar.example.com etc.) if it is the same as parent domain (e.g. example.com).

Is it possible to match any subdomain without matching parent domain; to match any *.example.com but not example.com?

If the value specify subdomains and parent domain (e.g. somewhere.example.com),
it matches the specified domain (and child domain; e.g. somewhere.example.com and any *.somewhere.example.com)
and does not match parent domain (e.g. example.com).
For example, the following command accesses somewhere.example.com directly.
NO_PROXY=somewhere.example.com curl -x http://proxy.example.com http://somewhere.example.com
The following command accesses example.com through proxy.
NO_PROXY=somewhere.example.com curl -x http://proxy.example.com http://example.com

I suggest that add the following description to documentations.

If do fully backward match with a accessed host, access the host directly.
If not do fully backward match, access the host through proxy.
* If want to directly access a host with trailing dot (e.g. example.com.),
  it should have trailing dot to match.
* If want to directly access a host without trailing dot (e.g. example.com), 
  it should not have trailing dot to match.
* If want to match subdomains on any level (e.g. foo.example.com and for.bar.example.com etc.), 
  it should specify the parent domain (e.g. example.com).
* If want to match any subdomain without matching parent domain (e.g. the parent domain is example.com),
  it should specify the domain including subdomain (e.g. somewhere.example.com).

The behaviors are same whether subdomains have the dot at the beginning because take no notice of it.
* If a accessed host is somewhere.example.com, .example.com and example.com match.

Note that NO_PROXY can not specify regular expression (e.g. asterisk in *.example.com).

(I apologize for my poor English.)

@piotr-dobrogost

This comment has been minimized.

piotr-dobrogost commented Jan 16, 2017

@eramoto Thank you for answering my questions.

Another question is how to specify ip (v4, v6) addresses? I guess matching hosts/ip addresses is something also used by DNS resolvers and HTTP servers. I wonder if what curl does is in line with other software? This just begs to be included in some RFC...

@eramoto

This comment has been minimized.

Contributor

eramoto commented Jan 18, 2017

Another question is how to specify ip (v4, v6) addresses?

If a direct accessed host has a ip address (e.g. http://192.168.100.2/foobar), you should specify the ip address (e.g. 192.168.100.2) and you can not specify a subnet mask etc.(e.g. 192.168.100.0/24). Because not do DNS lookup and treat ip address as domain when test for a match.

I wonder if what curl does is in line with other software?

I hardly know anything about other software do.
Do you know about it. @bagder

@piotr-dobrogost

This comment has been minimized.

piotr-dobrogost commented Jan 18, 2017

treat ip address as domain when test for a match

I guess with the difference that domain is matched from the right end and the ip address from the left end, right?
So

  • 192.168 will match any ip address of the form 192.168.*.*
  • specifying 192.168. (trailing dot) is the same as specifying 192.168
@eramoto

This comment has been minimized.

Contributor

eramoto commented Jan 19, 2017

domain is matched from the right end and the ip address is matched from the left end

I think that the rightful behavior is what you said.
But unfortunately, matching process of NO_PROXY can not do what you said.
Now,

  • 192.168 match any ip address of the form *.*.192.168
  • specifying 192.168. (trailing dot) is the same as specifying 192.168. (trailing dot)
    (It is the same as the following previous answer.)
  • If want to directly access a host with trailing dot (e.g. example.com.),
    it should have trailing dot to match.

I try to fix it (and any documentation).

@piotr-dobrogost

This comment has been minimized.

piotr-dobrogost commented Jan 19, 2017

192.168 match any ip address of the form *.*.192.168

I highly doubt above is true as matching ip addresses this way makes no sense.

specifying 192.168. (trailing dot) is the same as specifying 192.168. (trailing dot)

I guess there's some error above as this is the same string (192.168.) mentioned twice.

@piotr-dobrogost

This comment has been minimized.

piotr-dobrogost commented Jan 20, 2017

There's https://github.com/libproxy/libproxy library which aims to unify getting proxy configuration and which is used by growing number of apps. I wanted to compare how they treat NO_PROXY environment variable but it seems there's no documentation for this library – see libproxy/libproxy/issues/52
However, looking at repo it's quite clear the relevant code is in the following files:

Looking at the second file above libproxy would match foo.example.com against NO_PROXY=.example.com (dot at the beginning) but it would not match it against NO_PROXY=example.com which curl matches if I understand correctly.
Looking at the third file above we see that libproxy supports special string <local> in NO_PROXY which matches any single-label domain name. I think it's a cute feature :) Does curl support something like this?

There's libproxy-bin Fedora package with a helper binary called proxy which given url as argument shows what proxy, if any, should be used to connect with specified address.

Ideally libcurl would use libproxy and we would have uniform way to get proxy configuration... If not then maybe both projects could at least treat NO_PROXY the same way?

@bagder

This comment has been minimized.

Member

bagder commented Jan 21, 2017

See #977 for a libproxy pull request. I have my doubts about the library quality though, and I'm not sure it is a responsible action for us to add support for it without it being taken care of better.

@piotr-dobrogost

This comment has been minimized.

piotr-dobrogost commented Jan 25, 2017

Thank you for pointing out issue with libproxy support.

@quite

This comment has been minimized.

quite commented Sep 8, 2017

@piotr-dobrogost commented on Jan 19:

192.168 match any ip address of the form ..192.168

highly doubt above is true as matching ip addresses this way makes no sense.

I think there's confusion here. It seems to me like the code does not
do anything special with IP addresses; it doesn't know the concept.
It just matches the same way as for FQDNs. Thus, 192.168. would match some.freaky.domain.192.168..

@lock lock bot locked as resolved and limited conversation to collaborators May 6, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.