Add `--proto-default` option #351

Closed
wants to merge 1 commit into
from

Projects

None yet

3 participants

@waisbrot

This modifies the protocol-guessing algorithm to respect
the available protocols. For example:

  • curl --proto -ftp ftp.gnu.org does not consider FTP as the protocol (instead it will try HTTP)
  • curl --proto '=https' www.gnu.org will try HTTPS

The impetus here is that I want to be able to get an HTTPS connection when I have a scheme-less URL.

I'm not sure if --proto was the right option to go after here, but like that this stops curl from selecting a disallowed protocol and then complaining about it.

@jay
Member
jay commented Jul 20, 2015

This might be short sighted of me but I can't see how anything would break doing this.

I modified your patch slightly, draft here. It doesn't try as hard. The reason being the current behavior is to continue to parse and fill regardless of if the http protocol is allowed and I don't know the implications of changing that behavior. Also some style changes to conform.

@waisbrot

Thanks, @jay . I had added the final guessing loop in because it seemed confusing that --proto '=ftp' google.com wouldn't work analagously to --proto '=https' google.com. That was not wholly satisfying as a solution, though, because the command-line tool's docs would then imply that curl --proto '-all,imap,ftp' foo.com would get you IMAP, when in fact you'd get FTP.

Anyway, I think your modification is good, and I too believe that this would be a benign patch.

@bagder
Member
bagder commented Jul 21, 2015

I think there's a problem with using --proto for this (new) functionality:

What if you just want the default guess to be HTTPS if no proto is given, but still allow HTTP if the protocol is explicitly given? Seems perfectly reasonable to me...

@jay
Member
jay commented Jul 21, 2015

@bagder what are you suggesting, a separate option?

     --proto-default PROTOCOL  Default to PROTOCOL if URL has no protocol
@waisbrot

Another implementation I'd thought about was to use the --ssl flag. If you pass --ssl then it tries HTTPS when given no schema. But this is a little confusing, too, because --ssl currently means to try SSL and fall back to unencrypted (which wouldn't work here). And --ssl-reqd would be the same as --ssl --proto -http, I suppose.

Piggy-backing on an existing flag is attractive because it's less code and fewer semi-overlapping options for a user to decipher, but the new option @jay's describing sounds reasonable.

@bagder
Member
bagder commented Jul 22, 2015

Using --ssl could possibly work, but then it'd be restricted to only SSL and you couldn't tell curl to use another protocol when none is specified. Also, since --ssl is an existing option that only is used for a certain set of protocols today, there may be existing applications that are using it unconditionally and that then might start to fail if we'd use it for non-protocol URLs.

I think I prefer a new option simply because it has the least risk of destroying any existing apps and it offers the biggest flexibility.

@waisbrot waisbrot changed the title from Use HTTPS when --proto '=https' is specified to Add `--proto-default` option Jul 30, 2015
@waisbrot

I've completely overhauled this PR (I hope it's OK that I'm continuing in this thread). Now it adds a --proto-default option to the command line tool (CURLOPT_DEFAULT_PROTOCOL in the library) which expects a valid protocol name. When given, curl always attempts to use that protocol for unspecified URLs.

The patch is much bigger. If there's anything I can do to make it easier to review, please let me know.

Some examples:

  • --proto-default https example.com -> https://example.com
  • ftp.example.com -> ftp://example.com
  • --proto-default http ftp.example.com -> http://ftp.example.com
  • --proto-default https http://example.com -> http://example.com
@waisbrot

The test failure is because symbols-in-versions isn't in sync. I don't know what the correct change to make is there.

@jay
Member
jay commented Jul 31, 2015

Thanks for your work on this. Don't worry about the symbols thing I'll get it once we figure out if it's going in and what version. I'll do a review soon, but I think we're past the feature window for 7.44 (@bagder ??). I just glanced but I can tell you this line below will have to go, i'm guessing it was for debug:

fprintf(stderr, "protop=%s\n", protop);
Nathaniel Waisbrot Add the `--proto-default` option
When `--proto-default <protocol>` is passed, any URLs without a
protocol specification will attempt to use the supplied default.

When the option is not used, curl will follow its usual plan of
guessing from the hostname and falling back to 'http'.
3d7084f
@bagder
Member
bagder commented Aug 1, 2015

Yes, feature window is closed until the coming release, due to ship on Aug 12

@jay jay self-assigned this Aug 14, 2015
@jay
Member
jay commented Aug 17, 2015

I took a look at Nathaniel's patch today and I'm wondering if it makes more sense for me to implement something similar using the same syntax as --proto and --proto-redir. I'm concerned someone may confuse the syntax as they use the different options.

I wrote up a doc to see how it would look, but it's complicated in comparison. I'm not sure if the benefits of having the same syntax and being able to still allow the best guess based on subdomain are worth it. As far as the "popular order" thing goes I couldn't think of a better way to handle that.

.IP "--proto-default <protocols>"
Tells curl to use the listed protocols when making a best guess for any URL
missing a scheme name. See --proto for how protocols are represented.

Normally a protocol best guess is based on the name of a host's outermost
subdomain. If it is named ftp, dict, ldap, imap, smtp or pop3 then curl's best
guess would be that protocol. In any other case curl's best guess is http. You
can use this option to modify that behavior.

If you use this option and there is no outermost subdomain match to one of the
aforementioned protocol names then curl chooses the first enabled default
protocol in popular order, with the exception that http will always be first
unless it's disabled as a default protocol. Currently the popular order is:

http, https, ftp, sftp, ftps, smtp, scp, file, smtps, imap, tftp, ldap, imaps,
ldaps, telnet, pop3, rtmp, pop3s, rtsp, gopher, dict

Typically though this option would be used to restrict the best guess to a
single protocol such as https.

Examples:

.RS
.IP "--proto-default =https ftp.mozilla.org"
curl would use https://ftp.mozilla.org because only https is enabled as a
default protocol.

.IP "--proto-default -all,https,ftp ftp.mozilla.org"
curl would use ftp://ftp.mozilla.org because the outermost subdomain matches
ftp which is enabled as a default protocol.

.IP "--proto-default -all,https,http ftp.mozilla.org"
curl would use http://ftp.mozilla.org because http is the first enabled default
protocol in popular order.
.RE

The default protocols set by this option do not override the allowed protocols
set by \fB--proto\fP and \fB--proto-redir\fP. It's possible to default to a
protocol which is disallowed and that will cause an error.

(Added in 7.45.0)

If there's not a lot of interest in this I'll continue my review of his patch.

@waisbrot

I don't like how the "popular order" mechanic works, because it has confusing effects. For example, there's no way to handle Badger's original scenario of defaulting to HTTPS but allowing explicitly-specified HTTP URLs.

If you allowed changes to the popular order, that'd be cool, but then it wouldn't match --proto. But if you allowed changes to the popular order, then you could modify --proto and handle this without adding an option: --proto =https,+all means HTTPS is the default and all protocols are allowed (so ftp.example.com will use HTTPS). --proto -http,+http moves HTTP to the end of the line (so ftp.example.com will use FTP, but example.com will use HTTPS).

@jay
Member
jay commented Aug 17, 2015

Actually it allows for that. You could default to https by doing --proto-default =https. So --proto-default =https http://foo.com would be http://foo.com and --proto-default =https foo.com would be https://foo.com. Any misinterpretation of this is I'm sure my fault, what I wrote was a just draft.

@waisbrot

Yes, I see you're right. So setting = one protocol would be mostly equivalent to what I've got, but you could also set =ftp,+ldap and get the domain-matching behavior for those two (but also this would use ftp for everything it didn't guess). Is --proto-default =all the same as the currently released behavior?

With your proposed change, would curl --proto -http www.google.com get an HTTPS connection, because https is now the first enabled protocol (and there's not domain-match)?

@jay
Member
jay commented Aug 18, 2015

Yes --proto-default -http www.google.com (I assume you meant that and not --proto) would get an HTTPS connection, but nevermind because it's too complicated, I think. Nothing lost though. I'll finish a review of your patch tomorrow, it looks like an easier way to address the issue.

@waisbrot

No, I actually meant --proto based on

curl chooses the first enabled default protocol in popular order, with the exception that http will always be first unless it's disabled as a default protocol

That would produce a similar effect to my original PR, but with more flexibility.

@jay
Member
jay commented Aug 19, 2015

I did a review of your patch this evening, you can see it at jay@a45724e

I still have to add a test and do some testing. You'll notice I refer in the docs to the protocol as a scheme as well because it's more correct. Although in the codebase protocol seems to be popular. Also, I have a reservation about using the protocol as a string. I wonder if it would be better to use the CURLPROTO flags instead of a string so the user can do something like
curl_easy_setopt(curl, CURLOPT_DEFAULT_PROTOCOL, CURLPROTO_HTTPS);
instead of
curl_easy_setopt(curl, CURLOPT_DEFAULT_PROTOCOL, "https");
the former being consistent with our other protocol options. I don't know which is less likely to cause an error. Using the flags would be a problem if the user combines them like with the other two options where that's permissible. anyone have an opinion on this

@bagder
Member
bagder commented Aug 19, 2015

My personal preference would be to leave it as a string to make it more obvious to users that it really is a single scheme you'd specify.

@waisbrot

All of your changes look good to me @jay .

@jay
Member
jay commented Aug 23, 2015

The final review changeset can be found at jay@03e5a5a. I added two tests and changed some more documentation. I squashed both changesets into yours and it landed in 9756d1d. Thanks!

@jay jay closed this Aug 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment