Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-special relative handling differs from whatwg-url #6

Closed
TimothyGu opened this issue Jun 6, 2021 · 13 comments
Closed

Non-special relative handling differs from whatwg-url #6

TimothyGu opened this issue Jun 6, 2021 · 13 comments

Comments

@TimothyGu
Copy link

Given non-special relative URL abc:rootless resolved against abc://host/path, WHATWG uses the input string verbatim. But spec-url tries to actually resolve it:

image

See scheme state step 2.9. It commences the relative URL resolution process having already seen a scheme, but is only taken if it's a special URL.

Same issue happens with abc:/rooted resolved against abc://host/path:

image

@TimothyGu
Copy link
Author

A fix appears to be changing the definition of ~goto from:

The non-strict goto operation (url1 ~goto url2) is defined to be (url1 goto url2') where url2' is url2 with its scheme token removed if it case-insensitively compares equal to the scheme token of url1, or url2 otherwise.

to:

The non-strict goto operation (url1 ~goto url2) is defined to be (url1 goto url2') where url2' is url2 with its scheme token removed if it case-insensitively compares equal to the scheme token of url1 and if url2 is a web-URL or file-URL, or url2 otherwise.

@alwinb
Copy link
Owner

alwinb commented Jun 6, 2021

Thank you! Yes, you are right. Apparently there are no tests for that case in the WPT test suite.
I will make a change. Fortunately no large changes are needed, it seems, but I'll investigate.

@alwinb
Copy link
Owner

alwinb commented Jun 6, 2021

Hmm I wonder if there is a bit more to say about this. Chrome has the same behaviour, thus disagrees with the WHATWG Standard, whilst Firefox and Safari agree with it. Related issue: whatwg/url#385.

Maybe it is more clean to use the strict goto for generic URLs rather than modify the nonstrict goto. I'll need to have a better look at it.

@alwinb
Copy link
Owner

alwinb commented Jun 6, 2021

Alright. It seems that indeed, using the strict goto for generic URLs solves the problem.
I've exposed the strictness as an option for the resolve operations and updated the WHATWGParseResolve function to agree with the spec.

Let me know if this works for you.

I am thinking about removing forceResolve (because the forcing itself is non-strict behaviour, so passing a 'strict' argument may be confusing) and replace it with a WHATWGResolve method that picks the right strictness option (for the goto) depending on its arguments.

@TimothyGu
Copy link
Author

I agree it's a little confusing to have a "strict" argument in forceResolve, when it only affects goto.

Why can't we make the ~goto operation a bit more intelligent, and to do nothing for generic-URLs, as I described in #6 (comment)? In other words, what are the considerations for threading a parsing mode through all the operations (forced resolution, resolution, pre-resolution) versus just detecting the parsing mode from url2 in non-strict goto?

@alwinb
Copy link
Owner

alwinb commented Jun 7, 2021

Hm I can see that threading through a strictness argument isn't necessarily so elegant. But since you asked, I started thinking about it and then I found out how many implicit guidelines I have been trying to follow.

  • First, RFC3986 already makes the distinction between a non-strict and a strict 'merge', and resolve. (I renamed the merge ops to 'goto', to stress that they're not commutative). So they are existing concepts. Just to be sure by the way, the 'strict' option here is not a parser mode! Its just a way to present two separate but related binary operations on parsed URL structures via an additional boolean argument.
  • I want to maintain that distinction, to honor the RFC but mostly because I think it is a useful conceptual difference. I think that if people are able to identify them as distinct operations, then they'll understand 'unusual' URL behaviour better as well. Such as that of web browsers!
  • In fact it helped me to finally understand the WHATWG behaviour, and that it can be characterised by really surprisingly few changes to the RFCs. Which in turn recovers a lot of good things from the RFCs that have been lost.

But there's a general design strategy behind it. There is so much to learn from mathematics when it comes to API design. It's not even much of a stretch to say that mathematics is all about API design. Now, URLs aren't so interesting mathematically, but they're not too bad either. For example,

  • The set of all URLs together with the strict goto operation is a monoid (well, almost, spot the exception!)
  • Likewise with the non-strict version
  • Normalisation induces an equivalence relation on the set of all URLs that is congruent with those monoid laws. It is a nice (very simple) logic. The RFC even specifies a hierarchy of such equivalence classes. That's mathematically quite neat and it makes me a bit happy.
  • Resolution however, if considered as a binary operation on all URLs, is a partial function (it may throw), which makes it not as clean mathematically. You can make it tidy I guess by limiting its domain, but I haven't bothered.
  • The force operation breaks the congruence. That's a bit sad.

Trying to stay aware of, or uncover the maths, usually leads to APIs that have properties that are very good for software. And in general the users don't have to know the math, to still benefit from that.

Alright... that's what I'm trying to do anyway. :)

But concretely, yeah, I don't care much about two pre-resolve variants being exposed. And yeah, having a separate strict- and non-strict force-resolve, I think is just too much. I just haven't figured out a more ideal way to do it yet. I have to change force-resolve anyway to collect validation errors, which is on the back of my mind as well. So there'll be more tweaks at some point probably.

@alwinb
Copy link
Owner

alwinb commented Jun 13, 2021

I think I can close this issue.
If you run into things, don't hesitate to open a new issue (or reopen this, but I don't know if that option is available).

Thank you!

@alwinb alwinb closed this as completed Jun 13, 2021
@alwinb
Copy link
Owner

alwinb commented Jun 13, 2021

As an aside, I'm playing with the idea to set up a Wiki to collect questions/ discussions and closed issues that contain useful information. I don't know if that is a good idea or not, or if it will fo anywhere, but we'll see.

@TimothyGu
Copy link
Author

It might be useful to provide some information on general design, such as the rationale of the force operation, etc. The wiki could be a good way of conveying that information.

@alwinb
Copy link
Owner

alwinb commented Jun 16, 2021

There is a little bit of text about that in my specification. It is in the Parsing section, under the subtitle "A note about repeated slashes". There's also an old comment of mine about it here (which however isn't entirely up to date with how things are handled now).

@alwinb alwinb reopened this Jun 23, 2021
@alwinb
Copy link
Owner

alwinb commented Jun 23, 2021

Meh. I don't like my solution, yours is clearly nicer. I will look at it again.

@alwinb
Copy link
Owner

alwinb commented Jul 3, 2021

I did an update to my characterisation; in 0.7.0 I've removed the non-strict goto, and changed the definition of a base-URL.

The force operation can now be said to coerce file- and web-URLs to base URLs.

Then I specified three resolution operations: strict and non-strict (I think they match RFC3986) and forced-resolution, which characterises WHATWG resolution (modulo normalisation).

Still not ideal, but better, I think.

@alwinb
Copy link
Owner

alwinb commented Oct 7, 2021

@TimothyGu, if you have a some time, would you have ideas or suggestions as to how I can explain the design in a better, and accessible way? I know this is important, but so far I have a hard time with it. (I know, it's not a very concrete question.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants