Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix addQueryArgs behavior when encountering URL fragments #16656

Open
wants to merge 8 commits into
base: master
from

Conversation

@davilera
Copy link
Contributor

davilera commented Jul 18, 2019

Description

Fixes #16655.

How has this been tested?

I’ve added a new test in @wordpress/url package that reproduces the issue described in #16655. Then, I've modified addQueryArgs so that the URL fragment is first removed from the URL, then the function runs as always, and finally the fragment is re-appended in the resulting URL at its proper location.

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • My code has proper inline documentation.
  • I've included developer documentation if appropriate.
@davilera davilera requested review from aduth, talldan and youknowriad as code owners Jul 18, 2019
@swissspidy swissspidy changed the title Fix/16655 Fix addQueryArgs behavior when encountering URL fragments Jul 18, 2019
@swissspidy swissspidy self-requested a review Jul 18, 2019
@swissspidy

This comment has been minimized.

Copy link
Member

swissspidy commented Jul 18, 2019

Does removeQueryArgs handle fragments correctly?

@davilera

This comment has been minimized.

Copy link
Contributor Author

davilera commented Jul 18, 2019

Apparently, it doesn't @swissspidy, as it assumes the fragment is part of the argument. For example:

removeQueryArgs( 'https://example.com/?a=1&b=2#fragment', 'a' );

returns:

https://example.com/?b=2%23fragment

but the expected result is:

https://example.com/?b=2#fragment

Moreover, if the given URL is already invalid, I'm pretty sure the functions included in @wordpress/url don't behave as expected either.

For example, if query args appear after the fragment, they should be part of the fragment. So, the fragment in a URL such as https://example.com/#fragment?a=1 is #fragment?a=1, as you can see when using document.location.hash. However:

wp.url.getFragment( 'https://example.com/#fragment?a=1' )

returns:

#fragment

@swissspidy

This comment has been minimized.

Copy link
Member

swissspidy commented Jul 18, 2019

It would be great if we could fix that too then :)

@davilera

This comment has been minimized.

Copy link
Contributor Author

davilera commented Jul 18, 2019

Should I open a new issue or address this here? Also, I've reviewed all the functions included in @wordpress/url and I think these are the ones that might need some tweaking:

addQueryArgs (already done)
• getQueryString
• isValidQueryString
• getFragment
• isValidFragment
• getQueryArg
• hasQueryArg
• removeQueryArgs

I've talked to @aduth and he thinks we should probably look for an already-existing URL parsing library or a "native" solution. Right now, @wordpress/url uses qs.

The native solution sounds quite good:

const a = document.createElement( 'a' );
a.href = 'https://example.com/some/fantastic/page/?a=1&b=2#fragment';

a.protocol; // https:
a.hostname; // example.com
a.pathname; // /some/fantastic/page/
a.search;   // ?a=1&b=2
a.hash;     // #fragment

Thoughts?

@swissspidy

This comment has been minimized.

Copy link
Member

swissspidy commented Jul 18, 2019

Don't really have a strong opinion about how to best tackle it.

The native solution only gets you so far as browser support goes, see https://developer.mozilla.org/en-US/docs/Web/API/URL for example. qs would help with that and also some edge cases I guess.

@davilera

This comment has been minimized.

Copy link
Contributor Author

davilera commented Jul 18, 2019

If you don't like the idea of using the native solution (about 94% support), I feel like a good idea is to follow the same approach I implemented in my PR. Essentially, remove the fragment using a regexp (/#.*$/), work with a "clean" version of the URL and tweak it as needed, and re-append the fragment at the end.

In the future, we can transition to the native solution, if you want. Does this sound better?

Either way, I'll be glad to take care of it and propose a new PR.

@swissspidy

This comment has been minimized.

Copy link
Member

swissspidy commented Jul 18, 2019

Sounds good.

Please note that it's not about liking the idea of the native solution. WordPress still supports IE11 and IE11 does not have support for it. That's a blocker.

@davilera

This comment has been minimized.

Copy link
Contributor Author

davilera commented Jul 19, 2019

So, it took me a little bit longer than expected, but I finally got the PR ready, @swissspidy. I first refactored all the tests so that they match the specification and I then tweaked the functions so that those tests pass. If you want to review my work, I'd recommend you first take a look at how the tests change, as those reflect my intent ;-)

How does this PR look?

PS. I know it wasn't about liking anything; that was just poor phrasing from my end.

@davilera

This comment has been minimized.

Copy link
Contributor Author

davilera commented Jul 22, 2019

BTW, I tried to tweak the minimum amount of code possible from our @wordpress/url library, but I feel like using something like the url library internally might result in a more robust solution...

expect( isValidFragment( '#yes-it-is?' ) ).toBe( true );
expect( isValidFragment( '#yes-it-is?not-a-query' ) ).toBe( true );
expect( isValidFragment( '#yes-it-is#' ) ).toBe( true );
expect( isValidFragment( '#yes-it-#is' ) ).toBe( true );

This comment has been minimized.

Copy link
@adamsilverstein

adamsilverstein Jul 23, 2019

Contributor

I'm surprised these are valid fragments (containing hashes)

This comment has been minimized.

Copy link
@davilera

davilera Jul 23, 2019

Author Contributor

Me too, but that's what the URL API returns in Firefox.

@aduth

This comment has been minimized.

Copy link
Member

aduth commented Jul 24, 2019

BTW, I tried to tweak the minimum amount of code possible from our @wordpress/url library, but I feel like using something like the url library internally might result in a more robust solution...

I agree with this general sentiment. I don't look forward to trying to maintaining the correct spec behavior in this module. We could maybe find one a bit more browser-friendly / minimal than node-url, but it feels like something we should be rely on an existing mature, tested solution.

packages/url/src/index.js Outdated Show resolved Hide resolved
}

return encodeURI( fragment )
.replace( /%5B/gi, '[' )

This comment has been minimized.

Copy link
@aduth

aduth Jul 24, 2019

Member

Can you explain for what reason these are here?

This comment has been minimized.

Copy link
@davilera

davilera Jul 30, 2019

Author Contributor

Sure! In Firefox 67, this:

new URL( 'https://example.com/some-page/?q=1#a hash-%with[spaces_and-öther characterß]' ).hash

returns this:

"#a%20hash-%with[spaces_and-%C3%B6ther%20character%C3%9F]"

That is, it encoded almost all characters as encodeURI does, the only exceptions being %, [, and ]. That's why I URI encoded the fragment and then reverted the scapes:

encodeURI( "a hash-%with[spaces_and-öther characterß]" )

returns:

"a%20hash-%25with%5Bspaces_and-%C3%B6ther%20character%C3%9F%5D"

which, after applying the replaces, becomes:

"a%20hash-%with[spaces_and-%C3%B6ther%20character%C3%9F]"

That's why I applied those... Does this makes sense?

This comment has been minimized.

Copy link
@aduth

aduth Jul 30, 2019

Member

Sort of. But rather than rely on some anecdotal observations of a specific browser, is there a specification from which we can draw on for the precise behavior to which we should conform?

In any case, it would be important to include sufficient inline code comments to make this behavior as clear as possible to the next maintainer.

David Aguilera and others added 2 commits Jul 30, 2019
Co-Authored-By: Andrew Duthie <andrew@andrewduthie.com>
@davilera

This comment has been minimized.

Copy link
Contributor Author

davilera commented Jul 30, 2019

I don't look forward to trying to maintaining the correct spec behavior in this module.

Neither do I. I actually think the current PR implementation is pretty good at it.

We could maybe find one a bit more browser-friendly / minimal than node-url

Any ideas? I thought about using url-polyfill, but I'm afraid I don't know how I should include it in the project...

@aduth

This comment has been minimized.

Copy link
Member

aduth commented Jul 30, 2019

I'm afraid I don't know how I should include it in the project...

For true polyfills (for which I think this qualifies), there's a pattern in core already:

Depending on how to test for support, it could then look something like:

function gutenberg_polyfill_url( &$scripts ) {
	$scripts->add( 'wp-polyfill-url', '...', array(), 'x.x.x' );
	did_action( 'init' ) && $scripts->add_inline_script(
		'wp-polyfill',
		wp_get_script_polyfill(
			$scripts,
			array(
				'\'URL\' in window' => 'wp-polyfill-url',
			)
		)
	);
}
add_action( 'wp_default_scripts', 'gutenberg_polyfill_url' );

Getting that path of the file might be tricky, and perhaps require factoring out vendor script registration to occur on init (in time for wp_default_scripts) rather than wp_enqueue_scripts. gutenberg_register_vendor_scripts provides a pattern for downloading / registering a script from a URL in Gutenberg.

All this would need to correspond with an equivalent effort in core to ensure the script is added in time for the next major release.

I realize as I type this that it might be a fair bit of work. The advantage of the polyfill is that if we can rely on a stable solution from the browser in modern browsers, it only adds overhead in the legacy.

Looking back through the conversation though, something strikes me from your comment in #16656 (comment) .

Does the technique using an anchor tag require modern browsers? Or is that just if we wanted to specifically use the URL constructor? One potential downside to this approach (and I assume also the same with URL and perhaps its polyfill) is that it limits usage of this module to work only in the browser, not in Node. This would be unfortunate.

For the purposes of this pull request then, if we can make reasonable (tested) improvements without compromising Node support, maybe it's fine to move forward with it as-is and explore those alternatives separately?

@davilera

This comment has been minimized.

Copy link
Contributor Author

davilera commented Jul 30, 2019

Thanks for the detailed answer, @aduth!

Does the technique using an anchor tag require modern browsers? Or is that just if we wanted to specifically use the URL constructor?

That's actually a pretty good question. I guess we'd have to test it.

One potential downside to this approach (and I assume also the same with URL and perhaps its polyfill) is that it limits usage of this module to work only in the browser, not in Node. This would be unfortunate.

I agree.

For the purposes of this pull request then, if we can make reasonable (tested) improvements without compromising Node support, maybe it's fine to move forward with it as-is and explore those alternatives separately?

I agree.

As you said, node-url is also an option that looks quite good, especially in terms of using it both in the browser and Node. But, unless someone find out yet another unexpected issue, the current implementation + PR is close enough to the proper solution, and future refactors are always possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants
You can’t perform that action at this time.