Skip to content
This repository has been archived by the owner on Apr 12, 2024. It is now read-only.

fix($browser): normalize inputted URLs #16606

Merged
merged 2 commits into from Nov 21, 2018
Merged

fix($browser): normalize inputted URLs #16606

merged 2 commits into from Nov 21, 2018

Conversation

jbedard
Copy link
Contributor

@jbedard jbedard commented Jun 20, 2018

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
fix

What is the current behavior? (You can also link to an open issue here)

What is the new behavior (if this is a feature change)?

Does this PR introduce a breaking change?

Please check if the PR fulfills these requirements

  • The commit message follows our guidelines
  • Fix/Feature: Docs have been added/updated
  • Fix/Feature: Tests have been added; existing tests pass

Other information:

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

Copy link
Contributor

@Narretz Narretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor things, but technically good to go

src/ng/browser.js Show resolved Hide resolved
@@ -131,6 +131,15 @@ function Browser(window, document, $log, $sniffer) {
// setter
if (url) {
var sameState = lastHistoryState === state;
var hasTrailingSlash = url[url.length - 1] === '/';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex would look nicer and could be re-used below

test/ng/browserSpecs.js Outdated Show resolved Hide resolved
@jbedard
Copy link
Contributor Author

jbedard commented Jun 24, 2018

Updated. Sorry I squashed/rebased everything so it might be hard to see the changes.

I added a commit (21263c9) refactoring the mock window.location in the $browser specs. I think this makes it more realistic by normalizing URLs assigned to window.location.* similar to the real Location. This was required for the tests added to the main commit to fail before (and pass after) the fix.

Also, as @petebacondarwin pointed out, none of the tests in the first commit actually fail in master right now. So I'm tempted to just drop them. I swear I saw these fail at some point though. @dmartres can you confirm your tests (from #16098) did fail at some point? Any idea why they are not be failing?

src/ng/browser.js Outdated Show resolved Hide resolved
browser.onUrlChange(callback);
browser.url('http://server/abc?q=\'');

browser.$$checkUrlChange();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are testing a private API here, I feel like maybe we should test this in $location spec instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid invoking the private API from tests? It really is testing logic only in browser.js though :/

// Normalize the inputted URL ...
url = urlResolve(url).href;

// ... but keep the (lack of) trailing '/' consistent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that there was a slash initially and urlResolve stripped it off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if a browser "normalizes" to have no slash at the end then yes, and I'd vote to handle that just in case. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't hurt. Let's do it 😃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although now I'm wondering if trimming this is actually a problem 🤔 (originally I didn't trim anything, but we added this to avoid adding / to many tests).

The whole point of this is to make the persisted lastBrowserUrl normalized the same way document.location.href is normalized. So the comparison below and maybe also the comparison in $location would not fail due to the normalized value being different then the non-normalized lastBrowserUrl. By removing this / we are making lastBrowserUrl different then the normalized value...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we are normalizing all the time, so it shouldn't matter? Or if it does, we should be able to add a test that shows this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we strip the slash here and assign to lastBrowserUrl, then later we compare to $browser.url() (which is location.href, which has a slash), then maybe it will detect changes when it shouldn't?
browser.js#L148
browser.js#L231

I won't change anything unless I can prove it with a test though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already have a test that proves it actually (but I'll try to add more).

Notice when I updated the MockWindow to use aElement.href to normalize the window.location.href mock... one test also had it's input changed otherwise it failed. I think the input shouldn't have to change and the fact that it failed shows what I'm talking about here... if we are comparing lastBrowserURL to $browser.url() then they must be normalized the same way. $browser.url() is location.href, so lastBrowserURL should be normalized just like location.href.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. So we have to remove the slash recreation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think we must keep the browser-normalized URL, slash or not.

See the new tests in the latest commit, 4 fail with the slash-removal logic (and they are run twice, so 8 failures total). Without the urlResolve there are 14 failures.

@@ -567,6 +568,30 @@ describe('browser', function() {
expect(replaceState).not.toHaveBeenCalled();
expect(locationReplace).not.toHaveBeenCalled();
});

it('should not do pushState with a URL only different in encoding', function() {
//A URL from something such as window.location.href
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after //.

@gkalpak gkalpak modified the milestones: 1.7.x - won't fix, 1.7.x Sep 26, 2018
@jbedard jbedard force-pushed the 16100 branch 2 times, most recently from c117089 to 768105f Compare October 3, 2018 06:26
@googlebot
Copy link

CLAs look good, thanks!

1 similar comment
@googlebot
Copy link

CLAs look good, thanks!

@googlebot googlebot added cla: yes and removed cla: no labels Oct 3, 2018
@jbedard
Copy link
Contributor Author

jbedard commented Oct 3, 2018

Tests are currently failing in IE because IE is not normalizing the URLs like other browsers. Most likely the same IE bug that his block is solving: https://github.com/angular/angular.js/blob/v1.7.4/src/ng/urlUtils.js#L65-L71

@jbedard
Copy link
Contributor Author

jbedard commented Oct 5, 2018

Some tests are failing in IE, and I've realized those are failing for the same original issue #16098 was trying to fix. Also the reason I thought none of the tests from #16098 were "working" (failing before the fix) is because they probably only failed in IE and I wasn't trying IE...

So this PR doesn't fully fix the bug. It does for some cases of URLs being normalized, but IE doesn't normalize ' in the URL. Either we have to manually decode %27 to ' in urlResolve or just live with the fact that IE considers those to be different, meaning only half-fix the bug...

For reference RFC1738 says: "only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL" and it seems like IE doesn't respect that.

Should we add specific handling for those chars the RFC lists? Or just respect what a.href = "..." does and have some potential browser inconsistencies?

@gkalpak
Copy link
Member

gkalpak commented Oct 5, 2018

Should we add specific handling for those chars the RFC lists? Or just respect what a.href = "..." does and have some potential browser inconsistencies?

Tough one 😁
I vote for special handling if it only affects IE and browser inconsistencies otherwise.

@jbedard
Copy link
Contributor Author

jbedard commented Oct 5, 2018

I think this could get ugly :|

Chrome:

  • encodes ' in params
  • decodes -_. in paths

IE:

  • decodes -_. in paths

Firefox:

  • encodes ' in params
  • decodes . in the paths

Everything else in $-_.+!*'(), seems to not be changed by urlResolve, so I believe this bug would be reproduced with those.

I'm going to sleep on it :|

@petebacondarwin
Copy link
Member

URL handling is hard

@jbedard
Copy link
Contributor Author

jbedard commented Oct 16, 2018

I've updated this to (I think) fix it better, although that makes the change bigger and potentially more questionable about merging vs wontfix.

This still adds URL normalizing to $browser.url(newValue) by using urlResolve like it originally did, but I've now updated urlResolve to normalize things that browsers don't do on their own such as decoding unnecessarily encoded things and making all hex-encoding upper case.

There's a whole section in RFC3986 talking about normalization/comparison and defines the ABNF for the path, query and fragment. Those are the 3 parts of the URL this PR tries to normalize.

(Note that this RFC3986 replaces the one I was previously referencing)

Copy link
Member

@petebacondarwin petebacondarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very interesting. Can we get Travis green?

@jbedard
Copy link
Contributor Author

jbedard commented Oct 16, 2018

But Chrome and FF did just fine :(

@jbedard
Copy link
Contributor Author

jbedard commented Oct 16, 2018

Most of the failures were because all browsers except Chrome+FF add the ? and # if you assign to HTMLAnchorElement.search/hash. However now that I avoided that assignment when unnecessary... Chrome+FF no longer remove the ? and # when empty.

Might require some manually trimming of ?# :/

@jbedard
Copy link
Contributor Author

jbedard commented Oct 25, 2018

For reference: here's a commit which attempts to normalize encoding in the pathname/search/hash: jbedard@5d33cee

While this normalizes a lot of things it also introduces some ugliness and inconsistencies across browsers (appending blank ? and #), so for now I'm dropping this commit.

@jbedard jbedard force-pushed the 16100 branch 4 times, most recently from 2b6e6fa to e9e18f5 Compare October 29, 2018 16:17
@jbedard
Copy link
Contributor Author

jbedard commented Oct 30, 2018

I've changed this back to the basic fix, just letting the browser do the normalizing with urlResolve (assigning + reading from a HTMLAnchronElement.href). Browsers are a little inconsistent and a lot of things don't get normalized such as mentioned above as well as thing like making escaped characters consistently cased etc. I think all that is too big of a change at this point though. For this reason I don't think #16100 should actually be marked as fixed...

The added tests show some examples where invoking $browser.url(...) would previously push a new URL/state and now it won't. However $browser.url(...) is really only invoked from $location which already has similar normalizing, but I think there are still cases where this will allow the early return and avoid an extra pushState.

I think this is ready...

Copy link
Member

@mgol mgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but it'd be great if you could add more details in the final commit message...

@jbedard
Copy link
Contributor Author

jbedard commented Oct 31, 2018

@mgol the thing is... I'm not 100% sure what those cases are lol

I think it would be:
a) when $browser.url(x) is called twice in a row with the same url but encoded different it will no longer do an extra url/hash change
b) when the pendingLocation is being used $browser.url() will return a normalized version like $browser.url() normally does when it returns location.href

So I think if a user is at a url, then clicks a link to the same url with different encoding, it will no longer do pushState. I also wouldn't mind testing that better but I'm having trouble writing one...

Calls to `$browser.url` now normalize the inputted URL ensuring multiple
calls only differing in formatting do not force a browser `pushState`.

Normalization is done the same as the browser location URL and may
differ per browser and may be changed by browsers. Today no browsers
fully normalize URLs so this does not fix all instances of this issue.

See angular#16100
Closes angular#16606
@jbedard jbedard merged commit 4a3ae43 into angular:master Nov 21, 2018
Narretz pushed a commit that referenced this pull request Nov 26, 2018
Calls to `$browser.url` now normalize the inputted URL ensuring multiple
calls only differing in formatting do not force a browser `pushState`.

Normalization is done the same as the browser location URL and may
differ per browser and may be changed by browsers. Today no browsers
fully normalize URLs so this does not fix all instances of this issue.

See #16100
Closes #16606
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants