Skip to content
This repository has been archived by the owner on Apr 12, 2024. It is now read-only.

fix(angular.encodeUriSegment): do not encode semi-colon #5019

Closed
wants to merge 2 commits into from

Conversation

brettporter
Copy link
Contributor

RFC 3986 indicates that ; is not encoded as part of the URI, as is the case
with other members of sub-delim. Changed encodeUriSegment to match that
behaviour, along with the corresponding spec.

This was causing a practical issue with Java servers that append ;jsessionid=... to the path. The ; was encoded, leading to the following error in PhantomJS as it continually saw the current URL as different to the $location.absUrl() which had been encoded:

Error: [$rootScope:infdig] 10 $digest() iterations reached. Aborting!
Watchers fired in the last 5 iterations: [["fn: $locationWatch; newVal: 7; oldVal: 6"],["fn: $locationWatch; newVal: 8; oldVal: 7"],["fn: $locationWatch; newVal: 9; oldVal: 8"],["fn: $locationWatch; newVal: 10; oldVal: 9"],["fn: $locationWatch; newVal: 11; oldVal: 10"]]

I note that the specs previously called out ; specifically for being encoded, which was introduced in 9e30baa, and that it has been in place for some time. Is there a reason this might have been needed that I've missed?

I also note that the same code is in ngResource, but I'm unfamiliar with the codebase and what impact it would have to change that as well. Should the code in there be updated as well?

RFC 3986 indicates that ; is not encoded as part of the URI, as is the case
with other members of sub-delim. Changed encodeUriSegment to match that
behaviour, along with the corresponding spec.
@petebacondarwin
Copy link
Member

This is basically a duplicate of #4067, where this has been discussed in some detail. Can you look at that one and add your thoughts?

@brettporter
Copy link
Contributor Author

Sorry I missed that in searching. Will do.

@brettporter
Copy link
Contributor Author

Hi @petebacondarwin - I think this is actually different, despite the similarity. The code here affects location, while the other PR only handles ngResource (which was the other case I mentioned above).

The assessment in there is right in this case too - just doing the substitution means %3B becomes ; and information is lost in the other direction.

Here is an alternate solution I could propose:
brettporter@bac97d9

This avoids the encode/decode loop. It would need more work to complete (specs for that case, and better handling for setting pathEncoded directly since setting $$path will encode ;).

Would you consider reopening, or a new PR, for such a change?

The other alternative would be to parse out the the parameters as a new variable. I tried that and had some issues with it being removed, since any code that uses $$path directly would lose the parameters.

@petebacondarwin
Copy link
Member

@brettporter - so the fix is basically the same but because the same function (encodeUriQuery) appears in both Angular.js (for the core ng module) and resource.js (for the ngResource module), it would actually need to be changed in both places. I wonder if these functions should be made available on the angular object rather than duplicating in ngResource.

But back to the issue: if you look through the discussion there is a problem as mentioned in this comment: #4067. Semi-colons should be encoded, unless they are being used in a special application specific sense. This makes it rather difficult for Angular since how does it know in what sense the semi-colon is being used?

@eonlepapillon suggests a solution that is specific to ngResource but perhaps from your situation there needs to be a higher level configuration? Or perhaps $location simply needs a similar configuration facility.

Either way, I don't think this PR is the right solution and I suggest that once we flush out the best solution it should go in a new PR.

Change location to retain the expected path encoding. The decode/encode cycle
for the location path loses data as %3B (a semi-colon in the path) and ; (a
sub-delim in the URI for path parameters) are considered to be different.
@brettporter
Copy link
Contributor Author

@petebacondarwin - thanks for the quick response. We're in agreement, my first two paragraphs were an attempt to say the same thing as your first two.

It's certainly possible that there could be a similar configuration facility for those manipulating $location directly, however that's not the issue I'm dealing with here - rather that $location goes through a cycle of decoding and parsing the current URL, then encoding and reconstructing it, that loses information without any intervention from the application.

Here's a simple example: https://github.com/brettporter/angular-phonecat/compare/semicolon-path-uri

If you access http://localhost:8000/app/index.html;jsessionid.html, you get index.html as you should, but note that the window URL changes to http://localhost:8000/app/index.html%3Bjsessionid.html. If you then refresh, you get the file index.html;jsessionid.html (note the change in page title).

What $location needs to do is maintain its internal representation of $$path in such a way that it can remember which elements of it were part of the path, and which were sub-delim.

Here's a better link to the fix I propose, which maintains an encoded representation of $$path: https://github.com/brettporter/angular.js/compare/semicolon-path-uri

I'm willing to put the extra work in to finish that off for a new PR, or adjust to a different solution - but would like to confirm if that's an acceptable solution first.

@brettporter
Copy link
Contributor Author

@petebacondarwin any suggestions on how to proceed here?

@petebacondarwin
Copy link
Member

I'm going to see if we can get this into the 1.2.5 milestone next week. Then the team can have a chat about what the best solution would be.

@IgorMinar
Copy link
Contributor

I'm sorry, but I wasn't able to verify your CLA signature. CLA signature is required for any code contributions to AngularJS.

Please sign our CLA and ensure that the CLA signature email address and the email address in this PR's commits match.

If you signed the CLA as a corporation, please let me know the company's name.

Thanks a bunch!

PS: If you signed the CLA in the past then most likely the email addresses don't match. Please sign the CLA again or update the email address in the commit of this PR.
PS2: If you are a Googler, please sign the CLA as well to simplify the CLA verification process.

@brettporter
Copy link
Contributor Author

Thanks for considering it! I've signed the CLA again with the email address on the commit in case it wasn't linked up properly before.

Is there anything else I should do with the PR at the moment to make it easier for you, or just wait until you've had a chance to discuss the solution?

@ghost ghost assigned jeffbcross Dec 9, 2013
@IgorMinar
Copy link
Contributor

I'm sorry, but I wasn't able to verify your CLA signature. CLA signature is required for any code contributions to AngularJS.

Please sign our CLA and ensure that the CLA signature email address and the email address in this PR's commits match.

If you signed the CLA as a corporation, please let me know the company's name.

Thanks a bunch!

PS: If you signed the CLA in the past then most likely the email addresses don't match. Please sign the CLA again or update the email address in the commit of this PR.
PS2: If you are a Googler, please sign the CLA as well to simplify the CLA verification process.

@brettporter
Copy link
Contributor Author

@IgorMinar I've signed the CLA twice, most recently with the email in this commit. However, it had me sign in to Google, so I'm not sure if that's using the Google account instead. How can I make sure the CLA gets accepted?

@mary-poppins
Copy link

CLA signature verified! Thank you!

Someone from the team will now triage your PR and it will be processed based on the determined priority (doc updates and fixes with tests are prioritized over other changes).

@jeffbcross jeffbcross modified the milestones: 1.3.0-beta.17, 1.3.0 Jul 21, 2014
@jeffbcross
Copy link
Contributor

Here's my current thinking of how $location and $resource should handle URLs (breaking changes ahead...):

  • Path, query and hash should be automatically encoded to the extent that the spec absolutely requires (e.g. $location.path('/foo^bar'); $location.path(); == /foo%5Ebar)
  • Should assume that any unencoded, contextually-acceptable delimiter characters contained in each url part are intended to be preserved as delimiters (e.g. $location.path('/foo;bar'); $location.path(); == /foo;bar/)
  • Should not decode any already-encoded characters provided as url parts (e.g. $location.path('/foo%3Bbar'); $location.path(); == /foo%3Bbar/)

Although significant, I don't think such a breaking change would be too drastic to be included in 1.3.

@petebacondarwin & @brettporter what do you think of this approach?

@petebacondarwin
Copy link
Member

@jeffbcross - so you are saying that applications should be responsible for pre-encoding such delimiters if they want them to pass through? For example what if you wanted to use semi-colons as a delimiter but then had a semi-colon in the content:

/foo%3Bbar;doo%3Bdar

How would we ensure that we didn't double encode this?

@jeffbcross
Copy link
Contributor

@petebacondarwin yes that's what I'm saying, but I'm still trying to think of different workable policies. I think my suggestion is the most intuitive and compliant as an "Automatic Encoding Policy". (<-- look, british period placement) Do you have a cohesive idea to put forward for a "Configurable Encoding Policy"? I didn't think what was proposed in #4067 was flexible enough, since it was All Or Nothing ignore of encodable characters within a string.

What I don't like about my suggested approach is that it would introduce more work for developers who are marshaling strings from one piece of data into the url, who are assuming that the service will encode more than it actually would.

Ensuring against double encoding would be a little expensive, I'm experimenting with some approaches.

@jeffbcross
Copy link
Contributor

@petebacondarwin I added a proof of concept here (a couple of tests are failing): jeffbcross@f57758c

@brettporter
Copy link
Contributor Author

Thanks for picking this up again. Not sure I have the cycles to page this back into my brain right now, but happy to test any changes you come up with against my specific use case.

@IgorMinar
Copy link
Contributor

just a few notes about this issue:

  1. It wasn't called out explicitly but I believe that the ;jsessionid problem only affects the html5 mode. but the problem is more generic.
  2. it would be great to have a failing test demonstrating the infinite digest issue as that would make understanding this issue much faster
  3. the $location setters currently expect that the value to be set is not encoded this is an important thing to remember and should be added to the documentation.
  4. the root problem is that a url can have more than one valid encoded string representation but our change detection expects a strict string match between the window.location.href and the url we store internally. This internal url, is constructed by parsing and decoding window.location.href and putting all parts together and encoding them. Because our encoding might not match the original encoding exactly, we might end up with a different string, which then makes change detection think that we have a new url. This could then result in infinite digest, but I wasn't able to reproduce that (haven't tried too hard though).
  5. navigating to https://docs.angularjs.org/api/ng/service/$location;foo=bar changes the url to https://docs.angularjs.org/api/ng/service/$location%3Bfoo=bar this is wrong
  6. I don't quite understand the scenario when you would want ; to be encoded as %3B when you are updating the url via $location. If you are changing the url via $location, the input should be decoded, so you have no control over encoding. If you were to use encoded input, we would double-encode. Can someone elaborate?

So to wrap up:

  • the original commit 0ea270e looks on the right track if applied to path and query parts of the url. why was it reverted?
  • the follow up commit f728b78 fixes the problem only for path but not query, so it's not ideal
  • we should also change
    if (!changeCounter || oldUrl != $location.absUrl()) {
    to compared decoded values not encoded ones

@IgorMinar
Copy link
Contributor

with regards to 6. - I read #4067 (comment) and in case of $resource and http requests made to the server it does make sense to distinguish between the two scenarios. I don't think it applies to window.location though. Anyone thinks differently?

@jeffbcross
Copy link
Contributor

Per some IRL discussion with @IgorMinar, here is the proposed approach.

  • Add semicolon to the "whitelist" of characters to represent as unencoded internally
  • Run window.location.href through the same partial-decoding logic when comparing $location's URL to the current URL.

I'm going to write a test to reproduce the infinite digest, and then will implement this approach.

@jeffbcross
Copy link
Contributor

Here is a failing test. The test is kinda hacky because of limitations of the mock $browser implementation, but I think it captures the case correctly. Removing semicolons from both path parts will make the test pass. It will also pass if the url returned in the $browser.$$url getter encodes the semicolon.

  it('should not infinitely digest when using a semicolon in initial path', function() {
    module(function($locationProvider) {
      $locationProvider.html5Mode(true);
    });

    inject(function($location, $browser, $rootScope) {
      Object.prototype.__defineGetter__.call($browser, '$$url', function() {
        return 'http://server/;jsessionid=foo';
      });
      Object.prototype.__defineSetter__.call($browser, '$$url', angular.noop);
      $location.path('/;jsessionid=foo');
      $rootScope.$digest();
    });
  });

@jeffbcross
Copy link
Contributor

Here's a better test, using the real $browser with mock window:

  it('should not infinitely digest when using a semicolon in initial path', function() {
    module(function($windowProvider, $locationProvider, $browserProvider) {
      $locationProvider.html5Mode(true);
      $windowProvider.$get = function() {
        var win = {};
        angular.extend(win, window);
        win.location = {
          href: 'http://localhost:9876/;jsessionid=foo'
        };
        return win;
      }
      $browserProvider.$get = function($document, $window) {
        var sniffer = {history: true, hashchange: true}
        var logs = {log:[], warn:[], info:[], error:[]};
        var fakeLog = {log: function() { logs.log.push(slice.call(arguments)); },
                   warn: function() { logs.warn.push(slice.call(arguments)); },
                   info: function() { logs.info.push(slice.call(arguments)); },
                   error: function() { logs.error.push(slice.call(arguments)); }};


        var b = new Browser($window, $document, fakeLog, sniffer);
        return b;
      }
    });

    inject(function($location, $browser, $rootScope) {
      $rootScope.$digest();
    });
  });

jeffbcross added a commit to jeffbcross/angular.js that referenced this pull request Jul 28, 2014
Some servers require characters within path segments to contain semicolons,
such as `/;jsessionid=foo` in order to work correctly. RFC-3986 includes
semicolons as acceptable sub-delimiters inside of path and query, but $location
currently encodes semicolons. This can cause an infinite digest to occur since $location
is comparing the internal semicolon-encoded url with the semicolon-unencoded url returned
from window.location.href, causing Angular to believe the url is changing with each digest
loop.

This fix adds ";" to the list of characters to unencode after encoding queries or path segments.

Closes angular#5019
jeffbcross added a commit to jeffbcross/angular.js that referenced this pull request Jul 28, 2014
Some servers require characters within path segments to contain semicolons,
such as `/;jsessionid=foo` in order to work correctly. RFC-3986 includes
semicolons as acceptable sub-delimiters inside of path and query, but $location
currently encodes semicolons. This can cause an infinite digest to occur since $location
is comparing the internal semicolon-encoded url with the semicolon-unencoded url returned
from window.location.href, causing Angular to believe the url is changing with each digest
loop.

This fix adds ";" to the list of characters to unencode after encoding queries or path segments.

Closes angular#5019
jeffbcross added a commit to jeffbcross/angular.js that referenced this pull request Jul 28, 2014
Some servers require characters within path segments to contain semicolons,
such as `/;jsessionid=foo` in order to work correctly. RFC-3986 includes
semicolons as acceptable sub-delimiters inside of path and query, but $location
currently encodes semicolons. This can cause an infinite digest to occur since $location
is comparing the internal semicolon-encoded url with the semicolon-unencoded url returned
from window.location.href, causing Angular to believe the url is changing with each digest
loop.

This fix adds ";" to the list of characters to unencode after encoding queries or path segments.

Closes angular#5019
jeffbcross added a commit to jeffbcross/angular.js that referenced this pull request Jul 29, 2014
Some servers require characters within path segments to contain semicolons,
such as `/;jsessionid=foo` in order to work correctly. RFC-3986 includes
semicolons as acceptable sub-delimiters inside of path and query, but $location
currently encodes semicolons. This can cause an infinite digest to occur since $location
is comparing the internal semicolon-encoded url with the semicolon-unencoded url returned
from window.location.href, causing Angular to believe the url is changing with each digest
loop.

This fix adds ";" to the list of characters to unencode after encoding queries or path segments.

Closes angular#5019
jeffbcross added a commit to jeffbcross/angular.js that referenced this pull request Jul 29, 2014
Some servers require characters within path segments to contain semicolons,
such as `/;jsessionid=foo` in order to work correctly. RFC-3986 includes
semicolons as acceptable sub-delimiters inside of path and query, but $location
currently encodes semicolons. This can cause an infinite digest to occur since $location
is comparing the internal semicolon-encoded url with the semicolon-unencoded url returned
from window.location.href, causing Angular to believe the url is changing with each digest
loop.

This fix adds ";" to the list of characters to unencode after encoding queries or path segments.

Closes angular#5019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants