Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing a string that contains a semi-colon doesn't properly get encoded #9224

Closed
bobber205 opened this issue Sep 23, 2014 · 10 comments
Closed

Passing a string that contains a semi-colon doesn't properly get encoded #9224

bobber205 opened this issue Sep 23, 2014 · 10 comments

Comments

@bobber205
Copy link

@bobber205 bobber205 commented Sep 23, 2014

Discovered this today in a rails app as the backend (since it totally barfs on unescaped semis)

I had a string such as "test;test"

passed it to params in $http post() and put() and got

test;test instead of 
%3B

Shouldn't it encode?

@caitp
Copy link
Contributor

@caitp caitp commented Sep 23, 2014

https://github.com/angular/angular.js/blob/master/src/Angular.js#L1128 the comment explains why, but if it breaks rails, hmm --- maybe an option around this would be good

@bobber205
Copy link
Author

@bobber205 bobber205 commented Sep 23, 2014

My workaround was to stop sending my JSON data as a param.
Doesn't look like ; is a valid url separator?
http://stackoverflow.com/questions/3867460/valid-url-separators

@jeffbcross
Copy link
Contributor

@jeffbcross jeffbcross commented Sep 29, 2014

Semicolon is a valid delimiter for some parts of url (at least search, I think path as well).

@jeffbcross jeffbcross modified the milestones: 1.3.0-rc.5, 1.3.0, Backlog Sep 29, 2014
@DanielHeath
Copy link

@DanielHeath DanielHeath commented Nov 24, 2014

From the duplicate issue: http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2 is the relevant spec - semicolons should definitely be escaped.

@DanielHeath
Copy link

@DanielHeath DanielHeath commented Nov 24, 2014

RE

function encodeUriQuery(val, pctEncodeSpaces) {

What's so bad about encoding things that might not need to be encoded (to justify a parallel implementation in angular with its own bugs, tests, support etc)?

@pkozlowski-opensource
Copy link
Member

@pkozlowski-opensource pkozlowski-opensource commented Nov 29, 2014

I must say that I tend to agree with @DanielHeath here. The way I read this comment from rfc3986 is that query can consists of pchars which in turn can contain unreserved / pct-encoded / sub-delims / ":" / "@". But IMO it doesn't say anything about %-encoding / not encoding of sub-delims used in their non-sub-delims role.

In the mentioned RFC we can read (2.2):

URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component. If a reserved character is found in a URI component and
no delimiting role is known for that character, then it must be
interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII.

So much for the spec... Then, when we look at what is going on "in the wild" we can easily see that there are multiple gen-delims and sub-delims used commonly un-encoded (ex.: foo[]=bar, foo+bar).

We are not the only ones struggling with the spec / reality interpretation here, ex.:
http://www.456bereastreet.com/archive/201008/what_characters_are_allowed_unencoded_in_query_strings/

To sum up: we are in the grey area here.... The spec seem to suggest that we should %-encode all the gen-delims and sub-delims when used in their non-delimiting role but this is not what most of the software is doing :-/ So it seems that whatever we do we might break some of the impls.

Coming back to the issue at hand: we could %-encode ; and this will keep some back-ends happy while might break others. I guess we might try in 1.4 and see what gets broken in practice or advice people to %-encode things in their own to match their backend. What we should do in 1.4 is to extract URL-generation logic and let people easily override it.

Putting as 1.4 suggestion.

@petebacondarwin
Copy link
Member

@petebacondarwin petebacondarwin commented Dec 11, 2014

See #8377

It is a grey area. The spec says that encoding (or not encoding) is application specific.

Perhaps the best 1.4 solution (as part of the $http refactoring) would be to provide a configuration for whether to encode this...

@pkozlowski-opensource
Copy link
Member

@pkozlowski-opensource pkozlowski-opensource commented Dec 11, 2014

Yeh, this is definitively grey area. And spec really make this open to interpretation... This is one more reason for me to push for a dedicated service that is responsible for request params serialisation - it would be much easier to test / configure this as a separate service.

@schmod
Copy link
Contributor

@schmod schmod commented Feb 11, 2015

I work with a platform that has a (partial) implementation of Matrix URIs.

Admittedly, Matrix URIs never became a standard, and are not in widespread use. However, there are definitely existing implementations that rely on unescaped semicolons being in the path.

The query seems to be much more of a grey area, although the older HTML specifications encouraged CGI developers to support semicolon-delimited query strings.

IMO, there's enough precedent to make me apprehensive about automatically escaping semicolons in any URI component. If a developer is interacting with an idiosyncratic server implementation, it seems to make the most sense to fix the server, or write an interceptor to "sanitize" URLs before they are sent to the server.

HOWEVER

@bobber205's issue is a very specific and isolated use case: Putting semicolons inside of a single query parameter value. In this exact situation (and only this situation), I actually think that it is appropriate to escape semicolons, as server implementations might misinterpret an unescaped semicolon as a delimiter.

In summary:

  • path: Don't escape semicolons. Breaks matrix parameters, and no evidence that Angular's current behavior is problematic.
  • query: Escaping the entire query string would be unwise, as it breaks an established (albeit archaic) expectation for servers to be able to handle semicolon-delimited query strings.
  • query values: Probably safe to escape, especially if the server interprets semicolons as query delimiters. Passing complex data types (ie. JSON) via query parameters is a bad idea, but "real-world" string values often contain semicolons, and it seems like a good idea to escape them.
pkozlowski-opensource added a commit to pkozlowski-opensource/angular.js that referenced this issue Mar 4, 2015
pkozlowski-opensource added a commit to pkozlowski-opensource/angular.js that referenced this issue Mar 4, 2015
pkozlowski-opensource added a commit to pkozlowski-opensource/angular.js that referenced this issue Apr 1, 2015
pkozlowski-opensource added a commit to pkozlowski-opensource/angular.js that referenced this issue Apr 2, 2015
@bobber205
Copy link
Author

@bobber205 bobber205 commented Apr 2, 2015

Thanks for all the hardwork! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

7 participants
You can’t perform that action at this time.