New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing a string that contains a semi-colon doesn't properly get encoded #9224

Closed
bobber205 opened this Issue Sep 23, 2014 · 10 comments

Comments

Projects
None yet
7 participants
@bobber205

Discovered this today in a rails app as the backend (since it totally barfs on unescaped semis)

I had a string such as "test;test"

passed it to params in $http post() and put() and got

test;test instead of 
%3B

Shouldn't it encode?

@caitp

This comment has been minimized.

Show comment
Hide comment
@caitp

caitp Sep 23, 2014

Contributor

https://github.com/angular/angular.js/blob/master/src/Angular.js#L1128 the comment explains why, but if it breaks rails, hmm --- maybe an option around this would be good

Contributor

caitp commented Sep 23, 2014

https://github.com/angular/angular.js/blob/master/src/Angular.js#L1128 the comment explains why, but if it breaks rails, hmm --- maybe an option around this would be good

@bobber205

This comment has been minimized.

Show comment
Hide comment
@bobber205

bobber205 Sep 23, 2014

My workaround was to stop sending my JSON data as a param.
Doesn't look like ; is a valid url separator?
http://stackoverflow.com/questions/3867460/valid-url-separators

My workaround was to stop sending my JSON data as a param.
Doesn't look like ; is a valid url separator?
http://stackoverflow.com/questions/3867460/valid-url-separators

@jeffbcross

This comment has been minimized.

Show comment
Hide comment
@jeffbcross

jeffbcross Sep 29, 2014

Contributor

Semicolon is a valid delimiter for some parts of url (at least search, I think path as well).

Contributor

jeffbcross commented Sep 29, 2014

Semicolon is a valid delimiter for some parts of url (at least search, I think path as well).

@DanielHeath

This comment has been minimized.

Show comment
Hide comment
@DanielHeath

DanielHeath Nov 24, 2014

From the duplicate issue: http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2 is the relevant spec - semicolons should definitely be escaped.

From the duplicate issue: http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2 is the relevant spec - semicolons should definitely be escaped.

@DanielHeath

This comment has been minimized.

Show comment
Hide comment
@DanielHeath

DanielHeath Nov 24, 2014

RE

function encodeUriQuery(val, pctEncodeSpaces) {

What's so bad about encoding things that might not need to be encoded (to justify a parallel implementation in angular with its own bugs, tests, support etc)?

RE

function encodeUriQuery(val, pctEncodeSpaces) {

What's so bad about encoding things that might not need to be encoded (to justify a parallel implementation in angular with its own bugs, tests, support etc)?

@pkozlowski-opensource

This comment has been minimized.

Show comment
Hide comment
@pkozlowski-opensource

pkozlowski-opensource Nov 29, 2014

Member

I must say that I tend to agree with @DanielHeath here. The way I read this comment from rfc3986 is that query can consists of pchars which in turn can contain unreserved / pct-encoded / sub-delims / ":" / "@". But IMO it doesn't say anything about %-encoding / not encoding of sub-delims used in their non-sub-delims role.

In the mentioned RFC we can read (2.2):

URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component. If a reserved character is found in a URI component and
no delimiting role is known for that character, then it must be
interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII.

So much for the spec... Then, when we look at what is going on "in the wild" we can easily see that there are multiple gen-delims and sub-delims used commonly un-encoded (ex.: foo[]=bar, foo+bar).

We are not the only ones struggling with the spec / reality interpretation here, ex.:
http://www.456bereastreet.com/archive/201008/what_characters_are_allowed_unencoded_in_query_strings/

To sum up: we are in the grey area here.... The spec seem to suggest that we should %-encode all the gen-delims and sub-delims when used in their non-delimiting role but this is not what most of the software is doing :-/ So it seems that whatever we do we might break some of the impls.

Coming back to the issue at hand: we could %-encode ; and this will keep some back-ends happy while might break others. I guess we might try in 1.4 and see what gets broken in practice or advice people to %-encode things in their own to match their backend. What we should do in 1.4 is to extract URL-generation logic and let people easily override it.

Putting as 1.4 suggestion.

I must say that I tend to agree with @DanielHeath here. The way I read this comment from rfc3986 is that query can consists of pchars which in turn can contain unreserved / pct-encoded / sub-delims / ":" / "@". But IMO it doesn't say anything about %-encoding / not encoding of sub-delims used in their non-sub-delims role.

In the mentioned RFC we can read (2.2):

URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component. If a reserved character is found in a URI component and
no delimiting role is known for that character, then it must be
interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII.

So much for the spec... Then, when we look at what is going on "in the wild" we can easily see that there are multiple gen-delims and sub-delims used commonly un-encoded (ex.: foo[]=bar, foo+bar).

We are not the only ones struggling with the spec / reality interpretation here, ex.:
http://www.456bereastreet.com/archive/201008/what_characters_are_allowed_unencoded_in_query_strings/

To sum up: we are in the grey area here.... The spec seem to suggest that we should %-encode all the gen-delims and sub-delims when used in their non-delimiting role but this is not what most of the software is doing :-/ So it seems that whatever we do we might break some of the impls.

Coming back to the issue at hand: we could %-encode ; and this will keep some back-ends happy while might break others. I guess we might try in 1.4 and see what gets broken in practice or advice people to %-encode things in their own to match their backend. What we should do in 1.4 is to extract URL-generation logic and let people easily override it.

Putting as 1.4 suggestion.

@petebacondarwin

This comment has been minimized.

Show comment
Hide comment
@petebacondarwin

petebacondarwin Dec 11, 2014

Member

See #8377

It is a grey area. The spec says that encoding (or not encoding) is application specific.

Perhaps the best 1.4 solution (as part of the $http refactoring) would be to provide a configuration for whether to encode this...

Member

petebacondarwin commented Dec 11, 2014

See #8377

It is a grey area. The spec says that encoding (or not encoding) is application specific.

Perhaps the best 1.4 solution (as part of the $http refactoring) would be to provide a configuration for whether to encode this...

@pkozlowski-opensource

This comment has been minimized.

Show comment
Hide comment
@pkozlowski-opensource

pkozlowski-opensource Dec 11, 2014

Member

Yeh, this is definitively grey area. And spec really make this open to interpretation... This is one more reason for me to push for a dedicated service that is responsible for request params serialisation - it would be much easier to test / configure this as a separate service.

Yeh, this is definitively grey area. And spec really make this open to interpretation... This is one more reason for me to push for a dedicated service that is responsible for request params serialisation - it would be much easier to test / configure this as a separate service.

@schmod

This comment has been minimized.

Show comment
Hide comment
@schmod

schmod Feb 11, 2015

Contributor

I work with a platform that has a (partial) implementation of Matrix URIs.

Admittedly, Matrix URIs never became a standard, and are not in widespread use. However, there are definitely existing implementations that rely on unescaped semicolons being in the path.

The query seems to be much more of a grey area, although the older HTML specifications encouraged CGI developers to support semicolon-delimited query strings.

IMO, there's enough precedent to make me apprehensive about automatically escaping semicolons in any URI component. If a developer is interacting with an idiosyncratic server implementation, it seems to make the most sense to fix the server, or write an interceptor to "sanitize" URLs before they are sent to the server.

HOWEVER

@bobber205's issue is a very specific and isolated use case: Putting semicolons inside of a single query parameter value. In this exact situation (and only this situation), I actually think that it is appropriate to escape semicolons, as server implementations might misinterpret an unescaped semicolon as a delimiter.

In summary:

  • path: Don't escape semicolons. Breaks matrix parameters, and no evidence that Angular's current behavior is problematic.
  • query: Escaping the entire query string would be unwise, as it breaks an established (albeit archaic) expectation for servers to be able to handle semicolon-delimited query strings.
  • query values: Probably safe to escape, especially if the server interprets semicolons as query delimiters. Passing complex data types (ie. JSON) via query parameters is a bad idea, but "real-world" string values often contain semicolons, and it seems like a good idea to escape them.
Contributor

schmod commented Feb 11, 2015

I work with a platform that has a (partial) implementation of Matrix URIs.

Admittedly, Matrix URIs never became a standard, and are not in widespread use. However, there are definitely existing implementations that rely on unescaped semicolons being in the path.

The query seems to be much more of a grey area, although the older HTML specifications encouraged CGI developers to support semicolon-delimited query strings.

IMO, there's enough precedent to make me apprehensive about automatically escaping semicolons in any URI component. If a developer is interacting with an idiosyncratic server implementation, it seems to make the most sense to fix the server, or write an interceptor to "sanitize" URLs before they are sent to the server.

HOWEVER

@bobber205's issue is a very specific and isolated use case: Putting semicolons inside of a single query parameter value. In this exact situation (and only this situation), I actually think that it is appropriate to escape semicolons, as server implementations might misinterpret an unescaped semicolon as a delimiter.

In summary:

  • path: Don't escape semicolons. Breaks matrix parameters, and no evidence that Angular's current behavior is problematic.
  • query: Escaping the entire query string would be unwise, as it breaks an established (albeit archaic) expectation for servers to be able to handle semicolon-delimited query strings.
  • query values: Probably safe to escape, especially if the server interprets semicolons as query delimiters. Passing complex data types (ie. JSON) via query parameters is a bad idea, but "real-world" string values often contain semicolons, and it seems like a good idea to escape them.

pkozlowski-opensource added a commit to pkozlowski-opensource/angular.js that referenced this issue Mar 4, 2015

pkozlowski-opensource added a commit to pkozlowski-opensource/angular.js that referenced this issue Mar 4, 2015

pkozlowski-opensource added a commit to pkozlowski-opensource/angular.js that referenced this issue Apr 1, 2015

pkozlowski-opensource added a commit to pkozlowski-opensource/angular.js that referenced this issue Apr 2, 2015

@bobber205

This comment has been minimized.

Show comment
Hide comment
@bobber205

bobber205 Apr 2, 2015

Thanks for all the hardwork! ❤️

Thanks for all the hardwork! ❤️

netman92 added a commit to netman92/angular.js that referenced this issue Aug 8, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment