Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepObject parameter style produces invalid URIs #1942

Closed
vvanpo opened this issue Jun 10, 2019 · 4 comments
Closed

deepObject parameter style produces invalid URIs #1942

vvanpo opened this issue Jun 10, 2019 · 4 comments
Assignees
Labels
param serialization Issues related to parameter and/or header serialization
Milestone

Comments

@vvanpo
Copy link

vvanpo commented Jun 10, 2019

According to the RFC,

[ IPv6 host address ] is the only place where square bracket characters are allowed in the URI syntax.

So requiring OpenAPI clients to serialize parameters using square brackets means asking them to violate RFC 3986.

The same is true for the allowReserved field. The only reserved characters that need to be encoded for query strings at all are #, [, and ]. Allowing these to pass through unencoded produces an invalid query string.

@wickedest
Copy link

This article might be relevant. The TL;DR:

So because square brackets are only allowed in the "host" subcomponent, they "should" be percent encoded in other components and subcomponents, and in this case in the "query" component, unless RFC 3986 explicitly allows unencoded square brackets to represent data in the query component, which is does not.

However, if a "URI producing application" fails to do what it "should" do, by leaving square brackets unencoded in the query, then readers of the URI are not to reject the URI outright. Instead, the square brackets are to be considered as belonging to the data of the query component, since they are not used as delimiters in that component.

I'm not an authority on this subject (and maybe this article isn't either), but my interpretation of the article is that the square brackets (from gen-delims) should be percent-encoded, but do not have to be because they have no meaning in the query. I believe ? and # are query delimiters, and that it might mean that these can all be abused in the same way: : / [ ] @.

@karenetheridge
Copy link
Member

karenetheridge commented Sep 25, 2023

so... does that mean that to use style=deepObject (and pipeDelimited too) we have to use allowReserved=true?

Because all the examples in the spec (the table at https://spec.openapis.org/oas/v3.1.0#style-examples) apparently are showing the query (or path) portion of the URI, fully serialized and escaped. and these styles have reserved characters in them. I think that's important enough that it should be mentioned explicitly in the spec.

Alternatively, the examples need to be modified to properly escape these characters, just as spaceEncoded uses %20 instead of space for its delimiter.

@handrews
Copy link
Member

handrews commented May 14, 2024

TL;DR: if you want your URL to be parsed correctly by a strictly RFC3986-compliant parser, percent-encode [, ], and |.

OK, I delved into this more and have come up with what I hope is the correct interpretation of the following passage, the last two paragraphs of RFC 3986 §2.2 "Reserved Characters":

A subset of the reserved characters (gen-delims) is used as
delimiters of the generic URI components described in Section 3. A
component's ABNF syntax rule will not use the reserved or gen-delims
rule names directly; instead, each syntax rule lists the characters
allowed within that component (i.e., not delimiting it), and any of
those characters that are also in the reserved set are "reserved" for
use as subcomponent delimiters within the component. Only the most
common subcomponents are defined by this specification; other
subcomponents may be defined by a URI scheme's specification, or by
the implementation-specific syntax of a URI's dereferencing
algorithm, provided that such subcomponents are delimited by
characters in the reserved set allowed within that component.

URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component. If a reserved character is found in a URI component and
no delimiting role is known for that character, then it must be
interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII.

This pulls in more context than the blog post cited above, which I think makes things more clear:

  • Both RFC3986 and other specifications can define delimiter roles for allowed reserved characters in a given component.
  • In that context, the "no delimiting role is known for that character" clearly refers to allowed reserved characters, which may or may not be functioning as delimitiers

This is why application/x-www-form-urlencoded can define delimiting semantics for = and &, and character escape semantics for +: All three are allowed in query strings, and none have been assigned any semantics in any component by RFC 3986.

On the other hand, the OpenAPI Specification and various web framworks cannot do the same with [ and ], as those two characters, along with #, are the only reserved characters not allowed in query strings.

So what that last paragraph is saying is is along the lines of: If you don't have reason to think that a query string is in application/x-www-form-urlencoded format (see #1502), then you MUST treat =, &, and + as their literal character values. It is not saying that you have to treat illegal characters as their literal values.

That said, it seems like many systems stuff unencoded [ and ] characters in query strings, although in some cases people may just think that because the percent-encoding/decoding is happening out-of-sight. Which is a bit bizarre, because prior to RFC 3986, those characters were in the unsafe set, rather than the reserved set, and always needed to be encoded. This is a pretty good demonstration of why Postel's Law has been falling out of favor- you end up with all of these weird corners where things that violate specs are expected to keep working.

So the upshot is: it might work, and if you're absolutely certain you know all of the parsers involved will handle it, then you can get away with unencoded [ and ]. But we're talking about APIs and not intended-for-human-readability things, so just percent-encode and be safe (and we should absolutely do that in our examples). Because at some point, someone will run that URI through a validating parser that is actually implemented correctly (they seem to be shockingly rare but do exist), and it will cause an error.


The final wrinkle is that people using deepObject (or doing whatever PHP does) might want to leave the delimiter [ and ] un-encoded to avoid conflicts with those characters in the actual data. Which is, of course, the point of using reserved characters as delimiters. But it's still against the spec. I just would not recommend deepObject at all, TBH, and would rather offer more flexibility through supporting #1502 in the next minor release than try to make deepObject usable with strict URI parsers.

But we should clarify it in the patch releases anyway, so I'll throw it on the (increasingly enormous) pile.

@handrews
Copy link
Member

PR merged for 3.0.4 and ported to 3.1.1 via PR #3921!
This is addressed by the new Appendix E.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
param serialization Issues related to parameter and/or header serialization
Projects
None yet
Development

No branches or pull requests

4 participants