Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UriBuilder.fragment always encodes the complete fragment, but must only encode parameter values #5269

Closed
mkarg opened this issue Feb 27, 2023 · 10 comments · Fixed by #5379
Closed

Comments

@mkarg
Copy link
Member

mkarg commented Feb 27, 2023

According to Wikipedia, the following is a valid URI: http://www.example.org/foo.xml#xpointer(//Rube)

(I understand that Wikipedia is non-normative, but I assume we agree that it is correct in this particular case.)

In particular this example proofs that the fragment my contain unencoded slashes, just as the path.

When trying to produce this exact URI using UriBuilder, then the result depends of the way how the URI builder is used:

    final URI uri = new URI("http://www.example.org/foo.xml#xpointer(//Rube)").normalize();
    System.out.println(uri); // prints "http://www.example.org/foo.xml#xpointer(//Rube)"
    System.out.println(UriBuilder.fromUri(uri).build()); // prints "http://www.example.org/foo.xml#xpointer(//Rube)"
    System.out.println(UriBuilder.fromUri(uri).fragment("xpointer(//{type})").build("Rube")); // prints "http://www.example.org/foo.xml#xpointer%28%2F%2FRube%29"

If the UriBuilder is initialized from a URI, then it keeps the correct syntax.

If the UriBuilder is overwriting the fragment part, then it not only encodes the parameter values (which is correct), but in fact also encodes the static parts of the contains (which is incorrect).

Apparently there is no way to create the correct syntax (forward slash instead of %2F) using the .fragment() method!

@jansupol
Copy link
Contributor

jansupol commented Mar 2, 2023

Trying to decode http://www.example.org/foo.xml#xpointer%28%2F%2FRube%29, I get http://www.example.org/foo.xml#xpointer(//Rube). Both URIs are equivalent, one percent-encoded and one not.

Why do you think the behaviour is incorrect?

@mkarg
Copy link
Member Author

mkarg commented Mar 3, 2023

Because not everything right of the hash character is data. According to the URI spec, the same syntax rules do apply on the right just as on the left. It is incorrect to "arbitrarily" encode URIs. This is used by several specifications and applications. For example: Gitpod. If my JAX-RS server wants to return a URI pointing to my-URI within GitPod, then GitPod expects to find a URI as the fragment part: https://gitpod.io#my-uri. It is a difference if my-uri contains a slash in the syntax (not encoded), or a slash in data (encoded). A browser will not automatically assume that it must decode my-uri before it can call it, and always decoding fragments would be just wrong (in the Gitpod example, unencoding would produce invalid my-uri, as data MUST stay encoded). Neither will an application know. It is essential for the browser / application to know, if it must decode the URI. Due to that, no decoding happens. Due to that, it is essential to have control over encoding, i. e. that UriBuilder only encodes data, not templates.

So the correct implementation is that only data is encoded, but not the template.

@jansupol
Copy link
Contributor

jansupol commented May 10, 2023

The HTML 5 spec says:

  1. Let fragment be document's URL's fragment.

....

  1. Let fragmentBytes be the result of percent-decoding fragment.

So fragment should be percent-encoded.

This seems to conflict with RFC7303, which refers to XPointerFramework which does not percent-encode for certain media types (XMLs).

If I understand that well, the UriBuilder would need to know what media type the encoding is for, or a boolean argument to encode, or not to encode.

@mkarg
Copy link
Member Author

mkarg commented May 12, 2023

Yes and no. What I take from this conflicting definitions is: It is up to the application's current use case, whether the fragment is to be encoded or not. JAX-RS runtime should need to know the difference between HTML and XPointer use case, but should be neutral (not support "both" cases, but actually not supporting "any" particular case).

As a result, I repeat by claim: The application must have sole control if or if not to percent-encode the fragment.

This is easily to achive if you follow my original proposal at the top of this issue: Always encode the parameters of .build(), but never encode the parameter of .fragment(). As a result, HTML5 and XPointer is working by full discretion of the calling application:

  • HTML5: .fragment("{e}").build("string-to-be-encoded");
  • XPath: .fragment("string-to-keep-unencoded/{e}").build("string-to-be-encoded");

@jansupol
Copy link
Contributor

jansupol commented May 12, 2023

This looks hacky. Why fragment should not encode and build should and not otherwise? Why for XML the user cannot use fragment({e}).build("someFancyMethod()")?

The RFC 6570 seems more of a solution here; while fragment("{e}").build("Hello World!") encodes, fragment("{+e}").build("Hello World!") would not.

@mkarg
Copy link
Member Author

mkarg commented May 15, 2023

JAX-RS 3.1's JavaDocs of URIBuilder (which are normative) unambiguously mandates the following:

Builder methods perform contextual encoding of characters not permitted in the corresponding URI component following ... RFC 3986 .... Note that only characters not permitted in a particular component are subject to encoding.

RFC 3986 explicitly allows slashes to be part of the fragment component.

As a consequence, this proofs that Jersey has a bug in the following case, as Jersey does encode characters permitted in fragments according to RFC 3986, but MUST NOT encode them: System.out.println(UriBuilder.fromUri(uri).fragment("xpointer(//{type})").build("Rube")); // prints "http://www.example.org/foo.xml#xpointer%28%2F%2FRube%29"

Regarding "hacky": The proposal was an attempt to let users of JAX-RS 3.1 unambiguously decide which is to be encoded and which is not, without the need to change the JAX-RS specification. It might not be perfect, but it is much less "hacky" than the corrently inconsequent outcome of Jersey, which sometimes does encode and sometimes does not:

    final URI uri = new URI("http://www.example.org/foo.xml#xpointer(//Rube)").normalize();
    System.out.println(uri); // prints "http://www.example.org/foo.xml#xpointer(//Rube)"
    System.out.println(UriBuilder.fromUri(uri).build()); // prints "http://www.example.org/foo.xml#xpointer(//Rube)"
    System.out.println(UriBuilder.fromUri(uri).fragment("xpointer(//{type})").build("Rube")); // prints "http://www.example.org/foo.xml#xpointer%28%2F%2FRube%29"

So I would propose to separate this issue into two issues:

  • This one for Jersey, dealing with the bug that Jersey treats Slashes as forbidden characters in fragment, while actually they are explicitly allowed by RFC 3986.
  • A new one for Jakarta REST 4, proposing to respect RFC 6570 for the URI template syntax in all places templates are used in JAX-RS.

@jansupol
Copy link
Contributor

jansupol commented May 15, 2023

RFC 3986:

  • Section 1.2.3. Hierarchical Identifiers:

    The generic syntax uses the slash ("/"), question mark ("?"), and
    number sign ("#") characters to delimit components

  • Section 2.1. Percent-Encoding

    A percent-encoding mechanism is used to represent a data octet in a
    component when that octet's corresponding character is outside the
    allowed set or is being used as a delimiter of, or within, the
    component

  • Section 3.5. Fragment

    fragment = *( pchar / "/" / "?" )

From this, I read that a slash is a delimiter and should be percent-encoded.


The support for RFC 6570 does not seem to bring any backward incompatibility (unlike stopping the encoding for fragment) and can be introduced in any version of Jersey as a new feature.

@mkarg
Copy link
Member Author

mkarg commented May 15, 2023

IMHO you incorrectly assume that. Section 3.5 tells the explicitly allowed characters for fragment, which says that it allows any number of pchars, slashes and questionmarks. That is done to override the limitations given by 1.2.3.

@mkarg
Copy link
Member Author

mkarg commented Jun 11, 2023

I have proposed support for RFC 6570 for Jakarta REST 4.0. Thank you for this great idea! :-)

Nevertheless, RFC 6570 explicitly supports my claim that you misunderstood section 3.5 of RFC 3986 (hence that Jersey does it wrong currently), as it cleary proofs in section 3.2.4 "Fragment Expansion: {#var}" that in slashes in fragments MUST NOT get encoded:

{#path:6}/here     #/foo/b/here

@jansupol
Copy link
Contributor

Yes, we would need to revisit UriBuilder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants