Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear when to percent encode and decode URI strings #2245

Open
RichardTea opened this issue Jan 19, 2023 · 3 comments
Open

Unclear when to percent encode and decode URI strings #2245

RichardTea opened this issue Jan 19, 2023 · 3 comments
Assignees
Milestone

Comments

@RichardTea
Copy link

RichardTea commented Jan 19, 2023

Section 2.8 URIs is unclear.
It does not appear to be possible for an implementation to determine whether a URI has been percent-encoded or written as-is.

Reserved characters (as defined by RFC 3986, Section 2.2. and RFC 3987, Section 2.2.) MUST be percent-encoded.

":" / "/" / "?" / "#" / "[" / "]" / "@" / "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

Additionally:

Paths with non-ASCII characters MAY be written as-is, with JSON string escaping, or with percent-encoding; all these options are valid.

RFC 3986 separately states that "%" MUST be encoded as "%25", but it is unclear from the text whether or not this requirement also applies to glTF - "%" is an ASCII character, and is not an RFC 3986/3987 reserved character, so the text indicates that it MAY be written as-is.

If space and percent MAY be written as-is or percent-encoded, then all of the following are permitted:

Case uri raw UTF-8 string file on disk Note
1 big ball.png big ball.png Clear. Nothing is percent encoded
2 big%20ball.png big ball.png Indistinguishable from 3
3 big%20ball.png big%20ball.png Indistinguishable from 2
4 big%ball.png big%ball.png Distinguishable as %ba is not valid UTF-8
5 big%25ball.png big%ball.png Indistinguishable from 6
6 big%25ball.png big%25ball.png Indistinguishable from 5

It is not possible for a reader to distinguish between cases 2 and 3, or between cases 5 and 6.
A compliant reader must try both URIs (perhaps re-encoding for HTTPS)

I recommend two changes:

a) The wording of the first requirement for relative paths should be changed to:

Paths containing reserved characters (as defined by RFC 3986, Section 2.2. and RFC 3987, Section 2.2.) MUST be percent-encoded.

b) The glTF standard should explicitly require paths containing the "%" character be percent-encoded.
This would prohibit cases 3, 4 and 6, and eliminate the confusion.

Sadly I am aware of implementations that export using all six methods - Blender's Khronos glTF Blender I/O v1.5.17 even does this in the same file!

@RichardTea RichardTea changed the title Unclear when to percent encode and decode URI substrings Unclear when to percent encode and decode URI strings Jan 19, 2023
@scurest
Copy link

scurest commented Jan 20, 2023

Space and percent have to be percent-encoded.

RFC 3986 separately states that "%" MUST be encoded as "%25", but it is unclear from the text whether or not this requirement also applies to glTF

glTF URIs are normal URIs, all the normal rules apply.

"%" is an ASCII character, and is not an RFC 3986/3987 reserved character, so the text indicates that it MAY be written as-is

I don't see how it indicates that. The text says reserved characters have to be percent encoded and says non-ASCII characters may be written multiple ways, but neither applies to percent.

Sadly I am aware of implementations that export using all six methods - Blender's Khronos glTF Blender I/O v1.5.17 even does this in the same file!

If you can reproduce a wrongly encoded URI in the current version, you should report a bug.

@javagl
Copy link
Contributor

javagl commented Jan 20, 2023

Some strongly related discussion (with a table that shows possible ambiguities that is very similar to the one on the first post here) can be found around #1449 (comment) .

@RichardTea
Copy link
Author

RichardTea commented Jan 23, 2023

Space and percent have to be percent-encoded.

The trouble is that the published document does not actually say that.

I note that the linked issue it's explicitly noted that "space" can be either encoded or unencoded. #1449 (comment)

I'm fine with a rule along the lines of "if nothing else needs to be percent-encoded then space may be left as-is", but any paths containing the "%" character absolutely must be encoded.

glTF URIs are normal URIs, all the normal rules apply.

I agree that the intent probably was that glTF URIs MUST be either URIs (RFC 3986) or IRIs (RFC 3987).
Unfortunately, the text is ambiguous and a large number of implementations do not encode the % character.

If you can reproduce a wrongly encoded URI in the current version, you should report a bug.

Snippet from a Khronos glTF Blender I/O v3.4.50 export:

"buffers" : [
        {
            "byteLength" : 840,
            "uri" : "example 10%.bin"
        }
    ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants