New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in decoding of "+" in xmldb URI-related functions #1824
Comments
I'd say both cases should feature in the docs, as such can you add a label, to make it easier for me to keep track off. |
So the bug is that we are using Java's There is a good explanation of how the Path and Query parts of a URI have different encoding/decoding requirements here: https://stackoverflow.com/questions/1634271/url-encoding-the-space-character-or-20#answer-29948396 Coincidentally I was just fixing a very similar issue in IzPack! |
@joewiz If you would like to contribute some XQuery tests for |
@duncdrum Done. @adamretter Very promising theory! I had been wondering about I actually started work on an xqsuite module for this, but for some reason I was unable to execute the tests (#1827). Here's the test as I left it when I encountered the problems running any xqsuite test: xquery version "3.1";
module namespace ru="http://exist-db.org/test/resource-uris";
declare namespace test="http://exist-db.org/xquery/xqsuite";
declare
%test:args("A+B")
%test:assertEquals("A+B")
function ru:decode-plus-character($string as xs:string) {
xmldb:decode($string)
};
declare
%test:args("A+B")
%test:assertEquals("A+B")
function ru:decode-uri-plus-character($uri as xs:anyURI) {
xmldb:decode-uri($uri)
}; |
@adamretter It looks like @shabanovd applied a similar fix to the Java admin client in #1344. I notice that in places like https://github.com/eXist-db/exist/pull/1344/files#diff-14cc12330c6d6445bb390d1229576bb1L1426 he stripped off |
@joewiz As eXide 2.4.5 was released recently, I could test it: When I use eXide to create a resource/collection with a + sign in its name, that works, i.e. collection browser shows it too. From eXide I can delete it, but not browse it. Looking again, eXide sends all data as query strings in http get requests. The difference between creating/deleting and browsing ought to be serverside. My guess. |
Decoding query strings is not idempotent : %2b → + → space; Very likely this should leave "+" as is? |
@hungerburg Thank you, but for eXide-specific comments/observations could you please add to eXist-db/eXide#185 (or always feel free to start a new issue if that one doesn't cover it)? This issue is strongest if it stays focused on the problem with the eXist core. |
I hope this comment doesn't distract from the original issue's focus on the 1. Imagine that we are building an application that allows users to supply the name of a new resource or collection (say, in a web-based collection browser). To implement this we naturally look to the xmldb module, whose functions typically take a "collection URI" which "can be specified either as a simple collection path or an XMLDB URI". We wonder, does it make a difference whether we pass the name supplied by the user directly to xmldb functions for creating the resource/collection, or should we first pass this input through a URI-encoding function like xmldb:encode-uri? Is there a best practice or fool-proof method for taking a name and creating a resource with this name? 1a. Let's say the user wants to create a collection, "A加B" whose name happens to contain a Chinese character. If you create this collection by passing this "raw input" to the function, via 1b. Let's say the user wants to create a collection, "A+B", "A=B", or "A@B". Unlike case 1a, xmldb:create-collection creates different resources depending on whether you give it raw or encoded strings. Without pre-encoding the names, the function creates collections with the raw, original values. But if you pre-encode the names, the result is a second set of resources - "/db/A%2BB", "/db/A%3DB", or "/db/A%40B", respectively. 1c. Let's say the user wants to create a collection, "My Project 2018" (note the space characters). Unlike case 1a and 1b, xmldb:create-collection raises an error when you pass it a value that contains a space. The only way to create this collection is to pre-encode the name. 1d. Let's say the user wants to create a collection, "A/B" (note the slash). Unlike cases 1a, 1b, and 1c, xmldb:create-collection creates actually creates two collections! It creates collection "A" and a sub-collection "B". Now, slashes in resource names are generally a bad idea, but eXist allows them through without raising an error, whether you pre-encode the collection name or not. In conclusion, these cases suggest that eXist sometimes silently performs URI-encoding, sometimes doesn't, and may produce identical or different results when you pre-encode a collection name depending on the characters used, and may raise different errors or produce unexpected outcomes if you don't pre-encode. These differences make it difficult to create applications that handle characters consistently, and thus cause users confusion when they encounter inconsistency or errors caused by this design. Lastly, an observation: of all of eXist's interfaces, I am surprised to say that WebDAV is by far the best. Its handling of user-provided resource and collection names is consistent and predictable. WebDAV clients let users create resources using the characters they can type, and see and interact the resources as they created them. Behind my probing here is my desire to bring this level of consistency to eXide. I'd also like to see this consistency across all these interfaces, so users who upload a document via WebDAV can find it and rename it with the Java admin client and edit it with eXide - with no 404s or mangled characters. |
the webdav extension fully relies on the encoding/decoding feature of the 3rd party library |
I've just been re-reading @shabanovd's comments about this issue of resource names in past threads. What he says suggest that perhaps we shouldn't be looking one character at a time (like
So if eXist already stores paths as UTF-8 strings, why are we still encoding and decoding just to address resources within eXist? We should just be able to use any UTF-8 character that is compatible with http://tools.ietf.org/html/rfc3986, right?
So what does everyone say, shall we fix this, and end the end-user headaches for once and for all? Similarly, @hungerburg advocates for a wholesale review of this subsystem:
If it's ok, I'll add this as to our agenda for the next community call. It would be great to have some discussion and settle on a path to a solution. |
From the notes of the April 23, 2018 eXist-db Community Call:
|
See now also the thread on exist-open at https://exist-open.markmail.org/thread/3wnkr5amss7r456g. Tested with eXist-db Version 5.3.0-SNAPSHOT under MacOS Catalina. Details:
|
See also #3795 |
What is the problem
The
+
character is a reserved character according to RFC3986, and as such should be "protected from normalization." eXist's xmldb module's functions that deal with database URIs incorrectly normalize this character, treating it as if it were a percent-encoded octet, i.e., treating the+
character in an xmldb URI as a character that should be decoded and turned into a space character.This causes a problem in XQuery applications like eXide where the application must be able to reliably display the URI-decoded form of a resource (its "name"), while also reliably addressing its URI (as a "key") for resource operations like delete and rename.
For example, given a resource in the database created via
xmldb:store("/db", "A+B.xml", <foo/>)
, we can successfully address this document viadoc("/db/A+B.xml")
, butxmldb:decode("A+B.xml")
returnsA B.xml
instead ofA+B.xml
.While this issue is about handling of the
+
character, it is also about a larger issue: users have reported unpredictable behavior involving characters in resource names, and would benefit from a clear and straightforward description of eXist's xmldb URI scheme, explaining how characters in resource names are treated, and placed in the eXist documentation. This statement should cite any relevant specs/standards for URI encoding that eXist adheres to, and where (if at all) eXist diverges from these standards.Clarity about eXist's xmldb URI scheme would also let us achieve greater consistency in eXist's various interfaces and built-in apps/tools. For example, while using WebDAV clients (e.g., oXygen and Transmit) to create collections like
tést
or你好
causes the resulting collection list to displaytést
and你好
, doing the same in the Java admin client causes the collection list to displayt%C3%A9st
and%E4%BD%A0%E5%A5%BD
. This suggests that the Java admin client is performing the correct encoding of the resource name into a URI when it is stored/created, but it is failing to decode the resource URI back into its name when displaying it.But for now, let's focus on the treatment of
+
in eXist's xmldb URI-related functions.What did you expect
According to the function documentation for
xmldb:decode
andxmldb:decode-uri
(see http://exist-db.org/exist/apps/fundocs/view.html?uri=http://exist-db.org/xquery/xmldb#decode.1), these functions do the following to the supplied input:Since
A+B
contains no percent-encoded octets, calling these decode functions should return the original value,A+B
. Instead, eXist returnsA B
.Let's probe the more restrictive function, xmldb:decode-uri(), which takes an xs:anyURI. XQuery's definition of xs:anyURI can be found at https://www.w3.org/TR/xpath-datamodel-31/#namespace-names, which references https://www.w3.org/TR/xmlschema-2/#anyURI and https://www.w3.org/TR/xmlschema11-2/#anyURI; the former cites https://tools.ietf.org/html/rfc3986#section-2.2, which defines
+
as a reserved character:I take this to mean that
+
should not be encoded or decoded; it should be protected from normalization.Describe how to reproduce or add a test
1. Simple test:
The following query should return
true()
x2, but instead returnsfalse()
x2.2. Complex test:
eXist-db-4.1.0-SNAPSHOT+201804130403.dmg
/db/test
and upload the file to eXist via WebDAVxmldb:get-child-resources("/db/test")
on the collection containing the file will return the following result:"eXist-db-4.1.0-SNAPSHOT+201804130403.dmg"
xmldb:get-child-resources("/db/test") ! xmldb:decode-uri(.)
will return:"eXist-db-4.1.0-SNAPSHOT 201804130403.dmg"
Context information
The text was updated successfully, but these errors were encountered: