You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In various places related to expression evaluation, we say that if the collection attribute is true, then the context item is undefined and all of the documents that appear on the port providing the context item are available in the default collection.
What we don't say is that if two or more documents that appear on that port have the same document-uri(), only one of them can appear in the collection. And we have tests in the test suite that rely on violating this constraint. The constraint isn't ours, it's in XPath:
For every document node D that is in the target of a mapping in available collections, or that is the root of a tree containing such a node, the document-uri property of D must either be absent, or must be a URI U such that available documents contains a mapping from U to D.
That's not the clearest prose in the world, but I think it establishes that there is a single mapping from U to D so you can't have more than one document with the same document-uri() (or, consequently the same document more than once).
I expect we should clarify this in an errata.
The question is, should it be an error to attempt to construct a collection that contains two documents with the same document-uri() or should we say that the implementation must avoid this by including only one such document. I think the latter is better, since the user may have no obvious way to fix the error. But it does potentially change the behavior of existing pipelines.
A "filter out duplicates" step might be something to add.
The text was updated successfully, but these errors were encountered:
ndw
changed the title
Collections cannot contain duplicates
Collections must not contain duplicates
Apr 29, 2024
All hope is not lost. Mike observes that the definition of fn:collection says explicitly:
There is no requirement that any nodes in the result should be in document order, nor is there a requirement that the result should contain no duplicates.
So maybe my reading of the definition of "available collections" was too narrow.
In various places related to expression evaluation, we say that if the
collection
attribute is true, then the context item is undefined and all of the documents that appear on the port providing the context item are available in the default collection.What we don't say is that if two or more documents that appear on that port have the same
document-uri()
, only one of them can appear in the collection. And we have tests in the test suite that rely on violating this constraint. The constraint isn't ours, it's in XPath:That's not the clearest prose in the world, but I think it establishes that there is a single mapping from
U
toD
so you can't have more than one document with the samedocument-uri()
(or, consequently the same document more than once).I expect we should clarify this in an errata.
The question is, should it be an error to attempt to construct a collection that contains two documents with the same
document-uri()
or should we say that the implementation must avoid this by including only one such document. I think the latter is better, since the user may have no obvious way to fix the error. But it does potentially change the behavior of existing pipelines.A "filter out duplicates" step might be something to add.
The text was updated successfully, but these errors were encountered: