Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collections must not contain duplicates #1094

Open
ndw opened this issue Apr 29, 2024 · 1 comment
Open

Collections must not contain duplicates #1094

ndw opened this issue Apr 29, 2024 · 1 comment

Comments

@ndw
Copy link
Contributor

ndw commented Apr 29, 2024

In various places related to expression evaluation, we say that if the collection attribute is true, then the context item is undefined and all of the documents that appear on the port providing the context item are available in the default collection.

What we don't say is that if two or more documents that appear on that port have the same document-uri(), only one of them can appear in the collection. And we have tests in the test suite that rely on violating this constraint. The constraint isn't ours, it's in XPath:

For every document node D that is in the target of a mapping in available collections, or that is the root of a tree containing such a node, the document-uri property of D must either be absent, or must be a URI U such that available documents contains a mapping from U to D.

That's not the clearest prose in the world, but I think it establishes that there is a single mapping from U to D so you can't have more than one document with the same document-uri() (or, consequently the same document more than once).

I expect we should clarify this in an errata.

The question is, should it be an error to attempt to construct a collection that contains two documents with the same document-uri() or should we say that the implementation must avoid this by including only one such document. I think the latter is better, since the user may have no obvious way to fix the error. But it does potentially change the behavior of existing pipelines.

A "filter out duplicates" step might be something to add.

@ndw ndw changed the title Collections cannot contain duplicates Collections must not contain duplicates Apr 29, 2024
@ndw
Copy link
Contributor Author

ndw commented Apr 30, 2024

All hope is not lost. Mike observes that the definition of fn:collection says explicitly:

There is no requirement that any nodes in the result should be in document order, nor is there a requirement that the result should contain no duplicates.

So maybe my reading of the definition of "available collections" was too narrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant