New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reduce view row collation with unicode equivalent keys #3783
Conversation
a1b7385
to
4da057a
Compare
@jcoglan what do you think of this PR? |
What behaviour does this produce for the examples in #3773? In that issue we have two stored view rows with keys that have different bytes, representing different codepoints, but which can be viewed equal under Unicode normalisation.
When these keys are retrieved from the same shard they're considered equal when reducing, but not when fetched from different shards. This meant that:
What behaviour does this patch give for these scenarios? Would it be possible to write tests for them? |
For clarity I'm referring to the queries performed by these requests:
|
Seems to me that we either want:
And we want these results to be independent of |
It looks like #3773 (comment) is consistent with the first case I mention above. Do we mind that the results contain unnormalised keys? In @nickva's example:
The first result has |
CouchDB currently doesn't normalize json keys in the views, neither when updating the view or the start/end keys or key dicts when querying. Perhaps we should do it, but I think that's a larger decision to be made, as it would involve compatibility with existent views. CouchDB relies on unicode comparisons only ( |
@nickva I think you're right, normalising keys in output could be a significant behaviour change so something that requires more thought and planning, not something to roll into this fix 👍 |
Previously, view reduce collation with keys relied on the keys in the rows returned from the view shards to exactly match (=:=) the keys specified in the args. However, in the case when there are multiple rows which compare equal with the unicode collator, that may not always be the case. In that case when the rows are fetched from the row dict by key, they should be matched using the same collation algorithm as the one used on the view shards.
f5dedf7
to
8b08449
Compare
Previously, view reduce collation with keys relied on the keys in the rows returned from the view shards to exactly match (=:=) the keys specified in the args. However, in the case when there are multiple rows which compare equal with the unicode collator, that may not always be the case.
In that case when the rows are fetched from the row dict by key, they should be matched using the same collation algorithm as the one used on the view shards.