GEODE-9004: Fix issues in queries targeting a Map field#6200
GEODE-9004: Fix issues in queries targeting a Map field#6200albertogpz wants to merge 4 commits intoapache:developfrom
Conversation
Queries in which map fields are involved
in the condition, sometimes do not return the same entries
if map indexes are used, compared to when map indexes
are not used.
Diferences were found when the condition on the
map field was of type '!=' and also when the condition
on the map field was comparing the value with null.
Example1:
Running a query with the following condition:
positions['SUN'] = null
in a region with the following entries:
entry1.positions=null
entry2.positions={'a'=>'b'}
entry3.positions={'SUN'=>null}
- will return entry1 and entry3 if there are no
indexes defined.
- will return no entries if the following index is defined:
positions['SUN','c']
- will return entry3 if the following index is defined:
positions[*]
Example2:
Running a query with the following condition:
positions['SUN'] != '3'
in a region with the following entries:
entry1.positions=null
entry2.positions={'a'=>'b'}
entry3.positions={'SUN'=>'4'}
entry4.positions={'SUN'=>'3'}
entry5.positions={'SUN'=>null}
- will return all entries except for entry4 if:
there are no indexes defined.
- will return entry3 if the following index is defined:
positions['SUN','c']
- will return entry3 and entry5 if the following index is defined:
positions[*]
In order to have the same results for these
queries no matter if indexes are used,
the following changes have been made:
- When running queries, the result of getting the
value from a Map that does not contain the key used
to access the value will be
considered UNDEFINED instead of null.
This will imply that queries where the condition is
positions['SUN'] = null will not return entries
that do not have the 'SUN' key in the positions Map.
- Queries with map indexes will not be able to use
the indexes if the query is of type '!=' against
any of the indexed map fields.
Example:
for index positions['SUN', 'ERIC'] or
positions[*]
the following query will not use indexes:
positions['SUN'] != '44'
These changes must be documented accordingly
in the corresponding indexes section.
|
The PR description says that "These changes must be documented accordingly in the corresponding indexes section." Is there a GEODE ticket for this docs work? If not, one should be created, or the docs changes included as part of this PR. |
I agree, Donal. My plan is to update the documentation as part of this PR if the solution is accepted. Given that the outcome of some queries without indexes change and that indexes will not be used with queries using "!=" with Map indexes, I want to have first the blessing from the community and only then update the documentation. |
|
I'm not a query expert, so I don't understand what the reasoning is behind map indexes no longer being used for queries using |
With the current implementation, map indexes are used (if defined) in queries that have a condition using "!=" against a value in the map field indexed. The problem is that these queries offer different outcome if the index is defined vs when the index is not defined. |
DonalEvans
left a comment
There was a problem hiding this comment.
A few requests for code clean-up/clarifying comments. The content of the changes looks okay to me, but the conceptual changes need someone with a better understanding of the expected behaviour of queries and indexes.
| p6.positions.put("ERIC", "hey"); | ||
| region.put(6, p6); | ||
|
|
||
| // One more with null map |
There was a problem hiding this comment.
Is it necessary to have two region entries with null maps? p2 also has the same null positions field.
There was a problem hiding this comment.
Not really. I have changed a bit some of the entries to cover more cases and removed having two region entries with null maps.
| assertThat(keys).isEqualTo(4); | ||
| assertThat(mapIndexKeys).isEqualTo(0); | ||
| assertThat(values).isEqualTo(7); | ||
| assertThat(uses).isEqualTo(4); |
There was a problem hiding this comment.
It's not immediately obvious where these values are coming from. Woudl it be possible to add a comment explaining why they are what they are?
There was a problem hiding this comment.
I have added comments about these and the other. Nevertheless, there is no documentation on them so I am not 100% sure that they are working correctly. Anyhow, it is not something changed by this PR.
| assertThat(keys).isEqualTo(3); | ||
| assertThat(mapIndexKeys).isEqualTo(1); | ||
| assertThat(values).isEqualTo(3); | ||
| assertThat(uses).isEqualTo(2); |
There was a problem hiding this comment.
See above comment about it being unclear where these values are coming from.
| assertThat(keys).isEqualTo(5); | ||
| assertThat(mapIndexKeys).isEqualTo(4); | ||
| assertThat(values).isEqualTo(5); | ||
| assertThat(uses).isEqualTo(2); |
There was a problem hiding this comment.
See above comment about it being unclear where these values are coming from.
|
|
||
| @Test | ||
| public void testQueriesForValueInMapFieldWithoutIndex() throws Exception { | ||
| region = |
There was a problem hiding this comment.
Multiple warnings in this file can be cleaned up by making region a Region<Object, Object> where it's declared.
ea1b37f to
9e2c7a6
Compare
…exes and added comments in test cases
9e2c7a6 to
1784eac
Compare
| // positions map takes for each key (null not included) | ||
| // for each entry in the region: | ||
| // "something", null, "nothing", "more", "hey", "tip" | ||
| assertThat(keys).isEqualTo(7); |
There was a problem hiding this comment.
This value is now in disagreement with the comment above, as the list of values includes only 6 values. Either the explanation in the comment needs to be changed, or the value being returned here is incorrect and there is a bug somewhere. Also, it is a little confusing in terms of what "null not included" is referring to, since the list of values includes null.
| // The number of values must be equal to the number of values the | ||
| // positions map takes for each key (null not included) | ||
| // for each entry in the region: | ||
| // "something", null, "nothing", "more", "empty", "hey", "more", "tip" | ||
| assertThat(values).isEqualTo(8); |
There was a problem hiding this comment.
This comment is a little confusing in terms of what "null not included" is referring to, since the list of values includes null.
|
|
||
| // Both RangeIndex should be used | ||
| assertEquals(100 /* Execution time */, keyIndexStats.getTotalUses()); | ||
| // Index should not be used |
There was a problem hiding this comment.
Could this comment (and others in this class) be expanded slightly to explain why the index should not be used here? I am assuming it's because the query uses !=, but it would be good to make it explicit.
DonalEvans
left a comment
There was a problem hiding this comment.
Approved with the caveat that someone with deeper knowledge of queries and indexes should sign off on this, as it represents a potentially significant change in behaviour.
|
We had a small impromtu discussion with the query engine team and some research on how NoSQL treats null (or nil in some worlds)
Hence we think that the current behavior i.e. querying without indexes is the right result and we should try to make the indexes return the same result. |
|
@albertogpz Here is what I believe should be happening.
Please let me know if you have any questions on the expected behavior. The overall behavior of the query on map should be in-line with non-map fields. The usage of index should not be avoided; unless the results are inconsistent with non-index query results. Instead of avoiding/blocking the use of index, it will be good to address the issues with indexed queries and make the behavior consistent. Can you confirm above requirements are met in this PR. |
agingade
left a comment
There was a problem hiding this comment.
Can you please respond to my comment. For that reason I am setting this PR as request changes.
|
@agingade
This is what I tried to fix with this PR and the test new cases now show that results are the same with and without indexes while prior to the PR, they returned different results.
So, you mean if positions is not null but it only contains a mapping for the 'SUN" key, positions["ERIC"] should return null instead of UNDEFINED?
ok. That should not have been changed.
Currently if you have an entry for which positions has the following mappings: {"SUN" => null} and another entry for which positions has the following mapping: {"ERICSSON" => "3"} querying for positions["SUN"] = null will return both entries.
I tried to support the use of indexes in all queries with my PR but I found no way to do it with != queries using an index of type: positions[*] (index on all keys) or positions["SUN", "ERICSSON"].
|
@nabarunnag Thanks for your comments. If @agingade agrees (I will check his answers after having provided mine to his comments) I will look into this possibility. |
@nabarunnag and @agingade : I have created a new pull request with a solution in which I keep the current behavior of the queries without indexes: #6279 |
Queries in which map fields are involved
in the condition, sometimes do not return the same entries
if map indexes are used, compared to when map indexes
are not used.
Differences were found when the condition on the
map field was of type '!=' and also when the condition
on the map field was comparing the value with null.
Example1:
Running a query with the following condition:
positions['SUN'] = null
in a region with the following entries:
entry1.positions=null
entry2.positions={'a'=>'b'}
entry3.positions={'SUN'=>null}
indexes defined.
positions['SUN','c']
positions[*]
Example2:
Running a query with the following condition:
positions['SUN'] != '3'
in a region with the following entries:
entry1.positions=null
entry2.positions={'a'=>'b'}
entry3.positions={'SUN'=>'4'}
entry4.positions={'SUN'=>'3'}
entry5.positions={'SUN'=>null}
there are no indexes defined.
positions['SUN','c']
positions[*]
In order to have the same results for these
queries no matter if indexes are used,
the following changes have been made:
When running queries, the result of getting the
value from a Map that does not contain the key used
to access the value will be
considered UNDEFINED instead of null.
This will imply that queries where the condition is
positions['SUN'] = null will not return entries
that do not have the 'SUN' key in the positions Map.
Queries with map indexes will not be able to use
the indexes if the query is of type '!=' against
any of the indexed map fields.
Example:
for index positions['SUN', 'ERIC'] or
positions[*]
the following query will not use indexes:
positions['SUN'] != '44'
These changes must be documented accordingly
in the corresponding indexes section.
Thank you for submitting a contribution to Apache Geode.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?
Has your PR been rebased against the latest commit within the target branch (typically
develop)?Is your initial contribution a single, squashed commit?
Does
gradlew buildrun cleanly?Have you written or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
Note:
Please ensure that once the PR is submitted, check Concourse for build issues and
submit an update to your PR as soon as possible. If you need help, please send an
email to dev@geode.apache.org.