Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include/Exclude Filtering Behavior #4491

Closed
RobCherry opened this issue Dec 17, 2013 · 8 comments
Closed

Include/Exclude Filtering Behavior #4491

RobCherry opened this issue Dec 17, 2013 · 8 comments
Assignees

Comments

@RobCherry
Copy link
Contributor

Currently there is a bug in elasticsearch (#4047) where empty objects are not stored in _source when an include/exclude list is present.

This is because elasticsearch aggressively removes empty objects from the _source.

For example, if I have an object

{ 'data': { 'key': 'value' } }

the following filters will all result in removing data from _source: excludes = ['data'], excludes = ['data.*'], excludes = ['data.key'], excludes = ['*.key'], includes = ['data.other']

I believe this behavior is incorrect. I think that we should only remove an object if it is explicitly excluded (excludes = ['data'], excludes = ['data.*']) or if no elements are included (includes = ['other_data.*']). For situations where the object is referenced in an includes list but there is no match, I think the object should remain as an empty object (includes = ['data.other']`).

This use case makes more sense if we are talking about some nested object that is indexed...

Example:

{
  "name": "John Doe",
  "identifiers": {
    "ssn": "987-65-4320",
    "facebook_uid": "12345",
    "twitter_uid": "54321"
  }

and excludes = [ "*.ssn"] would drop the entire identifiers object if the only key was ssn for that object, even if we want the empty identifiers object to remain under all circumstances. We have the same problem with includes = ["*.instagram_uid"].

@bleskes
Copy link
Contributor

bleskes commented Dec 18, 2013

Thanks for opening it. One quick comment about:

We have the same problem with includes = ["*.instagram_uid"].

This is tricky - in your example it makes sense but the implications are that we'd have to leave all other objects intact - which will potentially result in a lot of empty objects (trees). At least for this case I'd use include [ "identifiers", "*.instagram_uid" .

@ghost ghost assigned bleskes Dec 18, 2013
@RobCherry
Copy link
Contributor Author

Fair enough. Using includes = [ "identifiers", "*.instagram_uid"] would be correct in that case.

bleskes pushed a commit to bleskes/elasticsearch that referenced this issue Jan 14, 2014
When excluding '*.f1' from `{ "obj": { "f1": 1, "f2": 2 } }` XContentMapValues.filter returns `{ "obj": { "f2": 2}}`. When run on `{ "obj": { "f1" : 1 }}` we should return `{ "obj": { }}` to maintain object structure. People currently need to always check whether `obj` is there or not.

Closes elastic#4715
Closes elastic#4047
Related to elastic#4491
bleskes pushed a commit that referenced this issue Jan 14, 2014
When excluding '*.f1' from `{ "obj": { "f1": 1, "f2": 2 } }` XContentMapValues.filter returns `{ "obj": { "f2": 2}}`. When run on `{ "obj": { "f1" : 1 }}` we should return `{ "obj": { }}` to maintain object structure. People currently need to always check whether `obj` is there or not.

Closes #4715
Closes #4047
Related to #4491
@bleskes
Copy link
Contributor

bleskes commented Jan 14, 2014

Hi Rob,

I've implemented a solution that is not exactly what you ask for but I feel it's simpler to reason & understand how it works. Effectively it makes excludes maintain document structure above the point of exclusion. You can see it here: #4715

I used part of the tests you wrote as a basis so I included your name in the commit to give you credit for the work.

Can you please check if you still feel this case is needed and if not close it?

Thx,
Boaz

RobCherry added a commit to RobCherry/elasticsearch that referenced this issue Jan 14, 2014
@RobCherry
Copy link
Contributor Author

Ignore that commit for now, it is not actually testing what I wanted to test. I will update it tomorrow.

@RobCherry
Copy link
Contributor Author

@bleskes

What you're describing as the proposed change

maintain document structure above the point of exclusion

seems like a great solution.

I was originally going to add some additional tests to ensure that document structure is maintained both with and without includes, but then I noticed that testNotOmittingObjectWithNestedExcludedObject was covering most of what I wanted.

RobCherry added a commit to RobCherry/elasticsearch that referenced this issue Jan 15, 2014
@bleskes
Copy link
Contributor

bleskes commented Jan 16, 2014

@RobCherry happy you like it. Can we close this issue then?

@RobCherry
Copy link
Contributor Author

Yes

@bleskes
Copy link
Contributor

bleskes commented Jan 16, 2014

cool. Once again, thx for your help?.

brusic pushed a commit to brusic/elasticsearch that referenced this issue Jan 19, 2014
When excluding '*.f1' from `{ "obj": { "f1": 1, "f2": 2 } }` XContentMapValues.filter returns `{ "obj": { "f2": 2}}`. When run on `{ "obj": { "f1" : 1 }}` we should return `{ "obj": { }}` to maintain object structure. People currently need to always check whether `obj` is there or not.

Closes elastic#4715
Closes elastic#4047
Related to elastic#4491
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants