Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw parsing exception if terms filter or query has more than one field #5137

Merged
merged 1 commit into from Oct 12, 2014

Conversation

imotov
Copy link
Contributor

@imotov imotov commented Feb 16, 2014

Closes #5014

@jpountz
Copy link
Contributor

jpountz commented Feb 17, 2014

The TermsFilter constructor that we use indeed only supports filtering on a single field but there is another constructor that takes a list of terms (instead of a list of BytesRefs) and supports filtering based on terms that come from different fields. Maybe we could use it?

@s1monw
Copy link
Contributor

s1monw commented Feb 18, 2014

+1 to fixing the terms filter parser to allow different fields!

@kimchy
Copy link
Member

kimchy commented Feb 18, 2014

Is it really a common use case, that justifies complicating the terms filter? compared to having a terms filter that is then OR'ed using the regular query DSL?

@s1monw
Copy link
Contributor

s1monw commented Feb 18, 2014

I don't think it complicates it at all. to me it just does what you would expect.

@kimchy
Copy link
Member

kimchy commented Feb 18, 2014

maybe we can see an example of how this would look DSL wise, and decide based on that. No problem with adding it if it makes sense.

@s1monw
Copy link
Contributor

s1monw commented Feb 18, 2014

to me it woudl just look like this:

{
    "constant_score" : {
        "filter" : {
            "terms" : { 
                   "user" : ["kimchy", "elasticsearch"],
                   "company" : ["elasticsearch"]
           }
        }
    }
}

@kimchy
Copy link
Member

kimchy commented Feb 18, 2014

That can work, though it will be the first filter that works across fields in ES, and I personally find the lack of boolean logic between fields strange (not very evident if its AND or OR). Would love to hear other people thoughts.

@dadoonet
Copy link
Member

Naive answer: writing filter like this makes me think that we want that both Terms filters to match.
So to me, it's a AND here.

@imotov
Copy link
Contributor Author

imotov commented Feb 19, 2014

We are handling boolean logic to some degree through execution mode. So, perhaps, we can extend it to the case of multiple fields. It might become messy though.

@s1monw
Copy link
Contributor

s1monw commented Feb 19, 2014

so to me the confusion is in the name. It's called terms_filter and a term is a (field, value) tuple. I don't see how multiple fields make anything more complex or messy. You have N terms and you wanna fitler on them nothing changes if you have multiple fields they are still N terms? Exec mode is still the same you have a set of terms that's it.

@imotov
Copy link
Contributor Author

imotov commented Feb 20, 2014

@s1monw it becomes more complicated because before it was one dimensional (multiple values). Now it's two dimensional: (multiple fields and multiple values). Your simplification (which is natural for you since it follows Lucene design) is only one possible way to interpret the query. It also doesn't fit the current query syntax nicely since you don't specify this query as:

{
    "constant_score" : {
        "filter" : {
            "terms" : [{
                "field": "user",
                "value": "kimchy"
            }, {
                "field": "user",
                "value": "elasticsearch"
            }, {
                "field": "company",
                "value": "elasticsearch"
            }]
        }
    }
}

@s1monw
Copy link
Contributor

s1monw commented Feb 20, 2014

fair enough - I thought my example was simple and consistent:

{
    "constant_score" : {
        "filter" : {
            "terms" : { 
                   "user" : ["kimchy", "elasticsearch"],
                   "company" : ["elasticsearch"]
           }
        }
    }
}

but apparently it isn't... I don't have strong feelings but it feels to me it's the wrong thing todo to throw an exception....

@lordaugustus
Copy link

A wildcard form like this
"tag.*" : ["sushi", "booyah"]
would be useful for people storing inner fields that are not predefined, like
"tag.en", "tag.de".
Maybe it is not a good schema design for multilingual data? I am not sure, but the proposed change would definitely be helpful in this case.

@clintongormley
Copy link

I'm torn on this. This is a frequent mistake that users make; they clearly expect something like this to work.

I agree completely with @kimchy that it is not obvious whether the logic is AND or OR. And I agree with @dadoonet that the most likely interpretation is as an AND, so:

{
    "constant_score" : {
        "filter" : {
            "terms" : { 
                   "user" : ["kimchy", "elasticsearch"],
                   "company" : ["elasticsearch"]
           }
        }
    }
}

... would be interpreted as:

user IN ["kimchy","elasticsearch"] AND company = 'elasticsearch'

But then there is no easy way to specify that the AND should be an OR. In the example of wildcarded field names, the user would expect an OR:

{ "terms": { "user.*": ["kimchy","elasticsearch"] }}

In fact, looking at the current terms query & filter, there is already room for parameter/fieldname clashes, eg:

{ "terms": {
    "execution": "bool",
    "field": ["values"..."]
}}

So you can't have a field called execution, _cache, _cache_key, _name or minimum_match. I'm OK with disallowing fields starting with _, but I'm not crazy about having execution and minimum_match at the same level as fieldname. In other clauses we handle this with, eg:

{ "regexp": {
    "field": {
        "value": "xx",
        "flags": "..."
}}

We could do:

{ "terms": {
    "fieldname": {
        "execution": "bool",
        "values": ["foo","bar"]
    },
    "other_fieldname": ["one","two"]
}}

But that still doesn't give us place to include the overall AND/OR logic for multiple fields, unless we add a top-level _operator flag. Of course, elsewhere in the DSL that flag is called operator, which makes this inconsistent.

One way to do it would be to have different filter names, eg

{ "any": { "user.*": ["kimchy", "elasticsearch"]}}
{ "all": {
    "user" : ["kimchy", "elasticsearch"],
    "company" : ["elasticsearch"]
}

(We're missing terms in the name, but any_terms is wrong because it is really terms_in_any_field)

And really, that's not very far from:

{ "and": [
    { "terms": { "user": ["kimchy", "elasticsearch" ]}},
    { "term": { "company": "elasticsearch" }}
]}

...but it doesn't handle wildcards in the field name.

So, lots of rumination without any solutions. What I dislike about all of these suggestions is that they complicate the terms filter and require the user to remember things (what's the default operator? etc)

My inclination would be to leave things as they are, except that we throw an exception if more than one field name is specified.

@clintongormley
Copy link

Revisiting this PR: changing the behaviour of the terms filter/query is beyond the scope of the current work, and this PR accurately reflects the situation as it is today. I think we should review and merge.

@GaelTadh
Copy link
Contributor

LGTM but also looks like this is the current behavior. Did this PR already get merged ? If so can it be closed ?

@s1monw
Copy link
Contributor

s1monw commented Oct 10, 2014

I agree with @GaelTadh lets get this in @imotov

@imotov imotov force-pushed the issue-5014-invalid-terms-filter branch from 11ec18d to 28a7d63 Compare October 12, 2014 20:02
@imotov imotov merged commit 28a7d63 into elastic:master Oct 12, 2014
@jpountz jpountz removed the review label Oct 21, 2014
@clintongormley clintongormley changed the title Throw parsing exception if terms filter or query has more than one field Query DSL: Throw parsing exception if terms filter or query has more than one field Nov 3, 2014
@clintongormley clintongormley changed the title Query DSL: Throw parsing exception if terms filter or query has more than one field Throw parsing exception if terms filter or query has more than one field Jun 7, 2015
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
@imotov imotov deleted the issue-5014-invalid-terms-filter branch May 1, 2020 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v1.4.0 v1.5.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filter terms without logical groups
8 participants