Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggester: Phrase suggest option to limit suggestions to exising phrases #3482

Closed
radev opened this issue Aug 12, 2013 · 26 comments
Closed

Suggester: Phrase suggest option to limit suggestions to exising phrases #3482

radev opened this issue Aug 12, 2013 · 26 comments
Assignees

Comments

@radev
Copy link

radev commented Aug 12, 2013

When using phrase suggest API to provide "Did you mean ?" corrections it would be nice to include only suggestions that would return results.

So returned phrase must exist at least in one document in the index.

@ghost ghost assigned s1monw Aug 12, 2013
@s1monw
Copy link
Contributor

s1monw commented Aug 12, 2013

Thanks for opening this. do you feel like attaching a pullrequest? I'd be happy to help you sketching out the functionality here!

@nik9000
Copy link
Member

nik9000 commented Aug 12, 2013

So I had a similar problem that came from filters: I was getting phrases that exist in the index but were filtered out on subsequent searches with the same filter set. I've since mostly worked around the problem by splitting my index along the most common filter. So if it isn't that much more work to get it to include filters that'd be great. If OTOH, you do something like only return suggestions that match an n-gram then by all means just do that and ignore me. I'd still use it and it'd probably be faster in the end.

@s1monw
Copy link
Contributor

s1monw commented Aug 12, 2013

folks, I don't think we can filter the process of drawing candidates etc. since performance will suffer badly. What I can imagine is to execute a match query with each suggestoin that is returned to make a decision to drop them or not. Yet, this will allow for filtering for sure but it will only be a helper to prune the result list. if this is ok for you guys I think we can certainly do that!

@s1monw
Copy link
Contributor

s1monw commented Aug 29, 2013

@nik9000 do you wanna take a look at this. I won't be able to do in the near future.

@nik9000
Copy link
Member

nik9000 commented Aug 29, 2013

I can take a look sometime in the next few days, yeah. We just deployed our elasticsearch software to a much larger group of users yesterday so I'm getting a bunch of high priority bugs that aren't (yet) this. What I'll do is file this issue as a medium priority one on my side and pick it up when I've cleared everything higher.

So what'd be most useful for me would be to have an api like this:

curl -XPOST 'localhost:9200/_search' -d {
  "suggest" : {
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
      "phrase" : {
        "field" : "body",
        "size" : 5,
        "shard_size": 10,
        "confidence": 2.0,
        "filter_replace_string": "{}",
        "filter": {
          "bool" : {
                "must" : {
                    "query": { "match_phrase" : { "body" : "{}", "slop": 3 } }
                },
                "must_not" : {
                    "range" : {
                        "age" : { "from" : 10, "to" : 20 }
                    }
                },
                "should" : [
                    {
                        "term" : { "tag" : "sometag" }
                    },
                    {
                        "term" : { "tag" : "sometagtag" }
                    }
                ]
            }
        }
      }
    }
  }
}

I can see how this could be a performance problem but caching should kick in and help with most of the filters. Also, if you set your confidence nice and high you might not have to do this too many time.

And another thing, you'd have to crank up the shard_size value or you might end up filtering out all the suggestions that come from the shards.

@nik9000
Copy link
Member

nik9000 commented Sep 6, 2013

Now that I'm digging into this I like this API better

curl -XPOST 'localhost:9200/_search' -d {
  "suggest" : {
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
      "phrase" : {
        "field" : "body",
        "size" : 5,
        "shard_size": 10,
        "confidence": 2.0,
        "filter": {
          "bool" : {
                "must_not" : {
                    "range" : {
                        "age" : { "from" : 10, "to" : 20 }
                    }
                },
                "should" : [
                    {
                        "term" : { "tag" : "sometag" }
                    },
                    {
                        "term" : { "tag" : "sometagtag" }
                    }
                ]
            }
        }
      }
    }
  }
}```
Internally I'd build a bool filter containing a phrase_match against the field against which we generate suggestions and the filter you passes in.  This is less fiddly to code and allows some simple syntax shortcuts:
```bash
curl -XPOST 'localhost:9200/_search' -d {
  "suggest" : {
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
      "phrase" : {
        "field" : "body",
        "size" : 5,
        "shard_size": 10,
        "confidence": 2.0,
        "filter": "yes"
      }
    }
  }
}```
which would add the phrase_match and
```bash
curl -XPOST 'localhost:9200/twitter/_search?pretty=true' -d '
{
    "query" : {
        "term" : { "message" : "something" }
    },
    "filter" : {
        "term" : { "tag" : "green" }
    }
    "suggest" : {
        "text" : "Xor the Got-Jewel",
        "simple_phrase" : {
            "phrase" : {
                "field" : "body",
                "size" : 5,
                "shard_size": 10,
                "confidence": 2.0,
                "filter": "query"
            }
        }
    }
}'

which would go and get the filters from the top level query.

I'm still concerned about how slow this might be.

@nik9000
Copy link
Member

nik9000 commented Sep 14, 2013

So I finally have something for this that kinda works. It doesn't fully work and I'd like some guidance. I can't post the code right now because I'm traveling. Such is life. Any way, I'd like giluidance on two things:

  1. Since there is no good way to parse the request during the reduce phase I'm parsing what I need on the shards and sending it back. Is that right/normal?
  2. I can make queries during the reduce phase and block, waiting for them to return pretty easy. That seems to work on when I start the server but badly in tests, presumably because they have a lower bound on tr thread pool. Is it OK to make blocking calls in the reduce step? Anything else seems like a pretty invasive change for this feature which doesn't have a ton of traction.

nik9000 added a commit to nik9000/elasticsearch that referenced this issue Nov 4, 2013
This implementation has a bunch of problems that'll need to be worked
before it is a valid candidate for merging.  I don't have time to rebase
it right now but would still love the feedback on problem.  The ones I
remember:

1.  It performs the filtering by blocking the suggesting thread.
2.  Because there is no "exists" query type it uses a limit.  I now know
that isn't ass efficient as just using a count but it might be worth
implementing an exists query type for it any way.
3.  It feels like there are a lot of plumbing changes required for this
feature.  My guess is that is because I'm going about it wrong.  This
correlates with #1 pretty well.
4.  I have to wrap the filter through the map nodes and parse it during
the reduce step.  That feels silly.

Closes elastic#3482
@s1monw
Copy link
Contributor

s1monw commented Nov 13, 2013

moved over to 0.90.8 I will look at this soon as well

@s1monw s1monw added v1.2.0 and removed v1.1.0 labels Mar 20, 2014
@timbunce
Copy link

Any news on this, or the similar #2842?

@s1monw
Copy link
Contributor

s1monw commented Apr 25, 2014

I think now that we have templates we can implement this much simpler - I think we should revisit it soon, thanks for pinging @timbunce

@s1monw s1monw added v1.3.0 and removed v1.2.0 labels May 12, 2014
@brupm
Copy link

brupm commented Jun 20, 2014

👍

@brupm
Copy link

brupm commented Jun 27, 2014

This would be awesome to have. I ended up having to do it outside ES.

  1. Get the suggestions
  2. Perform another ES search to validate the existence
  3. Present only existing phases to users

@s1monw s1monw assigned areek and unassigned s1monw Jun 27, 2014
@clintongormley
Copy link

@areek something you could take a look at?

@areek
Copy link
Contributor

areek commented Jul 1, 2014

@clintongormley already started looking into it!

@areek
Copy link
Contributor

areek commented Jul 7, 2014

After looking into it, I have come up with the following API for the suggestion filter option:

curl -XPOST 'localhost:9200/_search' -d {
    "suggest": {
        "text": "Xor the Got-Jewel",
        "simple_phrase": {
            "phrase": {
                "field": "body",
                "size": 5,
                "shard_size": 10,
                "confidence": 2,
                "filter": {
                    "match": {
                        "body": "{{suggestion}}"
                    }
                }
            }
        }
    }
}

The filter option above is just a query template with the magic variable "suggestion", which will be populated once phrase suggestions are made.
For the first iteration, I intend to make local queries for the filter instead of hitting all the shards and also restrict the size of the suggestions to a maximum of 20, when filter option is used.

areek added a commit to areek/elasticsearch that referenced this issue Jul 7, 2014
…ching any documents for a given query

The newly added filter option will let the user provide a template query which will be executed for every
phrase suggestions generated to ensure that the suggestion matches at least one document for the query.
The filter query is only executed on the local node for now. When the new filter option is used, the
size of the suggestion is restricted to 20.

Closes elastic#3482
@clintongormley
Copy link

@areek Just a note: match is a query, not a filter.

@areek
Copy link
Contributor

areek commented Jul 8, 2014

@clintongormley Currently the filter param takes queries! The reason the name is filter, is to indicate that the query is used to filter out the suggestions after being generated

@nik9000
Copy link
Member

nik9000 commented Jul 8, 2014

Its confusing to take queries and call it filter, I think. Why not make it
take filters?

On Tue, Jul 8, 2014 at 10:26 AM, Areek Zillur notifications@github.com
wrote:

@clintongormley https://github.com/clintongormley Currently the filter
param takes queries! The reason the name is filter, is to indicate that the
query is used to filter out the suggestions after being generated


Reply to this email directly or view it on GitHub
#3482 (comment)
.

@clintongormley
Copy link

Agree with @nik9000 about confusing. filter means filters, not queries. In that case should be called query to be consistent with other APIs.

@areek
Copy link
Contributor

areek commented Jul 8, 2014

It does seem confusing, the updated API looks like the following:

curl -XPOST 'localhost:9200/_search' -d {
    "suggest": {
        "text": "Xor the Got-Jewel",
        "simple_phrase": {
            "phrase": {
                "field": "body",
                "size": 5,
                "shard_size": 10,
                "confidence": 2,
                "filter": {
                    "template": {
                        "body": "{{suggestion}}"
                    },
                    "preference": "_only_local"
                }
            }
        }
    }
}

The change was mainly due to adding the preference param in the filter object. Calling it query would also confuse users at least with respect to the reason for providing such an option in the suggest API. Thoughts?

@clintongormley
Copy link

@areek I assume that any query can be run there? it's not limited to just template queries? Also, if the user wants to use a template query with other parameters, I assume they can just specify params and the suggestion value will be added to any manually specified params?

Either way, filter is the wrong name for that section. Wherever we have filter or *_filter, it means that we are in "filter context" and only filter clauses can be accepted.

I'd be perfectly happy just renaming it to query, but if you want something slightly more descriptive then why not use post_query? We use post_filter in the top level of search requests to say: "this filter will be applied to the hits array after the query has been run and after the aggregations have been calculated". This functionality seems similar.

@s1monw
Copy link
Contributor

s1monw commented Jul 9, 2014

what about:

"collate" : {
  "filter|query" : { ... }   
  "preference": "_only_local"
}

then folks can pick if we should parse it as a query or not. The template engine doesn't care really it's just string replacements.... we can document it that it is passed through mustache.

@areek
Copy link
Contributor

areek commented Jul 9, 2014

@s1monw I like collate, I will go ahead and change it to that format and document the use of mustache under the covers.

@clintongormley all your assumptions are correct, except that the params support is not there, the reason for the template is to allow for the magic suggestion variable.

@clintongormley
Copy link

@areek i'm thinking that users might want to use a template query here. Couldn't you just check if there are params already defined, and merge suggestion variable into them?

@areek
Copy link
Contributor

areek commented Jul 10, 2014

@clintongormley I will end up doing that, thanks for suggesting.

@areek
Copy link
Contributor

areek commented Jul 10, 2014

So the updated API looks like the following:

curl -XPOST 'localhost:9200/_search' -d {
    "suggest": {
        "text": "Xor the Got-Jewel",
        "simple_phrase": {
            "phrase": {
                "field": "body",
                "size": 5,
                "shard_size": 10,
                "confidence": 2,
                "collate": {
                    "query": {
                        "{{field_name}}": "{{suggestion}}"
                    },
                    "preference": "_primary",
                    "params": {"field_name": "title"}
                }
            }
        }
    }
}

collate can also take a filter instead of query (as suggested by @s1monw) and optional params can be used, to inject the template query or filter with additional params (as suggested by @clintongormley).

@s1monw s1monw added v1.4.0 and removed v1.3.0 labels Jul 14, 2014
@areek areek closed this as completed in 7634389 Jul 14, 2014
@s1monw s1monw added v1.3.0 and removed v1.4.0 labels Jul 14, 2014
areek added a commit that referenced this issue Jul 14, 2014
The newly added collate option will let the user provide a template query/filter which will be executed for every phrase suggestions generated to ensure that the suggestion matches at least one document for the filter/query.
The user can also add routing preference `preference` to route the collate query/filter and additional `params` to inject into the collate template.

Closes #3482
@clintongormley clintongormley changed the title Phrase suggest option to limit suggestions to exising phrases Suggester: Phrase suggest option to limit suggestions to exising phrases Jul 16, 2014
@clintongormley clintongormley added the :Search/Suggesters "Did you mean" and suggestions as you type label Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants