Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phrase suggest direct generator possibly not obeying min_word_len 0.90 #3037

Closed
jtreher opened this issue May 14, 2013 · 10 comments
Closed

Phrase suggest direct generator possibly not obeying min_word_len 0.90 #3037

jtreher opened this issue May 14, 2013 · 10 comments

Comments

@jtreher
Copy link

jtreher commented May 14, 2013

I ran into an issue where the phrase suggester does not seem to be generating terms for words of length less than the default of four even with the min_word_len set to 0,1,2, or 3. When I run a term suggest, the term comes back as expected.

Here is a gist reproducing the issue:
https://gist.github.com/jtreher/5577747

@clintongormley
Copy link

The parameter is prefix_len not prefix_leng:

@clintongormley
Copy link

    curl -XPOST '127.0.0.1:9200/test/_suggest?pretty' -d'{
      "did_you_mean": {
        "text": "ice",
        "term": {
          "field": "name",
          "max_edits":2,
          "suggest_mode": "always",
          "min_word_len": 0,
          "prefix_len": 0
        }
      }
    }'
    {
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "failed" : 0
      },
      "did_you_mean" : [ {
        "text" : "ice",
        "offset" : 0,
        "length" : 3,
        "options" : [ {
          "text" : "iced",
          "score" : 0.6666666,
          "freq" : 1
        } ]
      } ]
    }

@jtreher
Copy link
Author

jtreher commented May 15, 2013

@clintongormley While I did have a typo in the term suggest, the phrase suggest example is working and demonstrates the issue. Could you reopen?

I will clarify that the gist was demonstrating that the term suggest is providing the term "iced" but I believe the candidate generator in the phrase suggest is not provided the term "iced" for the phrase suggest to consider because of the word length.

@clintongormley
Copy link

@jtreher sorry - got that completely wrong. I'll reopen

I'm seeing the same thing you're seeing.

@s1monw ?

@ghost ghost assigned s1monw May 15, 2013
@s1monw
Copy link
Contributor

s1monw commented May 15, 2013

this seems to be a bug in the min_doc_freq smoothing. The good thing is that this only happens if your query term has a freq = 1 and the replacement has a freq = 1 as well. So in practice this might not be an issue. I will have a fix soon, in the meanwhile this should help:

curl -XPOST 'localhost:9200/test/_suggest?pretty=true' -d '{                                                                                                                              
  "text": "ice tea",
  "did_you_mean": {
    "phrase": {
      "field": "name_shingled",
      "gram_size": 3,
      "direct_generator": [
        {
          "field": "name",
          "max_edits": 2,
          "suggest_mode": "always",
          "min_word_len": 0,
          "prefix_len": 0,
          "min_doc_freq": 1.0
        }
      ]
    }
  }
}'

@s1monw s1monw closed this as completed in 8235b89 May 15, 2013
s1monw added a commit that referenced this issue May 15, 2013
Using an automatically detected 'min_doc_freq' if suggest type is set to
'always' is counter intuitive. If we suggest always ignore the frequency and
set threshold frequency to 0 to allow all possible candidates to be drawn if
they are within the given bounds.

Closes #3037
@jtreher
Copy link
Author

jtreher commented May 15, 2013

@s1monw Is there any chance that max_term_freq is not being obeyed as well with "always?" While this patch fixed this test issue, I actually have a situation where ice appears thousands of times and iced several hundred. I see "iced" appear from the term suggest, but it's like the phrase suggest never gets it.

@s1monw
Copy link
Contributor

s1monw commented May 15, 2013

max_term_freq is the maximum threshold (default: 0.01f) of documents a query term can appear in order to provide suggestions. Which means that if you won't even get a candidate for ice if the freq exceeds max_term_freq. Maybe I am not understanding you question right?

@jtreher
Copy link
Author

jtreher commented May 16, 2013

@s1monw I have to set max_term_freq in the term suggest to 0.999 (99.9%) to have term show up. However, when I do this in phrase suggest, it's as though the candidate is not generated.

@jtreher
Copy link
Author

jtreher commented Jun 7, 2013

@s1monw This did fix the issue I had. It seems to respect max_term_freq now with 0.9.1

@s1monw
Copy link
Contributor

s1monw commented Jun 7, 2013

@jtreher I think I did!

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Using an automatically detected 'min_doc_freq' if suggest type is set to
'always' is counter intuitive. If we suggest always ignore the frequency and
set threshold frequency to 0 to allow all possible candidates to be drawn if
they are within the given bounds.

Closes elastic#3037
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants