Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement "interesting terms" in More Like This handler #1412

Closed
dcramer opened this issue Oct 19, 2011 · 10 comments
Closed

Implement "interesting terms" in More Like This handler #1412

dcramer opened this issue Oct 19, 2011 · 10 comments
Labels
:Search/Search Search-related issues that do not fall into other categories

Comments

@dcramer
Copy link

dcramer commented Oct 19, 2011

Solr implements the "interesting terms" for MLT queries: http://wiki.apache.org/solr/MoreLikeThisHandler

"One of: "list", "details", "none" -- this will show what "interesting" terms are used for the MoreLikeThis query. These are the top tf/idf terms. NOTE: if you select 'details', this shows you the term and boost used for each term. Unless mlt.boost=true all terms will have boost=1.0"

@dadoonet
Copy link
Member

+1 nice feature

@vinhphu1711
Copy link

+1 I'm excited to see this feature. Still no milestone yet?

@dcroley
Copy link

dcroley commented Jul 6, 2012

+1 This would be really useful for my project too.

@tkholopkin
Copy link

+100 Must have. Especially if projects migrate from Lucene to Elastic Search

@mohsinh
Copy link
Contributor

mohsinh commented Oct 7, 2013

+1 to get list of "Interesting Terms"

@alexanderjmitchell
Copy link

+1 this would be really useful

@alexksikes
Copy link
Contributor

There are two things that we may wish to achieve with "interesting terms" and More Like This.

  1. Return the selected interesting terms that formed the More Like This query. These could be returned as an ordered list of (term, tf-idf, score) where score is the boosted score if activated.

  2. Return the matched interesting terms of each document returned from the response.

The former is, I think, what Solr does with "mlt.interestingTerms", while the later could be achieved with "explain". However, "explain" is more for debugging purposes and so its output is very verbose. Perhaps, it would be more desirable to simply return a list of matched interesting terms. Any thoughts?

I haven't tested the "mlt.interestingTerms" Solr feature, would anyone know what the exact behavior and output are?

@piyushrai
Copy link

+1 This is almost must have if one want to move more like this (with boosting) from SOLR to elasticsearch. Any updates on this?

alexksikes added a commit to alexksikes/elasticsearch that referenced this issue Mar 18, 2015
For Fuzzy Queries:

```
GET /imdb/movies/_validate/query?explain=true
{
  "query": {
    "fuzzy": {
      "actors": "kyle"
    }
  }
}

Response:

{
   ...
   "explanations": [
      {
         "index": "imdb",
         "valid": true,
         "explanation": "filtered(actors:eyle^0.75 actors:kale^0.75 actors:kayle^0.75 ... )->cache(_type:movies)"
      }
   ]
}
```

For More Like This:

```
GET /imdb/movies/_validate/query?explain=true
{
  "query": {
    "more_like_this": {
      "like": {
        "_id": "88247"
      }
    }
  }
}

Response:

{
   ...
   "explanations": [
      {
         "index": "imdb",
         "valid": true,
         "explanation": "filtered((((title:terminator^3.71334 plot:kyle^1.0604408 plot:cyborg^1.0863208 ... )~2)) -ConstantScore(_uid:movies#88247))->cache(_type:movies)"
      }
   ]
}
```

Relates to elastic#1412
alexksikes added a commit that referenced this issue Apr 13, 2015
…d queries

This commit adds a `rewrite` parameter to the validate API in order to shown
how the given query is re-written into primitive queries. For example, an MLT
query is re-written into a disjunction of the selected terms. Other use cases
include `fuzzy`, `common_terms`, or `match` query especially with a
`cutoff_frequency` parameter. Note that the explanation is only given for a
single randomly chosen shard only, so the output may vary from one shard to
another.

Relates #1412
Closes #10147
alexksikes added a commit that referenced this issue Apr 13, 2015
…d queries

This commit adds a `rewrite` parameter to the validate API in order to shown
how the given query is re-written into primitive queries. For example, an MLT
query is re-written into a disjunction of the selected terms. Other use cases
include `fuzzy`, `common_terms`, or `match` query especially with a
`cutoff_frequency` parameter. Note that the explanation is only given for a
single randomly chosen shard only, so the output may vary from one shard to
another.

Relates #1412
Closes #10147
@alexksikes
Copy link
Contributor

closed by #10147

@sabi0
Copy link
Contributor

sabi0 commented Jul 7, 2017

Sorry for bringing this old issue up.
I couldn't find a way to determine "interesting terms" for MLT query with an ad-hoc document.
_validate/query?explain=true returns just "filtered(like:[...here goes my text...]))".
Is such feature available? I figured it out - just had to add &rewrite=true.

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :More Like This labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests