Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant_terms agg: added option for a backgroundFilter #5944

Closed
wants to merge 3 commits into from
Closed

Significant_terms agg: added option for a backgroundFilter #5944

wants to merge 3 commits into from

Conversation

markharwood
Copy link
Contributor

Allows for a narrowed background context in analysis of term frequencies.

…background context for analysis of term frequencies
// term may be missing from the background, so for the purposes of this calculation
// we assume a value of 1 for our calculations which avoids returning an "infinity" result
supersetFreq = 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use additional smoothing instead in order not to have this particular case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "0" is the only special case that needs treatment as it is the absence of any evidence and without this adjustment the score returned is infinity. Anything > 0 is at least "some evidence" and therefore something I don't think we should tamper with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more thinking about something like Laplace smoothing, that we already use eg. in the phrase suggester http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html#_smoothing_models

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be done in a different issue though.

@jpountz
Copy link
Contributor

jpountz commented Apr 30, 2014

The patch looks good. Can we also have a test that makes sure that this agg finds the right terms when a background filter is set?

"tags" : {
"significant_terms" : {
"field" : "tag",
"backgroundFilter": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all our examples include the _ notation, can we change to background_filter

Added test for background filter tuning out selected words

if (BACKGROUND_FILTER.match(currentFieldName)) {
filter = context.queryParserService().parseInnerFilter(parser).filter();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a else to fail the parsing if an object with a different key than BACKGROUND_FILTER is found?

@jpountz
Copy link
Contributor

jpountz commented May 12, 2014

@markharwood I just did another review and left a couple of minor comments. Other than that it looks good to me, it's nice that the documentation makes clear that although convenient this option can be very slow.

@jpountz
Copy link
Contributor

jpountz commented May 12, 2014

LGTM

markharwood added a commit that referenced this pull request May 13, 2014
… background context for analysis of term frequencies

Closes #5944
@s1monw s1monw removed the review label Jun 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants