Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PercentageScore heuristic for significant_terms #9747

Closed
wants to merge 1 commit into from
Closed

PercentageScore heuristic for significant_terms #9747

wants to merge 1 commit into from

Conversation

markharwood
Copy link
Contributor

Provides simple “per-capita” type scoring measure for use on higher-frequency terms

Closes #9720

@markharwood
Copy link
Contributor Author

@brwe This seemed a simple but potentially common heuristic so I added a native implementation rather than relying on a scripted solution. I added a cautionary note in the reference docs about use on fields with low-frequency terms after some trials on real data.

@@ -42,10 +42,10 @@ protected void checkFrequencyValidity(long subsetFreq, long subsetSize, long sup
throw new ElasticsearchIllegalArgumentException("Frequencies of subset and superset must be positive in " + scoreFunctionName + ".getScore()");
}
if (subsetFreq > subsetSize) {
throw new ElasticsearchIllegalArgumentException("subsetFreq > subsetSize, in JLHScore.score(..)");
throw new ElasticsearchIllegalArgumentException("subsetFreq > subsetSize, in " + scoreFunctionName + ".score(..)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the .score(..) here? For the user all they need to know is the score function that failed. The method is probably not relevant to the user in the message and will be in the stack trace for a dev investigating anyway

…t_terms aggregation provides simple “per-capita” type measures.

Closes #9720
@jpountz
Copy link
Contributor

jpountz commented Feb 20, 2015

LGTM

@clintongormley
Copy link

Merged in 4d26492

@clintongormley clintongormley changed the title New aggregations feature - “PercentageScore” heuristic for significant_terms PercentageScore heuristic for significant_terms Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

significant_terms should provide a simple percentage scoring heuristic
5 participants