Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggs enhancement - allow Include/Exclude clauses to use array of terms #7529

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -449,67 +449,10 @@ WARNING: Use of background filters will slow the query as each term's postings m
===== Filtering Values

It is possible (although rarely required) to filter the values for which buckets will be created. This can be done using the `include` and
`exclude` parameters which are based on regular expressions. This functionality mirrors the features
offered by the `terms` aggregation.
`exclude` parameters which are based on a regular expression string or arrays of exact terms. This functionality mirrors the features
described in the <<search-aggregations-bucket-terms-aggregation,terms aggregation>> documentation.


[source,js]
--------------------------------------------------
{
"aggs" : {
"tags" : {
"significant_terms" : {
"field" : "tags",
"include" : ".*sport.*",
"exclude" : "water_.*"
}
}
}
}
--------------------------------------------------

In the above example, buckets will be created for all the tags that has the word `sport` in them, except those starting
with `water_` (so the tag `water_sports` will no be aggregated). The `include` regular expression will determine what
values are "allowed" to be aggregated, while the `exclude` determines the values that should not be aggregated. When
both are defined, the `exclude` has precedence, meaning, the `include` is evaluated first and only then the `exclude`.

The regular expression are based on the Java(TM) http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html[Pattern],
and as such, they it is also possible to pass in flags that will determine how the compiled regular expression will work:

[source,js]
--------------------------------------------------
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tags",
"include" : {
"pattern" : ".*sport.*",
"flags" : "CANON_EQ|CASE_INSENSITIVE" <1>
},
"exclude" : {
"pattern" : "water_.*",
"flags" : "CANON_EQ|CASE_INSENSITIVE"
}
}
}
}
}
--------------------------------------------------

<1> the flags are concatenated using the `|` character as a separator

The possible flags that can be used are:
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#CANON_EQ[`CANON_EQ`],
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#CASE_INSENSITIVE[`CASE_INSENSITIVE`],
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#COMMENTS[`COMMENTS`],
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#DOTALL[`DOTALL`],
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#LITERAL[`LITERAL`],
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#MULTILINE[`MULTILINE`],
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CASE[`UNICODE_CASE`],
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS[`UNICODE_CHARACTER_CLASS`] and
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNIX_LINES[`UNIX_LINES`]

===== Execution hint

There are two mechanisms by which terms aggregations can be executed: either by using field values directly in order to aggregate
Expand Down
Expand Up @@ -418,7 +418,7 @@ Generating the terms using a script:
==== Filtering Values

It is possible to filter the values for which buckets will be created. This can be done using the `include` and
`exclude` parameters which are based on regular expressions.
`exclude` parameters which are based on regular expression strings or arrays of exact values.

[source,js]
--------------------------------------------------
Expand Down Expand Up @@ -477,6 +477,29 @@ http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CA
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS[`UNICODE_CHARACTER_CLASS`] and
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNIX_LINES[`UNIX_LINES`]

For matching based on exact values the `include` and `exclude` parameters can simply take an array of
strings that represent the terms as they are found in the index:

[source,js]
--------------------------------------------------
{
"aggs" : {
"JapaneseCars" : {
"terms" : {
"field" : "make",
"include" : ["mazda", "honda"]
}
},
"ActiveCarManufacturers" : {
"terms" : {
"field" : "make",
"exclude" : ["rover", "jensen"]
}
}
}
}
--------------------------------------------------

==== Multi-field terms aggregation

The `terms` aggregation does not support collecting terms from multiple fields
Expand Down
Expand Up @@ -19,6 +19,7 @@

package org.elasticsearch.search.aggregations.bucket.terms;

import org.elasticsearch.ElasticsearchIllegalArgumentException;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.search.aggregations.Aggregator;
import org.elasticsearch.search.aggregations.Aggregator.SubAggCollectionMode;
Expand All @@ -43,6 +44,8 @@ public class TermsBuilder extends ValuesSourceAggregationBuilder<TermsBuilder> {
private String executionHint;
private SubAggCollectionMode collectionMode;
private Boolean showTermDocCountError;
private String[] includeTerms = null;
private String[] excludeTerms = null;

/**
* Sole constructor.
Expand Down Expand Up @@ -101,10 +104,24 @@ public TermsBuilder include(String regex) {
* @see java.util.regex.Pattern#compile(String, int)
*/
public TermsBuilder include(String regex, int flags) {
if (includeTerms != null) {
throw new ElasticsearchIllegalArgumentException("exclude clause must be an array of strings or a regex, not both");
}
this.includePattern = regex;
this.includeFlags = flags;
return this;
}

/**
* Define a set of terms that should be aggregated.
*/
public TermsBuilder include(String [] terms) {
if (includePattern != null) {
throw new ElasticsearchIllegalArgumentException("include clause must be an array of strings or a regex, not both");
}
this.includeTerms = terms;
return this;
}

/**
* Define a regular expression that will filter out terms that should be excluded from the aggregation. The regular
Expand All @@ -123,10 +140,25 @@ public TermsBuilder exclude(String regex) {
* @see java.util.regex.Pattern#compile(String, int)
*/
public TermsBuilder exclude(String regex, int flags) {
if (excludeTerms != null) {
throw new ElasticsearchIllegalArgumentException("exclude clause must be an array of strings or a regex, not both");
}
this.excludePattern = regex;
this.excludeFlags = flags;
return this;
}

/**
* Define a set of terms that should not be aggregated.
*/
public TermsBuilder exclude(String [] terms) {
if (excludePattern != null) {
throw new ElasticsearchIllegalArgumentException("exclude clause must be an array of strings or a regex, not both");
}
this.excludeTerms = terms;
return this;
}


/**
* When using scripts, the value type indicates the types of the values the script is generating.
Expand Down Expand Up @@ -189,6 +221,9 @@ protected XContentBuilder doInternalXContent(XContentBuilder builder, Params par
if (collectionMode != null) {
builder.field(Aggregator.COLLECT_MODE.getPreferredName(), collectionMode.parseField().getPreferredName());
}
if (includeTerms != null) {
builder.array("include", includeTerms);
}
if (includePattern != null) {
if (includeFlags == 0) {
builder.field("include", includePattern);
Expand All @@ -199,6 +234,9 @@ protected XContentBuilder doInternalXContent(XContentBuilder builder, Params par
.endObject();
}
}
if (excludeTerms != null) {
builder.array("exclude", excludeTerms);
}
if (excludePattern != null) {
if (excludeFlags == 0) {
builder.field("exclude", excludePattern);
Expand Down