Skip to content

Grouping Parameter (Spider)

JoeWinter edited this page Feb 18, 2015 · 2 revisions

[Table of Contents](https://github.com/dell-oss/Doradus/wiki/Spider Databases: Table-of-Contents) | Previous | Next
Spider Aggregate Queries: Grouping Parameter Overview


When a grouping parameter is provided, it causes the aggregate query to compute multiple values, one per group value as defined by the grouping expression. Either of two methods can be used to pass a grouping parameter depending upon whether or not composite grouping is desired. A wide range of grouping expressions are allowed as described in the following sections.

Global Aggregates: No Grouping Parameter

Without a grouping parameter, an aggregate query returns a single value: the metric function computed across all selected objects. Consider the following aggregate query URI REST command:

GET /Msgs/Message/_aggregate?m=MAX(Size)

This aggregate query returns the largest Size value among all messages. The response in XML is:

<results>
	<aggregate metric="MAX(Size)"/>
	<value>16796009</value>
</results>

In JSON:

{"results": {
	"aggregate": {"metric": "MAX(Size)"},
	"value": "16796009"
}}

As shown, an aggregate element lists the parameters used for the aggregate query. In this case, only a metric parameter was provided. For a global aggregate, the metric value is provided in the value element.

When the grouping field is multi-valued with respect to the perspective table, the metric is applied across all values for each perspective object. For example, in this query:

GET /Msgs/Message/_aggregate?m=COUNT(Tags)

If Tags is an MV scalar, all values for each message are counted, so the total returned may be more than the number of objects. Furthermore, an object related to a perspective object may be processed more than once. Consider this query:

GET /Msgs/Message/_aggregate?m=COUNT(ExternalRecipients.MessageAddress.Domain)

Some messages are likely to have multiple external recipients linked to the same domain. Therefore, ExternalRecipients.MessageAddress.Domain will count the same domain multiple times for those messages.

Single-level Grouping

When the grouping parameter consists of a single grouping field, objects are divided into sets based on the distinct values found for the grouping field. A separate metric value is computed for each group. For example, this aggregate query uses a single-valued scalar as the grouping field:

GET /Msgs/Message/_aggregate?m=MAX(Size)&f=Tags

The Tags field logically partitions objects into groups: one for each field value. Each object is included in each group for which is has a Tags value. If an object has no Tags value, it is placed in a (null) group. The maximum Size is then computed for each group. A typical result in XML is shown below:

<results>
	<aggregate metric="MAX(Size)" group="Tags"/>
	<totalobjects>6030</totalobjects>
	<summary>16796009</summary>
	<groups>
		<group>
			<metric>4875</metric>
			<field name="Tags">(null)</field>
		</group>
		<group>
			<metric>16796009</metric>
			<field name="Tags">AfterHours</field>
		</group>
		<group>
			<metric>16796009</metric>
			<field name="Tags">Customer</field>
		</group>
	</groups>
</results>

In JSON:

{"results": {
	"aggregate": {"metric": "MAX(Size)", "group": "Tags"},
	"totalobjects": "6030",
	"summary": "16796009",
	"groups": [
		{"group": {
			"metric": "4875",
			"field": {"Tags": "(null)"}
		}},
		{"group": {
			"metric": "16796009",
			"field": {"Tags": "AfterHours"}
		}},
		{"group": {
			"metric": "16796009",
			"field": {"Tags":"Customer"}
		}}
	]
}}

For grouped aggregate queries, the results element contains a groups element, which contains one group for each group value. Each group contains the field name and value for that group and the corresponding metric value. The totalobjects value computes the number of objects selected in the computation. The summary value computes the metric value across all selected objects independent of groups. For the AVERAGE function, this provides the true average, not the average of averages.

Grouping Field Aliases

By default, aggregate query group results use the fully-qualified path name of each grouping field. You can shorten the output result by using an alias for each grouping field. The alias name is used instead of the fully-qualified name. For example:

GET /Msgs/Message/_aggregate?m=COUNT(*)&f=TOP(3,Sender.Person.Name)

This query produces a result such as the following:

<results>
	<aggregate metric="COUNT(*)" group="TOP(3,Sender.Person.Name)"/>
	<totalobjects>6030</totalobjects>
	<summary>6030</summary>
	<totalgroups>186</totalgroups>
	<groups>
		<group>
			<metric>5256</metric>
			<field name="Sender.Person.Name">(null)</field>
		</group>
		<group>
			<metric>82</metric>
			<field name="Sender.Person.Name">Quest Support</field>
		</group>
		<group>
			<metric>80</metric>
			<field name="Sender.Person.Name">spb_setupbuilder</field>
		</group>
	</groups>
</results>

The fully-qualified field name Sender.Person.Name is used for the field parameter in each group element. An alias can be substituted for a grouping field by appending AS *name* to the grouping field. Example:

GET /Msgs/Message/_aggregate?m=COUNT(*)&f=TOP(3,Sender.Person.Name) AS Name

The alias Name is used in the output as shown below:

<results>
	<aggregate metric="COUNT(*)" group="TOP(3,Sender.Person.Name)"/>
	<totalobjects>6030</totalobjects>
	<summary>6030</summary>
	<totalgroups>186</totalgroups>
	<groups>
		<group>
			<metric>5256</metric>
			<field name="Name">(null)</field>
		</group>
		<group>
			<metric>82</metric>
			<field name="Name">Quest Support</field>
		</group>
		<group>
			<metric>80</metric>
			<field name="Name">spb_setupbuilder</field>
		</group>
	</groups>
</results>

When many groups are returned and/or the grouping field name is long, the alias name can significantly shorten the output results. An alias name can also improve the identity of the groups.

Multi-level Grouping

The grouping parameter can list multiple grouping expressions to form multi-level grouping. Each grouping expression must be a path from the perspective table to a link or scalar field. Below is an example multi-level aggregate query:

GET /Msgs/Message/_aggregate?m=COUNT(*)&f=TRUNCATE(SendDate,DAY),Tags

In this example, TRUNCATE(SendDate,DAY) is the top-level grouping field and Tags is the second-level grouping field. The query creates groups based on the cross-product of all grouping field values, and a metric is computed for each combination for which at least one object has a value. A perspective object is included in the metric computation for each group for which it has actual values. A summary value is computed for each non-leaf group, and a totalobjects value is computed for the top-level group. An example result in XML is shown below:

<results>
	<aggregate metric="COUNT(*)" group="TRUNCATE(SendDate,DAY),Tags"/>
	<totalobjects>6030</totalobjects>
	<summary>6030</summary>
	<groups>
		<group>
			<summary>4752</summary>
			<field name="SendDate">2010-07-17 00:00:00</field>
			<groups>
				<group>
					<metric>1</metric>
					<field name="Tags">(null)</field>
				</group>
				<group>
					<metric>4751</metric>
					<field name="Tags">AfterHours</field>
				</group>
				<group>
					<metric>1524</metric>
					<field name="Tags">Customer</field>
				</group>
			</groups>
		</group>
		<group>
			<summary>1278</summary>
			<field name="SendDate">2010-07-18 00:00:00</field>
			<groups>
				<group>
					<metric>1278</metric>
					<field name="Tags">AfterHours</field>
				</group>
				<group>
					<metric>700</metric>
					<field name="Tags">Customer</field>
				</group>
			</groups>
		</group>
	</groups>
</results>

The same response in JSON:

{"results": {
	"aggregate": {"metric": "COUNT(*)", "group": "TRUNCATE(SendDate,DAY),Tags"},
	"totalobjects": "6030",
	"summary": "6030",
	"groups": [
		{"group": {
			"summary": "4752",
			"field": {"SendDate":"2010-07-17 00:00:00"},
			"groups": [
				{"group": {
					"metric": "1",
					"field": {"Tags": "(null)"}
				}},
				{"group": {
					"metric": "4751",
					"field": {"Tags":"AfterHours"}
				}},
				{"group": {
					"metric": "1524",
					"field": {"Tags":"Customer"}
				}}
			]
		}},
		{"group": {
			"summary": "1278",
			"field": {"SendDate": "2010-07-18 00:00:00"},
			"groups": [
				{"group": {
					"metric": "1278",
					"field": {"Tags":"AfterHours"}
				}},
				{"group": {
					"metric": "700",
					"field": {"Tags":"Customer"}
				}}
			]
		}}
	]
}}

Each non-leaf group has an inner groups element containing its corresponding lower-level group elements and a summary element that provides a group-level metric value. Leaf-level groups have a metric element. This structure is recursive if there are more than two grouping levels.

Group Fields as Grouping Parameters

Doradus Spider allows a group field to be used as a grouping expression if all of its leaf fields are links. For example, in the example schema the group field Participants contains links Sender, InternalRecipients, and ExternalRecipients, all pointing to the Participant table. If the Participants field is used as a grouping parameter, the values (object IDs) for all three links are combined into a set for each perspective object. The object is included in the metric computation for each group for which it has a value. If a perspective object has no values for any of the links, it is included in the (null) group.

Link paths and WHERE filters can be used on the group field with the same syntax as allowed for links. For example:

f=Participants.Person.Name
f=Participants.WHERE(ReceiptDate > 2014-02-01).Person.Name
f=Participants.Person.WHERE(Department:Sales).Name

In the first example, the combined Person.Name values of each link (ExternalRecipients, InternalRecipients, and Sender) form the grouping sets; each perspective object is included in each set in which it participates. If a perspective object has no Participants.Person.Name values, it is included in the null group.

The second and third examples using the same grouping parameter: Participants.Person.Name. However, the presence of the WHERE filter causes nulls to be handled differently:

  • In the second example, a perspective object is simply skipped if it has no participants or none of its participants are selected by the expression ReceiptDate > 2014-02-01. When a perspective is selected by the WHERE expression but it is has no Person.Name values, it is included in the (null) group.

  • In the third example, a perspective object is skipped if it has no participants, no participants have Person values, or no Person.Department field includes the term Sales. Only if a perspective object is chosen but has no Name values is it included in the (null) group.

Composite Grouping

Doradus Spider supports a special grouping feature called composite grouping. It is meaningful only for aggregate queries with 2 or more grouping levels. It causes the metric function(s) to be computed for parent (non-leaf) groups in addition to leaf-most groups. The extra computations are returned as composite results within the corresponding parent groups.

Composite grouping is requested by using a special grouping parameter instead of the normal grouping parameter. For example, consider the following URI aggregate query:

GET /Msgs/Message/_aggregate?m=COUNT(*)&f=TRUNCATE(SendDate,DAY),Tags

Composite grouping can be requested for this 2-level query by using the &cf parameter instead of &f:

GET /Msgs/Message/_aggregate?m=COUNT(*)&cf=TRUNCATE(SendDate,DAY),Tags

A typical result for this 2-level aggregate query in XML is shown below:

<results>
	<aggregate metric="COUNT(*)" group="TRUNCATE(SendDate,DAY),Tags"/>
	<totalobjects>6032</totalobjects>
	<summary>6032</summary>
	<groups>
		<group>
			<summary>4753</summary>
			<field name="SendDate">2010-07-17 00:00:00</field>
			<groups>
				<group>
					<metric>1</metric>
					<field name="Tags">(null)</field>
				</group>
				<group>
					<metric>4752</metric>
					<field name="Tags">AfterHours</field>
				</group>
				<group>
					<metric>1524</metric>
					<field name="Tags">Customer</field>
				</group>
			</groups>
		</group>
		<group>
			<summary>1279</summary>
			<field name="SendDate">2010-07-18 00:00:00</field>
			<groups>
				<group>
					<metric>1279</metric>
					<field name="Tags">AfterHours</field>
				</group>
				<group>
					<metric>701</metric>
					<field name="Tags">Customer</field>
				</group>
			</groups>
		</group>
		<group composite="true">
			<field name="SendDate">*</field>
			<groups>
				<group>
					<metric>1</metric>
					<field name="Tags">(null)</field>
				</group>
				<group>
					<metric>6031</metric>
					<field name="Tags">AfterHours</field>
				</group>
				<group>
					<metric>2225</metric>
					<field name="Tags">Customer</field>
				</group>
			</groups>
		</group>
	/groups>
</results>

The same response in JSON is shown below:

{"results": {
	"aggregate": {"metric": "COUNT(*)", "group": "TRUNCATE(SendDate,DAY),Tags"},
	"totalobjects": "6032",
	"summary": "6032",
	"groups": [
		{"group": {
			"summary": "4753",
			"field": {"SendDate": "2010-07-17 00:00:00"},
			"groups": [
				{"group": {
					"metric": "1",
					"field": {"Tags": "(null)"}
				}},
				{"group": {
					"metric": "4752",
					"field": {"Tags": "AfterHours"}
				}},
				{"group": {
					"metric": "1524",
					"field": {"Tags": "Customer"}
				}}
			]
		}},
		{"group": {
			"summary": "1279",
			"field": {"SendDate": "2010-07-18 00:00:00"},
			"groups": [
				{"group": {
					"metric": "1279",
					"field": {"Tags": "AfterHours"}
				}},
				{"group": {
					"metric": "701",
					"field": {"Tags": "Customer"}
				}}
			]
		}},
		{"group": {
			"composite": "true",
			"field": {"SendDate": "*"},
			"groups": [
				{"group": {
					"metric": "1",
					"field": {"Tags": "(null)"}
				}},
				{"group": {
					"metric": "6031",
					"field": {"Tags": "AfterHours"}
				}},
				{"group": {
					"metric": "2225",
					"field": {"Tags":"Customer"}
				}}
			]
		}}
	]
}}

As shown, composite grouping produces an extra group for non-leaf grouping levels. This group is marked with a composite property of true, and the value for its field element is "*". *Within the composite group, lower-level metric groups are provided for each lower-level grouping field value, however, these lower-level metrics are computed across all objects at the composite grouping level. In the example above, the composite group computes the metric function (COUNT(*)) for all second-level groups (Tags) across all first-level group values (SendDate).

Composite grouping is only meaningful for multi-level grouping.

Compound Grouping: GROUP Sets

Doradus Spider allows the aggregate query grouping parameter to consist of multiple grouping sets. Each grouping set is enclosed in a GROUP function; multiple grouping sets are separated by commas. This feature is known as compound grouping. The general syntax is:

GROUP(<expression 1>),GROUP(<expression 2>),...,GROUP(<expression n>)

Each <expression n> parameter must use one of the following forms:

  • A "*" can be used to compute a global aggregate (i.e., GROUP(*)). The metric function is computed for all selected objects just as in an aggregate query with no grouping parameter. The GROUP(*) function should be specified at most once since there is only one metric value for a global aggregate.

  • A single-level grouping expression, consisting of a single scalar field or a field path (e.g., GROUP(Tags)).

  • A multi-level grouping expression, consisting of a comma-separated list of scalar fields and/or field paths (e.g., GROUP(TRUNCATE(SendDate,DAY),Tags)).

Each single- and multi-level grouping expression must be relative to the perspective table. The same set of objects selected by the aggregate query is passed to each grouping set, and separate metric computations are performed for each grouping set. Aggregate queries that use compound grouping perform a single pass through the selected objects and computes multiple grouping sets at the same time.

Consider this aggregate query:

GET /Msgs/Message/_aggregate?m=MAX(Size)
	&cf=GROUP(*),GROUP(TRUNCATE(SendDate,WEEK)),GROUP(TOP(2,TERMS(Subject)),Tags)
	&q=SendDate > 2013-10-15

This compound grouping aggregate query selects messages whose SendDate is >= 2013-10-15, and it computes the following:

  • The maximum Size value of selected messages (GROUP(*)).

  • The maximum Size of selected messages grouped by SendDate truncated to WEEK granularity (GROUP(TRUNCATE(SendDate,WEEK))).

  • The maximum Size of selected messages grouped first by the top 2 terms used in the Subject field and then by the Tags field (GROUP(TOP(2,TERMS(Subject)),Tags)). Because composite grouping was requested (&cf), this multi-level grouping expression uses the composite grouping technique.

Compound aggregate queries compute all grouping sets in a single pass. The query above returns XML results such as the following:

<results>
	<aggregate metric="MAX(Size)" query="SendDate > 2009-10-15"
		group="GROUP(*),GROUP(TRUNCATE(SendDate,WEEK)),GROUP(TOP(2,TERMS(Subject)),Tags)"/>
	<totalobjects>6032</totalobjects>
	<groupsets>
		<groupset>
			<value>16796009</value>
		</groupset>
		<groupset group="TRUNCATE(SendDate,WEEK)">
			<summary>16796009</summary>
			<groups>
				<group>
					<field name="SendDate">2010-07-12 00:00:00</field>
					<metric>965230</metric>
				</group>
				...
			</groups>
		</groupset>
		<groupset group="TOP(2,TERMS(Subject)),Tags">
			<summary>16796009</summary>
			<totalgroups>15267</totalgroups>
			<groups>
				<group>
					<summary>16796009</summary>
					<field name="Subject">scalepan</field>
					<groups>
						<group>
							<metric>16796009</metric>
							<field name="Tags">AfterHours</field>
						</group>
						...
					</groups>
				</group>
				...
				<group composite="true">
					<field name="Subject">*</field>
					<groups>
						<group>
							<metric>7317</metric>
							<field name="Tags">(null)</field>
						</group>
						...
					</groups>
				</group>
			</groups>
		</groupset>
	</groupsets>
</results>

In JSON:

{"results": {
	"aggregate": {
		"group": "GROUP(*),GROUP(TRUNCATE(SendDate,WEEK)),GROUP(TOP(2,TERMS(Subject)),Tags)",
		"query": "SendDate > 2009-10-15",
		"metric":"MAX(Size)"
	},
	"totalobjects": "6032",
	"groupsets": [
		{"groupset": {
			"value": "16796009"
		}},
		{"groupset": {
			"group": "TRUNCATE(SendDate,WEEK)",
			"summary": "16796009",
			"groups": [
				{"group": {
					"field": {"SendDate": "2010-07-12 00:00:00"},
					"metric": "16796009"
				}},
				...
			]
		}},
		{"groupset": {
			"group": " TOP(2,TERMS(Subject)),Tags ",
			"summary": "16796009",
			"totalgroups": "15267",
			"groups": [
				{"group": {
					"summary": "16796009",
					"field": {"Subject": "scalepan"},
					"groups": [
						{"group": {
							"metric": "16796009",
							"field": {"Tags": "AfterHours"}
						}},
						{"group": {
							"metric": "16796009",
							"field": {"Tags": "Customer"}
						}}
					]
				}},
				...
				{"group": {
					"composite": "true",
					"field": {"Origin": "*"},
					"groups": [
						{"group": {
							"metric": "7317",
							"field": {"Tags": "(null)"}
						}},
						...
					]
				}}
			]
		}}
	]
}}

Notable aspects of a compound group result:

  • As with all aggregate queries, the outer results element contains an aggregate element that confirms the aggregate query parameters.

  • The results element also contains a groupsets element, which contains one groupset element per grouping set, that is, for each GROUP function.

  • The contents of each groupset element follows the format applicable for global, single-level, or multi-level aggregate queries, except that they do not contain an aggregate element.

  • As with all grouped aggregate queries, each groupset and each non-leaf group contains a summary value.

  • Only multi-level grouping sets can contain a composite group, denoted by a composite=true element and a field value of "*".

Compound/Multi-metric Grouping Results

Multi-metric aggregate queries can also use compound grouping. That is, a single aggregate query can specify multiple metric functions in the &m parameter and multiple GROUP functions in the &f or &cf parameter. Such queries compute multiple metric functions, like a multi-metric query, and provide multiple grouping operations for each metric, all in a single query.

The results of compound/multi-metric queries are returned using groupset elements for each combination of metric function and GROUP function. If there are m metric functions and n GROUP functions, the result will contain m x n groupset elements. Each groupset identifies the metric and grouping parameter for which it provides results.

As an example, the following compound/multi-metric aggregate query has 3 metric functions and 2 GROUP functions:

GET /Msgs/Message/_aggregate?m=COUNT(*),MAX(Size),AVERAGE(Size)
	&cf=GROUP(TOP(2,Tags),Subject),GROUP(TRUNCATE(SendDate,DAY))

This means the result will contain 6 groupset elements, as shown in the following XML outline:

<results>
	<aggregate metric="COUNT(*),MAX(Size),AVERAGE(Size)" 
		group="GROUP(TOP(2,Tags),Subject),GROUP(TRUNCATE(SendDate,DAY))"/>
	<groupsets>
		<groupset group="TOP(2,Tags),Subject" metric="COUNT(*)">...</groupset>
		<groupset group="TRUNCATE(SendDate,DAY)" metric="COUNT(*)">...</groupset>
		<groupset group="TOP(2,Tags),Subject" metric="MAX(Size)">...</groupset>
		<groupset group="TRUNCATE(SendDate,DAY)" metric="MAX(Size)">...</groupset>
		<groupset group="TOP(2,Tags),Subject" metric="AVERAGE(Size)">...</groupset>
		<groupset group="TRUNCATE(SendDate,DAY)" metric="AVERAGE(Size)">...</groupset>
	</groupsets>
</results>

In JSON:

{"results": {
	"aggregate": {
		"group": "GROUP(TOP(2,Tags),Origin),GROUP(TRUNCATE(SendDate,DAY))",
		"metric": "COUNT(*),MAX(Size),AVERAGE(Size)"
	},
	"groupsets": [
		{"groupset": {"group": "TOP(2,Subject),Origin", "metric": "COUNT(*)", ...}},
		{"groupset": {"group": "TRUNCATE(SendDate,DAY)", "metric": "COUNT(*)", ...}},
		{"groupset": {"group": "TOP(2,Subject),Origin", "metric": "MAX(Size)", ...}},
		{"groupset": {"group": "TRUNCATE(SendDate,DAY)", "metric": "MAX(Size)", ...},
		{"groupset": {"group": "TOP(2,Subject),Origin", "metric": "AVERAGE(Size)", ...}},
		{"groupset": {"group": "TRUNCATE(SendDate,DAY)", "metric": "AVERAGE(Size)", ...}}
	]
}}

Though not shown here, each groupset will contain groups, summary, and totalgroups elements as required by each GROUP function. If the query requests composite grouping (&cf), a groupset with multi-level grouping will contain a composite group for non-leaf groups.

Special Grouping Functions

This section describes special functions that provide enhanced behavior for the grouping parameter.

BATCH Function

The BATCH function divides a scalar field’s values into specific ranges. Each range becomes a grouping field value, and objects contribute to the metric computation for the ranges for which it has values. The BATCH function’s first value must be a scalar field. The remaining values must be literal values compatible with the field’s type (text, timestamp, or numeric), and they must be given in ascending order. Example:

GET /Msgs/Message/_aggregate?m=COUNT(*)&f=BATCH(Size,100,1000,10000,100000)

This query counts messages grouped by specific ranges of the Size field. The ranges are divided at the given literal values: 100, 1000, 10000, and 100000. The lowest value implicitly creates an extra less than group; the highest value is open-ended and creates a greater than or equal to group. The query in the example above defines the following 5 groups:

Group 1: Size < 100
Group 2: Size >= 100 AND Size < 1000
Group 3: Size >= 1000 AND Size < 10000
Group 4: Size >= 10000 AND Size < 100000
Group 5: Size >= 100000

The example above returns a result such as the following:

<results>
	<aggregate metric="COUNT(Size)" group="BATCH(Size,100,1000,10000,100000)"/>
	<groups>
		<group>
			<field name="Size">&lt;100</field>
			<metric>0</metric>
		</group>
		<group>
			<field name="Size">100-1000</field>
			<metric>125</metric>
		</group>
		<group>
			<field name="Size">1000-10000</field>
			<metric>4651</metric>
		</group>
		<group>
			<field name="Size">10000-100000</field>
			<metric>1149</metric>
		</group>
		<group>
			<field name="Size">&gt;=100000</field>
			<metric>105</metric>
		</group>
	</groups>
	<summary>6030</summary>
</results>

In JSON:

{"results": {
	"aggregate": {"group": "BATCH(Size,100,1000,10000,100000)", "metric": "COUNT(Size)"},
	"groups": [
		{"group": {
			"field": {"Size": "<100"},
			"metric": "0"
		}},
		{"group": {
			"field": {"Size": "100-1000"},
			"metric": "125"
		}},
		{"group": {
			"field": {"Size": "1000-10000"},
			"metric": "4651"
		}},
		{"group": {
			"field": {"Size":"10000-100000"},
			"metric": "1149"
		}},
		{"group": {
			"field": {"Size": ">=100000"},
			"metric": "105"
		}}
	],
	"summary":"6030"
}}

As shown above in the <100 group, if no selected object has a value that falls into one of the specified groups, that group is still returned: the group’s metric is 0 for the COUNT function and empty for all other metric functions. As with all grouped aggregate queries, a summary value is returned that applies the metric function across all groups.

INCLUDE and EXCLUDE Functions

In an aggregate query, normally all values of a scalar grouping field are used to create groups. For example:

GET /Email/Message/_aggregate?f=Tags&...

All values of the Tags field for selected objects are used to create grouping fields. To eliminate specific values from being used for grouping—without affecting the selection of the owning object—the EXCLUDE function can be used:

GET /Email/Message/_aggregate?f=Tags.EXCLUDE('Confidential, 'Internal')&...

When the grouping field is a text field, the values passed to the EXCLUDE function are whole, case-insensitive values—not terms—and must be enclosed in quotes. In the example above, groups matching the value Confidential or Internal or case variations of these are excluded. The values used for text scalars can contain wildcards ? and *. For example:

GET /Email/Message/_aggregate?f=Tags.EXCLUDE('*sam?')&...

This aggregate query excludes all groups that end with the sam*x*, where x is any letter, or case variations of this sequence.

To generate only groups that match specific scalar values—without affecting the selection of the owning object—the INCLUDE function can be used:

GET /Email/Message/_aggregate?f=Tags.INCLUDE('Confidential, 'Internal')&...

The only groups generated are those matching Confidential and Internal and case variations of these; all other values are skipped. Again, when the grouping field is a text scalar, the value must be enclosed in quotes, and it can contain wildcards ? and *.

The values passed to INCLUDE and EXCLUDE must be compatible with the corresponding scalar type field: integers for integer or long fields, Booleans for boolean fields, etc. Additionally, the keyword NULL (uppercase) can be used to include or exclude the (null) group normally generated when at least one object has a null value for the grouping field. Example:

GET /Email/Message/_aggregate?f=Tags.EXCLUDE(NULL)&...

TERMS Function

Groups can be created from the terms used within a specific field. The general format of the TERMS function is:

TERMS(<field name> [, (<stop term 1> <stop term 2> ...)])

Any predefined scalar field can be used, but TERMS is most effective with text fields. The optional stop term list is a list of terms, enclosed in parentheses, that are excluded from the unique set of terms found within the specified field.

For example, the following request fetches the COUNT of messages grouped by terms found within the Subject field for a particular sender. For brevity, the TERMS function is wrapped by the TOP function to limit the results to the five groups with the highest counts:

GET /Msgs/Message/_aggregate?m=COUNT(*)&q=Sender.MessageAddress.Person.Name:Support
	&f=TOP(5,TERMS(Subject))

Similar to the BATCH function, the TERMS function creates dynamic groups from a text field based on the terms it uses. To do this, as objects matching the query parameter (if any) are found, the field passed to TERMS is parsed into alphanumeric terms, and a group is created for each unique term. Each contributes to the group metric computation for each term it contains. If a term appears multiple times within a field (e.g., "plan for a plan"), the object is only counted once. An example result in XML is shown below:

<results>
	<aggregate query="Sender.MessageAddress.Person.Name:Support" metric="COUNT(*)"
		group="TOP(5,TERMS(Subject))"/>
	<groups>
		<group>
			<field name="Subject">dilemmas</field>
			<metric>14</metric>
		</group>
		<group>
			<field name="Subject">unchary</field>
			<metric>14</metric>
		</group>
		<group>
			<field name="Subject">sundae</field>
			<metric>14</metric>
		</group>
		<group>
			<field name="Subject">tillage</field>
			<metric>14</metric>
		</group>
		<group>
			<field name="Subject">infernal</field>
			<metric>14</metric>
		</group>
	</groups>
	<summary>82</summary>
	<totalgroups>85</totalgroups>
</results>

In JSON:

{"results": {
	"aggregate": {
		"group": "TOP(5,TERMS(Subject))",
		"metric": "COUNT(*)",
		"query": "Sender.MessageAddress.Person.Name:Support"
	},
	"groups": [
		{"group": {
			"field": {"Subject": "dilemmas"},
			"metric": "14"
		}},
		{"group": {
			"field": {"Subject": "unchary"},
			"metric": "14"
		}},
		{"group": {
			"field": {"Subject": "sundae"},
			"metric": "14"
		}},
		{"group": {
			"field": {"Subject": "tillage"},
			"metric": "14"
		}},
		{"group": {
			"field": {"Subject": "infernal"},
			"metric": "14"
		}}
	],
	"summary": "82",
	"totalgroups": "85"
}}

If the terms "sundae" and “tillage” were considered uninteresting, they could be eliminated from the results by listing them as stop terms in a second parenthetical parameter to the TERMS function:

GET /Msgs/Message/_aggregate?m=COUNT(*)&q=Sender.MessageAddress.Person.Name='Quest Support'
	&f=TOP(5,TERMS(Subject,(sundae tillage)))

TOP and BOTTOM Functions

By default, all group values are returned at each grouping level, and the groups are returned in ascending order of the grouping field value. The TOP and BOTTOM functions can be used to return groups in the order of the metric value. Optionally, they can also be used to limit the number of groups returned to the highest or lowest metric values. The TOP and BOTTOM functions wrap a grouping field expression and specify a limit parameter. For example:

GET /Msgs/Message/_aggregate?m=SUM(Size)&f=TOP(3,InternalRecipients.Person.Name)

The first parameter to TOP/BOTTOM is the limit value; the second parameter is the grouping field expression. This aggregate query sums the Size field of message objects, grouped by internal recipient names, but it only returns the groups with the three highest SUM values. Typical results for the example above in XML:

<results>
	<aggregate metric="SUM(Size)" group="TOP(3,InternalRecipients.Person.Name)"/>
	<totalobjects>6030</totalobjects>
	<summary>190643320</summary>
	<totalgroups>836</totalgroups>
	<groups>
		<group>
			<metric>103198352</metric>
			<field name="InternalRecipients.Person.Name">(null)</field>
		</group>
		<group>
			<metric>20060808</metric>
			<field name="InternalRecipients.Person.Name">Marc Bourauel</field>
		</group>
		<group>
			<metric>16798901</metric>
			<field name="InternalRecipients.Person.Name">Nina Cantauw</field>
		</group>
	</groups>
</results>

In JSON:

{"results": {
	"aggregate": {
		"metric": "SUM(Size)",
		"group": "TOP(3,InternalRecipients.Person.Name)"
	},
	"totalobjects": "6030",
	"summary": "190643320",
	"totalgroups": "836",
	"groups": [
		{"group": {
		"metric": "103198352",
		"field": {"InternalRecipients.Person.Name": "(null)"}
		}},
		{"group": {
		"metric": "20060808",
		"field": {"InternalRecipients.Person.Name": "Marc Bourauel"}
		}},
		{"group": {
		"metric": "16798901",
		"field": {"InternalRecipients.Person.Name": "Nina Cantauw"}
		}}
	]
}}

The BOTTOM parameter works the same way but returns the groups with the lowest metric values. When either the TOP or BOTTOM function is used, the total number of groups that were actually computed is returned in the element totalgroups, as shown above.

When the limit parameter is 0, all groups are returned, but they are returned in metric-computation order. Using 0 for the limit parameter essentially means unlimited.

When the aggregate query has multiple grouping fields, a TOP or BOTTOM function can be used with each grouping field. In secondary groups, TOP and BOTTOM return groups whose metric values are computed relative to their parent groups. Below is an example aggregate query 2-level grouping using TOP for the outer level and BOTTOM for the inner level:

GET /Msgs/Message/_aggregate?m=COUNT(*)&f=TOP(3,Sender.Person),BOTTOM(2,TRUNCATE(SendDate,DAY))`

FIRST and LAST Functions

The FIRST and LAST functions are similar to the TOP and BOTTOM functions except that a limited set of groups are returned based on the group names instead of the metric values.

GET /Email/Person/_aggregate?q=*&m=COUNT(*)&f=FIRST(10,LastName)&range=0

This query returns the first 10 LastName groups alphabetically. LAST returns the last groups (descending sort). This should help with aggregate queries that must be “paged”.

TRUNCATE Function

The TRUNCATE function truncates a timestamp field to a given granularity, yielding a value that can be used as a grouping field. Before the timestamp field is truncated, the TRUNCATE function can optionally shift the value to another time first. The syntax for the function is:

TRUNCATE(<timestamp field>, <precision> [, <time shift>])

For example:

GET /Msgs/Message/_aggregate?m=COUNT(*)&f=TRUNCATE(SendDate,DAY,GMT-2)
	&q=SendDate >= "2010-07-17"

This query finds all messages whose SendDate is >= "2010-07-17". For each one, it subtracts 2 hours from the SendDate value and then truncates ("rounds down") to the nearest day. The count of all objects for each modified timestamp is computed in a separate group.

The value must be one of the following mnemonics:

Precision Meaning
SECOND The milliseconds component of the timestamp is set to 0.
MINUTE The milliseconds and seconds components are set to 0.
HOUR The milliseconds, seconds, and minutes components are set to 0.
DAY All time components are set to 0.
WEEK All time components are set to 0, and the date components are set to the Monday of the calendar week in which the timestamp falls (as defined by ISO 8601). For example 2010-01-02 is truncated to the week 2009-12-28.
MONTH All time components are set to 0, and the day component is set to 1.
QUARTER All time components are set to 0, the day component is set to 1, and the month component is rounded "down" to January, April, July, or October.
YEAR All time components are set to 0 and the day and month components are set to 1.

The optional parameter adds or subtracts a specific amount to each object’s timestamp value before truncating it to the requested granularity. Optionally, the parameter can be quoted in single or double quotes. The syntax of the parameter is:

<timezone> | <GMT offset>

Where uses the same format as the NOW function:

GMT<sign><hours>[:<minutes>]

The meaning of each format is summarized below:

  • <timezone>: A timezone abbreviation (e.g., "PST") or name (e.g., "America/Los_Angeles") can be given. Each object’s timestamp value is assumed to be in GMT (UTC) time and adjusted by the necessary amount to reflect the equivalent value in the given timezone. The allowable values for a abbreviation or name are those recognized by the Java function java.util.TimeZone.getAvailableIDs().

  • GMT+<hour> or GMT-<hour>: The term GMT followed by a plus or minus signed followed by an integer hour value adjust each object’s timestamp up or down by the given number of hours.

  • GMT+<hour>:<minute> or GMT-<hour>:<minute>: This is the same as the previous format except that each object’s timestamp is adjusted up or down by the given hour and minute value.

Note that in the GMT versions, the sign (‘+’ or ‘-‘) is required, and in URIs, the ‘+’ sign must be escaped as %2B.

The timestamp field passed to the TRUNCATE function can belong to the perspective table, or it can be at the end of a field path (e.g., TRUNCATE(Messages.SendDate)).

When a grouping field uses the TRUNCATE function, the truncated value is used for the field value within each group. An example in XML is shown below:

<results>
	<aggregate query="SendDate >= 2010-07-17" metric="COUNT(*)" group="TRUNCATE(SendDate,HOUR)"/>
	<groups>
		<group>
			<field name="SendDate">2010-07-17 00:00:00</field>
			<metric>5</metric>
		</group>
		<group>
			<field name="SendDate">2010-07-17 01:00:00</field>
			<metric>4</metric>
		</group>
		<group>
			<field name="SendDate">2010-07-17 02:00:00</field>
			<metric>4</metric>
		</group>
		...
	</groups>
	<summary>6030</summary>
</results>

In JSON:

{"results": {
	"aggregate": {
		"group": "TRUNCATE(SendDate,HOUR)",
		"metric": "COUNT(*)",
		"query": "SendDate >= 2010-07-17"
	},
	"groups": [
		{"group": {
			"field": {"SendDate": "2010-07-17 00:00:00"},
			"metric": "5"
		}},
		{"group": {
			"field": {"SendDate": "2010-07-17 01:00:00"},
			"metric": "4"
		}},
		{"group": {
			"field": {"SendDate": "2010-07-17 02:00:00"},
			"metric": "4"
		}},
		...
	]
}}

UPPER and LOWER Functions

When a text field is used as a grouping field in an aggregate query, actual field values are used to form each group value. For example, in this query:

.../_aggregate?m=COUNT(*)&f=Extension&...

If the field Extension has identical but differently-cased values such as ".jpg" and “.JPG”, a group is created for each one and the metric function (COUNT) is applied to each one.

When a text field is used as the grouping field, values can be case-normalized as they are used as grouping field values. This can be done with the UPPER and LOWER functions, which translate each text field accordingly as it is sorted into aggregated groups. Example:

.../_aggregate?m=COUNT(*)&f=LOWER(Extension)&...

This causes the Extension field to be down-cased before it is sorted into its metric group. Hence, both values ".jpg" and “.JPG” are counted in a single group.

WHERE Function

The WHERE function can be used to provide filtering on a path used in a grouping expression. Most importantly, it can be used for multi-clause expressions that are bound to the same objects. To illustrate why the WHERE clause is needed and how it is used, here’s an example.

Suppose we want to count messages grouped by the domain name of each message’s recipients, but we only want recipients that received the message after a certain date and the recipient’s address is considered external. As an example, this aggregate query won’t work:

// Doesn’t do what we want
GET /Msgs/Message/_aggregate?m=COUNT(*)
	&f=Recipients.MessageAddress.Domain.Name
	&q=Recipients.ReceiptDate > "2014-01-01" AND Recipients.MessageAddress.Domain.IsInternal=false

This query doesn’t work because it selects messages for which at least one recipient’s ReceiptDate is > "2014-01-01", and at least one recipient has an external domain. Every such message is then counted in all of its Recipients.MessageAddress.Domain.Name values, even for those that don’t really qualify.

Using the WHERE filter for query expressions, we could bind the two query clauses to the same Recipients instances. But this query still doesn’t work:

// Still not what we want
GET /Msgs/Message/_aggregate?m=COUNT(*)
	&f=Recipients.MessageAddress.Domain.Name
	&q=Recipients.WHERE(ReceiptDate > "2014-01-01" AND MessageAddress.Domain.IsInternal=false)

This causes the correct objects to be selected, but it still counts them in all Recipients.MessageAddress.Domain.Name groups, not just those found with the query expression.

For this scenario, we can use the WHERE function in the grouping parameter instead of the query parameter. In a grouping parameter, the WHERE function filters out group values we don’t want. And, when the object selection criteria lies solely in the choice of groups, we don’t need a separate query parameter. The solution to the previous problem can be expressed as follows:

GET /Msgs/Message/_aggregate?m=COUNT(*)
	&f=Recipients.WHERE(ReceiptDate > "2014-01-01" AND 
		MessageAddress.Domain.IsInternal=false).MessageAddress.Domain.Name

The grouping field is still Recipients.MessageAddress.Domain.Name, but the WHERE function inserted after Recipients filters values used for grouping. The first field in each WHERE clause (ReceiptDate and MessageAddress) must be members of the same table as Recipients, thereby filtering the recipients in some manner. In this case, only recipients whose ReceiptDate is > "2014-01-01" and whose MessageAddress.Domain.IsInternal is false. Groups are created by domains of recipients that match those constraints, and only objects within those group values are counted.

But wait! It gets better! The WHERE function can be applied to multiple components of the same grouping path as long as each subquery is qualified to the path component to which it is attached. Exploiting this, we can factor out the redundant specification of MessageAddress.Domain with this shorter but equivalent expression:

GET /Msgs/Message/_aggregate?m=COUNT(*)
	&f=Recipients.WHERE(ReceiptDate > "2014-01-01").Address.Domain.WHERE(IsInternal=false).Name

Neat, yes?

Clone this wiki locally