Skip to content

Grouping Parameter (OLAP)

JoeWinter edited this page Feb 26, 2015 · 10 revisions

[Table of Contents](https://github.com/dell-oss/Doradus/wiki/OLAP Databases: Table-of-Contents) | [Previous](https://github.com/dell-oss/Doradus/wiki/Metric-Parameter for Aggregate Queries (OLAP)) | Next
OLAP Aggregate Queries: Grouping Parameter Overview


When a grouping parameter is provided, it causes the aggregate query to compute multiple values, one per *group value* as defined by the grouping expression. A wide range of grouping expressions are allowed as described in the following sections.

Global Aggregates: No Grouping Parameter

Without a grouping parameter, an aggregate query returns a single value: the metric function computed across all selected objects. Consider the following aggregate query URI REST command:

GET /Email/Message/_aggregate?m=MAX(Size)&range=0

This aggregate query returns the largest Size value among all messages in all shards. The response in XML is:

<results>
	<aggregate metric="MAX(Size)"/>
	<value>16796009</value>
</results>

In JSON:

{"results": {
	"aggregate": {"metric": "MAX(Size)"},
	"value": "16796009"
}}

As shown, an aggregate element lists the parameters used for the aggregate query. In this case, only a metric parameter was provided. For a global aggregate, the metric value is provided in the value element.

When the grouping field is multi-valued with respect to the perspective table, the metric is applied across all values for each perspective object. For example, in this query:

GET /Email/Message/_aggregate?m=COUNT(Tags)&range=0

If Tags is an MV scalar, all values for each message are counted, so the total returned may be more than the number of objects. Furthermore, an object related to a perspective object may be processed more than once. Consider this query:

GET /Email/Message/_aggregate?m=COUNT(ExternalRecipients.MessageAddress.Domain)&shards=2014-06-01

Some messages are likely to have multiple external recipients linked to the same domain. Therefore, ExternalRecipients.MessageAddress.Domain will count the same domain multiple times for those messages.

Single-level Grouping

When the grouping parameter consists of a single grouping field, objects are divided into sets based on the distinct values found for the grouping field. A separate metric value is computed for each group. For example, this aggregate query uses a single-valued scalar as the grouping field:

GET /Email/Message/_aggregate?m=MAX(Size)&f=Tags&range=0

The Tags field logically partitions objects into groups: one for each field value. Each object is included in each group for which is has a Tags value. If an object has no Tags value, it is placed in a (null) group. The maximum Size is then computed for each group. A typical result in XML is shown below:

<results>
	<aggregate metric="MAX(Size)" group="Tags"/>
	<totalobjects>6030</totalobjects>
	<summary>16796009</summary>
	<groups>
		<group>
			<metric>4875</metric>
			<field name="Tags">(null)</field>
		</group>
		<group>
			<metric>16796009</metric>
			<field name="Tags">AfterHours</field>
		</group>
		<group>
			<metric>16796009</metric>
			<field name="Tags">Customer</field>
		</group>
	</groups>
</results>

In JSON:

{"results": {
	"aggregate": {"metric": "MAX(Size)", "group": "Tags"},
	"totalobjects": "6030",
	"summary": "16796009",
	"groups": [
		{"group": {
			"metric": "4875",
			"field": {"Tags": "(null)"}
		}},
		{"group": {
			"metric": "16796009",
			"field": {"Tags": "AfterHours"}
		}},
		{"group": {
			"metric": "16796009",
			"field": {"Tags":"Customer"}
		}}
	]
}}

For grouped aggregate queries, the results element contains a groups element, which contains one group for each group value. Each group contains the field name and value for that group and the corresponding metric value. The totalobjects value computes the number of objects selected in the computation. The summary value computes the metric value across all selected objects independent of groups. For the AVERAGE function, this provides the true average, not the average of averages.

Grouping Field Aliases

By default, aggregate query group results use the fully-qualified path name of each grouping field. You can shorten the output result by using an alias for each grouping field. The alias name is used instead of the fully-qualified name. For example:

GET /Email/Message/_aggregate?m=COUNT(*)&f=TOP(3,Sender.Person.Name)&range=0

This query produces a result such as the following:

<results>
	<aggregate metric="COUNT(*)" group="TOP(3,Sender.Person.Name)"/>
	<totalobjects>6030</totalobjects>
	<summary>6030</summary>
	<totalgroups>186</totalgroups>
	<groups>
		<group>
			<metric>5256</metric>
			<field name="Sender.Person.Name">(null)</field>
		</group>
		<group>
			<metric>82</metric>
			<field name="Sender.Person.Name">Quest Support</field>
		</group>
		<group>
			<metric>80</metric>
			<field name="Sender.Person.Name">spb_setupbuilder</field>
		</group>
	</groups>
</results>

The fully-qualified field name Sender.Person.Name is used for the field parameter in each group element. An alias can be substituted for a grouping field by appending AS *name* or .AS(*name*) to the grouping field. The two syntaxes are equivalent. The AS function’s parameter can be an unquoted term or a quoted string. Example:

GET /Email/Message/_aggregate?m=COUNT(*)&f=TOP(3,Sender.Person.Name) AS Name&range=0
GET /Email/Message/_aggregate?m=COUNT(*)&f=TOP(3,Sender.Person.Name).AS(Name)&range=0

With either form, the alias Name is used in the output as shown below:

<results>
	<aggregate metric="COUNT(*)" group="TOP(3,Sender.Person.Name)"/>
	<totalobjects>6030</totalobjects>
	<summary>6030</summary>
	<totalgroups>186</totalgroups>
	<groups>
		<group>
			<metric>5256</metric>
			<field name="Name">(null)</field>
		</group>
		<group>
			<metric>82</metric>
			<field name="Name">Quest Support</field>
		</group>
		<group>
			<metric>80</metric>
			<field name="Name">spb_setupbuilder</field>
		</group>
	</groups>
</results>

When many groups are returned and/or the grouping field name is long, the alias name can significantly shorten the output results. An alias name can also improve the identity of the groups.

Multi-level Grouping

The grouping parameter can list multiple grouping expressions to form multi-level grouping. Each grouping expression must be a path from the perspective table to a link or scalar field. Below is an example multi-level aggregate query:

GET /Email/Message/_aggregate?m=COUNT(*)&f=TRUNCATE(SendDate,DAY),Tags&range=0

In this example, TRUNCATE(SendDate,DAY) is the top-level grouping field and Tags is the second-level grouping field. The query creates groups based on the cross-product of all grouping field values, and a metric is computed for each combination for which at least one object has a value. A perspective object is included in the metric computation for each group for which it has actual values. A summary value is computed for each non-leaf group, and a totalobjects value is computed for the top-level group. An example result in XML is shown below:

<results>
	<aggregate metric="COUNT(*)" group="TRUNCATE(SendDate,DAY),Tags"/>
	<totalobjects>6030</totalobjects>
	<summary>6030</summary>
	<groups>
		<group>
			<summary>4752</summary>
			<field name="SendDate">2010-07-17 00:00:00</field>
			<groups>
				<group>
					<metric>1</metric>
					<field name="Tags">(null)</field>
				</group>
				<group>
					<metric>4751</metric>
					<field name="Tags">AfterHours</field>
				</group>
				<group>
					<metric>1524</metric>
					<field name="Tags">Customer</field>
				</group>
			</groups>
		</group>
		<group>
			<summary>1278</summary>
			<field name="SendDate">2010-07-18 00:00:00</field>
			<groups>
				<group>
					<metric>1278</metric>
					<field name="Tags">AfterHours</field>
				</group>
				<group>
					<metric>700</metric>
					<field name="Tags">Customer</field>
				</group>
			</groups>
		</group>
	</groups>
</results>

The same response in JSON:

{"results": {
	"aggregate": {"metric": "COUNT(*)", "group": "TRUNCATE(SendDate,DAY),Tags"},
	"totalobjects": "6030",
	"summary": "6030",
	"groups": [
		{"group": {
			"summary": "4752",
			"field": {"SendDate":"2010-07-17 00:00:00"},
			"groups": [
				{"group": {
					"metric": "1",
					"field": {"Tags": "(null)"}
				}},
				{"group": {
					"metric": "4751",
					"field": {"Tags":"AfterHours"}
				}},
				{"group": {
					"metric": "1524",
					"field": {"Tags":"Customer"}
				}}
			]
		}},
		{"group": {
			"summary": "1278",
			"field": {"SendDate": "2010-07-18 00:00:00"},
			"groups": [
				{"group": {
					"metric": "1278",
					"field": {"Tags":"AfterHours"}
				}},
				{"group": {
					"metric": "700",
					"field": {"Tags":"Customer"}
				}}
			]
		}}
	]
}}

Each non-leaf group has an inner groups element containing its corresponding lower-level group elements and a summary element that provides a group-level metric value. Leaf-level groups have a metric element. This structure is recursive if there are more than two grouping levels.

Group by Timestamp Subfield

Doradus OLAP supports grouping by timestamp subfields. For example:

.../Message/_aggregate?q=*&m=COUNT(*)&f=TOP(3,SendDate.HOUR)&range=0

SendDate is a timestamp field belonging to the Message table; the subfield HOUR extracts the hour-of-day component of the timestamp, which is a value between 0 and 23. Since SendDate.HOUR is wrapped in a TOP function, the grouping parameter produces the 3 top-most hours. In other words, this aggregate query finds the 3 hours of the day in which the most messages are sent.

Grouping by a timestamp subfield is different than using the TRUNCATE function. Grouping by a subfield extracts a timestamp component, so there will never be more values than the component allows (e.g., 12 months in a year, 24 hours in a day). In comparison, TRUNCATE(SendDate,HOUR) rounds down (truncates) each SendDate value to its hour precision, but it will produce a value for every year-month-day-hour combination found among selected objects.

Special Grouping Functions

This section describes special functions that provide enhanced behavior for the grouping parameter.

BATCH Function

The BATCH function divides a scalar field’s values into specific ranges. Each range becomes a grouping field value, and objects contribute to the metric computation for the ranges for which it has values. The BATCH function’s first value must be a scalar field. The remaining values must be literal values compatible with the field’s type (text, timestamp, or numeric), and they must be given in ascending order. Example:

GET /Email/Message/_aggregate?m=COUNT(*)&f=BATCH(Size,100,1000,10000,100000)&shards=2014-06-01

This query counts messages grouped by specific ranges of the Size field from the shard named 2014-06-01. The ranges are divided at the given literal values: 100, 1000, 10000, and 100000. The lowest value implicitly creates an extra less than group; the highest value is open-ended and creates a greater than or equal to group. The query in the example above defines the following 5 groups:

Group 1: Size < 100
Group 2: Size >= 100 AND Size < 1000
Group 3: Size >= 1000 AND Size < 10000
Group 4: Size >= 10000 AND Size < 100000
Group 5: Size >= 100000

The example above returns a result such as the following:

<results>
	<aggregate metric="COUNT(Size)" group="BATCH(Size,100,1000,10000,100000)"/>
	<groups>
		<group>
			<field name="Size">&lt;100</field>
			<metric>0</metric>
		</group>
		<group>
			<field name="Size">100-1000</field>
			<metric>125</metric>
		</group>
		<group>
			<field name="Size">1000-10000</field>
			<metric>4651</metric>
		</group>
		<group>
			<field name="Size">10000-100000</field>
			<metric>1149</metric>
		</group>
		<group>
			<field name="Size">&gt;=100000</field>
			<metric>105</metric>
		</group>
	</groups>
	<summary>6030</summary>
</results>

In JSON:

{"results": {
	"aggregate": {"group": "BATCH(Size,100,1000,10000,100000)", "metric": "COUNT(Size)"},
	"groups": [
		{"group": {
			"field": {"Size": "<100"},
			"metric": "0"
		}},
		{"group": {
			"field": {"Size": "100-1000"},
			"metric": "125"
		}},
		{"group": {
			"field": {"Size": "1000-10000"},
			"metric": "4651"
		}},
		{"group": {
			"field": {"Size":"10000-100000"},
			"metric": "1149"
		}},
		{"group": {
			"field": {"Size": ">=100000"},
			"metric": "105"
		}}
	],
	"summary":"6030"
}}

As shown above in the <100 group, if no selected object has a value that falls into one of the specified groups, that group is still returned: the group’s metric is 0 for the COUNT function and empty for all other metric functions. As with all grouped aggregate queries, a summary value is returned that applies the metric function across all groups.

INCLUDE and EXCLUDE Functions

In an aggregate query, normally all values of a scalar grouping field are used to create groups. For example:

GET /Email/Message/_aggregate?f=Tags&...

All values of the Tags field for selected objects are used to create grouping fields. To eliminate specific values from being used for grouping—without affecting the selection of the owning object—the EXCLUDE function can be used:

GET /Email/Message/_aggregate?f=Tags.EXCLUDE('Confidential, 'Internal')&...

When the grouping field is a text field, the values passed to the EXCLUDE function are whole, case-insensitive values—not terms—and must be enclosed in quotes. In the example above, groups matching the value Confidential or Internal or case variations of these are excluded. The values used for text scalars can contain wildcards ? and *. For example:

GET /Email/Message/_aggregate?f=Tags.EXCLUDE('*sam?')&...

This aggregate query excludes all groups that end with the sam*x*, where x is any letter, or case variations of this sequence.

To generate only groups that match specific scalar values—without affecting the selection of the owning object—the INCLUDE function can be used:

GET /Email/Message/_aggregate?f=Tags.INCLUDE('Confidential, 'Internal')&...

The only groups generated are those matching Confidential and Internal and case variations of these; all other values are skipped. Again, when the grouping field is a text scalar, the value must be enclosed in quotes, and it can contain wildcards ? and *.

The values passed to INCLUDE and EXCLUDE must be compatible with the corresponding scalar type field: integers for integer or long fields, Booleans for boolean fields, etc. Additionally, the keyword NULL (uppercase) can be used to include or exclude the (null) group normally generated when at least one object has a null value for the grouping field. Example:

GET /Email/Message/_aggregate?f=Tags.EXCLUDE(NULL)&...

INCLUDE and EXCLUDE can be used in the same grouping expression. Because the functions are processed in order from left-to-right, it is most useful to use INCLUDE first, to select a overly-broad set of values, and then use EXCLUDE to prune-out unwanted values. INCLUDE and EXCLUDE can also be used in conjunction with other grouping functions, as shown in this example:

GET /Email/Person/_aggregate?m=COUNT(*)&f=TOP(10,Name.INCLUDE(aaron*).EXCLUDE(*Murphy))&range=0

This query returns a COUNT of the TOP 10 most-used Name values, but it only INCLUDEs names that begin with “aaron”. Within those names, it EXCLUDEs names that end with “murphy”.

TERMS Function

Groups can be created from the terms used within a specific field. The general format of the TERMS function is:

TERMS(<field name> [, (<stop term 1> <stop term 2> ...)])

Any predefined scalar field can be used, but TERMS is most effective with text fields. The optional stop term list is a list of terms, enclosed in parentheses, that are excluded from the unique set of terms found within the specified field.

For example, the following request fetches the COUNT of messages grouped by terms found within the Subject field for a particular sender. For brevity, the TERMS function is wrapped by the TOP function to limit the results to the five groups with the highest counts:

GET /Email/Message/_aggregate?m=COUNT(*)&q=Sender.MessageAddress.Person.Name:Support
&f=TOP(5,TERMS(Subject))&shards=2014-06-01

Similar to the BATCH function, the TERMS function creates dynamic groups from a text field based on the terms it uses. To do this, as objects matching the query parameter (if any) are found, the field passed to TERMS is parsed into alphanumeric terms, and a group is created for each unique term. Each contributes to the group metric computation for each term it contains. If a term appears multiple times within a field (e.g., "plan for a plan"), the object is only counted once. An example result in XML is shown below:

<results>
	<aggregate query="Sender.MessageAddress.Person.Name:Support" metric="COUNT(*)"
		group="TOP(5,TERMS(Subject))"/>
	<groups>
		<group>
			<field name="Subject">dilemmas</field>
			<metric>14</metric>
		</group>
		<group>
			<field name="Subject">unchary</field>
			<metric>14</metric>
		</group>
		<group>
			<field name="Subject">sundae</field>
			<metric>14</metric>
		</group>
		<group>
			<field name="Subject">tillage</field>
			<metric>14</metric>
		</group>
		<group>
			<field name="Subject">infernal</field>
			<metric>14</metric>
		</group>
	</groups>
	<summary>82</summary>
	<totalgroups>85</totalgroups>
</results>

In JSON:

{"results": {
	"aggregate": {
		"group": "TOP(5,TERMS(Subject))",
		"metric": "COUNT(*)",
		"query": "Sender.MessageAddress.Person.Name:Support"
	},
	"groups": [
		{"group": {
			"field": {"Subject": "dilemmas"},
			"metric": "14"
		}},
		{"group": {
			"field": {"Subject": "unchary"},
			"metric": "14"
		}},
		{"group": {
			"field": {"Subject": "sundae"},
			"metric": "14"
		}},
		{"group": {
			"field": {"Subject": "tillage"},
			"metric": "14"
		}},
		{"group": {
			"field": {"Subject": "infernal"},
			"metric": "14"
		}}
	],
	"summary": "82",
	"totalgroups": "85"
}}

If the terms "sundae" and “tillage” were considered uninteresting, they could be eliminated from the results by listing them as stop terms in a second parenthetical parameter to the TERMS function:

GET /Email/Message/_aggregate?m=COUNT(*)&q=Sender.MessageAddress.Person.Name='Quest Support'
&f=TOP(5,TERMS(Subject,(sundae tillage)))&shards=2014-06-01

TOP and BOTTOM Functions

By default, all group values are returned at each grouping level, and the groups are returned in ascending order of the grouping field value. The TOP and BOTTOM functions can be used to return groups in the order of the metric value. Optionally, they can also be used to limit the number of groups returned to the highest or lowest metric values. The TOP and BOTTOM functions wrap a grouping field expression and specify a limit parameter. For example:

GET /Email/Message/_aggregate?m=SUM(Size)&f=TOP(3,InternalRecipients.Person.Name)&range=0

The first parameter to TOP/BOTTOM is the limit value; the second parameter is the grouping field expression. This aggregate query sums the Size field of message objects, grouped by internal recipient names, but it only returns the groups with the three highest SUM values. Typical results for the example above in XML:

<results>
	<aggregate metric="SUM(Size)" query="*" group="TOP(3,InternalRecipients.Person.Name)"/>
	<totalobjects>6030</totalobjects>
	<summary>190643320</summary>
	<totalgroups>836</totalgroups>
	<groups>
		<group>
			<metric>97744099</metric>
			<field name="InternalRecipients.Person.Name">(null)</field>
		</group>
		<group>
			<metric>20060808</metric>
			<field name="InternalRecipients.Person.Name">Marc Bourauel</field>
		</group>
		<group>
			<metric>16798901</metric>
			<field name="InternalRecipients.Person.Name">Nina Cantauw</field>
		</group>
	</groups>
</results>

In JSON:

{"results": {
	"aggregate": {
		"metric": "SUM(Size)",
		"query": "*",
		"group": "TOP(3,InternalRecipients.Person.Name)"
	},
	"totalobjects": "6030",
	"summary": "190643320",
	"totalgroups": "836",
	"groups": [
		{"group": {
			"metric": "97744099",
			"field": {"InternalRecipients.Person.Name": "(null)"}
		}},
		{"group": {
			"metric": "20060808",
			"field": {"InternalRecipients.Person.Name": "Marc Bourauel"}
		}},
		{"group": {
			"metric": "16798901",
			"field": {"InternalRecipients.Person.Name": "Nina Cantauw"}
		}}
	]
}}

The BOTTOM parameter works the same way but returns the groups with the lowest metric values. When either the TOP or BOTTOM function is used, the total number of groups that were actually computed is returned in the element totalgroups, as shown above.

Some numeric metric expressions can produce a result that is not a valid number, such as when division by zero occurs. Such non-numeric values include "not-a-number" (NaN), positive infinity, and negative infinity. All such invalid numeric results sort to the end (last group) for both TOP and BOTTOM.

When the limit parameter is 0, all groups are returned, but they are returned in metric-computation order. Using 0 for the limit parameter essentially means unlimited.

When the aggregate query has multiple grouping fields, a TOP or BOTTOM function can be used with each grouping field. In secondary groups, TOP and BOTTOM return groups whose metric values are computed relative to their parent groups. Below is an example aggregate query 2-level grouping using TOP for the outer level and BOTTOM for the inner level:

GET /Email/Message/_aggregate?m=COUNT(*)&f=TOP(3,Sender.Person),BOTTOM(2,TRUNCATE(SendDate,DAY))
&shards=2014-06-01

TRUNCATE Function

The TRUNCATE function truncates a timestamp field to a given granularity, yielding a value that can be used as a grouping field. Before the timestamp field is truncated, the TRUNCATE function can optionally shift the value to another time first. The syntax for the function is:

TRUNCATE(<timestamp field>, <precision> [, <time shift>])

For example:

GET /Email/Message/_aggregate?m=COUNT(*)&f=TRUNCATE(SendDate,DAY,GMT-2)
&q=SendDate >= "2010-07-17"&shards=2014-06-01

This query finds all messages whose SendDate is >= "2010-07-17" in the shard named 2014-06-01. For each one, it subtracts 2 hours from the SendDate value and then truncates ("rounds down") to the nearest day. The count of all objects for each modified timestamp is computed in a separate group.

The value must be one of the following mnemonics:

Precision Meaning
SECOND The milliseconds component of the timestamp is set to 0.
MINUTE The milliseconds and seconds components are set to 0.
HOUR The milliseconds, seconds, and minutes components are set to 0.
DAY All time components are set to 0.
WEEK All time components are set to 0, and the date components are set to the Monday of the calendar week in which the timestamp falls (as defined by ISO 8601). For example 2010-01-02 is truncated to the week 2009-12-28.
MONTH All time components are set to 0, and the day component is set to 1.
QUARTER All time components are set to 0, the day component is set to 1, and the month component is rounded "down" to January, April, July, or October.
YEAR All time components are set to 0 and the day and month components are set to 1.

The optional parameter adds or subtracts a specific amount to each object’s timestamp value before truncating it to the requested granularity. Optionally, the parameter can be quoted in single or double quotes. The syntax of the parameter is:

<timezone> | <GMT offset>

Where uses the same format as the NOW function:

GMT<sign><hours>[:<minutes>]

The meaning of each format is summarized below:

  • <timezone>: A timezone abbreviation (e.g., "PST") or name (e.g., "America/Los_Angeles") can be given. Each object’s timestamp value is assumed to be in GMT (UTC) time and adjusted by the necessary amount to reflect the equivalent value in the given timezone. The allowable values for a abbreviation or name are those recognized by the Java function java.util.TimeZone.getAvailableIDs().

  • GMT+<hour> or GMT-<hour>: The term GMT followed by a plus or minus signed followed by an integer hour value adjust each object’s timestamp up or down by the given number of hours.

  • GMT+<hour>:<minute> or GMT-<hour>:<minute>: This is the same as the previous format except that each object’s timestamp is adjusted up or down by the given hour and minute value.

Note that in the GMT versions, the sign (‘+’ or ‘-‘) is required, and in URIs, the ‘+’ sign must be escaped as %2B.

The timestamp field passed to the TRUNCATE function can belong to the perspective table, or it can be at the end of a field path (e.g., TRUNCATE(Messages.SendDate)).

When a grouping field uses the TRUNCATE function, the truncated value is used for the field value within each group. An example in XML is shown below:

<results>
	<aggregate query="SendDate >= 2010-07-17" metric="COUNT(*)" group="TRUNCATE(SendDate,HOUR)"/>
	<groups>
		<group>
			<field name="SendDate">2010-07-17 00:00:00</field>
			<metric>5</metric>
		</group>
		<group>
			<field name="SendDate">2010-07-17 01:00:00</field>
			<metric>4</metric>
		</group>
		<group>
			<field name="SendDate">2010-07-17 02:00:00</field>
			<metric>4</metric>
		</group>
		...
	</groups>
	<summary>6030</summary>
</results>

In JSON:

{"results": {
	"aggregate": {
		"group": "TRUNCATE(SendDate,HOUR)",
		"metric": "COUNT(*)",
		"query": "SendDate >= 2010-07-17"
	},
	"groups": [
		{"group": {
			"field": {"SendDate": "2010-07-17 00:00:00"},
			"metric": "5"
		}},
		{"group": {
			"field": {"SendDate": "2010-07-17 01:00:00"},
			"metric": "4"
		}},
		{"group": {
			"field": {"SendDate": "2010-07-17 02:00:00"},
			"metric": "4"
		}},
		...
	]
}}

When the precision parameter is DAY, WEEK, MONTH, QUARTER, or YEAR, the timestamp values in group names are returned as dates only. For example:

GET /Email/Message/_aggregate?m=COUNT(*)&f=TRUNCATE(SendDate,WEEK)&range=0

This returns a response such as the following:

<results>
    <aggregate metric="COUNT(*)" query="*" group="TRUNCATE(SendDate,WEEK)"/>
    <totalobjects>6030</totalobjects>
    <summary>6030</summary>
    <totalgroups>2</totalgroups>
    <groups>
        <group>
            <metric>5972</metric>
            <field name="TRUNCATE(SendDate,WEEK)">2010-07-19</field>
        </group>
        <group>
            <metric>58</metric>
            <field name="TRUNCATE(SendDate,WEEK)">2010-07-26</field>
        </group>
    </groups>
</results>

UPPER and LOWER Functions

When a text field is used as a grouping field in an aggregate query, actual field values are used to form each group value. For example, in this query:

.../_aggregate?m=COUNT(*)&f=Extension&...

If the field Extension has identical but differently-cased values such as ".jpg" and “.JPG”, a group is created for each one and the metric function (COUNT) is applied to each one.

When a text field is used as the grouping field, values can be case-normalized as they are used as grouping field values. This can be done with the UPPER and LOWER functions, which translate each text field accordingly as it is sorted into aggregated groups. Example:

.../_aggregate?m=COUNT(*)&f=LOWER(Extension)&...

This causes the Extension field to be down-cased before it is sorted into its metric group. Hence, both values ".jpg" and “.JPG” are counted in a single group.

WHERE Function

The WHERE function can be used to provide filtering on a path used in a grouping expression. Most importantly, it can be used for multi-clause expressions that are bound to the same objects. To illustrate why the WHERE clause is needed and how it is used, here’s an example.

Suppose we want to count messages grouped by the domain name of each message’s recipients, but we only want recipients that received the message after a certain date and the recipient’s address is considered external. As an example, this aggregate query won’t work:

// Doesn’t do what we want
GET /Email/Message/_aggregate?m=COUNT(*)
	&f=Recipients.MessageAddress.Domain.Name
	&q=Recipients.ReceiptDate > "2014-01-01" AND Recipients.MessageAddress.Domain.IsInternal=false
	&shards=...

This query doesn’t work because it selects messages for which at least one recipient’s ReceiptDate is > "2014-01-01", and at least one recipient has an external domain. Every such message is then counted in all of its Recipients.MessageAddress.Domain.Name values, even for those that don’t really qualify.

Using the WHERE filter for query expressions, we could bind the two query clauses to the same Recipients instances. But this query still doesn’t work:

// Still not what we want
GET /Email/Message/_aggregate?m=COUNT(*)
	&f=Recipients.MessageAddress.Domain.Name
	&q=Recipients.WHERE(ReceiptDate > "2014-01-01" AND MessageAddress.Domain.IsInternal=false)
	&shards=...

This causes the correct objects to be selected, but it still counts them in all Recipients.MessageAddress.Domain.Name groups, not just those found with the query expression.

For this scenario, we can use the WHERE function in the grouping parameter instead of the query parameter. In a grouping parameter, the WHERE function filters out group values we don’t want. And, when the object selection criteria lies solely in the choice of groups, we don’t need a separate query parameter. The solution to the previous problem can be expressed as follows:

GET /Email/Message/_aggregate?m=COUNT(*)
	&f=Recipients.WHERE(ReceiptDate > "2014-01-01" AND 	
		MessageAddress.Domain.IsInternal=false).MessageAddress.Domain.Name
	&shards=...

The grouping field is still Recipients.MessageAddress.Domain.Name, but the WHERE function inserted after Recipients filters values used for grouping. The first field in each WHERE clause (ReceiptDate and MessageAddress) must be members of the same table as Recipients, thereby filtering the recipients in some manner. In this case, only recipients whose ReceiptDate is > "2014-01-01" and whose MessageAddress.Domain.IsInternal is false. Groups are created by domains of recipients that match those constraints, and only objects within those group values are counted.

But wait! It gets better! The WHERE function can be applied to multiple components of the same grouping path as long as each subquery is qualified to the path component to which it is attached. Exploiting this, we can factor out the redundant specification of MessageAddress.Domain with this shorter but equivalent expression:

GET /Email/Message/_aggregate?m=COUNT(*)
&f=Recipients.WHERE(ReceiptDate > "2014-01-01").Address.Domain.WHERE(IsInternal=false).Name
&shards=...

Outer WHERE Function

The WHERE filter usually follows a link name to select related objects connected via that link. However, a link path can begin when a WHERE filter, in which case it selects perspective objects. For example:

GET /Email/Person/_aggregate?range=0&m=COUNT(*)&f=WHERE(Department:sales).Office

In this query, the grouping field is Office. But because it is prefixed with a WHERE filter, the filter is applied to perspective objects (Person). Following an outer WHERE filter, the scope of the link path remains at the query perspective, consequently the subsequent link path or additional WHERE filters must be applicable to the query perspective.

When the grouping field is used with a grouping function such as BATCH, TERMS, or TOP, the outer WHERE field is always specified as a prefix to the grouping field. Examples:

GET /Email/Person/_aggregate?range=0&m=COUNT(*)&f=TOP(6,WHERE(Department:sales).Office)
GET /Email/Message/_aggregate?range=0&m=COUNT(*)
	&f=BATCH(WHERE(Tags=AfterHours).Size,1000,10000,100000)
GET /Email/Message/_aggregate?range=0&m=COUNT(*)&f=TOP(5,TERMS(WHERE(Size>100000).Subject))
GET /Email/Message/_aggregate?range=0&m=COUNT(*)
	&f=TERMS(WHERE(Size>100000).Subject),WHERE(Sender.Person.LastName:Cuthill).Tags

If present, the query parameter (&q) is used first to select objects, the total number of which is reflected in the <totalobjects> element in the query results. Each outer WHERE function is used to further filter objects passed to the calculations of the corresponding aggregate groups.

FIRST and LAST Functions

The FIRST and LAST functions are similar to the TOP and BOTTOM functions except that a limited set of groups are returned based on the group names instead of the metric values.

GET /Email/Person/_aggregate?q=*&m=COUNT(*)&f=FIRST(10,LastName)&range=0

This query returns the first 10 LastName groups alphabetically. LAST returns the last groups (descending sort). This should help with aggregate queries that must be “paged”.

Flattened Multi-level Grouping

An aggregate query with multi-level grouping is returned as a nested document. When the TOP, BOTTOM, FIRST, or LAST functions are used, the limit value is normally applied to the corresponding grouping level only. For example:

GET /Email/Message/_aggregate?range=0&m=COUNT(*)
    &f=TOP(10,Sender.Person.LastName),Sender.Person.FirstName

This query returns the first 10 Sender.Person.LastName groups, but it returns all FirstName subgroups for each LastName group. The query can be flattened by adding the REST parameter &flat=true:

GET /Email/Message/_aggregate?range=0&m=COUNT(*)
    &f=TOP(10,Sender.Person.LastName),Sender.Person.FirstName&flat=true

This returns only the top 10 LastName/FirstName group pairs. This allows a more predictable result size, e.g., when the results will be displayed in a web page.

####SETS Function The SETS function allows aggregate query groups to be created from arbitrary query expressions. The general syntax is:

SETS(<query 1> [AS <name>], <query 2> [AS <name>], ..., <query n> [AS <name>]) [AS <name>])

Where each is a DQL query expression that selects a set of objects from the perspective table. The queries can overlap such that multiple expressions can select the same objects. For example, the following query selects messages with overlapping Size values:

GET /Email/Message/_aggregate?range=0&m=COUNT(*)
&f=SETS(Size < 1000, Size > 100 AND Size < 100000, Size > 10000)

This returns a result such as the following:

<results>
    <aggregate metric="COUNT(*)" query="*" group="SETS(Size < 1000, Size > 100 AND Size < 100000, Size > 10000)"/>
    <totalobjects>6030</totalobjects>
    <summary>6030</summary>
    <totalgroups>3</totalgroups>
    <groups>
        <group>
            <metric>125</metric>
            <field name="SETS(Size < 1000, Size > 100 AND Size < 100000, Size > 10000)">Size IN {null - 1000}</field>
        </group>
        <group>
            <metric>5925</metric>
            <field name="SETS(Size < 1000, Size > 100 AND Size < 100000, Size > 10000)">(Size IN {100 - null}) AND (Size IN {null - 100000})</field>
        </group>
        <group>
            <metric>1254</metric>
            <field name="SETS(Size < 1000, Size > 100 AND Size < 100000, Size > 10000)">Size IN {10000 - null}</field>
        </group>
    </groups>
</results>

By default, each group uses the corresponding query expression in both the field name and value. The query expression text may not match the original query text exactly since expressions may be transformed internally. A custom field name can be used by using the AS syntax after the SETS function. Example:

GET /Email/Message/_aggregate?range=0&m=COUNT(*)
&f=SETS(Size < 1000, Size > 100 AND Size < 100000, Size > 10000) AS Sizes

This creates the simpler output format:

<results>
    <aggregate metric="COUNT(*)" query="*" group="SETS(Size < 1000, Size > 100 AND Size < 100000, Size > 10000) AS Sizes"/>
    <totalobjects>6030</totalobjects>
    <summary>6030</summary>
    <totalgroups>3</totalgroups>
    <groups>
        <group>
            <metric>125</metric>
            <field name="Sizes">Size IN {null - 1000}</field>
        </group>
        <group>
            <metric>5925</metric>
            <field name="Sizes">(Size IN {100 - null}) AND (Size IN {null - 100000})</field>
        </group>
        <group>
            <metric>1254</metric>
            <field name="Sizes">Size IN {10000 - null}</field>
        </group>
    </groups>
</results>

The field values can also be customized by using the AS syntax after each query expression. For example:

GET /Email/Message/_aggregate?range=0&m=COUNT(*)
&f=SETS(Size < 1000 AS Littles, Size > 100 AND Size < 100000 AS Middles,
Size > 10000 AS Biggies) AS Sizes

This creates an even simpler format:

<results>
    <aggregate metric="COUNT(*)" query="*" group="SETS(Size < 1000 AS Littles, Size > 100 AND Size < 100000 AS Middles, Size > 10000 AS Biggies) AS Sizes"/>
    <totalobjects>6030</totalobjects>
    <summary>6030</summary>
    <totalgroups>3</totalgroups>
    <groups>
        <group>
            <metric>125</metric>
            <field name="Sizes">Littles</field>
        </group>
        <group>
            <metric>5925</metric>
            <field name="Sizes">Middles</field>
        </group>
        <group>
            <metric>1254</metric>
            <field name="Sizes">Biggies</field>
        </group>
    </groups>
</results>

Note that if objects are selected by the aggregate query but not included in any query expressions defined by the SETS function, those objects are automatically included in a null set. For example:

GET /Email/Message/_aggregate? range=0&m=COUNT(*)
&f=SETS(Size < 10 AS BittyEmails, Size > 100000 AS MegaEmails) AS Sizes

Messages that don’t fit either of the two query expressions are included in a group named (null) as shown below:

<results>
    <aggregate metric="COUNT(*)" query="*" group="SETS(Size < 10 AS BittyEmails, Size > 100000 AS MegaEmails) AS Sizes"/>
    <totalobjects>6030</totalobjects>
    <summary>6030</summary>
    <totalgroups>3</totalgroups>
    <groups>
        <group>
            <metric>5925</metric>
            <field name="Sizes">(null)</field>
        </group>
        <group>
            <metric>0</metric>
            <field name="Sizes">BittyEmails</field>
        </group>
        <group>
            <metric>105</metric>
            <field name="Sizes">MegaEmails</field>
        </group>
    </groups>
</results>

As shown, the majority of messages (5925) did not match either query expression and were therefore included in the (null) group. No messages matches the BittyEmails query expression, hence the metric value for that group is 0. Note that the query expressions do not need to use the same fields. Each query expression can use any criteria to create its groups. Example:

GET /Email/Message/_aggregate?m=COUNT(*)&range=0
&f=SETS(Size < 1000, Tags=Customer, Sender.Person.Department:support)
Clone this wiki locally