/
aggregation.rst
847 lines (622 loc) · 25.6 KB
/
aggregation.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
.. highlight:: psql
.. _aggregation:
===========
Aggregation
===========
When :ref:`selecting data <sql_dql_aggregation>` from CrateDB, you can use an
`aggregate function`_ to calculate a single summary value for one or more
columns.
For example::
cr> SELECT count(*) FROM locations;
+----------+
| count(*) |
+----------+
| 13 |
+----------+
SELECT 1 row in set (... sec)
Here, the :ref:`count(*) <aggregation-count-star>` function computes the result
across all rows.
Aggregate :ref:`functions <gloss-function>` can be used with the
:ref:`sql_dql_group_by` clause. When used like this, an aggregate function
returns a single summary value for each grouped collection of column values.
For example::
cr> SELECT kind, count(*) FROM locations GROUP BY kind;
+-------------+----------+
| kind | count(*) |
+-------------+----------+
| Galaxy | 4 |
| Star System | 4 |
| Planet | 5 |
+-------------+----------+
SELECT 3 rows in set (... sec)
.. TIP::
Aggregation works across all the rows that match a query or on all matching
rows in every distinct group of a ``GROUP BY`` statement. Aggregating
``SELECT`` statements without ``GROUP BY`` will always return one row.
.. rubric:: Table of contents
.. contents::
:local:
.. _aggregation-expressions:
Aggregate expressions
=====================
An *aggregate expression* represents the application of an :ref:`aggregate
function <aggregation-functions>` across rows selected by a query. Besides the
function signature, :ref:`expressions <gloss-expression>` might contain
supplementary clauses and keywords.
The synopsis of an aggregate expression is one of the following::
aggregate_function ( * ) [ FILTER ( WHERE condition ) ]
aggregate_function ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE condition ) ]
Here, ``aggregate_function`` is a name of an aggregate function and
``expression`` is a column reference, :ref:`scalar function <scalar-functions>`
or literal.
If ``FILTER`` is specified, then only the rows that met the
:ref:`sql_dql_where_clause` condition are supplied to the aggregate function.
The optional ``DISTINCT`` keyword is only supported by aggregate functions
that explicitly mention its support. Please refer to existing
:ref:`limitations <aggregation-limitations>` for further information.
The aggregate expression form that uses a ``wildcard`` instead of an
``expression`` as a function argument is supported only by the ``count(*)``
aggregate function.
.. _aggregation-functions:
Aggregate functions
===================
.. _aggregation-arbitrary:
``arbitrary(column)``
---------------------
The ``arbitrary`` aggregate function returns a single value of a column.
Which value it returns is not defined.
Its return type is the type of its parameter column and can be ``NULL`` if the
column contains ``NULL`` values.
Example::
cr> select arbitrary(position) from locations;
+---------------------+
| arbitrary(position) |
+---------------------+
| ... |
+---------------------+
SELECT 1 row in set (... sec)
::
cr> select arbitrary(name), kind from locations
... where name != ''
... group by kind order by kind desc;
+-...-------------+-------------+
| arbitrary(name) | kind |
+-...-------------+-------------+
| ... | Star System |
| ... | Planet |
| ... | Galaxy |
+-...-------------+-------------+
SELECT 3 rows in set (... sec)
An example use case is to group a table with many rows per user by ``user_id``
and get the ``username`` for every group, that means every user. This works as
rows with same ``user_id`` have the same ``username``. This method performs
better than grouping on ``username`` as grouping on number types is generally
faster than on strings. The advantage is that the ``arbitrary`` function does
very little to no computation as for example ``max`` aggregate function would
do.
.. _aggregation-any-value:
``any_value(column)``
---------------------
``any_value`` is an alias for :ref:`arbitrary <aggregation-arbitrary>`.
Example::
cr> select any_value(x) from unnest([1, 1]) t (x);
+--------------+
| any_value(x) |
+--------------+
| 1 |
+--------------+
SELECT 1 row in set (... sec)
.. _aggregation-array-agg:
``array_agg(column)``
---------------------
The ``array_agg`` aggregate function concatenates all input values into an
array.
::
cr> SELECT array_agg(x) FROM (VALUES (42), (832), (null), (17)) as t (x);
+---------------------+
| array_agg(x) |
+---------------------+
| [42, 832, null, 17] |
+---------------------+
SELECT 1 row in set (... sec)
.. SEEALSO::
:ref:`aggregation-string-agg`
.. _aggregation-avg:
``avg(column)``
---------------
The ``avg`` and ``mean`` aggregate function returns the arithmetic mean, the
*average*, of all values in a column that are not ``NULL``. It accepts all
numeric, timestamp and interval types as single argument. For ``numeric``
argument type the return type is ``numeric``, for ``interval`` argument type the
return type is ``interval`` and for other argument type the return type is
``double``.
Example::
cr> select avg(position), kind from locations
... group by kind order by kind;
+---------------+-------------+
| avg(position) | kind |
+---------------+-------------+
| 3.25 | Galaxy |
| 3.0 | Planet |
| 2.5 | Star System |
+---------------+-------------+
SELECT 3 rows in set (... sec)
The ``avg`` aggregation on the ``bigint`` column might result in a precision
error if sum of elements exceeds 2^53::
cr> select avg(t.val) from
... (select unnest([9223372036854775807, 9223372036854775807]) as val) t;
+-----------------------+
| avg(val) |
+-----------------------+
| 9.223372036854776e+18 |
+-----------------------+
SELECT 1 row in set (... sec)
To address the precision error of the avg aggregation, we cast the aggregation
column to the ``numeric`` data type::
cr> select avg(t.val :: numeric) from
... (select unnest([9223372036854775807, 9223372036854775807]) as val) t;
+---------------------------+
| avg(cast(val AS numeric)) |
+---------------------------+
| 9223372036854775807 |
+---------------------------+
SELECT 1 row in set (... sec)
.. _aggregation-avg-distinct:
``avg(DISTINCT column)``
~~~~~~~~~~~~~~~~~~~~~~~~
The ``avg`` aggregate function also supports the ``distinct`` keyword. This
keyword changes the behaviour of the function so that it will only average the
number of distinct values in this column that are not ``NULL``::
cr> select
... avg(distinct position) AS avg_pos,
... count(*),
... date
... from locations group by date
... order by 1 desc, count(*) desc;
+---------+----------+---------------+
| avg_pos | count(*) | date |
+---------+----------+---------------+
| 4.0 | 1 | 1367366400000 |
| 3.6 | 8 | 1373932800000 |
| 2.0 | 4 | 308534400000 |
+---------+----------+---------------+
SELECT 3 rows in set (... sec)
::
cr> select avg(distinct position) AS avg_pos from locations;
+---------+
| avg_pos |
+---------+
| 3.5 |
+---------+
SELECT 1 row in set (... sec)
.. _aggregation-count:
``count(column)``
-----------------
In contrast to the :ref:`aggregation-count-star` function the ``count``
function used with a column name as parameter will return the number of rows
with a non-``NULL`` value in that column.
Example::
cr> select count(name), count(*), date from locations group by date
... order by count(name) desc, count(*) desc;
+-------------+----------+---------------+
| count(name) | count(*) | date |
+-------------+----------+---------------+
| 7 | 8 | 1373932800000 |
| 4 | 4 | 308534400000 |
| 1 | 1 | 1367366400000 |
+-------------+----------+---------------+
SELECT 3 rows in set (... sec)
.. _aggregation-count-distinct:
``count(DISTINCT column)``
~~~~~~~~~~~~~~~~~~~~~~~~~~
The ``count`` aggregate function also supports the ``distinct`` keyword. This
keyword changes the behaviour of the function so that it will only count the
number of distinct values in this column that are not ``NULL``::
cr> select
... count(distinct kind) AS num_kind,
... count(*),
... date
... from locations group by date
... order by num_kind, count(*) desc;
+----------+----------+---------------+
| num_kind | count(*) | date |
+----------+----------+---------------+
| 1 | 1 | 1367366400000 |
| 3 | 8 | 1373932800000 |
| 3 | 4 | 308534400000 |
+----------+----------+---------------+
SELECT 3 rows in set (... sec)
::
cr> select count(distinct kind) AS num_kind from locations;
+----------+
| num_kind |
+----------+
| 3 |
+----------+
SELECT 1 row in set (... sec)
.. _aggregation-count-star:
``count(*)``
~~~~~~~~~~~~
This aggregate function simply returns the number of rows that match the query.
``count(columName)`` is also possible, but currently only works on a primary
key column. The semantics are the same.
The return value is always of type ``bigint``.
::
cr> select count(*) from locations;
+----------+
| count(*) |
+----------+
| 13 |
+----------+
SELECT 1 row in set (... sec)
``count(*)`` can also be used on group by queries::
cr> select count(*), kind from locations group by kind order by kind asc;
+----------+-------------+
| count(*) | kind |
+----------+-------------+
| 4 | Galaxy |
| 5 | Planet |
| 4 | Star System |
+----------+-------------+
SELECT 3 rows in set (... sec)
.. _aggregation-geometric-mean:
``geometric_mean(column)``
--------------------------
The ``geometric_mean`` aggregate function computes the geometric mean, a mean
for positive numbers. For details see: `Geometric Mean`_.
``geometric mean`` is defined on all numeric types and on timestamp. It always
returns double values. If a value is negative, all values were null or we got
no value at all ``NULL`` is returned. If any of the aggregated values is ``0``
the result will be ``0.0`` as well.
.. CAUTION::
Due to java double precision arithmetic it is possible that any two
executions of the aggregate function on the same data produce slightly
differing results.
Example::
cr> select geometric_mean(position), kind from locations
... group by kind order by kind;
+--------------------------+-------------+
| geometric_mean(position) | kind |
+--------------------------+-------------+
| 2.6321480259049848 | Galaxy |
| 2.6051710846973517 | Planet |
| 2.213363839400643 | Star System |
+--------------------------+-------------+
SELECT 3 rows in set (... sec)
.. _aggregation-hyperloglog-distinct:
``hyperloglog_distinct(column, [precision])``
---------------------------------------------
The ``hyperloglog_distinct`` aggregate function calculates an approximate count
of distinct non-null values using the `HyperLogLog++`_ algorithm.
The return value data type is always a ``bigint``.
The first argument can be a reference to a column of all
:ref:`data-types-primitive`. :ref:`data-types-container` and
:ref:`data-types-geo` are not supported.
The optional second argument defines the used ``precision`` for the
`HyperLogLog++`_ algorithm. This allows to trade memory for accuracy, valid
values are ``4`` to ``18``. A precision of ``4`` uses approximately ``16``
bytes of memory. Each increase in precision doubles the memory requirement. So
precision ``5`` uses approximately ``32`` bytes, up to ``262144`` bytes for
precision ``18``.
The default value for the ``precision`` which is used if the second argument is
left out is ``14``.
Examples::
cr> select hyperloglog_distinct(position) from locations;
+--------------------------------+
| hyperloglog_distinct(position) |
+--------------------------------+
| 6 |
+--------------------------------+
SELECT 1 row in set (... sec)
::
cr> select hyperloglog_distinct(position, 4) from locations;
+-----------------------------------+
| hyperloglog_distinct(position, 4) |
+-----------------------------------+
| 6 |
+-----------------------------------+
SELECT 1 row in set (... sec)
.. _aggregation-mean:
``mean(column)``
----------------
An alias for :ref:`aggregation-avg`.
.. _aggregation-min:
``min(column)``
---------------
The ``min`` aggregate function returns the smallest value in a column that is
not ``NULL``. Its single argument is a column name and its return value is
always of the type of that column.
Example::
cr> select min(position), kind
... from locations
... where name not like 'North %'
... group by kind order by min(position) asc, kind asc;
+---------------+-------------+
| min(position) | kind |
+---------------+-------------+
| 1 | Planet |
| 1 | Star System |
| 2 | Galaxy |
+---------------+-------------+
SELECT 3 rows in set (... sec)
::
cr> select min(date) from locations;
+--------------+
| min(date) |
+--------------+
| 308534400000 |
+--------------+
SELECT 1 row in set (... sec)
``min`` returns ``NULL`` if the column does not contain any value but ``NULL``.
It is allowed on columns with primitive data types. On ``text`` columns it will
return the lexicographically smallest.
::
cr> select min(name), kind from locations
... group by kind order by kind asc;
+------------------------------------+-------------+
| min(name) | kind |
+------------------------------------+-------------+
| Galactic Sector QQ7 Active J Gamma | Galaxy |
| | Planet |
| Aldebaran | Star System |
+------------------------------------+-------------+
SELECT 3 rows in set (... sec)
.. _aggregation-max:
``max(column)``
---------------
It behaves exactly like ``min`` but returns the biggest value in a column that
is not ``NULL``.
Some Examples::
cr> select max(position), kind from locations
... group by kind order by kind desc;
+---------------+-------------+
| max(position) | kind |
+---------------+-------------+
| 4 | Star System |
| 5 | Planet |
| 6 | Galaxy |
+---------------+-------------+
SELECT 3 rows in set (... sec)
::
cr> select max(position) from locations;
+---------------+
| max(position) |
+---------------+
| 6 |
+---------------+
SELECT 1 row in set (... sec)
::
cr> select max(name), kind from locations
... group by kind order by max(name) desc;
+-------------------+-------------+
| max(name) | kind |
+-------------------+-------------+
| Outer Eastern Rim | Galaxy |
| Bartledan | Planet |
| Altair | Star System |
+-------------------+-------------+
SELECT 3 rows in set (... sec)
.. _aggregation-max_by:
``max_by(returnField, searchField)``
------------------------------------
Returns the value of ``returnField`` where ``searchField`` has the highest
value.
If there are ties for ``searchField`` the result is non-deterministic and can be
any of the ``returnField`` values of the ties.
``NULL`` values in the ``searchField`` don't count as max but are skipped.
An Example::
cr> SELECT max_by(mountain, height) FROM sys.summits;
+--------------------------+
| max_by(mountain, height) |
+--------------------------+
| Mont Blanc |
+--------------------------+
SELECT 1 row in set (... sec)
.. _aggregation-min_by:
``min_by(returnField, searchField)``
------------------------------------
Returns the value of ``returnField`` where ``searchField`` has the lowest
value.
If there are ties for ``searchField`` the result is non-deterministic and can be
any of the ``returnField`` values of the ties.
``NULL`` values in the ``searchField`` don't count as min but are skipped.
An Example::
cr> SELECT min_by(mountain, height) FROM sys.summits;
+--------------------------+
| min_by(mountain, height) |
+--------------------------+
| Puy de Rent |
+--------------------------+
SELECT 1 row in set (... sec)
.. _aggregation-stddev:
``stddev(column)``
------------------
The ``stddev`` aggregate function computes the `Standard Deviation`_ of the
set of non-null values in a column. It is a measure of the variation of data
values. A low standard deviation indicates that the values tend to be near the
mean.
``stddev`` is defined on all numeric types and on timestamp. It always returns
``double precision`` values. If all values were null or we got no value at all
``NULL`` is returned.
Example::
cr> select stddev(position), kind from locations
... group by kind order by kind;
+--------------------+-------------+
| stddev(position) | kind |
+--------------------+-------------+
| 1.920286436967152 | Galaxy |
| 1.4142135623730951 | Planet |
| 1.118033988749895 | Star System |
+--------------------+-------------+
SELECT 3 rows in set (... sec)
.. CAUTION::
Due to java double precision arithmetic it is possible that any two
executions of the aggregate function on the same data produce slightly
differing results.
.. _aggregation-string-agg:
``string_agg(column, delimiter)``
---------------------------------
The ``string_agg`` aggregate function concatenates the input values into a
string, where each value is separated by a delimiter.
If all input values are null, null is returned as a result.
::
cr> select string_agg(col1, ', ') from (values('a'), ('b'), ('c')) as t;
+------------------------+
| string_agg(col1, ', ') |
+------------------------+
| a, b, c |
+------------------------+
SELECT 1 row in set (... sec)
.. SEEALSO::
:ref:`aggregation-array-agg`
.. _aggregation-percentile:
``percentile(column, {fraction | fractions})``
----------------------------------------------
The ``percentile`` aggregate function computes a `Percentile`_ over numeric
non-null values in a column.
Percentiles show the point at which a certain percentage of observed values
occur. For example, the 98th percentile is the value which is greater than 98%
of the observed values. The result is defined and computed as an interpolated
weighted average. According to that it allows the median of the input data to
be defined conveniently as the 50th percentile.
The :ref:`function <gloss-function>` expects a single fraction or an array of
fractions and a column name. Independent of the input column data type the
result of ``percentile`` always returns a ``double precision``. If the value at
the specified column is ``null`` the row is ignored. Fractions must be double
precision values between 0 and 1. When supplied a single fraction, the function
will return a single value corresponding to the percentile of the specified
fraction::
cr> select percentile(position, 0.95), kind from locations
... group by kind order by kind;
+----------------------------+-------------+
| percentile(position, 0.95) | kind |
+----------------------------+-------------+
| 6.0 | Galaxy |
| 5.0 | Planet |
| 4.0 | Star System |
+----------------------------+-------------+
SELECT 3 rows in set (... sec)
When supplied an array of fractions, the function will return an array of
values corresponding to the percentile of each fraction specified::
cr> select percentile(position, [0.0013, 0.9987]) as perc from locations;
+------------+
| perc |
+------------+
| [1.0, 6.0] |
+------------+
SELECT 1 row in set (... sec)
When a query with ``percentile`` function won't match any rows then a null
result is returned.
To be able to calculate percentiles over a huge amount of data and to scale out
CrateDB calculates approximate instead of accurate percentiles. The algorithm
used by the percentile metric is called `TDigest`_. The accuracy/size trade-off
of the algorithm is defined by a single compression parameter which has a
constant value of ``100``. However, there are a few guidelines to keep in mind
in this implementation:
- Extreme percentiles (e.g. 99%) are more accurate.
- For small sets, percentiles are highly accurate.
- It is difficult to generalize the exact level of accuracy, as it depends
on your data distribution and volume of data being aggregated.
.. _aggregation-sum:
``sum(column)``
---------------
Returns the sum of a set of numeric input values that are not ``NULL``.
Depending on the argument type a suitable return type is chosen. For
``interval`` argument types the return type is ``interval``. For ``real`` and
``double precision`` argument types the return type is equal to the argument
type. For ``byte``, ``smallint``, ``integer`` and ``bigint`` the return type
changes to ``bigint``. If the range of ``bigint`` values (-2^64 to 2^64-1) gets
exceeded an ``ArithmeticException`` will be raised.
::
cr> select sum(position), kind from locations
... group by kind order by sum(position) asc;
+---------------+-------------+
| sum(position) | kind |
+---------------+-------------+
| 10 | Star System |
| 13 | Galaxy |
| 15 | Planet |
+---------------+-------------+
SELECT 3 rows in set (... sec)
::
cr> select sum(position) as position_sum from locations;
+--------------+
| position_sum |
+--------------+
| 38 |
+--------------+
SELECT 1 row in set (... sec)
::
cr> select sum(name), kind from locations group by kind order by sum(name) desc;
SQLParseException[Cannot cast value `North West Ripple` to type `byte`]
If the ``sum`` aggregation on a numeric data type with the fixed length can
potentially exceed its range it is possible to handle the overflow by casting
the :ref:`function <gloss-function>` argument to the :ref:`numeric type
<type-numeric>` with an arbitrary precision.
.. Hidden: create user visits table
cr> CREATE TABLE uservisits (id integer, count bigint)
... CLUSTERED INTO 1 SHARDS
... WITH (number_of_replicas = 0);
CREATE OK, 1 row affected (... sec)
.. Hidden: insert into uservisits table
cr> INSERT INTO uservisits VALUES (1, 9223372036854775807), (2, 10);
INSERT OK, 2 rows affected (... sec)
.. Hidden: refresh uservisits table
cr> REFRESH TABLE uservisits;
REFRESH OK, 1 row affected (... sec)
The ``sum`` aggregation on the ``bigint`` column will result in an overflow
in the following aggregation query::
cr> SELECT sum(count)
... FROM uservisits;
ArithmeticException[long overflow]
To address the overflow of the sum aggregation on the given field, we cast
the aggregation column to the ``numeric`` data type::
cr> SELECT sum(count::numeric)
... FROM uservisits;
+-----------------------------+
| sum(cast(count AS numeric)) |
+-----------------------------+
| 9223372036854775817 |
+-----------------------------+
SELECT 1 row in set (... sec)
.. Hidden: refresh uservisits table
cr> DROP TABLE uservisits;
DROP OK, 1 row affected (... sec)
.. _aggregation-variance:
``variance(column)``
--------------------
The ``variance`` aggregate function computes the `Variance`_ of the set of
non-null values in a column. It is a measure about how far a set of numbers is
spread. A variance of ``0.0`` indicates that all values are the same.
``variance`` is defined on all numeric types and on timestamp. It returns a
``double precision`` value. If all values were null or we got no value at all
``NULL`` is returned.
Example::
cr> select variance(position), kind from locations
... group by kind order by kind desc;
+--------------------+-------------+
| variance(position) | kind |
+--------------------+-------------+
| 1.25 | Star System |
| 2.0 | Planet |
| 3.6875 | Galaxy |
+--------------------+-------------+
SELECT 3 rows in set (... sec)
.. CAUTION::
Due to java double precision arithmetic it is possible that any two
executions of the aggregate function on the same data produce slightly
differing results.
.. _aggregation-limitations:
Limitations
===========
- ``DISTINCT`` is not supported with aggregations on :ref:`sql_joins`.
- Aggregate functions can only be applied to columns with a :ref:`plain index
<sql_ddl_index_plain>`, which is the default for all :ref:`primitive type
<data-types-primitive>` columns.
.. _Aggregate function: https://en.wikipedia.org/wiki/Aggregate_function
.. _Geometric Mean: https://en.wikipedia.org/wiki/Geometric_mean
.. _HyperLogLog++: https://research.google.com/pubs/pub40671.html
.. _Percentile: https://en.wikipedia.org/wiki/Percentile
.. _Standard Deviation: https://en.wikipedia.org/wiki/Standard_deviation
.. _TDigest: https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf
.. _Variance: https://en.wikipedia.org/wiki/Variance