Skip to content

Commit

Permalink
[FLINK-7599][docs] Updated MATCH_RECOGNIZE documentation with aggrega…
Browse files Browse the repository at this point in the history
…tions
  • Loading branch information
dawidwys committed Dec 16, 2018
1 parent 258a708 commit 19fcf65
Showing 1 changed file with 61 additions and 5 deletions.
66 changes: 61 additions & 5 deletions docs/dev/table/streaming/match_recognize.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,65 @@ If a condition is not defined for a pattern variable, a default condition will b

For a more detailed explanation about expressions that can be used in those clauses, please have a look at the [event stream navigation](#pattern-navigation) section.

### Aggregations

Aggregations can be used in `DEFINE` and `MEASURES` clauses. Both [built-in]({{ site.baseurl }}/dev/table/sql.html#built-in-functions) and custom [user defined]({{ site.baseurl }}/dev/table/udfs.html) functions are supported.

Aggregate functions are applied to each subset of rows mapped to a match. In order to understand how those subsets are evaluated have a look at the [event stream navigation](#pattern-navigation) section.

The task of the following example is to find the longest period of time for which the average price of a ticker did not go below certain threshold. It shows how expressible `MATCH_RECOGNIZE` can become with aggregations.
This task can be performed with the following query:

{% highlight sql %}
SELECT *
FROM Ticker
MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY rowtime
MEASURES
FIRST(A.rowtime) AS start_tstamp,
LAST(A.rowtime) AS end_tstamp,
AVG(A.price) AS avgPrice
ONE ROW PER MATCH
AFTER MATCH SKIP TO FIRST B
PATTERN (A+ B)
DEFINE
A AS AVG(A.price) < 15
) MR;
{% endhighlight %}

Given this query and following input values:

{% highlight text %}
symbol rowtime price tax
====== ==================== ======= =======
'ACME' '01-Apr-11 10:00:00' 12 1
'ACME' '01-Apr-11 10:00:01' 17 2
'ACME' '01-Apr-11 10:00:02' 13 1
'ACME' '01-Apr-11 10:00:03' 16 3
'ACME' '01-Apr-11 10:00:04' 25 2
'ACME' '01-Apr-11 10:00:05' 2 1
'ACME' '01-Apr-11 10:00:06' 4 1
'ACME' '01-Apr-11 10:00:07' 10 2
'ACME' '01-Apr-11 10:00:08' 15 2
'ACME' '01-Apr-11 10:00:09' 25 2
'ACME' '01-Apr-11 10:00:10' 30 1
{% endhighlight %}

The query will accumulate events as part of the pattern variable `A` as long as the average price of them does not exceed `15`. For example, such a limit exceeding happens at `01-Apr-11 10:00:04`.
The following period exceeds the average price of `15` again at `01-Apr-11 10:00:10`. Thus the results for said query will be:

{% highlight text %}
symbol start_tstamp end_tstamp avgPrice
========= ================== ================== ============
ACME 01-APR-11 10:00:00 01-APR-11 10:00:03 14.5
ACME 01-APR-11 10:00:04 01-APR-11 10:00:09 13.5
{% endhighlight %}

<span class="label label-info">Note</span> Aggregations can be applied to expressions, but only if they reference a single pattern variable. Thus `SUM(A.price * A.tax)` is a valid one, but `AVG(A.price * B.tax)` is not.

<span class="label label-danger">Attention</span> `DISTINCT` aggregations are not supported.

Defining a Pattern
------------------

Expand Down Expand Up @@ -548,8 +607,6 @@ The table consists of the following columns:

As can be seen in the table, the first row is mapped to pattern variable `A` and subsequent rows are mapped to pattern variable `B`. However, the last row does not fulfill the `B` condition because the sum over all mapped rows `SUM(price)` and the sum over all rows in `B` exceed the specified thresholds.

<span class="label label-danger">Attention</span> Please note that aggregations such as `SUM` are not supported yet. They are only used for explanation here.

### Logical Offsets

_Logical offsets_ enable navigation within the events that were mapped to a particular pattern variable. This can be expressed
Expand Down Expand Up @@ -780,8 +837,6 @@ FROM Ticker

The query returns the sum of the prices of all rows mapped to `A` and the first and last timestamp of the overall match.

<span class="label label-danger">Attention</span> Please note that aggregations such as `SUM` are not supported yet. They are only used for explanation here.

The query will produce different results based on which `AFTER MATCH` strategy was used:

##### `AFTER MATCH SKIP PAST LAST ROW`
Expand Down Expand Up @@ -902,5 +957,6 @@ Unsupported features include:
* `SUBSET` - which allows creating logical groups of pattern variables and using those groups in the `DEFINE` and `MEASURES` clauses.
* Physical offsets - `PREV/NEXT`, which indexes all events seen rather than only those that were mapped to a pattern variable(as in [logical offsets](#logical-offsets) case).
* Extracting time attributes - there is currently no possibility to get a time attribute for subsequent time-based operations.
* Aggregates - one cannot use aggregates in `MEASURES` nor `DEFINE` clauses.
* `MATCH_RECOGNIZE` is supported only for SQL. There is no equivalent in the Table API.
* Aggregations:
* distinct aggregations are not supported. Moreover if specified it will be silently dropped unfortunately.

0 comments on commit 19fcf65

Please sign in to comment.