Skip to content

Commit

Permalink
[FLINK-7599] Updated MATCH_RECOGNIZE documentation with aggregations
Browse files Browse the repository at this point in the history
  • Loading branch information
dawidwys committed Nov 28, 2018
1 parent e03d131 commit 05710d0
Showing 1 changed file with 62 additions and 5 deletions.
67 changes: 62 additions & 5 deletions docs/dev/table/streaming/match_recognize.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,66 @@ If a condition is not defined for a pattern variable, a default condition will b

For a more detailed explanation about expressions that can be used in those clauses, please have a look at the [event stream navigation](#pattern-navigation) section.

### Scalar & Aggregate functions

One can use scalar and aggregate functions in those clauses, both [built-in]({{ site.baseurl }}/dev/table/sql.html#built-in-functions) as well as provide [user defined]({{ site.baseurl }}/dev/table/udfs.html) functions.

Aggregate functions are applied to subset of rows mapped to a match. To understand how those subsets are evaluated have a look at the [event stream navigation](#pattern-navigation) section.

With a task to find the longest period of time for which the average price of a ticker did not go below certain threshold, one can see how expressible `MATCH_RECOGNIZE` can become with aggregations.
This task can be performed with the following query:

{% highlight sql %}
SELECT *
FROM Ticker
MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY rowtime
MEASURES
FIRST(A.rowtime) AS start_tstamp,
LAST(A.rowtime) AS end_tstamp,
AVG(A.price) AS avgPrice
ONE ROW PER MATCH
AFTER MATCH SKIP TO FIRST B
PATTERN (A+ B)
DEFINE
A AS AVG(A.price) < 15
) MR;
{% endhighlight %}

Given this query and following input values:

{% highlight text %}
symbol rowtime price tax
====== ==================== ======= =======
'ACME' '01-Apr-11 10:00:00' 12 1
'ACME' '01-Apr-11 10:00:01' 17 2
'ACME' '01-Apr-11 10:00:02' 13 1
'ACME' '01-Apr-11 10:00:03' 16 3
'ACME' '01-Apr-11 10:00:04' 25 2
'ACME' '01-Apr-11 10:00:05' 2 1
'ACME' '01-Apr-11 10:00:06' 4 1
'ACME' '01-Apr-11 10:00:07' 10 2
'ACME' '01-Apr-11 10:00:08' 15 2
'ACME' '01-Apr-11 10:00:09' 25 2
'ACME' '01-Apr-11 10:00:10' 30 1
{% endhighlight %}

The query will accumulate events as part of `A` pattern variable as long as the average price of them does not exceed 15. Which will happen at `01-Apr-11 10:00:04`. The next such period that starts then will
exceed average price of 15 at `01-Apr-11 10:00:10`. Thus the results for said query will be:

{% highlight text %}
symbol start_tstamp end_tstamp avgPrice
========= ================== ================== ============
ACME 01-APR-11 10:00:00 01-APR-11 10:00:03 14.5
ACME 01-APR-11 10:00:04 01-APR-11 10:00:09 13.5
{% endhighlight %}

An important thing to have in mind is how aggregates behave in situation when no rows where mapped to certain pattern variable. Every aggregate, beside `COUNT` will produce `null` in those cases. `COUNT` on the other hand will
produce 0.

<span class="label label-danger">Attention</span> `DISTINCT` aggregations are not supported. Moreover the `DISTINCT` modifier will be silently dropped if specified for aggregation!

Defining a Pattern
------------------

Expand Down Expand Up @@ -482,8 +542,6 @@ The table consists of the following columns:

As can be seen in the table, the first row is mapped to pattern variable `A` and subsequent rows are mapped to pattern variable `B`. However, the last row does not fulfill the `B` condition because the sum over all mapped rows `SUM(price)` and the sum over all rows in `B` exceed the specified thresholds.

<span class="label label-danger">Attention</span> Please note that aggregations such as `SUM` are not supported yet. They are only used for explanation here.

### Logical Offsets

_Logical offsets_ enable navigation within the events that were mapped to a particular pattern variable. This can be expressed
Expand Down Expand Up @@ -714,8 +772,6 @@ FROM Ticker

The query returns the sum of the prices of all rows mapped to `A` and the first and last timestamp of the overall match.

<span class="label label-danger">Attention</span> Please note that aggregations such as `SUM` are not supported yet. They are only used for explanation here.

The query will produce different results based on which `AFTER MATCH` strategy was used:

##### `AFTER MATCH SKIP PAST LAST ROW`
Expand Down Expand Up @@ -837,5 +893,6 @@ Unsupported features include:
* `SUBSET` - which allows creating logical groups of pattern variables and using those groups in the `DEFINE` and `MEASURES` clauses.
* Physical offsets - `PREV/NEXT`, which indexes all events seen rather than only those that were mapped to a pattern variable(as in [logical offsets](#logical-offsets) case).
* Extracting time attributes - there is currently no possibility to get a time attribute for subsequent time-based operations.
* Aggregates - one cannot use aggregates in `MEASURES` nor `DEFINE` clauses.
* `MATCH_RECOGNIZE` is supported only for SQL. There is no equivalent in the Table API.
* Aggregations:
* distinct aggregations are not supported. Moreover if specified it will be silently dropped unfortunately.

0 comments on commit 05710d0

Please sign in to comment.