Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-1011] Add triggers content to the programming guide #204

Closed
wants to merge 4 commits into from
Closed

[BEAM-1011] Add triggers content to the programming guide #204

wants to merge 4 commits into from

Conversation

melap
Copy link

@melap melap commented Apr 6, 2017

R: @kennknowles
Triggers content + minor changes to the windowing section + a small amount of white space cleanup

@asfbot
Copy link

asfbot commented Apr 6, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/352/
--none--

@asfbot
Copy link

asfbot commented Apr 6, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/394/

Jenkins built the site at commit id dc5368a with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great. Just little nits. And at some point we should do something about that grammar. I think some eng should maybe re-interpret the categorization.

@@ -1398,7 +1398,7 @@ If your `PCollection` is bounded (the size is fixed), you can assign all the ele
%}
```

### Time skew, data lag, and late data
### <a name="watermarks-late-data"></a>Watermarks and late data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - could we put the text in the <a> tag so that it is easy for users to link to a section? I feel like we can use CSS to make the visual and interactive behavior otherwise identical?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this could definitely be improved. I'm not familiar enough with CSS to do this offhand and don't want to block on it, but I'll add this to my list of things to take a look at. I tested moving it in to see how bad it looks if done without CSS for now, and well, it looks pretty bad!


#### Handling Late Data

If a pipeline wants data that arrives after the watermark passes the end of the window, you can apply an *allowed lateness* when you set your windowing configuration. This gives your trigger the opportunity to react to the late data. If allowed lateness is set, the default trigger will emit new results immediately whenever late data arrives.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If you want your pipeline to process data that arrives..." ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@asfbot
Copy link

asfbot commented Apr 7, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/355/
--none--

@asfbot
Copy link

asfbot commented Apr 7, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Test/356/
--none--

@melap
Copy link
Author

melap commented Apr 7, 2017

Thanks for your comments, I incorporated your suggestion and fixed a couple erroneous forced bold section headers. @bjchambers - can you take a look at the triggers BNF (very last section of this PR)? Kenn suggested you might have some thoughts about fixing it up.

@asfbot
Copy link

asfbot commented Apr 7, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/397/

Jenkins built the site at commit id cfb23e5 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@asfbot
Copy link

asfbot commented Apr 7, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/398/

Jenkins built the site at commit id 0b46ba0 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@davorbonaci
Copy link
Member

@kennknowles, @bjchambers, any comments perhaps?

@@ -1456,4 +1458,202 @@ An example might be if your pipeline reads log records from an input file, and e

## <a name="triggers"></a>Working with triggers

> **Note:** This guide is still in progress. There is an open issue to finish the guide ([BEAM-193](https://issues.apache.org/jira/browse/BEAM-193))
> **NOTE:** This content applies only to the Beam SDK for Java. The Beam SDK for Python does not support triggers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, I think the content is correct, but we may be better served be moving the more concrete examples earlier -- I feel like this first paragraph is very abstract "must determine when to emit" etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved some stuff around to try and improve this. PTAL and see what you think once it's staged.


Beam provides a number of pre-built triggers that you can set for your `PCollection`s:

* **Event time-based triggers**. These triggers operate on the event time, as indicated by the timestamp on each data element. Beam's default trigger is event time-based.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would "event-time triggers" be better than "event time-based triggers" (associate more closely with the event or processing time, since that is the key difference)?


The `AfterWatermark` trigger operates on *event time*. The `AfterWatermark` trigger emits the contents of a window after the [watermark](#watermarks-late-data) passes the end of the window, based on the timestamps attached to the data elements. The watermark is a global progress metric, and is Beam's notion of input completeness within your pipeline at any given point. `AfterWatermark.pastEndOfWindow()` *only* fires when the watermark passes the end of the window.

In addition, you can use `.withEarlyFirings(trigger)` and `.withLateFirings(trigger)` to configure triggers that fire if your pipeline receives data before or after the end of the window.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples might make this clearer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


To set a window to accumulate the panes that are produced when the trigger fires, invoke`.accumulatingFiredPanes()` when you set the trigger. To set a window to discard fired panes, invoke `.discardingFiredPanes()`.

Let's look an an example that uses a `PCollection` with fixed-time windowing and a data-based trigger. This is something you might do if, for example, each window represented a ten-minute running average, but you wanted to display the current value of the average in a UI more frequently than every ten minutes. We'll assume the following conditions:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"an an" -> "at an"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

The following grammar describes the various ways that you can combine triggers into composite triggers:

```
TRIGGER ::=

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to just get rid of the grammar. I don't think it adds much, and each language may realize the grammar differently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@melap
Copy link
Author

melap commented Apr 24, 2017

Made changes from Ben's feedback. Unfortunately it looks like Jenkins may not be doing any staging/testing at the moment though?

@asfbot
Copy link

asfbot commented Apr 25, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Website_Stage/433/

Jenkins built the site at commit id c44c456 with Jekyll and staged it here. Happy reviewing.

Note that any previous site has been deleted. This staged site will be automatically deleted after its TTL expires. Push any commit to the pull request branch or re-trigger the build to get it staged again.

@melap
Copy link
Author

melap commented Apr 25, 2017

Copy link

@bjchambers bjchambers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!


**Triggers** allow you to change this default behavior and specify when Beam emits the aggregated results for each window (referred to as a *pane*). At a high level, triggers provide two additional capabilities compared to simply outputting at the end of a window:
You can set triggers for your `PCollection`s to change this default behavior and specify when to emit the aggregated results for each window. Beam provides a number of pre-built triggers that you can set:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and specify when to emit" may be redundant with the same statement in the paragraph above, but seems fine either way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, removed

* **Event time triggers**. These triggers operate on the event time, as indicated by the timestamp on each data element. Beam's default trigger is event time-based.
* **Processing time triggers**. These triggers operate on the processing time -- the time when the data element is processed at any given stage in the pipeline.
* **Data-driven triggers**. These triggers operate by examining the data as it arrives in each window, and firing when that window has received a certain number of data elements.
* **Composite triggers**. These triggers combine multiple triggers in some logical way.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in some logical way" -> "in various ways"?


* **Event time triggers**. These triggers operate on the event time, as indicated by the timestamp on each data element. Beam's default trigger is event time-based.
* **Processing time triggers**. These triggers operate on the processing time -- the time when the data element is processed at any given stage in the pipeline.
* **Data-driven triggers**. These triggers operate by examining the data as it arrives in each window, and firing when that window has received a certain number of data elements.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "and firing when that data meets a certain property. Currently, these only support firing after a specific number of elements."

* **Data-driven triggers**. These triggers operate by examining the data as it arrives in each window, and firing when that window has received a certain number of data elements.
* **Composite triggers**. These triggers combine multiple triggers in some logical way.

At a high level, triggers provide two additional capabilities compared to simply outputting at the end of a window:

* Triggers allow Beam to emit early results, before all the data in a given window has arrived.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, after a certain amount of time or data has arrived.

* **Data-driven triggers**. These triggers operate by examining the data as it arrives in each window, and firing when that window has received a certain number of data elements.
* **Composite triggers**. These triggers combine multiple triggers in some logical way.

At a high level, triggers provide two additional capabilities compared to simply outputting at the end of a window:

* Triggers allow Beam to emit early results, before all the data in a given window has arrived.
* Triggers allow processing of late data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by triggering after the end of the event time watermark has past the end of the window

@melap
Copy link
Author

melap commented Apr 25, 2017

Thanks, made more updates

@kennknowles
Copy link
Member

LGTM

@asfgit asfgit closed this in bb4221b Apr 27, 2017
@melap melap deleted the triggers branch May 2, 2017 22:59
robertwb pushed a commit to robertwb/incubator-beam that referenced this pull request Jun 5, 2018
robertwb pushed a commit to robertwb/incubator-beam that referenced this pull request Jun 5, 2018
melap pushed a commit to apache/beam that referenced this pull request Jun 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants