Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/dev/datastream_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,7 @@ windowedStream.reduce (new ReduceFunction<Tuple2<String,Integer>() {
The example function, when applied on the sequence (1,2,3,4,5),
folds the sequence into the string "start-1-2-3-4-5":</p>
{% highlight java %}
windowedStream.fold("start-", new FoldFunction<Integer, String>() {
windowedStream.fold("start", new FoldFunction<Integer, String>() {
public String fold(String current, Integer value) {
return current + "-" + value;
}
Expand Down Expand Up @@ -1324,7 +1324,7 @@ File-based:

*IMPORTANT NOTES:*

1. If the `watchType` is set to `FileProcessingMode.PROCESS_CONTINUOUSLY`, when a file is modified, its contents are re-processed entirely. This can brake the "exactly-once" semantics, as appending data at the end of a file will lead to **all** its contents being re-processed.
1. If the `watchType` is set to `FileProcessingMode.PROCESS_CONTINUOUSLY`, when a file is modified, its contents are re-processed entirely. This can break the "exactly-once" semantics, as appending data at the end of a file will lead to **all** its contents being re-processed.

2. If the `watchType` is set to `FileProcessingMode.PROCESS_ONCE`, the source scans the path **once** and exits, without waiting for the readers to finish reading the file contents. Of course the readers will continue reading until all file contents are read. Closing the source leads to no more checkpoints after that point. This may lead to slower recovery after a node failure, as the job will resume reading from the last checkpoint.

Expand Down Expand Up @@ -1382,7 +1382,7 @@ File-based:

*IMPORTANT NOTES:*

1. If the `watchType` is set to `FileProcessingMode.PROCESS_CONTINUOUSLY`, when a file is modified, its contents are re-processed entirely. This can brake the "exactly-once" semantics, as appending data at the end of a file will lead to **all** its contents being re-processed.
1. If the `watchType` is set to `FileProcessingMode.PROCESS_CONTINUOUSLY`, when a file is modified, its contents are re-processed entirely. This can break the "exactly-once" semantics, as appending data at the end of a file will lead to **all** its contents being re-processed.

2. If the `watchType` is set to `FileProcessingMode.PROCESS_ONCE`, the source scans the path **once** and exits, without waiting for the readers to finish reading the file contents. Of course the readers will continue reading until all file contents are read. Closing the source leads to no more checkpoints after that point. This may lead to slower recovery after a node failure, as the job will resume reading from the last checkpoint.

Expand Down
9 changes: 5 additions & 4 deletions docs/dev/libs/cep.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ val result: DataStream[Alert] = patternStream.select(createAlert(_))
</div>
</div>

Note that we use use Java 8 lambdas in our Java code examples to make them more succinct.
Note that we use Java 8 lambdas in our Java code examples to make them more succinct.

## The Pattern API

Expand Down Expand Up @@ -521,10 +521,11 @@ def flatSelectFn(pattern : mutable.Map[String, IN], collector : Collector[OUT])

### Handling Timed Out Partial Patterns

Whenever a pattern has a window length associated via the `within` key word, it is possible that partial event patterns will be discarded because they exceed the window length.
In order to react to these timeout events the `select` and `flatSelect` API calls allow to specify a timeout handler.
Whenever a pattern has a window length associated via the `within` keyword, it is possible that partial event patterns will be discarded because they exceed the window length.
In order to react to these timeout events the `select` and `flatSelect` API calls allow a timeout handler to be specified.
This timeout handler is called for each partial event pattern which has timed out.
The timeout handler receives all so far matched events of the partial pattern and the timestamp when the timeout was detected.
The timeout handler receives all the events that have been matched so far by the pattern, and the timestamp when the timeout was detected.


<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">
Expand Down
17 changes: 8 additions & 9 deletions docs/dev/state.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,20 +73,19 @@ active key (i.e. the key of the input element).
It is important to keep in mind that these state objects are only used for interfacing
with state. The state is not necessarily stored inside but might reside on disk or somewhere else.
The second thing to keep in mind is that the value you get from the state
depend on the key of the input element. So the value you get in one invocation of your
user function can be different from the one you get in another invocation if the key of
the element is different.
depends on the key of the input element. So the value you get in one invocation of your
user function can differ from the value in another invocation if the keys involved are different.

To get a state handle you have to create a `StateDescriptor` this holds the name of the state
To get a state handle you have to create a `StateDescriptor`. This holds the name of the state
(as we will later see you can create several states, and they have to have unique names so
that you can reference them), the type of the values that the state holds and possibly
that you can reference them), the type of the values that the state holds, and possibly
a user-specified function, such as a `ReduceFunction`. Depending on what type of state you
want to retrieve you create one of `ValueStateDescriptor`, `ListStateDescriptor` or
`ReducingStateDescriptor`.
want to retrieve, you create either a `ValueStateDescriptor`, a `ListStateDescriptor` or
a `ReducingStateDescriptor`.

State is accessed using the `RuntimeContext`, so it is only possible in *rich functions*.
Please see [here]({{ site.baseurl }}/apis/common/#specifying-transformation-functions) for
information about that but we will also see an example shortly. The `RuntimeContext` that
information about that, but we will also see an example shortly. The `RuntimeContext` that
is available in a `RichFunction` has these methods for accessing state:

* `ValueState<T> getState(ValueStateDescriptor<T>)`
Expand Down Expand Up @@ -147,7 +146,7 @@ env.fromElements(Tuple2.of(1L, 3L), Tuple2.of(1L, 5L), Tuple2.of(1L, 7L), Tuple2

This example implements a poor man's counting window. We key the tuples by the first field
(in the example all have the same key `1`). The function stores the count and a running sum in
a `ValueState`, once the count reaches 2 it will emit the average and clear the state so that
a `ValueState`. Once the count reaches 2 it will emit the average and clear the state so that
we start over from `0`. Note that this would keep a different state value for each different input
key if we had tuples with different values in the first field.

Expand Down
6 changes: 3 additions & 3 deletions docs/dev/state_backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,16 +41,16 @@ chosen **State Backend**.

Out of the box, Flink bundles these state backends:

- *MemoryStateBacked*
- *MemoryStateBackend*
- *FsStateBackend*
- *RocksDBStateBackend*

If nothing else is configured, the system will use the MemoryStateBacked.
If nothing else is configured, the system will use the MemoryStateBackend.


### The MemoryStateBackend

The *MemoryStateBacked* holds data internally as objects on the Java heap. Key/value state and window operators hold hash tables
The *MemoryStateBackend* holds data internally as objects on the Java heap. Key/value state and window operators hold hash tables
that store the values, triggers, etc.

Upon checkpoints, this state backend will snapshot the state and send it as part of the checkpoint acknowledgement messages to the
Expand Down
4 changes: 2 additions & 2 deletions docs/dev/windows.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ which we could process the aggregated elements.
### Tumbling Windows

A *tumbling windows* assigner assigns elements to fixed length, non-overlapping windows of a
specified *window size*.. For example, if you specify a window size of 5 minutes, the window
specified *window size*. For example, if you specify a window size of 5 minutes, the window
function will get 5 minutes worth of elements in each invocation.

<img src="{{ site.baseurl }}/fig/tumbling-windows.svg" class="center" style="width: 80%;" />
Expand Down Expand Up @@ -381,7 +381,7 @@ a concatenation of all the `Long` fields of the input.

### WindowFunction - The Generic Case

Using a `WindowFunction` provides most flexibility, at the cost of performance. The reason for this
Using a `WindowFunction` provides the most flexibility, at the cost of performance. The reason for this
is that elements cannot be incrementally aggregated for a window and instead need to be buffered
internally until the window is considered ready for processing. A `WindowFunction` gets an
`Iterable` containing all the elements of the window being processed. The signature of
Expand Down
23 changes: 12 additions & 11 deletions docs/quickstart/run_example_quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,12 @@ under the License.
* This will be replaced by the TOC
{:toc}

In this guide we will start from scratch and go from setting up a Flink project and running
In this guide we will start from scratch and go from setting up a Flink project to running
a streaming analysis program on a Flink cluster.

Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to
read this channel in Flink and count the number of bytes that each user edits within
a given window of time. This is easy enough to implement in a few minutes using Flink but it will
a given window of time. This is easy enough to implement in a few minutes using Flink, but it will
give you a good foundation from which to start building more complex analysis programs on your own.

## Setting up a Maven Project
Expand Down Expand Up @@ -125,21 +125,21 @@ public class WikipediaAnalysis {
}
{% endhighlight %}

I admit it's very bare bones now but we will fill it as we go. Note, that I'll not give
The program is very basic now, but we will fill it in as we go. Note that I'll not give
import statements here since IDEs can add them automatically. At the end of this section I'll show
the complete code with import statements if you simply want to skip ahead and enter that in your
editor.

The first step in a Flink program is to create a `StreamExecutionEnvironment`
(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution
parameters and create sources for reading from external systems. So let's go ahead, add
parameters and create sources for reading from external systems. So let's go ahead and add
this to the main method:

{% highlight java %}
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
{% endhighlight %}

Next, we will create a source that reads from the Wikipedia IRC log:
Next we will create a source that reads from the Wikipedia IRC log:

{% highlight java %}
DataStream<WikipediaEditEvent> edits = see.addSource(new WikipediaEditsSource());
Expand All @@ -149,7 +149,7 @@ This creates a `DataStream` of `WikipediaEditEvent` elements that we can further
the purposes of this example we are interested in determining the number of added or removed
bytes that each user causes in a certain time window, let's say five seconds. For this we first
have to specify that we want to key the stream on the user name, that is to say that operations
on this should take the key into account. In our case the summation of edited bytes in the windows
on this stream should take the user name into account. In our case the summation of edited bytes in the windows
should be per unique user. For keying a Stream we have to provide a `KeySelector`, like this:

{% highlight java %}
Expand All @@ -165,8 +165,8 @@ KeyedStream<WikipediaEditEvent, String> keyedEdits = edits
This gives us a Stream of `WikipediaEditEvent` that has a `String` key, the user name.
We can now specify that we want to have windows imposed on this stream and compute a
result based on elements in these windows. A window specifies a slice of a Stream
on which to perform a computation. They are required when performing an aggregation
computation on an infinite stream of elements. In our example we will say
on which to perform a computation. Windows are required when computing aggregations
on an infinite stream of elements. In our example we will say
that we want to aggregate the sum of edited bytes for every five seconds:

{% highlight java %}
Expand Down Expand Up @@ -276,9 +276,10 @@ similar to this:
The number in front of each line tells you on which parallel instance of the print sink the output
was produced.

This should get you started with writing your own Flink programs. You can check out our guides
about [basic concepts]{{{ site.baseurl }}/apis/common/index.html} and the
[DataStream API]{{{ site.baseurl }}/apis/streaming/index.html} if you want to learn more. Stick
This should get you started with writing your own Flink programs. To learn more
you can check out our guides
about [basic concepts]({{ site.baseurl }}/apis/common/index.html) and the
[DataStream API]({{ site.baseurl }}/apis/streaming/index.html). Stick
around for the bonus exercise if you want to learn about setting up a Flink cluster on
your own machine and writing results to [Kafka](http://kafka.apache.org).

Expand Down