diff --git a/docs/_includes/note.html b/docs/_includes/note.html new file mode 100644 index 0000000000000..a0ac41357bcbb --- /dev/null +++ b/docs/_includes/note.html @@ -0,0 +1,23 @@ + + +
Takes one element and produces one element. A map function that doubles the values of the input stream:
{% highlight java %} @@ -65,7 +65,7 @@ dataStream.map(new MapFunctionTakes one element and produces zero, one, or more elements. A flatmap function that splits sentences to words:
{% highlight java %} @@ -82,7 +82,7 @@ dataStream.flatMap(new FlatMapFunctionEvaluates a boolean function for each element and retains those for which the function returns true.
A filter that filters out zero values:
@@ -98,7 +98,7 @@ dataStream.filter(new FilterFunction
Logically partitions a stream into disjoint partitions. All records with the same key are assigned to the same partition. Internally, keyBy() is implemented with hash partitioning. There are different ways to specify keys.
@@ -119,7 +119,7 @@ dataStream.keyBy(value -> value.f0) // Key by the first element of a Tuple
A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and
emits the new value.
@@ -139,7 +139,7 @@ keyedStream.reduce(new ReduceFunction
Rolling aggregations on a keyed data stream. The difference between min and minBy is that min returns the minimum value, whereas minBy returns @@ -159,7 +159,7 @@ keyedStream.maxBy("key");
Windows can be defined on already partitioned KeyedStreams. Windows group the data in each key according to some characteristic (e.g., the data that arrived within the last 5 seconds). @@ -171,7 +171,7 @@ dataStream.keyBy(value -> value.f0).window(TumblingEventTimeWindows.of(Time.seco
Windows can be defined on regular DataStreams. Windows group all the stream events according to some characteristic (e.g., the data that arrived within the last 5 seconds). @@ -184,7 +184,7 @@ dataStream.windowAll(TumblingEventTimeWindows.of(Time.seconds(5))); // Last 5 se
Applies a general function to the window as a whole. Below is a function that manually sums the elements of a window.
Note: If you are using a windowAll transformation, you need to use an AllWindowFunction instead.
@@ -218,7 +218,7 @@ allWindowedStream.apply (new AllWindowFunctionApplies a functional reduce function to the window and returns the reduced value.
{% highlight java %} @@ -231,7 +231,7 @@ windowedStream.reduce (new ReduceFunctionAggregates the contents of a window. The difference between min and minBy is that min returns the minimum value, whereas minBy returns @@ -251,7 +251,7 @@ windowedStream.maxBy("key");
Union of two or more data streams creating a new stream containing all the elements from all the streams. Note: If you union a data stream with itself you will get each element twice in the resulting stream.
@@ -261,7 +261,7 @@ dataStream.union(otherStream1, otherStream2, ...);Join two data streams on a given key and a common window.
{% highlight java %} @@ -273,7 +273,7 @@ dataStream.join(otherStream)Join two elements e1 and e2 of two keyed streams with a common key over a given time interval, so that e1.timestamp + lowerBound <= e2.timestamp <= e1.timestamp + upperBound
{% highlight java %} @@ -288,7 +288,7 @@ keyedStream.intervalJoin(otherKeyedStream)Cogroups two data streams on a given key and a common window.
{% highlight java %} @@ -300,7 +300,7 @@ dataStream.coGroup(otherStream)"Connects" two data streams retaining their types. Connect allowing for shared state between the two streams.
@@ -313,7 +313,7 @@ ConnectedStreamsSimilar to map and flatMap on a connected data stream
{% highlight java %} @@ -346,7 +346,7 @@ connectedStreams.flatMap(new CoFlatMapFunction
Creates a "feedback" loop in the flow, by redirecting the output of one operator
@@ -354,7 +354,6 @@ connectedStreams.flatMap(new CoFlatMapFunction Takes one element and produces one element. A map function that doubles the values of the input stream: Takes one element and produces zero, one, or more elements. A flatmap function that splits sentences to words: Evaluates a boolean function for each element and retains those for which the function returns true.
A filter that filters out zero values:
@@ -423,7 +422,7 @@ dataStream.filter { _ != 0 }
Logically partitions a stream into disjoint partitions, each partition containing elements of the same key.
Internally, this is implemented with hash partitioning. See keys on how to specify keys.
@@ -435,7 +434,7 @@ dataStream.keyBy(_._1) // Key by the first element of a Tuple
A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and
emits the new value.
@@ -449,7 +448,7 @@ keyedStream.reduce { _ + _ }
Rolling aggregations on a keyed data stream. The difference between min
and minBy is that min returns the minimum value, whereas minBy returns
@@ -469,7 +468,7 @@ keyedStream.maxBy("key")
Windows can be defined on already partitioned KeyedStreams. Windows group the data in each
key according to some characteristic (e.g., the data that arrived within the last 5 seconds).
@@ -481,7 +480,7 @@ dataStream.keyBy(_._1).window(TumblingEventTimeWindows.of(Time.seconds(5))) // L
Windows can be defined on regular DataStreams. Windows group all the stream events
according to some characteristic (e.g., the data that arrived within the last 5 seconds).
@@ -494,7 +493,7 @@ dataStream.windowAll(TumblingEventTimeWindows.of(Time.seconds(5))) // Last 5 sec
Applies a general function to the window as a whole. Below is a function that manually sums the elements of a window. Note: If you are using a windowAll transformation, you need to use an AllWindowFunction instead. Applies a functional reduce function to the window and returns the reduced value. Aggregates the contents of a window. The difference between min
and minBy is that min returns the minimum value, whereas minBy returns
@@ -537,7 +536,7 @@ windowedStream.maxBy("key")
Union of two or more data streams creating a new stream containing all the elements from all the streams. Note: If you union a data stream
with itself you will get each element twice in the resulting stream. Join two data streams on a given key and a common window. Cogroups two data streams on a given key and a common window. "Connects" two data streams retaining their types, allowing for shared state between
the two streams. Similar to map and flatMap on a connected data stream
Creates a "feedback" loop in the flow, by redirecting the output of one operator
@@ -608,7 +607,6 @@ connectedStreams.flatMap(
continuously update a model. The following code starts with a stream and applies
the iteration body continuously. Elements that are greater than 0 are sent back
to the feedback channel, and the rest of the elements are forwarded downstream.
- See iterations for a complete description.
{% highlight java %}
initialStream.iterate {
iteration => {
@@ -648,7 +646,7 @@ is not supported by the API out-of-the-box. To use this feature, you should use
Takes one element and produces one element. A map function that doubles the values of the input stream: Takes one element and produces zero, one, or more elements. A flatmap function that splits sentences to words: Evaluates a boolean function for each element and retains those for which the function returns true.
A filter that filters out zero values:
@@ -681,7 +679,7 @@ data_stream.filter(lambda x: x != 0)
Logically partitions a stream into disjoint partitions. All records with the same key are assigned to the same partition. Internally, keyBy() is implemented with hash partitioning. There are different ways to specify keys. This transformation returns a KeyedStream, which is, among other things, required to use keyed state. A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and
emits the new value.
@@ -711,7 +709,7 @@ data_stream.key_by(lambda x: x[1]).reduce(lambda a, b: (a[0] + b[0], b[1]))
Union of two or more data streams creating a new stream containing all the elements from all the streams. Note: If you union a data stream
with itself you will get each element twice in the resulting stream. "Connects" two data streams retaining their types, allowing for shared state between
the two streams. Similar to map and flatMap on a connected data stream Selects a subset of fields from the tuples
{% highlight java %}
@@ -811,7 +809,7 @@ DataStream Selects a subset of fields from the tuples
{% highlight python %}
@@ -848,7 +846,7 @@ via the following functions.
Uses a user-defined Partitioner to select the target task for each element.
@@ -860,7 +858,7 @@ dataStream.partitionCustom(partitioner, 0);
Partitions elements randomly according to a uniform distribution.
@@ -871,7 +869,7 @@ dataStream.shuffle();
Partitions elements round-robin, creating equal load per partition. Useful for performance
@@ -883,7 +881,7 @@ dataStream.rebalance();
Partitions elements, round-robin, to a subset of downstream operations. This is
@@ -927,7 +925,7 @@ dataStream.rescale();
Broadcasts elements to every partition.
@@ -955,7 +953,7 @@ dataStream.broadcast();
Uses a user-defined Partitioner to select the target task for each element.
@@ -967,7 +965,7 @@ dataStream.partitionCustom(partitioner, 0)
Partitions elements randomly according to a uniform distribution.
@@ -978,7 +976,7 @@ dataStream.shuffle()
Partitions elements round-robin, creating equal load per partition. Useful for performance
@@ -990,7 +988,7 @@ dataStream.rebalance()
Partitions elements, round-robin, to a subset of downstream operations. This is
@@ -1035,7 +1033,7 @@ dataStream.rescale()
Broadcasts elements to every partition.
@@ -1063,7 +1061,7 @@ dataStream.broadcast()
Uses a user-defined Partitioner to select the target task for each element.
{% highlight python %}
@@ -1074,7 +1072,7 @@ data_stream.partition_custom(lambda key, num_partition: key % partition, lambda
Partitions elements randomly according to a uniform distribution.
@@ -1085,7 +1083,7 @@ data_stream.shuffle()
Partitions elements round-robin, creating equal load per partition. Useful for performance
@@ -1097,7 +1095,7 @@ data_stream.rebalance()
Partitions elements, round-robin, to a subset of downstream operations. This is
@@ -1141,7 +1139,7 @@ data_stream.rescale()
Broadcasts elements to every partition.
diff --git a/docs/fig/datastream-example-job-graph.svg b/docs/fig/datastream-example-job-graph.svg
new file mode 100644
index 0000000000000..3482b638d9bb5
--- /dev/null
+++ b/docs/fig/datastream-example-job-graph.svg
@@ -0,0 +1,21 @@
+
+
+
+
-
Map
+
DataStream → DataStreamMap
DataStream → DataStream
-
FlatMap
+
DataStream → DataStreamFlatMap
DataStream → DataStream
-
Filter
+
DataStream → DataStreamFilter
DataStream → DataStream
-
KeyBy
+
DataStream → KeyedStreamKeyBy
DataStream → KeyedStream
-
Reduce
+
KeyedStream → DataStreamReduce
KeyedStream → DataStream
-
Aggregations
+
KeyedStream → DataStreamAggregations
KeyedStream → DataStream
-
Window
+
KeyedStream → WindowedStreamWindow
KeyedStream → WindowedStream
-
WindowAll
+
DataStream → AllWindowedStreamWindowAll
DataStream → AllWindowedStream
-
Window Apply
+
WindowedStream → DataStream
AllWindowedStream → DataStreamWindow Apply
WindowedStream → DataStream
AllWindowedStream → DataStream
-
Window Reduce
+
WindowedStream → DataStreamWindow Reduce
WindowedStream → DataStream
-
Aggregations on windows
+
WindowedStream → DataStreamAggregations on windows
WindowedStream → DataStream
-
Union
+
DataStream* → DataStreamUnion
DataStream* → DataStream
-
Window Join
+
DataStream,DataStream → DataStreamWindow Join
DataStream,DataStream → DataStream
-
Window CoGroup
+
DataStream,DataStream → DataStreamWindow CoGroup
DataStream,DataStream → DataStream
-
Connect
+
DataStream,DataStream → ConnectedStreamsConnect
DataStream,DataStream → ConnectedStreams
-
CoMap, CoFlatMap
+
ConnectedStreams → DataStreamCoMap, CoFlatMap
ConnectedStreams → DataStream
-
Iterate
+
DataStream → IterativeStream → DataStreamIterate
DataStream → IterativeStream → DataStream
-
Map
+
DataStream → DataStreamMap
DataStream → DataStream
-
FlatMap
+
DataStream → DataStreamFlatMap
DataStream → DataStream
-
Filter
+
DataStream → DataStreamFilter
DataStream → DataStream
-
KeyBy
+
DataStream → KeyedStreamKeyBy
DataStream → KeyedStream
-
Reduce
+
KeyedStream → DataStreamReduce
KeyedStream → DataStream
-
Union
+
DataStream* → DataStreamUnion
DataStream* → DataStream
-
Connect
+
DataStream,DataStream → ConnectedStreamsConnect
DataStream,DataStream → ConnectedStreams
-
CoMap, CoFlatMap
+
ConnectedStreams → DataStreamCoMap, CoFlatMap
ConnectedStreams → DataStream
- Project
+
DataStream → DataStreamProject
DataStream → DataStream
-
Project
+
DataStream → DataStreamProject
DataStream → DataStream
-
Custom partitioning
+
DataStream → DataStreamCustom partitioning
DataStream → DataStream
-
Random partitioning
+
DataStream → DataStreamRandom partitioning
DataStream → DataStream
-
Rebalancing (Round-robin partitioning)
+
DataStream → DataStreamRebalancing (Round-robin partitioning)
DataStream → DataStream
-
Rescaling
+
DataStream → DataStreamRescaling
DataStream → DataStream
-
Broadcasting
+
DataStream → DataStreamBroadcasting
DataStream → DataStream
-
Custom partitioning
+
DataStream → DataStreamCustom partitioning
DataStream → DataStream
-
Random partitioning
+
DataStream → DataStreamRandom partitioning
DataStream → DataStream
-
Rebalancing (Round-robin partitioning)
+
DataStream → DataStreamRebalancing (Round-robin partitioning)
DataStream → DataStream
-
Rescaling
+
DataStream → DataStreamRescaling
DataStream → DataStream
-
Broadcasting
+
DataStream → DataStreamBroadcasting
DataStream → DataStream
-
Custom partitioning
+
DataStream → DataStreamCustom partitioning
DataStream → DataStream
-
Random partitioning
+
DataStream → DataStreamRandom partitioning
DataStream → DataStream
-
Rebalancing (Round-robin partitioning)
+
DataStream → DataStreamRebalancing (Round-robin partitioning)
DataStream → DataStream
-
Rescaling
+
DataStream → DataStreamRescaling
DataStream → DataStream
- Broadcasting
+
DataStream → DataStreamBroadcasting
DataStream → DataStream