From 58505770a4d5ebf3ca0f7e4bb69d8b4a2bb5f6a8 Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Sat, 16 Aug 2014 05:57:33 +0200 Subject: [PATCH 01/15] Update java_api_guide.md with parallel --- docs/java_api_guide.md | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index 76dd3322bbf19..470610f5bbfe3 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1143,6 +1143,46 @@ You have the choice to implement either {% gh_link /flink-core/src/main/java/org [Back to top](#top) +
+Generic Operator Methods +--------- + +This section describes all methods that are available for all operators. + +### Parallelism + +`Parallelism` specifies the amount of parallel instances that each operator executes. All operators could be setted to the same amount of parallel instances, or they can be configurated individually. + +

+Parallelism is used as follows. All operators are executed by three parallel instances : +

+ +```java +int degreeOfParallelism = 3; +ExecutionEnvironment.setDefaultLocalParallelism(degreeOfParallelism); +final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +``` +

+You are able to set parallelism for each operator, in [WordCount](#example) the parallelism for each operator can be configurated as follows : +

+ +```java +DataSet> counts = +// split up the lines in pairs (2-tuples) containing: (word,1) +text.flatMap(new Tokenizer()) +// group by the tuple field "0" and sum up tuple field "1" +//set this operator's parralleism to "5" +.groupBy(0).sum(1).setParallelism(5); + + +// set this operator's parrallelism to 2 +counts.print().setParallelism(2); + +``` + +[Back to top](#top) +
Execution Plans --------------- From 4f12b3cb841062bc84016dadc963062fa24efa15 Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Tue, 19 Aug 2014 14:38:15 +0800 Subject: [PATCH 02/15] Improve writing and test # anchor --- docs/java_api_guide.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index 470610f5bbfe3..9888af5bca6b1 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1154,17 +1154,16 @@ This section describes all methods that are available for all operators. `Parallelism` specifies the amount of parallel instances that each operator executes. All operators could be setted to the same amount of parallel instances, or they can be configurated individually.

-Parallelism is used as follows. All operators are executed by three parallel instances : +Parallelism is used as follows. By this all operators are executed by three parallel instances in [WordCount](#example) :

```java int degreeOfParallelism = 3; ExecutionEnvironment.setDefaultLocalParallelism(degreeOfParallelism); final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); - ```

-You are able to set parallelism for each operator, in [WordCount](#example) the parallelism for each operator can be configurated as follows : +You are able to set parallelism for each operator, in [WordCount](#top) the parallelism for each operator can be configurated as follows :

```java @@ -1175,10 +1174,8 @@ text.flatMap(new Tokenizer()) //set this operator's parralleism to "5" .groupBy(0).sum(1).setParallelism(5); - // set this operator's parrallelism to 2 counts.print().setParallelism(2); - ``` [Back to top](#top) From 6605208132b951d4eaf615bcb2297dfc5a7ba43b Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Tue, 19 Aug 2014 14:39:51 +0800 Subject: [PATCH 03/15] to correct anchor(#) --- docs/java_api_guide.md | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index 9888af5bca6b1..f486baf8f7338 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1153,18 +1153,16 @@ This section describes all methods that are available for all operators. `Parallelism` specifies the amount of parallel instances that each operator executes. All operators could be setted to the same amount of parallel instances, or they can be configurated individually. -

Parallelism is used as follows. By this all operators are executed by three parallel instances in [WordCount](#example) : -

```java int degreeOfParallelism = 3; ExecutionEnvironment.setDefaultLocalParallelism(degreeOfParallelism); final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); ``` -

-You are able to set parallelism for each operator, in [WordCount](#top) the parallelism for each operator can be configurated as follows : -

+ +You are able to set parallelism for each operator, in [WordCount](#example) the parallelism for each operator can be configurated as follows : + ```java DataSet> counts = From 8307afc02934c8a10ff29db2d353b8aaf0605a2f Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Wed, 20 Aug 2014 22:37:10 +0800 Subject: [PATCH 04/15] improve section layout of parallel exectution --- docs/java_api_guide.md | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index f486baf8f7338..2a45a63de7fff 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -396,8 +396,10 @@ DataSet> out = in.project(2,0).types(String.class, Integ -[Back to top](#top) +You can configure each transformation for its [parallelism](#parallelism) by setParallelism(), and each transformation's name by name(). You can do the same for operators of data sources and data sinks. + +[Back to Top](#top)
Defining Keys @@ -1143,17 +1145,15 @@ You have the choice to implement either {% gh_link /flink-core/src/main/java/org [Back to top](#top) -
-Generic Operator Methods +
+Parallel Execution --------- -This section describes all methods that are available for all operators. +This section describes the detail of `parallellism` in Flink. Parallelism specifies the amount of parallel instances execute the program. All operators executed could be setted to the same amount of parallel instances, or they can be configurated individually. There are three levels of parallelism in Flink which are Operator, Execution Environment and System-level. -### Parallelism +###Execution Environment's level -`Parallelism` specifies the amount of parallel instances that each operator executes. All operators could be setted to the same amount of parallel instances, or they can be configurated individually. - -Parallelism is used as follows. By this all operators are executed by three parallel instances in [WordCount](#example) : +Parallelism at Execution Environment level is used as follows. By this all operators are executed by three parallel instances in [WordCount](#example) : ```java int degreeOfParallelism = 3; @@ -1161,6 +1161,7 @@ ExecutionEnvironment.setDefaultLocalParallelism(degreeOfParallelism); final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); ``` +###Operator's level You are able to set parallelism for each operator, in [WordCount](#example) the parallelism for each operator can be configurated as follows : @@ -1176,6 +1177,8 @@ text.flatMap(new Tokenizer()) counts.print().setParallelism(2); ``` +For setting parallelism at system level, see [system level](http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/config.html#common-options) + [Back to top](#top)
From ffd7330cd43132a822614ca47c5e54ebbd3d4ffc Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Wed, 20 Aug 2014 22:39:19 +0800 Subject: [PATCH 05/15] Correct spelling error --- docs/java_api_guide.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index 2a45a63de7fff..842b11ecc603f 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1145,13 +1145,13 @@ You have the choice to implement either {% gh_link /flink-core/src/main/java/org [Back to top](#top) -
+
Parallel Execution --------- -This section describes the detail of `parallellism` in Flink. Parallelism specifies the amount of parallel instances execute the program. All operators executed could be setted to the same amount of parallel instances, or they can be configurated individually. There are three levels of parallelism in Flink which are Operator, Execution Environment and System-level. +This section describes the detail of `parallelism` in Flink. Parallelism specifies the amount of parallel instances execute the program. All operators executed could be setted to the same amount of parallel instances, or they can be configurated individually. There are three levels of parallelism in Flink which are Operator, Execution Environment and System-level. -###Execution Environment's level +###Execution Environment's Level Parallelism at Execution Environment level is used as follows. By this all operators are executed by three parallel instances in [WordCount](#example) : @@ -1161,7 +1161,7 @@ ExecutionEnvironment.setDefaultLocalParallelism(degreeOfParallelism); final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); ``` -###Operator's level +###Operator's Level You are able to set parallelism for each operator, in [WordCount](#example) the parallelism for each operator can be configurated as follows : From 8b1e971a4c13a4e6614da3ba3e14e06a992a6293 Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Wed, 20 Aug 2014 22:40:53 +0800 Subject: [PATCH 06/15] parallelism upper case error --- docs/java_api_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index 842b11ecc603f..aff1eb7005d5b 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1145,7 +1145,7 @@ You have the choice to implement either {% gh_link /flink-core/src/main/java/org [Back to top](#top) -
+
Parallel Execution --------- From 5fd62a920f5b1489a0f9a6bedc0e9d5953d81201 Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Wed, 20 Aug 2014 22:43:07 +0800 Subject: [PATCH 07/15]
error --- docs/java_api_guide.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index aff1eb7005d5b..0c056992773c8 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1144,7 +1144,6 @@ You have the choice to implement either {% gh_link /flink-core/src/main/java/org [Back to top](#top) -
Parallel Execution --------- From e408af3a1fff969ee9f8792db126a1901afe1680 Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Thu, 21 Aug 2014 09:20:14 +0800 Subject: [PATCH 08/15] spelling correction --- docs/java_api_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index 0c056992773c8..a12e89a18f226 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1148,7 +1148,7 @@ You have the choice to implement either {% gh_link /flink-core/src/main/java/org Parallel Execution --------- -This section describes the detail of `parallelism` in Flink. Parallelism specifies the amount of parallel instances execute the program. All operators executed could be setted to the same amount of parallel instances, or they can be configurated individually. There are three levels of parallelism in Flink which are Operator, Execution Environment and System-level. +This section describes the detail of `parallelism` in Flink. Parallelism specifies the amount of parallel instances execute the program. All operators executed could be setted to the same amount of parallel instances, or they can be configurated individually. There are three levels of parallelism in Flink which are Operator, Execution Environment and System level. ###Execution Environment's Level From dfecc4b7bb2ab93ded1547cb4a85af213a48e5ab Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Tue, 2 Sep 2014 22:11:06 +0800 Subject: [PATCH 09/15] Changes according to 8/31 --- docs/java_api_guide.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index a12e89a18f226..4bc18700af4d9 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -397,7 +397,7 @@ DataSet> out = in.project(2,0).types(String.class, Integ -You can configure each transformation for its [parallelism](#parallelism) by setParallelism(), and each transformation's name by name(). You can do the same for operators of data sources and data sinks. +You can configure each transformation for its [parallelism](#parallelism) by `setParallelism()`, and each transformation's name by `name()`. You can do the same for data sources and data sinks. [Back to Top](#top) @@ -1148,11 +1148,11 @@ You have the choice to implement either {% gh_link /flink-core/src/main/java/org Parallel Execution --------- -This section describes the detail of `parallelism` in Flink. Parallelism specifies the amount of parallel instances execute the program. All operators executed could be setted to the same amount of parallel instances, or they can be configurated individually. There are three levels of parallelism in Flink which are Operator, Execution Environment and System level. +This section describes the usage of `parallelism` in Flink. Parallelism specifies the amount of parallel instances execute the program and all operators executed could be setted to the same amount of parallel instances, or they can be configurated individually. Specifically, a typical program runs as `Data Source -> Map -> Reduce -> Data Sink`. Therefore, you can deicide the number of parallel instances in Data Source, Map, Reduce and Data Sink. Based on different levels, parallelism in Flink can be distinguilished as Operator, Execution Environment and System level. The detail will be described as follows. ###Execution Environment's Level -Parallelism at Execution Environment level is used as follows. By this all operators are executed by three parallel instances in [WordCount](#example) : +Parallelism at Execution Environment level is used by `setDefaultLocalParallelism()` as follows. By this all operators are executed by three parallel instances in [WordCount](#example) : ```java int degreeOfParallelism = 3; @@ -1161,7 +1161,7 @@ final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); ``` ###Operator's Level -You are able to set parallelism for each operator, in [WordCount](#example) the parallelism for each operator can be configurated as follows : +You are able to set parallelism for each operator by `setParallelism()`, in [WordCount](#example) the parallelism for each operator can be configurated as follows ```java @@ -1176,7 +1176,9 @@ text.flatMap(new Tokenizer()) counts.print().setParallelism(2); ``` -For setting parallelism at system level, see [system level](http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/config.html#common-options) +###System's Level +The default parallelism of all the jobs and their execution enviroment and operators are configured by `parallelization.degreee.default` +The default parallelism for all jobs in a setup can be configured using the `parallelization.degreee.default` parameter in `conf/flink-conf.yaml`. See [system level](http://flink.incubator.apache.org/docs/0.6-incubating/config.html#common-options) [Back to top](#top) From dfb944b2498160a61aa8222c370147799a8ac1c9 Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Tue, 2 Sep 2014 22:12:57 +0800 Subject: [PATCH 10/15] writing error --- docs/java_api_guide.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index 4bc18700af4d9..6f168e2c6da64 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1177,8 +1177,7 @@ counts.print().setParallelism(2); ``` ###System's Level -The default parallelism of all the jobs and their execution enviroment and operators are configured by `parallelization.degreee.default` -The default parallelism for all jobs in a setup can be configured using the `parallelization.degreee.default` parameter in `conf/flink-conf.yaml`. See [system level](http://flink.incubator.apache.org/docs/0.6-incubating/config.html#common-options) +The default parallelism of all the jobs and their execution enviroment and operators are configured by `parallelization.degreee.default` in `conf/flink-conf.yaml`. See [system level](http://flink.incubator.apache.org/docs/0.6-incubating/config.html#common-options) [Back to top](#top) From 793e3452160e3980962a294fa7c9c725b7ee7708 Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Tue, 2 Sep 2014 22:15:23 +0800 Subject: [PATCH 11/15] Update java_api_guide.md --- docs/java_api_guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index 6f168e2c6da64..951d92e0b9c2d 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1145,10 +1145,10 @@ You have the choice to implement either {% gh_link /flink-core/src/main/java/org [Back to top](#top)
-Parallel Execution +Parallelism --------- -This section describes the usage of `parallelism` in Flink. Parallelism specifies the amount of parallel instances execute the program and all operators executed could be setted to the same amount of parallel instances, or they can be configurated individually. Specifically, a typical program runs as `Data Source -> Map -> Reduce -> Data Sink`. Therefore, you can deicide the number of parallel instances in Data Source, Map, Reduce and Data Sink. Based on different levels, parallelism in Flink can be distinguilished as Operator, Execution Environment and System level. The detail will be described as follows. +This section describes the usage of `parallelism` in Flink. Parallelism specifies the amount of parallel instances execute the program and all operators executed could be setted to the same amount of parallel instances, or they can be configurated individually. Specifically, a typical program runs as `Data Source -> Map -> Reduce -> Data Sink`. Therefore, you can deicide the number of parallel instances in Data Source, Map, Reduce and Data Sink. Based on different levels, parallelism in Flink can be distinguilished as Operator, Execution Environment and System level. The detail is as follows. ###Execution Environment's Level From 5a52c8d18c534d6ff63cdda338c4e37f8ca4faef Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Fri, 5 Sep 2014 15:44:42 +0800 Subject: [PATCH 12/15] Version_Stable --- docs/java_api_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index 951d92e0b9c2d..6b087d650902d 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1177,7 +1177,7 @@ counts.print().setParallelism(2); ``` ###System's Level -The default parallelism of all the jobs and their execution enviroment and operators are configured by `parallelization.degreee.default` in `conf/flink-conf.yaml`. See [system level](http://flink.incubator.apache.org/docs/0.6-incubating/config.html#common-options) +The default parallelism of all the jobs and their execution enviroment and operators are configured by `parallelization.degreee.default` in `conf/flink-conf.yaml`. See [system level](http://flink.incubator.apache.org/docs/{{site.FLINK_VERSION_STABLE }}/config.html) [Back to top](#top) From ad47ef25d0606a4866a9069e4098ba508ca2d56d Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Fri, 5 Sep 2014 15:51:38 +0800 Subject: [PATCH 13/15] Version_stable --- docs/java_api_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index 6b087d650902d..22a3d0bc0aef9 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1177,7 +1177,7 @@ counts.print().setParallelism(2); ``` ###System's Level -The default parallelism of all the jobs and their execution enviroment and operators are configured by `parallelization.degreee.default` in `conf/flink-conf.yaml`. See [system level](http://flink.incubator.apache.org/docs/{{site.FLINK_VERSION_STABLE }}/config.html) +The default parallelism of all the jobs and their execution enviroment and operators are configured by `parallelization.degreee.default` in `conf/flink-conf.yaml`. See [system level](http://flink.incubator.apache.org/docs/{{site.FLINK_VERSION_STABLE}}/config.html) [Back to top](#top) From 4f5281958c56b0ba368cbe2ff295ff09ee8b1283 Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Fri, 5 Sep 2014 16:00:58 +0800 Subject: [PATCH 14/15] site.baseurl.config --- docs/java_api_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index 22a3d0bc0aef9..736ba846a59bb 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1177,7 +1177,7 @@ counts.print().setParallelism(2); ``` ###System's Level -The default parallelism of all the jobs and their execution enviroment and operators are configured by `parallelization.degreee.default` in `conf/flink-conf.yaml`. See [system level](http://flink.incubator.apache.org/docs/{{site.FLINK_VERSION_STABLE}}/config.html) +The default parallelism of all the jobs and their execution enviroment and operators are configured by `parallelization.degreee.default` in `conf/flink-conf.yaml`. See [system level]({{site.baseurl}}/config.html) [Back to top](#top) From 6362e2383c2729a94c35b48cccb86397b9c70b8f Mon Sep 17 00:00:00 2001 From: Hung Chang Date: Fri, 5 Sep 2014 16:11:22 +0800 Subject: [PATCH 15/15] writing improvement --- docs/java_api_guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/java_api_guide.md b/docs/java_api_guide.md index 736ba846a59bb..51f33fbb12dee 100644 --- a/docs/java_api_guide.md +++ b/docs/java_api_guide.md @@ -1148,7 +1148,7 @@ You have the choice to implement either {% gh_link /flink-core/src/main/java/org Parallelism --------- -This section describes the usage of `parallelism` in Flink. Parallelism specifies the amount of parallel instances execute the program and all operators executed could be setted to the same amount of parallel instances, or they can be configurated individually. Specifically, a typical program runs as `Data Source -> Map -> Reduce -> Data Sink`. Therefore, you can deicide the number of parallel instances in Data Source, Map, Reduce and Data Sink. Based on different levels, parallelism in Flink can be distinguilished as Operator, Execution Environment and System level. The detail is as follows. +This section describes the usage of `parallelism` in Flink. Parallelism specifies the amount of parallel instances execute the program. All operators executed could be setted to the same amount of parallel instances, or they can be configurated individually. A typical program runs as `Data Source -> Map -> Reduce -> Data Sink` and you can deicide the number of parallel instances in Data Source, Map, Reduce and Data Sink. Based on different levels, parallelism in Flink can be distinguilished as Operator, Execution Environment and System level. The detail is as follows. ###Execution Environment's Level @@ -1161,7 +1161,7 @@ final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); ``` ###Operator's Level -You are able to set parallelism for each operator by `setParallelism()`, in [WordCount](#example) the parallelism for each operator can be configurated as follows +You are able to set parallelism for each operator by `setParallelism()`, in [WordCount](#example) the parallelism for each operator can be configurated as follows : ```java