diff --git a/docs/_blog/2020-07-01-announcing-the-release-of-apache-samza--1.5.0.md b/docs/_blog/2020-07-01-announcing-the-release-of-apache-samza--1.5.0.md new file mode 100644 index 0000000000..ca1a5029fb --- /dev/null +++ b/docs/_blog/2020-07-01-announcing-the-release-of-apache-samza--1.5.0.md @@ -0,0 +1,144 @@ +--- +layout: blog +title: Announcing the release of Apache Samza 1.5.0 +icon: git-pull-request +authors: + - name: Bharath Kumarasubramanian + website: + image: +excerpt_separator: +--- + + + +# **Announcing the release of Apache Samza 1.5.0** + + + + +**IMPORTANT NOTE**: As noted in the last release, this release contains **backward incompatible changes regarding samza job submission**. Details can be found on [SEP-23: Simplify Job Runner](https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner) + +We are thrilled to announce the release of Apache Samza 1.5.0. + +Today, Samza forms the backbone of hundreds of real-time production applications across a multitude of companies, such as LinkedIn, Slack, and Redfin, among many others. Samza provides leading support for large-scale stateful stream processing with: + +* First class support for local states (with RocksDB store). This allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD. + +* Support for incremental checkpointing of state instead of full snapshots. This enables Samza to scale to applications with very large states. + +* A fully asynchronous programming model that makes parallelizing remote calls efficient and effortless. + +* High level API for expressing complex stream processing pipelines in a few lines of code. + +* Beam Samza Runner that marries Beam’s best in class support for EventTime based windowed processing and sophisticated triggering with Samza’s stable and scalable stateful processing model. + +* A fully pluggable model for input sources (e.g. Kafka, Kinesis, DynamoDB streams etc.) and output systems (HDFS, Kafka, ElastiCache etc.). + +* A Table API that provides a common abstraction for accessing remote or local databases and allows developers to “join” an input event stream with such a Table. + +* Flexible deployment model for running the applications in any hosting environment and with cluster managers other than YARN. + +### New Features, Upgrades and Bug Fixes: +This release brings the following features, upgrades, and capabilities (highlights): + +#### Samza Container Placement +Container Placements API gives you the ability to move / restart one or more containers (either active or standby) of your cluster based applications from one host to another without restarting your application. You can use these API to build maintenance, balancing & remediation tools. + +#### Simplify Job Runner & Configs +Job Runner will now simply submit Samza job to Yarn RM without executing any user code and job planning will happen on ClusterBasedJobCoordinator instead. This simplified workflow addresses security requirements where job submissions need to be isolated in order to execute user code as well as operational pain points where deployment failure could happen at multiple places. + +Full list of the jiras addressed in this release can be found [here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20and%20fixVersion%20in%20(1.5)). + +### Upgrading your application to Apache Samza 1.5.0 +ConfigFactory is deprecated as Job Runner does not load full job config anymore. Instead, ConfigLoaderFactory is introduced to be executed on ClusterBasedJobCoordinator to fetch full job config. +If you are using the default PropertiesConfigFactory, simply switching to use the default PropertiesConfigLoaderFactory will work, otherwise if you are using a custom ConfigFactory, kindly creates its new counterpart following ConfigLoaderFactory. + +Configs related to job submission must be explicitly provided to Job Runner as it is no longer loading full job config anymore. These configs include + +* Configs directly related to job submission, such as yarn.package.path, job.name etc. +* Configs needed by the config loader on AM to fetch job config, such as path to the property file in the tarball, all of such configs will have a job.config.loader.properties prefix. +* Configs that users would like to override + +Full list of the job submission configurations can be found [here](https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner#SEP23:SimplifyJobRunner-References) + +#### Usage Instructions +Alternative way when submitting job, +{% highlight bash %} +deploy/samza/bin/run-app.sh + --config yarn.package.path= + --config job.name= +{% endhighlight %} +can be simplified to +{% highlight bash %} +deploy/samza/bin/run-app.sh + --config-path=/path/to/submission/properties/file/submission.properties +{% endhighlight %} +where submission.properties contains +{% highlight jproperties %} +yarn.package.path= +job.name= +{% endhighlight %} + +#### Rollback Instructions +In case of a problem in Samza 1.5, users can rollback to Samza 1.4 and keep the old start up flow using _config-path_ & _config-factory_. + +### Simplify Job Runner & Configs +[SAMZA-2488](https://issues.apache.org/jira/browse/SAMZA-2488) Add JobCoordinatorLaunchUtil to handle common logic when launching job coordinator + +[SAMZA-2471](https://issues.apache.org/jira/browse/SAMZA-2471) Simplify CommandLine + +[SAMZA-2458](https://issues.apache.org/jira/browse/SAMZA-2458) Update ProcessJobFactory and ThreadJobFactory to load full job config + +[SAMZA-2453](https://issues.apache.org/jira/browse/SAMZA-2453) Update ClusterBasedJobCoordinator to support Beam jobs + +[SAMZA-2441](https://issues.apache.org/jira/browse/SAMZA-2441) Update ApplicationRunnerMain#ApplicationRunnerCommandLine not to load local file + +[SAMZA-2420](https://issues.apache.org/jira/browse/SAMZA-2420) Update CommandLine to use config loader for local config file + +### Container Placement API +[SAMZA-2402](https://issues.apache.org/jira/browse/SAMZA-2402) Tie Container placement service and Container placement handler and validate placement requests + +[SAMZA-2379](https://issues.apache.org/jira/browse/SAMZA-2379) Support Container Placements for job running in degraded state + +[SAMZA-2378](https://issues.apache.org/jira/browse/SAMZA-2378) Container Placements support for Standby containers enabled jobs + + +### Bug Fixes +[SAMZA-2515](https://issues.apache.org/jira/browse/SAMZA-2515) Thread safety for Kafka consumer in KafkaConsumerProxy + +[SAMZA-2511](https://issues.apache.org/jira/browse/SAMZA-2511) Handle container-stop-fail in case of standby container failover + +[SAMZA-2510](https://issues.apache.org/jira/browse/SAMZA-2510) Incorrect shutdown status due to race between runloop thread and process callback thread + +[SAMZA-2506](https://issues.apache.org/jira/browse/SAMZA-2506) Inconsistent end of stream semantics in SystemStreamPartitionMetadata + +[SAMZA-2464](https://issues.apache.org/jira/browse/SAMZA-2464) Container shuts down when task fails to remove old state checkpoint dirs + +[SAMZA-2468](https://issues.apache.org/jira/browse/SAMZA-2468) Standby container needs to respond to shutdown request + +### Other Improvements +[SAMZA-2519](https://issues.apache.org/jira/browse/SAMZA-2519) Support duplicate timer registration + +[SAMZA-2508](https://issues.apache.org/jira/browse/SAMZA-2508) Use cytodynamics classloader to launch job container + +[SAMZA-2478](https://issues.apache.org/jira/browse/SAMZA-2478) Add new metrics to track key and value size of records written to RocksDb + +[SAMZA-2462](https://issues.apache.org/jira/browse/SAMZA-2462) Adding metric for container thread pool size + +### Sources downloads +A source download of Samza 1.5.0 is available [here](https://dist.apache.org/repos/dist/release/samza/1.5.0/), and is also available in Apache’s Maven repository. See Samza’s download [page](https://samza.apache.org/startup/download/) for details and Samza’s feature preview for new features. diff --git a/docs/_config.yml b/docs/_config.yml index da3f72cfaa..c62607f1f1 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -25,7 +25,7 @@ exclude: [_docs] baseurl: http://samza.apache.org version: latest # this is the version you will go if you click 'switch version' in "latest" pages. -latest-release: '1.4.0' +latest-release: '1.5.0' collections: menu: output: false diff --git a/docs/_menu/index.html b/docs/_menu/index.html index dc74b8fda1..243bc713b7 100644 --- a/docs/_menu/index.html +++ b/docs/_menu/index.html @@ -12,6 +12,8 @@ items_attributes: 'data-documentation="/learn/documentation/version/"' - menu_title: Releases items: + - menu_title: 1.5.0 + url: '/releases/1.5.0' - menu_title: 1.4.0 url: '/releases/1.4.0' - menu_title: 1.3.1 diff --git a/docs/_releases/1.5.0.md b/docs/_releases/1.5.0.md new file mode 100644 index 0000000000..614a86ff4c --- /dev/null +++ b/docs/_releases/1.5.0.md @@ -0,0 +1,135 @@ +--- +version: '1.5.0' +order: 150 +layout: page +menu_title: '1.5' +title: Apache Samza 1.5 [Docs] +--- + + +**IMPORTANT NOTE**: As noted in the last release, this release contains **backward incompatible changes regarding samza job submission**. Details can be found on [SEP-23: Simplify Job Runner](https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner) + +We are thrilled to announce the release of Apache Samza 1.5.0. + +Today, Samza forms the backbone of hundreds of real-time production applications across a multitude of companies, such as LinkedIn, Slack, and Redfin, among many others. Samza provides leading support for large-scale stateful stream processing with: + +* First class support for local states (with RocksDB store). This allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD. + +* Support for incremental checkpointing of state instead of full snapshots. This enables Samza to scale to applications with very large states. + +* A fully asynchronous programming model that makes parallelizing remote calls efficient and effortless. + +* High level API for expressing complex stream processing pipelines in a few lines of code. + +* Beam Samza Runner that marries Beam’s best in class support for EventTime based windowed processing and sophisticated triggering with Samza’s stable and scalable stateful processing model. + +* A fully pluggable model for input sources (e.g. Kafka, Kinesis, DynamoDB streams etc.) and output systems (HDFS, Kafka, ElastiCache etc.). + +* A Table API that provides a common abstraction for accessing remote or local databases and allows developers to “join” an input event stream with such a Table. + +* Flexible deployment model for running the applications in any hosting environment and with cluster managers other than YARN. + +### New Features, Upgrades and Bug Fixes: +This release brings the following features, upgrades, and capabilities (highlights): + +#### Samza Container Placement +Container Placements API gives you the ability to move / restart one or more containers (either active or standby) of your cluster based applications from one host to another without restarting your application. You can use these api to build maintenance, balancing & remediation tools. + +#### Simplify Job Runner & Configs +Job Runner will now simply submit Samza job to Yarn RM without executing any user code and job planning will happen on ClusterBasedJobCoordinator instead. This simplified workflow addresses security requirements where job submissions need to be isolated in order to execute user code as well as operational pain points where deployment failure could happen at multiple places. + +Full list of the jiras addressed in this release can be found [here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20and%20fixVersion%20in%20(1.5)). + +### Upgrading your application to Apache Samza 1.5.0 +ConfigFactory is deprecated as Job Runner does not load full job config anymore. Instead, ConfigLoaderFactory is introduced to be executed on ClusterBasedJobCoordinator to fetch full job config. +If you are using the default PropertiesConfigFactory, simply switching to use the default PropertiesConfigLoaderFactory will work, otherwise if you are using a custom ConfigFactory, kindly creates its new counterpart following ConfigLoaderFactory. + +Configs related to job submission must be explicitly provided to Job Runner as it is no longer loading full job config anymore. These configs include + +* Configs directly related to job submission, such as yarn.package.path, job.name etc. +* Configs needed by the config loader on AM to fetch job config, such as path to the property file in the tarball, all of such configs will have a job.config.loader.properties prefix. +* Configs that users would like to override + +Full list of the job submission configurations can be found [here](https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner#SEP23:SimplifyJobRunner-References) + +#### Usage Instructions +Alternative way when submitting job, +{% highlight bash %} +deploy/samza/bin/run-app.sh + --config yarn.package.path= + --config job.name= +{% endhighlight %} +can be simplified to +{% highlight bash %} +deploy/samza/bin/run-app.sh + --config-path=/path/to/submission/properties/file/submission.properties +{% endhighlight %} +where submission.properties contains +{% highlight jproperties %} +yarn.package.path= +job.name= +{% endhighlight %} + +#### Rollback Instructions +In case of a problem in Samza 1.5, users can rollback to Samza 1.4 and keep the old start up flow using _config-path_ & _config-factory_. + +### Simplify Job Runner +[SAMZA-2488](https://issues.apache.org/jira/browse/SAMZA-2488) Add JobCoordinatorLaunchUtil to handle common logic when launching job coordinator + +[SAMZA-2471](https://issues.apache.org/jira/browse/SAMZA-2471) Simplify CommandLine + +[SAMZA-2458](https://issues.apache.org/jira/browse/SAMZA-2458) Update ProcessJobFactory and ThreadJobFactory to load full job config + +[SAMZA-2453](https://issues.apache.org/jira/browse/SAMZA-2453) Update ClusterBasedJobCoordinator to support Beam jobs + +[SAMZA-2441](https://issues.apache.org/jira/browse/SAMZA-2441) Update ApplicationRunnerMain#ApplicationRunnerCommandLine not to load local file + +[SAMZA-2420](https://issues.apache.org/jira/browse/SAMZA-2420) Update CommandLine to use config loader for local config file + +### Container Placement API +[SAMZA-2402](https://issues.apache.org/jira/browse/SAMZA-2402) Tie Container placement service and Container placement handler and validate placement requests + +[SAMZA-2379](https://issues.apache.org/jira/browse/SAMZA-2379) Support Container Placements for job running in degraded state + +[SAMZA-2378](https://issues.apache.org/jira/browse/SAMZA-2378) Container Placements support for Standby containers enabled jobs + + +### Bug Fixes +[SAMZA-2515](https://issues.apache.org/jira/browse/SAMZA-2515) Thread safety for Kafka consumer in KafkaConsumerProxy + +[SAMZA-2511](https://issues.apache.org/jira/browse/SAMZA-2511) Handle container-stop-fail in case of standby container failover + +[SAMZA-2510](https://issues.apache.org/jira/browse/SAMZA-2510) Incorrect shutdown status due to race between runloop thread and process callback thread + +[SAMZA-2506](https://issues.apache.org/jira/browse/SAMZA-2506) Inconsistent end of stream semantics in SystemStreamPartitionMetadata + +[SAMZA-2464](https://issues.apache.org/jira/browse/SAMZA-2464) Container shuts down when task fails to remove old state checkpoint dirs + +[SAMZA-2468](https://issues.apache.org/jira/browse/SAMZA-2468) Standby container needs to respond to shutdown request + +### Other Improvements +[SAMZA-2519](https://issues.apache.org/jira/browse/SAMZA-2519) Support duplicate timer registration + +[SAMZA-2508](https://issues.apache.org/jira/browse/SAMZA-2508) Use cytodynamics classloader to launch job container + +[SAMZA-2478](https://issues.apache.org/jira/browse/SAMZA-2478) Add new metrics to track key and value size of records written to RocksDb + +[SAMZA-2462](https://issues.apache.org/jira/browse/SAMZA-2462) Adding metric for container thread pool size + +### Sources downloads +A source download of Samza 1.5.0 is available [here](https://dist.apache.org/repos/dist/release/samza/1.5.0/), and is also available in Apache’s Maven repository. See Samza’s download [page](https://samza.apache.org/startup/download/) for details and Samza’s feature preview for new features. diff --git a/docs/archive/index.html b/docs/archive/index.html index 46dc25ab4f..11fe0924c2 100644 --- a/docs/archive/index.html +++ b/docs/archive/index.html @@ -27,6 +27,14 @@

Latest Release

  • Hello Samza
  • +

    1.5 Release

    + + +

    1.4 Release

      diff --git a/docs/learn/tutorials/versioned/hello-samza-high-level-yarn.md b/docs/learn/tutorials/versioned/hello-samza-high-level-yarn.md index b0fc9fe998..46f762fb08 100644 --- a/docs/learn/tutorials/versioned/hello-samza-high-level-yarn.md +++ b/docs/learn/tutorials/versioned/hello-samza-high-level-yarn.md @@ -63,7 +63,7 @@ Then, you can continue w/ the following command in hello-samza project: {% highlight bash %} mvn clean package mkdir -p deploy/samza -tar -xvf ./target/hello-samza-1.4.0-SNAPSHOT-dist.tar.gz -C deploy/samza +tar -xvf ./target/hello-samza-1.6.0-SNAPSHOT-dist.tar.gz -C deploy/samza {% endhighlight %} ### Run a Samza Application diff --git a/docs/learn/tutorials/versioned/hello-samza-high-level-zk.md b/docs/learn/tutorials/versioned/hello-samza-high-level-zk.md index aa139e0674..ae1d9c1661 100644 --- a/docs/learn/tutorials/versioned/hello-samza-high-level-zk.md +++ b/docs/learn/tutorials/versioned/hello-samza-high-level-zk.md @@ -59,7 +59,7 @@ With the environment setup complete, let us move on to building the hello-samza {% highlight bash %} mvn clean package mkdir -p deploy/samza -tar -xvf ./target/hello-samza-1.4.0-SNAPSHOT-dist.tar.gz -C deploy/samza +tar -xvf ./target/hello-samza-1.6.0-SNAPSHOT-dist.tar.gz -C deploy/samza {% endhighlight %} We are now all set to deploy the application locally. diff --git a/docs/learn/tutorials/versioned/samza-rest-getting-started.md b/docs/learn/tutorials/versioned/samza-rest-getting-started.md index d1a8b4b2d4..bbeba717be 100644 --- a/docs/learn/tutorials/versioned/samza-rest-getting-started.md +++ b/docs/learn/tutorials/versioned/samza-rest-getting-started.md @@ -48,7 +48,7 @@ Run the following commands: {% highlight bash %} cd samza-rest/build/distributions/ mkdir -p deploy/samza-rest -tar -xvf ./samza-rest_2.11-1.4.0-SNAPSHOT.tgz -C deploy/samza-rest +tar -xvf ./samza-rest_2.11-1.6.0-SNAPSHOT.tgz -C deploy/samza-rest {% endhighlight %} #### Configure the Installations Path diff --git a/docs/startup/download/index.md b/docs/startup/download/index.md index 4a3322bbf4..41aa4402da 100644 --- a/docs/startup/download/index.md +++ b/docs/startup/download/index.md @@ -31,6 +31,7 @@ Starting from 2016, Samza will begin requiring JDK8 or higher. Please see [this Samza tools package contains command line tools that user can run to use Samza and it's input/output systems. + * [samza-tools_2.11-1.5.0.tgz](http://www-us.apache.org/dist/samza/1.5.0/samza-tools_2.11-1.5.0.tgz) * [samza-tools_2.11-1.4.0.tgz](http://www-us.apache.org/dist/samza/1.4.0/samza-tools_2.11-1.4.0.tgz) * [samza-tools_2.11-1.3.1.tgz](http://www-us.apache.org/dist/samza/1.3.1/samza-tools_2.11-1.3.1.tgz) * [samza-tools_2.11-1.3.0.tgz](http://www-us.apache.org/dist/samza/1.3.0/samza-tools_2.11-1.3.0.tgz) @@ -41,6 +42,7 @@ Starting from 2016, Samza will begin requiring JDK8 or higher. Please see [this ### Source Releases + * [samza-sources-1.5.0.tgz](http://www.apache.org/dyn/closer.lua/samza/1.5.0) * [samza-sources-1.4.0.tgz](http://www.apache.org/dyn/closer.lua/samza/1.4.0) * [samza-sources-1.3.1.tgz](http://www.apache.org/dyn/closer.lua/samza/1.3.1) * [samza-sources-1.3.0.tgz](http://www.apache.org/dyn/closer.lua/samza/1.3.0) @@ -73,12 +75,12 @@ A Maven-based Samza project can pull in all required dependencies Samza dependen org.apache.samza samza-api - 1.4.0 + 1.5.0 org.apache.samza samza-core_2.11 - 1.4.0 + 1.5.0 runtime @@ -86,37 +88,37 @@ A Maven-based Samza project can pull in all required dependencies Samza dependen samza-shell dist tgz - 1.4.0 + 1.5.0 runtime org.apache.samza samza-yarn_2.11 - 1.4.0 + 1.5.0 runtime org.apache.samza samza-kv_2.11 - 1.4.0 + 1.5.0 runtime org.apache.samza samza-kv-rocksdb_2.11 - 1.4.0 + 1.5.0 runtime org.apache.samza samza-kv-inmemory_2.11 - 1.4.0 + 1.5.0 runtime org.apache.samza samza-kafka_2.11 - 1.4.0 + 1.5.0 runtime {% endhighlight %} diff --git a/docs/startup/hello-samza/versioned/index.md b/docs/startup/hello-samza/versioned/index.md index 4b986ca672..8f0add35f8 100644 --- a/docs/startup/hello-samza/versioned/index.md +++ b/docs/startup/hello-samza/versioned/index.md @@ -63,7 +63,7 @@ Then, you can continue w/ the following command in hello-samza project: {% highlight bash %} mvn clean package mkdir -p deploy/samza -tar -xvf ./target/hello-samza-1.5.0-SNAPSHOT-dist.tar.gz -C deploy/samza +tar -xvf ./target/hello-samza-1.6.0-SNAPSHOT-dist.tar.gz -C deploy/samza {% endhighlight %} ### Run a Samza Job diff --git a/gradle.properties b/gradle.properties index cb5da70c7c..6fb78db8ee 100644 --- a/gradle.properties +++ b/gradle.properties @@ -15,7 +15,7 @@ # specific language governing permissions and limitations # under the License. group=org.apache.samza -version=1.5.0-SNAPSHOT +version=1.6.0-SNAPSHOT scalaSuffix=2.11 # after changing this value, run `$ ./gradlew wrapper` and commit the resulting changed files