Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 132 additions & 0 deletions docs/_blog/2020-03-17-announcing-the-release-of-apache-samza--1.4.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
layout: blog
title: Announcing the release of Apache Samza 1.4.0
icon: git-pull-request
authors:
- name: Cameron Lee
website:
image:
excerpt_separator: <!--more-->
---

<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# **Announcing the release of Apache Samza 1.4.0**


<!--more-->

**IMPORTANT NOTE**: We may introduce **backward incompatible changes regarding samza job submission** in the future 1.5 release. Details can be found on [SEP-23: Simplify Job Runner](https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner)

We are thrilled to announce the release of Apache Samza 1.4.0.

Today, Samza forms the backbone of hundreds of real-time production applications across a multitude of companies, such as LinkedIn, Slack, and Redfin, among many others. Samza provides leading support for large-scale stateful stream processing with:

* First class support for local state (with RocksDB store). This allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD.

* Support for incremental checkpointing of state instead of full snapshots. This enables Samza to scale to applications with very large state.

* A fully asynchronous programming model that makes parallelizing remote calls efficient and effortless.

* High level API for expressing complex stream processing pipelines in a few lines of code.

* Beam Samza Runner that marries Beam’s best in class support for EventTime based windowed processing and sophisticated triggering with Samza’s stable and scalable stateful processing model.

* A fully pluggable model for input sources (e.g. Kafka, Kinesis, DynamoDB streams etc.) and output systems (HDFS, Kafka, ElastiCache etc.).

* A Table API that provides a common abstraction for accessing remote or local databases and allowing developers are able to “join” an input event stream with such a Table.

* Flexible deployment model for running the applications in any hosting environment and with cluster managers other than YARN.

### New Features, Upgrades and Bug Fixes:
This release brings the following features, upgrades, and capabilities (highlights):

* Improvements regarding management and monitoring of local state

* Improvements to the Samza SQL API

* New system producer for Azure blob storage

* Bug fixes

Full list of the jiras addressed in this release can be found [here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20and%20fixVersion%20in%20(1.4)).

### Upgrading your application to Apache Samza 1.4.0
If an application is being upgraded to Samza 1.4, please note the following usage changes.

* The samza-autoscaling module is no longer supported, and the module has been removed.

### State
[SAMZA-2386](https://issues.apache.org/jira/browse/SAMZA-2386) Get store names should return correct store names in the presence of side inputs

[SAMZA-2324](https://issues.apache.org/jira/browse/SAMZA-2324) Adding KV store metrics for rocksdb

[SAMZA-2416](https://issues.apache.org/jira/browse/SAMZA-2416) Adding null-check before incrementing metrics for bytesSerialized

[SAMZA-2397](https://issues.apache.org/jira/browse/SAMZA-2397) Samza rocksdb metrics do not emit values after Samza version >= 1.1

[SAMZA-2447](https://issues.apache.org/jira/browse/SAMZA-2447) Checkpoint dir removal should only search in valid store dirs

### SQL
[SAMZA-2362](https://issues.apache.org/jira/browse/SAMZA-2362) Include the ScalarUDF implementations with the configured package prefix in ReflectionBasedUdfResolver.

[SAMZA-2375](https://issues.apache.org/jira/browse/SAMZA-2375) Samza-sql: Store udf original name for display purposes

[SAMZA-2376](https://issues.apache.org/jira/browse/SAMZA-2376) Samza-sql: Samza sql should handle sql statements with trailing semi-colon (;)

[SAMZA-2396](https://issues.apache.org/jira/browse/SAMZA-2396) Support dynamic addition of jars in ReflectionUdfResolver.

[SAMZA-2415](https://issues.apache.org/jira/browse/SAMZA-2415) Samza-Sql: Fix AvroRelConverter to only consider cached schema while populating SamzaSqlRelRecord for all the nested records.

[SAMZA-2425](https://issues.apache.org/jira/browse/SAMZA-2425) Samza-sql: support subquery in joins

[SAMZA-2455](https://issues.apache.org/jira/browse/SAMZA-2455) Validate the argument types in SamzaSQL UDF on execution planning phase

### Azure system producer
[SAMZA-2421](https://issues.apache.org/jira/browse/SAMZA-2421) Add SystemProducer for Azure Blob Storage

### Job coordinator dependency isolation (experimental)
[SAMZA-2332](https://issues.apache.org/jira/browse/SAMZA-2332) [AM isolation] YarnJob should pass new command and additional environment variables for AM deployment

[SAMZA-2333](https://issues.apache.org/jira/browse/SAMZA-2333) [AM isolation] Use cytodynamics classloader to launch job coordinator

### Bug fixes

[SAMZA-2334](https://issues.apache.org/jira/browse/SAMZA-2334) ProxyGrouper selection based on Host Affinity not whether job is stateful

[SAMZA-2372](https://issues.apache.org/jira/browse/SAMZA-2372) Null pointer exception in LocalApplicationRunner

[SAMZA-2443](https://issues.apache.org/jira/browse/SAMZA-2443) Upgrade Jetty version to prevent AM file descriptor leak

[SAMZA-2446](https://issues.apache.org/jira/browse/SAMZA-2446) Invoke onCheckpoint only for registered SSPs

[SAMZA-2463](https://issues.apache.org/jira/browse/SAMZA-2463) Duplicate firings of processing timers

[SAMZA-2461](https://issues.apache.org/jira/browse/SAMZA-2461) Fix Concurrent Modification Exception in InMemorySystem

### Other improvements
[SAMZA-2364](https://issues.apache.org/jira/browse/SAMZA-2364) Include the localized resource lib directory in the classpath of SamzaContainer

Clean up unused org.apache.samza.autoscaling module

[SAMZA-2444](https://issues.apache.org/jira/browse/SAMZA-2444) JobModel save in CoordinatorStreamStore resulting flush for each message

[SAMZA-2452](https://issues.apache.org/jira/browse/SAMZA-2452) Adding internal autosizing related configs

### Sources downloads
A source download of Samza 1.4.0 is available [here](https://dist.apache.org/repos/dist/release/samza/1.4.0/), and is also available in Apache’s Maven repository. See Samza’s download [page](https://samza.apache.org/startup/download/) for details and Samza’s feature preview for new features.
2 changes: 1 addition & 1 deletion docs/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ exclude: [_docs]
baseurl: http://samza.apache.org
version: latest
# this is the version you will go if you click 'switch version' in "latest" pages.
latest-release: '1.3.0'
latest-release: '1.4.0'
collections:
menu:
output: false
Expand Down
2 changes: 2 additions & 0 deletions docs/_menu/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
items_attributes: 'data-documentation="/learn/documentation/version/"'
- menu_title: Releases
items:
- menu_title: 1.4.0
url: '/releases/1.4.0'
- menu_title: 1.3.1
url: '/releases/1.3.1'
- menu_title: 1.3.0
Expand Down
123 changes: 123 additions & 0 deletions docs/_releases/1.4.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
version: '1.4'
order: 140
layout: page
menu_title: '1.4'
title: Apache Samza 1.4 <a href="/learn/documentation/1.4.0/"> [Docs] </a>
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

**IMPORTANT NOTE**: We may introduce **backward incompatible changes regarding samza job submission** in the future 1.5 release. Details can be found on [SEP-23: Simplify Job Runner](https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner)

We are thrilled to announce the release of Apache Samza 1.4.0.

Today, Samza forms the backbone of hundreds of real-time production applications across a multitude of companies, such as LinkedIn, Slack, and Redfin, among many others. Samza provides leading support for large-scale stateful stream processing with:

* First class support for local state (with RocksDB store). This allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD.

* Support for incremental checkpointing of state instead of full snapshots. This enables Samza to scale to applications with very large state.

* A fully asynchronous programming model that makes parallelizing remote calls efficient and effortless.

* High level API for expressing complex stream processing pipelines in a few lines of code.

* Beam Samza Runner that marries Beam’s best in class support for EventTime based windowed processing and sophisticated triggering with Samza’s stable and scalable stateful processing model.

* A fully pluggable model for input sources (e.g. Kafka, Kinesis, DynamoDB streams etc.) and output systems (HDFS, Kafka, ElastiCache etc.).

* A Table API that provides a common abstraction for accessing remote or local databases and allowing developers are able to “join” an input event stream with such a Table.

* Flexible deployment model for running the applications in any hosting environment and with cluster managers other than YARN.

### New Features, Upgrades and Bug Fixes:
This release brings the following features, upgrades, and capabilities (highlights):

* Improvements regarding management and monitoring of local state

* Improvements to the Samza SQL API

* New system producer for Azure blob storage

* Bug fixes

Full list of the jiras addressed in this release can be found [here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20and%20fixVersion%20in%20(1.4)).

### Upgrading your application to Apache Samza 1.4.0
If an application is being upgraded from Samza 1.3 to Samza 1.4, please note the following usage changes.

* The samza-autoscaling module is no longer supported, and the module has been removed.

### State
[SAMZA-2386](https://issues.apache.org/jira/browse/SAMZA-2386) Get store names should return correct store names in the presence of side inputs

[SAMZA-2324](https://issues.apache.org/jira/browse/SAMZA-2324) Adding KV store metrics for rocksdb

[SAMZA-2416](https://issues.apache.org/jira/browse/SAMZA-2416) Adding null-check before incrementing metrics for bytesSerialized

[SAMZA-2397](https://issues.apache.org/jira/browse/SAMZA-2397) Samza rocksdb metrics do not emit values after Samza version >= 1.1

[SAMZA-2447](https://issues.apache.org/jira/browse/SAMZA-2447) Checkpoint dir removal should only search in valid store dirs

### SQL
[SAMZA-2362](https://issues.apache.org/jira/browse/SAMZA-2362) Include the ScalarUDF implementations with the configured package prefix in ReflectionBasedUdfResolver.

[SAMZA-2375](https://issues.apache.org/jira/browse/SAMZA-2375) Samza-sql: Store udf original name for display purposes

[SAMZA-2376](https://issues.apache.org/jira/browse/SAMZA-2376) Samza-sql: Samza sql should handle sql statements with trailing semi-colon (;)

[SAMZA-2396](https://issues.apache.org/jira/browse/SAMZA-2396) Support dynamic addition of jars in ReflectionUdfResolver.

[SAMZA-2415](https://issues.apache.org/jira/browse/SAMZA-2415) Samza-Sql: Fix AvroRelConverter to only consider cached schema while populating SamzaSqlRelRecord for all the nested records.

[SAMZA-2425](https://issues.apache.org/jira/browse/SAMZA-2425) Samza-sql: support subquery in joins

[SAMZA-2455](https://issues.apache.org/jira/browse/SAMZA-2455) Validate the argument types in SamzaSQL UDF on execution planning phase

### Azure system producer
[SAMZA-2421](https://issues.apache.org/jira/browse/SAMZA-2421) Add SystemProducer for Azure Blob Storage

### Job coordinator dependency isolation (experimental)
[SAMZA-2332](https://issues.apache.org/jira/browse/SAMZA-2332) [AM isolation] YarnJob should pass new command and additional environment variables for AM deployment

[SAMZA-2333](https://issues.apache.org/jira/browse/SAMZA-2333) [AM isolation] Use cytodynamics classloader to launch job coordinator

### Bug fixes

[SAMZA-2334](https://issues.apache.org/jira/browse/SAMZA-2334) ProxyGrouper selection based on Host Affinity not whether job is stateful

[SAMZA-2372](https://issues.apache.org/jira/browse/SAMZA-2372) Null pointer exception in LocalApplicationRunner

[SAMZA-2443](https://issues.apache.org/jira/browse/SAMZA-2443) Upgrade Jetty version to prevent AM file descriptor leak

[SAMZA-2446](https://issues.apache.org/jira/browse/SAMZA-2446) Invoke onCheckpoint only for registered SSPs

[SAMZA-2463](https://issues.apache.org/jira/browse/SAMZA-2463) Duplicate firings of processing timers

[SAMZA-2461](https://issues.apache.org/jira/browse/SAMZA-2461) Fix Concurrent Modification Exception in InMemorySystem

### Other improvements
[SAMZA-2364](https://issues.apache.org/jira/browse/SAMZA-2364) Include the localized resource lib directory in the classpath of SamzaContainer

Clean up unused org.apache.samza.autoscaling module

[SAMZA-2444](https://issues.apache.org/jira/browse/SAMZA-2444) JobModel save in CoordinatorStreamStore resulting flush for each message

[SAMZA-2452](https://issues.apache.org/jira/browse/SAMZA-2452) Adding internal autosizing related configs

### Sources downloads
A source download of Samza 1.4.0 is available [here](https://dist.apache.org/repos/dist/release/samza/1.4.0/), and is also available in Apache’s Maven repository. See Samza’s download [page](https://samza.apache.org/startup/download/) for details and Samza’s feature preview for new features.
8 changes: 8 additions & 0 deletions docs/archive/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,14 @@ <h4 id="latest">Latest Release</h4>
<li><a href="../startup/hello-samza/latest">Hello Samza</a></li>
</ul>

<h4 id="1.4">1.4 Release</h4>

<ul class="documentation-list">
<li><a href="../learn/documentation/1.4.0">Documentation</a></li>
<li><a href="../learn/tutorials/1.4.0">Tutorials</a></li>
<li><a href="../startup/hello-samza/1.4.0">Hello Samza</a></li>
</ul>

<h4 id="1.3">1.3 Release</h4>

<ul class="documentation-list">
Expand Down
18 changes: 10 additions & 8 deletions docs/startup/download/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Starting from 2016, Samza will begin requiring JDK8 or higher. Please see [this

Samza tools package contains command line tools that user can run to use Samza and it's input/output systems.

* [samza-tools_2.11-1.4.0.tgz](http://www-us.apache.org/dist/samza/1.4.0/samza-tools_2.11-1.4.0.tgz)
* [samza-tools_2.11-1.3.1.tgz](http://www-us.apache.org/dist/samza/1.3.1/samza-tools_2.11-1.3.1.tgz)
* [samza-tools_2.11-1.3.0.tgz](http://www-us.apache.org/dist/samza/1.3.0/samza-tools_2.11-1.3.0.tgz)
* [samza-tools_2.11-1.2.0.tgz](http://www-us.apache.org/dist/samza/1.2.0/samza-tools_2.11-1.2.0.tgz)
Expand All @@ -40,6 +41,7 @@ Starting from 2016, Samza will begin requiring JDK8 or higher. Please see [this

### Source Releases

* [samza-sources-1.4.0.tgz](http://www.apache.org/dyn/closer.lua/samza/1.4.0)
* [samza-sources-1.3.1.tgz](http://www.apache.org/dyn/closer.lua/samza/1.3.1)
* [samza-sources-1.3.0.tgz](http://www.apache.org/dyn/closer.lua/samza/1.3.0)
* [samza-sources-1.2.0.tgz](http://www.apache.org/dyn/closer.lua/samza/1.2.0)
Expand Down Expand Up @@ -71,50 +73,50 @@ A Maven-based Samza project can pull in all required dependencies Samza dependen
<dependency>
<setId>org.apache.samza</setId>
<artifactId>samza-api</artifactId>
<version>1.3.1</version>
<version>1.4.0</version>
</dependency>
<dependency>
<setId>org.apache.samza</setId>
<artifactId>samza-core_2.11</artifactId>
<version>1.3.1</version>
<version>1.4.0</version>
<scope>runtime</scope>
</dependency>
<dependency>
<setId>org.apache.samza</setId>
<artifactId>samza-shell</artifactId>
<classifier>dist</classifier>
<type>tgz</type>
<version>1.3.1</version>
<version>1.4.0</version>
<scope>runtime</scope>
</dependency>
<dependency>
<setId>org.apache.samza</setId>
<artifactId>samza-yarn_2.11</artifactId>
<version>1.3.1</version>
<version>1.4.0</version>
<scope>runtime</scope>
</dependency>
<dependency>
<setId>org.apache.samza</setId>
<artifactId>samza-kv_2.11</artifactId>
<version>1.3.1</version>
<version>1.4.0</version>
<scope>runtime</scope>
</dependency>
<dependency>
<setId>org.apache.samza</setId>
<artifactId>samza-kv-rocksdb_2.11</artifactId>
<version>1.3.1</version>
<version>1.4.0</version>
<scope>runtime</scope>
</dependency>
<dependency>
<setId>org.apache.samza</setId>
<artifactId>samza-kv-inmemory_2.11</artifactId>
<version>1.3.1</version>
<version>1.4.0</version>
<scope>runtime</scope>
</dependency>
<dependency>
<setId>org.apache.samza</setId>
<artifactId>samza-kafka_2.11</artifactId>
<version>1.3.1</version>
<version>1.4.0</version>
<scope>runtime</scope>
</dependency>
{% endhighlight %}
Expand Down