Merge state backend async commit by dxichen · Pull Request #1513 · apache/samza

dxichen · 2021-08-09T17:57:23Z

Update feature branch state-backend-async-commit to include latest changes from master

…ts other than byte[] to SystemProducer (apache#1488) * SAMZA-2646: Minor refactor of StreamAppender to support sending formats other than byte[] to SystemProducer * correct method name * fix build failure

… java (apache#1496) Issues: For Samza on Kubernetes, we might need to update/extend this class, so converting to Java in order to reduce the Scala touchpoints. Changes: 1. Converted ShellCommandBuilder to Java 2. Updated unit tests for ShellCommandBuilder to add some more code coverage API changes and usage/upgrade instructions: N/A

Issues: Some code was added for SEP-24 some time ago (apache#1172, apache#1173), but we are not moving forward with SEP-24 because it does not cleanly handle certain use cases. Since we don't need this code, it should get removed. Changes: 1. Removed unused flows related to job coordinator dependency isolation. 2. Removed unused classloader separation utils. API changes and usage/upgrade instructions: Removed some configs and environment variables related to split deployment, but the feature wasn't complete, so those shouldn't be used by Samza jobs anyways.

…ter to gauge (apache#1497) Description: some of the AM HA metrics introduced in PR apache#1455 are counters. these get incremented correctly but go back to 0 and can not be measured a little later in time. Changes: making these counters as gauges.

…s not used (apache#1494) Issues: There is currently only one concrete FaultDomainManager implementation, and it is for YARN usage. It is also specified as the default in code. However, for non-YARN environments (such as Kubernetes), we shouldn't include YARN dependencies, so we need another concrete implementation. For now, we can just add a very simple implementation as a placeholder. Changes: Added SingleFaultDomainManager which is a simple impl of FaultDomainManager which only has a single fault domain. This isn't wired in anywhere, but a job can configure it as the FaultDomainManager if desirable. Any features which depend on fault domain awareness won't work (e.g. using standby containers with cluster-manager.fault-domain-aware.standby.enabled = true won't work), but it can still be a useful placeholder when standby containers are not needed. API changes and usage/upgrade instructions: Set the config cluster-manager.fault-domain-manager.factory to org.apache.samza.clustermanager. SingleFaultDomainManagerFactory to use this new fault domain manager.

Issues: Currently, the port for the job coordinator url (for accessing job model) is always dynamically allocated. In some cases, it is helpful to be able to hardcode a port. Changes: When JobModelManager is setting up the HttpServer which serves the coordinator url, read a config to set the port. API changes: Set the config cluster-manager.jobcoordinator.url.port to the port number to use for the coordinator url. This is backwards compatible because the default is to use a value of 0 (to dynamically allocate an unused port), and a value of 0 was the value used before this change was made. If the specified port is unavailable, then an exception will be thrown at startup. Therefore, this configuration should only be used when a specific port is needed and it is known that the port is not already in use. In many cases, using dynamic port allocations (i.e. not setting cluster-manager.jobcoordinator.url.port) will be the best way to go, since it will prevent failures due to port conflicts. See apache#1499 for more details about use cases.

…e#1495) Issues: In environments without an external metrics system, it can sometimes be hard to access any metrics from a Samza job. It can be useful to have a MetricsReporter which logs metrics so that they can be accessed in a simple way with minimal dependencies. This can be used to help verify certain flows are working. An example case for using this is when testing Samza on a bare-bones Kubernetes cluster in minikube. Changes: Added LoggingMetricsReporter which periodically logs metrics to a log file. API changes and usage/upgrade instructions: Add a new metrics reporter in the config which uses org.apache.samza.metrics.reporter.LoggingMetricsReporterFactory. See apache#1495 for an example of configuration.

…pache#1498) Issue: Currently, the pathing.jar file used to specify the classpath for run-class.sh is written directly into the top-level working directory of the app. When using container images, there may be some predefined permissions set on the working directory. This may cause permission issues with being able to write the pathing.jar file. Changes: Create a separate directory for writing classpath-related files (manifest.txt, pathing.jar) and write the pathing.jar to that separate directory. Update the classpath arguments to use the new location of the pathing.jar. API changes, usage/upgrade instructions: N/A See apache#1498 for more context about a use case for this change.

…pache#1500) Expose RocksDB maxOpenFiles setting as Samza parameter

…-changelog SAMZA-398: Remove force NONE compression for changelog topic producer

…ache#1502)

…mProducer (apache#1507) * SAMZA-2660: Add authentication via TokenCredential for AzureBlobSystemProducer * remove unnecessary changes in AzureBlobConfig * fix failures * add java doc and to configurations.md

…marks for high-level (apache#1506) Symptom: End-of-stream and watermarks are not properly propagated through Samza when side inputs are used. This prevents many tests from using the TestRunner framework, since the TestRunner framework relies on having tasks shut themselves down based on end-of-stream messages. Being able to use TestRunner is helpful because it significantly decreases test times. Cause: OperatorImplGraph builds EndOfStreamStates and WatermarkStates objects with all of the input SSPs from the job model. That includes side-input SSPs. However, high-level operator tasks aren't given messages from side-input SSPs, so high-level operators should not need to include handling for end-of-stream and watermarks. The result of this issue is that end-of-stream and watermark handling tries to include side-inputs but never updates those states, which can result in not exiting properly (end-of-stream) and not correctly calculating watermarks. Changes: 1. Pass set of SSPs excluding side-inputs to high-level operators so that they don't read directly from the task model which does have side-inputs. High-level operators will then handle end-of-stream and watermark propagation without considering side-input SSPs. 2. Change InMemoryManager to only use IncomingMessageEnvelope.END_OF_STREAM_OFFSET when the taskName in the EndOfStreamMessage is null. This prevents the issue with SAMZA-2300 which causes end-of-stream messages to not get properly get aggregated and then broadcast to all partitions (see SAMZA-2300 for more details). Some existing tests would fail without this change. 3. Add unique app.id in TestRunner for each test. This helps prevents clashes between different tests. For example, ControlMessageSender has a static cache keyed by stream id of intermediate streams, and multiple tests could end up using the same key in that cache. By using a unique app id, the intermediate streams are unique, so multiple tests won't use the same key in the cache. API changes (impacts testing framework only): 1. The default app.id used for tests executed by TestRunner is set to the "in-memory scope", which is a string that is randomly generated for each test. Before this change, the app.id was not set. 2. InMemoryManager only uses IncomingMessageEnvelope.END_OF_STREAM_OFFSET when the EndOfStreamMessage has a null taskName. Before this change, InMemoryManager used IncomingMessageEnvelope.END_OF_STREAM_OFFSET for all EndOfStreamMessages. Upgrade/usage instructions: 1. If tests are written using TestRunner, and those tests rely on app.id being unset, then those will need to be updated to use/read the new app.id. It isn't expected to be a common use case that tests rely on app.id. 2. If the in-memory system is being used (which includes tests written using TestRunner), and it is expected that the in-memory system sets END_OF_STREAM_OFFSET for messages when the taskName is non-null, then that usage will need to be removed. The taskName is intended for use by intermediate streams, so it shouldn't be used outside of Samza internals anyways.

… InterruptedException is thrown (apache#1511) * Samza-2665: AzureBlob SystemProducer: stop block upload retrying when InterruptedException is thrown * address comment: use String.format

perkss and others added 17 commits April 17, 2021 12:11

SAMZA-398: Remove force NONE compression for changelog topic producer

9d9ebc7

SAMZA-2646: Minor refactor of StreamAppender to support sending forma…

8ecd445

…ts other than byte[] to SystemProducer (apache#1488) * SAMZA-2646: Minor refactor of StreamAppender to support sending formats other than byte[] to SystemProducer * correct method name * fix build failure

[SAMZA-2655] Expose RocksDB maxOpenFiles setting as Samza parameter (a…

23627cc

…pache#1500) Expose RocksDB maxOpenFiles setting as Samza parameter

Merge pull request apache#1492 from perkss/SAMZA-398-allow-compressed…

d5f5494

…-changelog SAMZA-398: Remove force NONE compression for changelog topic producer

KafkaChangelogKeySerde should support deserializing multiple keys (ap…

08931ce

…ache#1502)

SAMZA-1885: Enable parallel task execution for build (apache#1504)

1f0f8bd

SAMZA-2660: Add authentication via TokenCredential for AzureBlobSyste…

337628e

…mProducer (apache#1507) * SAMZA-2660: Add authentication via TokenCredential for AzureBlobSystemProducer * remove unnecessary changes in AzureBlobConfig * fix failures * add java doc and to configurations.md

Samza-2665: AzureBlob SystemProducer: stop block upload retrying when…

8fa07fe

… InterruptedException is thrown (apache#1511) * Samza-2665: AzureBlob SystemProducer: stop block upload retrying when InterruptedException is thrown * address comment: use String.format

Merge branch 'master' into state-backend-async-commit

e975b86

prateekm merged commit c95d828 into apache:state-backend-async-commit Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge state backend async commit#1513

Merge state backend async commit#1513
prateekm merged 17 commits intoapache:state-backend-async-commitfrom
dxichen:merge-state-backend-async-commit

dxichen commented Aug 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Comments

Conversation

dxichen commented Aug 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Comments