Merge state backend async commit#1513
Merged
prateekm merged 17 commits intoapache:state-backend-async-commitfrom Aug 9, 2021
Merged
Merge state backend async commit#1513prateekm merged 17 commits intoapache:state-backend-async-commitfrom
prateekm merged 17 commits intoapache:state-backend-async-commitfrom
Conversation
…ts other than byte[] to SystemProducer (apache#1488) * SAMZA-2646: Minor refactor of StreamAppender to support sending formats other than byte[] to SystemProducer * correct method name * fix build failure
… java (apache#1496) Issues: For Samza on Kubernetes, we might need to update/extend this class, so converting to Java in order to reduce the Scala touchpoints. Changes: 1. Converted ShellCommandBuilder to Java 2. Updated unit tests for ShellCommandBuilder to add some more code coverage API changes and usage/upgrade instructions: N/A
Issues: Some code was added for SEP-24 some time ago (apache#1172, apache#1173), but we are not moving forward with SEP-24 because it does not cleanly handle certain use cases. Since we don't need this code, it should get removed. Changes: 1. Removed unused flows related to job coordinator dependency isolation. 2. Removed unused classloader separation utils. API changes and usage/upgrade instructions: Removed some configs and environment variables related to split deployment, but the feature wasn't complete, so those shouldn't be used by Samza jobs anyways.
…ter to gauge (apache#1497) Description: some of the AM HA metrics introduced in PR apache#1455 are counters. these get incremented correctly but go back to 0 and can not be measured a little later in time. Changes: making these counters as gauges.
…s not used (apache#1494) Issues: There is currently only one concrete FaultDomainManager implementation, and it is for YARN usage. It is also specified as the default in code. However, for non-YARN environments (such as Kubernetes), we shouldn't include YARN dependencies, so we need another concrete implementation. For now, we can just add a very simple implementation as a placeholder. Changes: Added SingleFaultDomainManager which is a simple impl of FaultDomainManager which only has a single fault domain. This isn't wired in anywhere, but a job can configure it as the FaultDomainManager if desirable. Any features which depend on fault domain awareness won't work (e.g. using standby containers with cluster-manager.fault-domain-aware.standby.enabled = true won't work), but it can still be a useful placeholder when standby containers are not needed. API changes and usage/upgrade instructions: Set the config cluster-manager.fault-domain-manager.factory to org.apache.samza.clustermanager. SingleFaultDomainManagerFactory to use this new fault domain manager.
Issues: Currently, the port for the job coordinator url (for accessing job model) is always dynamically allocated. In some cases, it is helpful to be able to hardcode a port. Changes: When JobModelManager is setting up the HttpServer which serves the coordinator url, read a config to set the port. API changes: Set the config cluster-manager.jobcoordinator.url.port to the port number to use for the coordinator url. This is backwards compatible because the default is to use a value of 0 (to dynamically allocate an unused port), and a value of 0 was the value used before this change was made. If the specified port is unavailable, then an exception will be thrown at startup. Therefore, this configuration should only be used when a specific port is needed and it is known that the port is not already in use. In many cases, using dynamic port allocations (i.e. not setting cluster-manager.jobcoordinator.url.port) will be the best way to go, since it will prevent failures due to port conflicts. See apache#1499 for more details about use cases.
…e#1495) Issues: In environments without an external metrics system, it can sometimes be hard to access any metrics from a Samza job. It can be useful to have a MetricsReporter which logs metrics so that they can be accessed in a simple way with minimal dependencies. This can be used to help verify certain flows are working. An example case for using this is when testing Samza on a bare-bones Kubernetes cluster in minikube. Changes: Added LoggingMetricsReporter which periodically logs metrics to a log file. API changes and usage/upgrade instructions: Add a new metrics reporter in the config which uses org.apache.samza.metrics.reporter.LoggingMetricsReporterFactory. See apache#1495 for an example of configuration.
…pache#1498) Issue: Currently, the pathing.jar file used to specify the classpath for run-class.sh is written directly into the top-level working directory of the app. When using container images, there may be some predefined permissions set on the working directory. This may cause permission issues with being able to write the pathing.jar file. Changes: Create a separate directory for writing classpath-related files (manifest.txt, pathing.jar) and write the pathing.jar to that separate directory. Update the classpath arguments to use the new location of the pathing.jar. API changes, usage/upgrade instructions: N/A See apache#1498 for more context about a use case for this change.
…pache#1500) Expose RocksDB maxOpenFiles setting as Samza parameter
…-changelog SAMZA-398: Remove force NONE compression for changelog topic producer
…mProducer (apache#1507) * SAMZA-2660: Add authentication via TokenCredential for AzureBlobSystemProducer * remove unnecessary changes in AzureBlobConfig * fix failures * add java doc and to configurations.md
…marks for high-level (apache#1506) Symptom: End-of-stream and watermarks are not properly propagated through Samza when side inputs are used. This prevents many tests from using the TestRunner framework, since the TestRunner framework relies on having tasks shut themselves down based on end-of-stream messages. Being able to use TestRunner is helpful because it significantly decreases test times. Cause: OperatorImplGraph builds EndOfStreamStates and WatermarkStates objects with all of the input SSPs from the job model. That includes side-input SSPs. However, high-level operator tasks aren't given messages from side-input SSPs, so high-level operators should not need to include handling for end-of-stream and watermarks. The result of this issue is that end-of-stream and watermark handling tries to include side-inputs but never updates those states, which can result in not exiting properly (end-of-stream) and not correctly calculating watermarks. Changes: 1. Pass set of SSPs excluding side-inputs to high-level operators so that they don't read directly from the task model which does have side-inputs. High-level operators will then handle end-of-stream and watermark propagation without considering side-input SSPs. 2. Change InMemoryManager to only use IncomingMessageEnvelope.END_OF_STREAM_OFFSET when the taskName in the EndOfStreamMessage is null. This prevents the issue with SAMZA-2300 which causes end-of-stream messages to not get properly get aggregated and then broadcast to all partitions (see SAMZA-2300 for more details). Some existing tests would fail without this change. 3. Add unique app.id in TestRunner for each test. This helps prevents clashes between different tests. For example, ControlMessageSender has a static cache keyed by stream id of intermediate streams, and multiple tests could end up using the same key in that cache. By using a unique app id, the intermediate streams are unique, so multiple tests won't use the same key in the cache. API changes (impacts testing framework only): 1. The default app.id used for tests executed by TestRunner is set to the "in-memory scope", which is a string that is randomly generated for each test. Before this change, the app.id was not set. 2. InMemoryManager only uses IncomingMessageEnvelope.END_OF_STREAM_OFFSET when the EndOfStreamMessage has a null taskName. Before this change, InMemoryManager used IncomingMessageEnvelope.END_OF_STREAM_OFFSET for all EndOfStreamMessages. Upgrade/usage instructions: 1. If tests are written using TestRunner, and those tests rely on app.id being unset, then those will need to be updated to use/read the new app.id. It isn't expected to be a common use case that tests rely on app.id. 2. If the in-memory system is being used (which includes tests written using TestRunner), and it is expected that the in-memory system sets END_OF_STREAM_OFFSET for messages when the taskName is non-null, then that usage will need to be removed. The taskName is intended for use by intermediate streams, so it shouldn't be used outside of Samza internals anyways.
… InterruptedException is thrown (apache#1511) * Samza-2665: AzureBlob SystemProducer: stop block upload retrying when InterruptedException is thrown * address comment: use String.format
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Update feature branch
state-backend-async-committo include latest changes from master