New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide support for graceful startup / shutdown #891
Comments
In case it's helpful: My Axon 2.4 application implements graceful shutdown. Obviously Axon 4 is different, but the sequence I used mixes shutdown of Axon and non-Axon components in an attempt to make the shutdown process finish as quickly as possible without losing any data. This sequence assumes a distributed command bus but local event delivery.
To make this work flawlessly, I had to add wrappers for DistributedCommandBus, SimpleCommandBus, and SimpleEventBus, and I had to make a small change to DistributedCommandBus itself. DistributedCommandBus: Changed it to throw an exception if an asynchronous command can't be sent to the cluster. (This is the behavior described in the original class's Javadoc, but it didn't actually throw.) DistributedCommandBus wrapper: Normally, delegates to DistributedCommandBus. But during shutdown, if there are no nodes available to handle a command (in other words, if this is the last node in the cluster to be shut down) and the command is asynchronous, the command gets wrapped in an event and sent to the event scheduler for later delivery. If the command has a callback then there's no choice but to dispatch it immediately. We're deferring newly-published events in that case as described above, so this is safe even after the local event bus and saga managers are shut down. SimpleCommandBus wrapper: Normally, delegates to SimpleCommandBus. But after the load factor has been set to 0 during the shutdown sequence, it instead delegates to the DistributedCommandBus wrapper so commands can be sent to other nodes. This is mostly needed to handle cases where the outgoing load factor update and an incoming command hit the network at the same time; that's not super common but happens reproducibly while the cluster is under heavy load. If we don't do this, and instead just process commands locally during shutdown, it's a correctness issue because the current node will no longer be the correct target for the command, and we thus might end up running commands for the same aggregate concurrently with another node. SimpleEventBus wrapper: Normally, delegates to SimpleEventBus. During shutdown, implements the "defer event delivery by pushing events to the event scheduler" behavior described above. Event ordering is maintained by bumping the scheduling delay up by 1ms each time an event is deferred. |
For reference, Spring Integration has a multi-phase Orderly Shutdown process: https://docs.spring.io/spring-integration/docs/5.1.6.RELEASE/reference/html/#jmx-shutdown |
Allow auto-start of components to be configured in case of Spring. Sometimes, applications need to execute upgrades before components should be started. |
Introduce the lifecycle package, containing a StartHandler and ShutdownHandler annotation. A utility class should be provided to contain some phases, as well as an exception dedicated to a failing lifecycle handler method. #891
Adjust the Configuration API to introduce an onStart/onShutdown which takes in the LifecycleHandler functional interface. The collections of start and shutdown handlers should become TreeMaps taking in the phase as the ordering parameter. A failure during start should result in throwing a LifecycleHandlerInvocationException and initiations of the shutdown process. During shutdown the failure should be logged. Lastly, init handlers no longer need to be phased, and start/shutdown handlers added out of order should be given precedence. #891
The LifecycleHandlerInspector will inspect the instances created in the Component for StartHandler/ShutdownHandler annotated methods without parameters. If those are present, they'll be registered to the Configuration's onStart/onShutdown methods. #891
The Configurer should no longer allow phased registration of a command/query handler, as the phase is now defined on a StartHandler/ShutdownHandler. #891
The ModuleConfiguration currently partakes in the start/shutdown cycle entirely, by providing a start(), shutdown() and phase() method. All impls of the ModuleConfiguration should however add LifecycleHandlers in the initialize method instead, to keep the Component + LifecycleHandlerInspector in charge of this task #891
As an implementation of the ModuleConfiguration, the AggregateConfigurer should no longer have a start/shutdown method, but instead register the aggregates command handlers as lifecycle handlers to the configuration directly. Also, the configureAggregate method can be delegated through to the registerModule method, since the DefaultConfigurer has no need to keep both a modules and aggregateConfigurers collection #891
As a ModuleConfiguration implementation, the EventProcessingModule should register start/shutdown handlers in the initialize method. Due to a discrepancy with Axon's Spring config, we add a start handler to create the EventProcessors in the earliest phase. #891
Remove all remaining implementations of ModuleConfiguration#start() and ModuleConfiguration#shutdown() in favor of adding lifecycle handlers to the Configuration upon initialization of the module #891
Align the AxonConfiguration with the new LifecycleHandler API #891
All the API changes should not have any impact on the existing tests. However, Spring's wiring logic combined with the SpringBeanParameterResolverFactory caused for issues. Additionally, tests should be introduced to cover the changed lifecycle logic in the DefaultConfigurer, as well as the removal of init-ordering tests in the DefaultConfigurerTest class #891
As is suggested in issue #713, it would be beneficial to have some additional debug logging during configuration, start up and shutdown. As the lifecycle handler approach is being revised in #891, adding logging along the way is relatively trivial. Hence, debug statements should be added when a component is created and configured, when a module configuration is configured, when start/shutdown handlers are being called and in which phase of the cycle we are. #891 & #713
Introduce a phase which is dedicated for starting up and shutting down components which deal with instructions of components #891
Change the shutdown phase of the AxonServerConnectionManager to be the last. Introduce a dedicated phase for this to be overly specific about it's use case #891
Introduce the ShutdownLatch class to be used to wait until a defined set of operations has completed #891
Drop shutdown logic from the Axon Server Event Store, as this will stop on it's own once all incoming channels for commands and queries have been closed off #891
The shutdown process within the AxonServerCommandBus can be streamlined: -Use the ShutdownLatch i.o. the boolean shuttingDown and the list of completable futures. This is should be cleaner and more efficient -Drop the disconnectAsync call entirely, as the disconnect operation will be pretty quick -Change the wait period on the CommandProcessor to first wait 5 seconds, and after that only 30 seconds more #891
The shutdown process within the AxonServerQueryBus can be streamlined: -Use the ShutdownLatch i.o. the boolean shuttingDown and the list of completable futures. This is should be cleaner and more efficient -Drop the disconnectAsync call entirely, as the disconnect operation will be pretty quick -Change the wait period on the QueryProcessor to first wait 5 seconds, and after that only 30 seconds more #891
Remove manual shutdown hook introduction of the AxonServerConnectionManager as this is now dealt with through the LifecycleHandlerInspector #891
-Remove generic wildcard mention on ParameterResolverFactory for backwards compatibility -Rename shutdown handlers on the DistributedCommandBus and CommandBusConnector, as well as adjusting the phasing. -Update the ShutdownLatch to use an ActivityHandle to end a registered activity. Rename methods and the adjust the javadoc accordingly -Create a dedicated ShutdownInProgressException to be thrown by the ShutdownLatch in case registerActivity is called on a closing latch -Add distinct command and query outbound/inbound phases -Ensure the "activity" is registered prior to any possibility of deregistering. Adjust this behaviour for both the AxonServerCommandBus and AxonServerQueryBus -Remove all subscriptions in the shutdownDispatching phase instead of the disconnect phase for the AxonServerCommandBus and AxonServerQueryBus -Fix await termination time in the AxonServerCommandBus and AxonServerQueryBus -Remove the SmartLifecycle implementation of the EventHandlerRegistrar entirely #891
-Be more precise in the Javadoc -Add ifShuttingDown(String) method to overload the ifShuttingDown (Supplier<Exception) method -Make ActivityHandle#end idempotent -Fix bug in ActivityHandle#end method; firstInvocation check wasn't used -Change usage of ifShuttingDown in the AxonServer command and query bus #891
The invocation of start/shutdown handlers will now wait indefinitely. Introducing a time out is thus necessary to ensure start/shutdown does no become an never ending process #891
-Introduce an initialize method, so that the latch can be started again -Use an atomic reference of a CompletableFuture as the latch to be thread safe -Make sure that initiateShutdown can complete the latch immediately in case no activities are present #891
Add StartHandler annotated methods to initialize the ShutdownLatch #891
Ensure the stream observers are completed upon a disconnect and unsubscribeAll #891
Use the shutdownNow method as is for existing cases. On a disconnect, we can do a shutdown + await termination process. Thus ensure we do not harshly close the stream(s) when the ShutdownHandler is called #891
Use a thead safe list to start lifecycle handlers in, as handlers can be registered in the phase being active at that point in time #891
Add an onInit method through which users can add handlers which should be called prior to starting the start-phase #891
Slight logging expansion on the Component to state the actual instance being instantiated #891
Through Spring wiring we aren't always ensured the components are instantiated through the Component class. Thus, no certainty the LifecycleHandlerInspector is invoked. Hence we should ensure that for some of the component in a Spring environment we enforce an init-handler to be added in the earliest phase possible, which simply pulls the object from the Configuration #891
[#891] Graceful Start up and Shutdown API
Currently, the shutdown sequence in the configuration api shuts down all components practically simultaneously. This may cause problems when requests are being handled, while components (like async command buses) are being shut down.
Instead, ‘edge’ components should block incoming calls and wait for running invocations to finish.
Edge components are
When those components have completed the running requests, the prepare-shutdown
sequence of other components may start. In this phase, async components need to empty their processing queues. They should not block new tasks, as other async processes may still provide tasks to complete.
Finally, components are shut down, rejecting any new requests and closing resources.
The text was updated successfully, but these errors were encountered: