Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide support for graceful startup / shutdown #891

Closed
abuijze opened this issue Nov 9, 2018 · 3 comments · Fixed by #1345
Closed

Provide support for graceful startup / shutdown #891

abuijze opened this issue Nov 9, 2018 · 3 comments · Fixed by #1345
Assignees
Labels
Priority 1: Must Highest priority. A release cannot be made if this issue isn’t resolved. Status: Resolved Use to signal that work on this issue is done. Type: Feature Use to signal an issue is completely new to the project.
Milestone

Comments

@abuijze
Copy link
Member

abuijze commented Nov 9, 2018

Currently, the shutdown sequence in the configuration api shuts down all components practically simultaneously. This may cause problems when requests are being handled, while components (like async command buses) are being shut down.
Instead, ‘edge’ components should block incoming calls and wait for running invocations to finish.
Edge components are

  • incoming connectors to the distributed command bus
  • command bus (an extra method could be made available on the configuration to provide an "edge command bus". This wraps the regular command bus and allows blocking incoming calls on shutdown)
  • tracking processors
  • connectors to 3rd party messaging systems (e.g. RabbitMQ)

When those components have completed the running requests, the prepare-shutdown
sequence of other components may start. In this phase, async components need to empty their processing queues. They should not block new tasks, as other async processes may still provide tasks to complete.

Finally, components are shut down, rejecting any new requests and closing resources.

@abuijze abuijze added this to the Release 4.1 milestone Nov 16, 2018
@abuijze abuijze added Type: Enhancement Use to signal an issue enhances an already existing feature of the project. Priority 2: Should High priority. Ideally, these issues are part of the release they’re assigned to. labels Nov 16, 2018
@sgrimm-sg
Copy link

In case it's helpful: My Axon 2.4 application implements graceful shutdown. Obviously Axon 4 is different, but the sequence I used mixes shutdown of Axon and non-Axon components in an attempt to make the shutdown process finish as quickly as possible without losing any data.

This sequence assumes a distributed command bus but local event delivery.

  1. Stop the event scheduler from polling for / delivering events. (But new events can still be scheduled.)
  2. Set the distributed command bus load factor to 0, but stay connected to the cluster. Block until the load factor change is reflected in the consistent hash.
  3. Shut down the application's HTTP service, waiting until any in-flight HTTP requests are finished. This happens after step 2 because we want any commands issued by in-flight requests to be sent to other nodes in the cluster, if any; if we did it before step 2 then in-flight requests could generate more work to do locally and the shutdown would take longer to finish.
  4. Wait for application-managed worker threads to finish.
  5. Set a flag in the event bus (we wrap the event bus in a custom class for this purpose) that causes newly-generated events to be deferred using the event scheduler rather than processed immediately. Obviously this won't be needed for tracking event processors, but some kind of deferral mechanism still will be for subscribing processors.
  6. Wait for any events that are already in flight or already queued to finish processing.
  7. Shut down all the saga managers and the event bus.
  8. Wait for the command bus to finish handling all in-flight or queued commands.
  9. Shut down the command bus.
  10. Shut down the write side of the event scheduler.

To make this work flawlessly, I had to add wrappers for DistributedCommandBus, SimpleCommandBus, and SimpleEventBus, and I had to make a small change to DistributedCommandBus itself.

DistributedCommandBus: Changed it to throw an exception if an asynchronous command can't be sent to the cluster. (This is the behavior described in the original class's Javadoc, but it didn't actually throw.)

DistributedCommandBus wrapper: Normally, delegates to DistributedCommandBus. But during shutdown, if there are no nodes available to handle a command (in other words, if this is the last node in the cluster to be shut down) and the command is asynchronous, the command gets wrapped in an event and sent to the event scheduler for later delivery. If the command has a callback then there's no choice but to dispatch it immediately. We're deferring newly-published events in that case as described above, so this is safe even after the local event bus and saga managers are shut down.

SimpleCommandBus wrapper: Normally, delegates to SimpleCommandBus. But after the load factor has been set to 0 during the shutdown sequence, it instead delegates to the DistributedCommandBus wrapper so commands can be sent to other nodes. This is mostly needed to handle cases where the outgoing load factor update and an incoming command hit the network at the same time; that's not super common but happens reproducibly while the cluster is under heavy load. If we don't do this, and instead just process commands locally during shutdown, it's a correctness issue because the current node will no longer be the correct target for the command, and we thus might end up running commands for the same aggregate concurrently with another node.

SimpleEventBus wrapper: Normally, delegates to SimpleEventBus. During shutdown, implements the "defer event delivery by pushing events to the event scheduler" behavior described above. Event ordering is maintained by bumping the scheduling delay up by 1ms each time an event is deferred.

@smcvb smcvb modified the milestones: Release 4.1, Release 4.1.1 Feb 21, 2019
@smcvb smcvb modified the milestones: Release 4.1.1, Release 4.2 Mar 26, 2019
@smcvb smcvb removed this from the Release 4.2 milestone Jun 25, 2019
@srnm
Copy link

srnm commented Jun 26, 2019

For reference, Spring Integration has a multi-phase Orderly Shutdown process: https://docs.spring.io/spring-integration/docs/5.1.6.RELEASE/reference/html/#jmx-shutdown

@smcvb smcvb added this to the Release 4.3 milestone Jul 22, 2019
@m1l4n54v1c m1l4n54v1c added Priority 1: Must Highest priority. A release cannot be made if this issue isn’t resolved. and removed Priority 2: Should High priority. Ideally, these issues are part of the release they’re assigned to. labels Sep 15, 2019
@m1l4n54v1c m1l4n54v1c changed the title Provide support for graceful shutdown Provide support for graceful startup / shutdown Sep 15, 2019
@m1l4n54v1c
Copy link
Member

Allow auto-start of components to be configured in case of Spring. Sometimes, applications need to execute upgrades before components should be started.

@smcvb smcvb self-assigned this Jan 14, 2020
@smcvb smcvb added the Status: In Progress Use to signal this issue is actively worked on. label Jan 14, 2020
smcvb added a commit that referenced this issue Feb 6, 2020
Introduce the lifecycle package, containing a StartHandler and
ShutdownHandler annotation. A utility class should be provided to
contain some phases, as well as an exception dedicated to a failing
lifecycle handler method.

#891
smcvb added a commit that referenced this issue Feb 6, 2020
Adjust the Configuration API to introduce an onStart/onShutdown which
takes in the LifecycleHandler functional interface. The collections of
start and shutdown handlers should become TreeMaps taking in the phase
as the ordering parameter. A failure during start should result in
throwing a LifecycleHandlerInvocationException and initiations of the
shutdown process. During shutdown the failure should be logged.
Lastly, init handlers no longer need to be phased, and start/shutdown
handlers added out of order should be given precedence.

#891
smcvb added a commit that referenced this issue Feb 6, 2020
The LifecycleHandlerInspector will inspect the instances created in the
Component for StartHandler/ShutdownHandler annotated methods without
parameters. If those are present, they'll be registered to the
Configuration's onStart/onShutdown methods.

#891
smcvb added a commit that referenced this issue Feb 6, 2020
The Configurer should no longer allow phased registration of a
command/query handler, as the phase is now defined on a
StartHandler/ShutdownHandler.

#891
smcvb added a commit that referenced this issue Feb 6, 2020
The ModuleConfiguration currently partakes in the start/shutdown cycle
entirely, by providing a start(), shutdown() and phase() method. All
impls of the ModuleConfiguration should however add LifecycleHandlers in
 the initialize method instead, to keep the Component +
 LifecycleHandlerInspector in charge of this task

#891
smcvb added a commit that referenced this issue Feb 6, 2020
As an implementation of the ModuleConfiguration, the AggregateConfigurer
 should no longer have a start/shutdown method, but instead register the
 aggregates command handlers as lifecycle handlers to the configuration
 directly. Also, the configureAggregate method can be delegated through
 to the registerModule method, since the DefaultConfigurer has no need
 to keep both a modules and aggregateConfigurers collection

#891
smcvb added a commit that referenced this issue Feb 6, 2020
As a ModuleConfiguration implementation, the EventProcessingModule
should register start/shutdown handlers in the initialize method. Due to
 a discrepancy with Axon's Spring config, we add a start handler to
 create the EventProcessors in the earliest phase.

#891
smcvb added a commit that referenced this issue Feb 6, 2020
Remove all remaining implementations of ModuleConfiguration#start() and
ModuleConfiguration#shutdown() in favor of adding lifecycle handlers to
the Configuration upon initialization of the module

#891
smcvb added a commit that referenced this issue Feb 6, 2020
Align the AxonConfiguration with the new LifecycleHandler API

#891
smcvb added a commit that referenced this issue Feb 6, 2020
All the API changes should not have any impact on the existing tests.
However, Spring's wiring logic combined with the
SpringBeanParameterResolverFactory caused for issues. Additionally,
tests should be introduced to cover the changed lifecycle logic in the
DefaultConfigurer, as well as the removal of init-ordering tests in the
DefaultConfigurerTest class

#891
smcvb added a commit that referenced this issue Feb 13, 2020
As is suggested in issue #713, it would be beneficial to have some
additional debug logging during configuration, start up and shutdown. As
 the lifecycle handler approach is being revised in #891, adding logging
 along the way is relatively trivial. Hence, debug statements should be
 added when a component is created and configured, when a module
 configuration is configured, when start/shutdown handlers are being
  called and in which phase of the cycle we are.

#891 & #713
smcvb added a commit that referenced this issue Feb 13, 2020
Slight logging improvements

#891
smcvb added a commit that referenced this issue Feb 14, 2020
Introduce a phase which is dedicated for starting up and shutting down
components which deal with instructions of components

#891
smcvb added a commit that referenced this issue Feb 14, 2020
Change the shutdown phase of the AxonServerConnectionManager to be the
last. Introduce a dedicated phase for this to be overly specific about
it's use case

#891
smcvb added a commit that referenced this issue Feb 14, 2020
Introduce the ShutdownLatch class to be used to wait until a defined set
 of operations has completed

#891
smcvb added a commit that referenced this issue Feb 14, 2020
Drop shutdown logic from the Axon Server Event Store, as this will stop
on it's own once all incoming channels for commands and queries have
been closed off

#891
smcvb added a commit that referenced this issue Feb 14, 2020
The shutdown process within the AxonServerCommandBus can be streamlined:
-Use the ShutdownLatch i.o. the boolean shuttingDown and the list of
completable futures. This is should be cleaner and more efficient
-Drop the disconnectAsync call entirely, as the disconnect operation
will be pretty quick
-Change the wait period on the CommandProcessor to first wait 5 seconds,
 and after that only 30 seconds more

#891
smcvb added a commit that referenced this issue Feb 14, 2020
The shutdown process within the AxonServerQueryBus can be streamlined:
-Use the ShutdownLatch i.o. the boolean shuttingDown and the list of
completable futures. This is should be cleaner and more efficient
-Drop the disconnectAsync call entirely, as the disconnect operation
will be pretty quick
-Change the wait period on the QueryProcessor to first wait 5 seconds,
 and after that only 30 seconds more

#891
smcvb added a commit that referenced this issue Feb 14, 2020
Remove manual shutdown hook introduction of the
AxonServerConnectionManager as this is now dealt with through the
LifecycleHandlerInspector

#891
smcvb added a commit that referenced this issue Feb 17, 2020
-Remove generic wildcard mention on ParameterResolverFactory for
backwards compatibility
-Rename shutdown handlers on the DistributedCommandBus and
CommandBusConnector, as well as adjusting the phasing.
-Update the ShutdownLatch to use an ActivityHandle to end a registered
activity. Rename methods and the adjust the javadoc accordingly
-Create a dedicated ShutdownInProgressException to be thrown by the
ShutdownLatch in case registerActivity is called on a closing latch
-Add distinct command and query outbound/inbound phases
-Ensure the "activity" is registered prior to any possibility of
deregistering. Adjust this behaviour for both the AxonServerCommandBus
and AxonServerQueryBus
-Remove all subscriptions in the shutdownDispatching phase instead of
the disconnect phase for the AxonServerCommandBus and AxonServerQueryBus
-Fix await termination time in the AxonServerCommandBus and
AxonServerQueryBus
-Remove the SmartLifecycle implementation of the EventHandlerRegistrar
entirely

#891
smcvb added a commit that referenced this issue Feb 18, 2020
-Be more precise in the Javadoc
-Add ifShuttingDown(String) method to overload the ifShuttingDown
(Supplier<Exception) method
-Make ActivityHandle#end idempotent
-Fix bug in ActivityHandle#end method; firstInvocation check wasn't used
-Change usage of ifShuttingDown in the AxonServer command and query bus

#891
smcvb added a commit that referenced this issue Feb 18, 2020
The invocation of start/shutdown handlers will now wait indefinitely.
Introducing a time out is thus necessary to ensure start/shutdown does
no become an never ending process

#891
smcvb added a commit that referenced this issue Feb 18, 2020
-Introduce an initialize method, so that the latch can be started again
-Use an atomic reference of a CompletableFuture as the latch to be
thread safe
-Make sure that initiateShutdown can complete the latch immediately in
case no activities are present

#891
smcvb added a commit that referenced this issue Feb 18, 2020
Add StartHandler annotated methods to initialize the ShutdownLatch

#891
smcvb added a commit that referenced this issue Feb 18, 2020
Ensure the stream observers are completed upon a disconnect and
unsubscribeAll

#891
smcvb added a commit that referenced this issue Feb 18, 2020
Use the shutdownNow method as is for existing cases. On a disconnect, we
 can do a shutdown + await termination process. Thus ensure we do not
 harshly close the stream(s) when the ShutdownHandler is called

#891
smcvb added a commit that referenced this issue Feb 18, 2020
Use a thead safe list to start lifecycle handlers in, as handlers can be
 registered in the phase being active at that point in time

#891
smcvb added a commit that referenced this issue Feb 18, 2020
Add an onInit method through which users can add handlers which should
be called prior to starting the start-phase

#891
smcvb added a commit that referenced this issue Feb 19, 2020
Slight logging expansion on the Component to state the actual instance
being instantiated

#891
smcvb added a commit that referenced this issue Feb 19, 2020
Through Spring wiring we aren't always ensured the components are
instantiated through the Component class. Thus, no certainty the
LifecycleHandlerInspector is invoked. Hence we should ensure that for
some of the component in a Spring environment we enforce an init-handler
 to be added in the earliest phase possible, which simply pulls the
 object from the Configuration

#891
smcvb added a commit that referenced this issue Feb 19, 2020
Rename onInit to onInitialize

#891
smcvb added a commit that referenced this issue Feb 19, 2020
[#891] Graceful Start up and Shutdown API
@smcvb smcvb added Status: Resolved Use to signal that work on this issue is done. and removed Status: In Progress Use to signal this issue is actively worked on. labels Feb 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority 1: Must Highest priority. A release cannot be made if this issue isn’t resolved. Status: Resolved Use to signal that work on this issue is done. Type: Feature Use to signal an issue is completely new to the project.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants