Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lifecycle stage refactor to ensure proper start and stop ordering of servers and announcements #7234

Merged
merged 2 commits into from
Mar 12, 2019

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Mar 11, 2019

This fixes an issue introduced by #7215. HttpServerInventoryView uses the 'discovery' announcement so would correctly identify servers that disappear, but AbstractCuratorServerInventoryView watches another announcement that stops later, causing the 'server disappeared' event to happen after jetty has stopped.

This PR resolves this issue by slightly reworking the Lifecycle.Stage enumeration and renaming them to be more relevant to their intended usage, to quote the javadocs:

Conceptually, the stages have the following purposes:

  • Stage.INIT: Currently, this stage is used exclusively for log4j initialization, since almost everything needs logging and it should be the last thing to shutdown. Any sort of bootstrapping object that provides something that should be initialized before nearly all other Lifecycle objects could also belong here (if it doesn't need logging during start or stop).
  • Stage.NORMAL: This is the default stage. Most objects will probably make the most sense to be registered at this level, with the exception of any form of server or service announcements
  • Stage.SERVER: This lifecycle stage is intended for all 'server' objects, and currently only contains the Jetty module, but any sort of 'server' that expects most Lifecycle objects to be initialized by the time it starts, and still available at the time it stops can logically live in this stage.
  • Stage.ANNOUNCEMENTS: Any object which announces to a cluster this servers location belongs in this stage. By being last, we can be sure that all servers are initialized before we advertise the endpoint locations, and also can be sure that we un-announce these advertisements prior to the Stage.SERVER objects stop.

This has the very nice behavior that all announcements can be unannounced prior to stopping the server, meaning druid.server.http.unannouncePropagationDelay should apply to all announcements to give adequate time for the rest of the cluster to notice all announcements are gone.

Logs of shutdown after this PR for curator based segment management:

historical:
2019-03-11T22:03:59,561 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Lifecycle [module] running shutdown hook
2019-03-11T22:03:59,563 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
2019-03-11T22:03:59,565 INFO [Thread-60] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannouncing [DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/historical', host='localhost', bindOnHost=false, port=-1, plaintextPort=8083, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='HISTORICAL', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=10000000000, type=historical, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}].
2019-03-11T22:03:59,565 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/internal-discovery/HISTORICAL/localhost:8083]
2019-03-11T22:03:59,587 INFO [Thread-60] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced [DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/historical', host='localhost', bindOnHost=false, port=-1, plaintextPort=8083, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='HISTORICAL', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=10000000000, type=historical, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}].
2019-03-11T22:03:59,587 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.announcement.Announcer.stop()] on object[org.apache.druid.curator.announcement.Announcer@11a3a45f].
2019-03-11T22:03:59,587 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - Stopping announcer
2019-03-11T22:03:59,589 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/localhost:8083]
2019-03-11T22:03:59,592 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/listeners/lookups/__default/http:localhost:8083]

broker (curator segment discovery):
2019-03-11T22:03:59,592 INFO [ServerInventoryView-0] org.apache.druid.client.BatchServerInventoryView - Server Disappeared[DruidServerMetadata{name='localhost:8083', hostAndPort='localhost:8083', hostAndTlsPort='null', maxSize=10000000000, tier='_default_tier', type=historical, priority=0}]
2019-03-11T22:03:59,593 INFO [BrokerServerView-0] org.apache.druid.sql.calcite.schema.DruidSchema - Removed all metadata for dataSource[wikiticker].

historical:
2019-03-11T22:03:59,595 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/segments/localhost:8083/localhost:8083_historical__default_tier_2019-03-11T22:03:25.234Z_faed3053e88042fcac4096395c3a58d70]
2019-03-11T22:03:59,602 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]
2019-03-11T22:03:59,602 INFO [Thread-60] org.apache.druid.server.initialization.jetty.JettyServerModule - Waiting 15000 ms for unannouncement to propagate.
2019-03-11T22:04:14,603 INFO [Thread-60] org.apache.druid.server.initialization.jetty.JettyServerModule - Stopping Jetty Server...
2019-03-11T22:04:14,614 INFO [Thread-60] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@1efac5b9{HTTP/1.1,[http/1.1]}{0.0.0.0:8083}
2019-03-11T22:04:14,614 INFO [Thread-60] org.eclipse.jetty.server.session - node0 Stopped scavenging
2019-03-11T22:04:14,616 INFO [Thread-60] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@7f0f84d4{/,null,UNAVAILABLE}
2019-03-11T22:04:14,619 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL]
2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer.stop()] on object[org.apache.druid.query.lookup.LookupResourceListenerAnnouncer@115ca7de].
2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer - Unannouncing start time on [/druid/listeners/lookups/__default/http:localhost:8083]
2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.query.lookup.LookupReferencesManager.stop()] on object[org.apache.druid.query.lookup.LookupReferencesManager@5ad1904f].
2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.query.lookup.LookupReferencesManager - LookupReferencesManager is stopping.
2019-03-11T22:04:14,620 INFO [LookupReferencesManager-MainThread] org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop exited, Lookup notices are not handled anymore.
2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.query.lookup.LookupReferencesManager - Closing lookup [simple-wiki]
2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.server.lookup.namespace.cache.CacheScheduler - Closing namespace [UriExtractionNamespace{uri=file:/Users/clint/workspace/data/druid/lookups/wiki-simple.json, uriPrefix=null, namespaceParseSpec=ObjectMapperFlatDataParser{}, fileRegex='null', pollPeriod=PT5M}] : org.apache.druid.server.lookup.namespace.cache.CacheScheduler$EntryImpl@3ec01ef5
2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.query.lookup.LookupReferencesManager - LookupReferencesManager is stopped.
2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.discovery.DruidLeaderClient.stop()] on object[org.apache.druid.discovery.DruidLeaderClient@642a16aa].
2019-03-11T22:04:14,620 INFO [Thread-60] org.apache.druid.discovery.DruidLeaderClient - Stopped.
2019-03-11T22:04:14,621 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[org.apache.druid.curator.discovery.ServerDiscoverySelector@77f991c].
2019-03-11T22:04:14,622 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider.stop()] on object[org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider@6968c1d6].
2019-03-11T22:04:14,623 INFO [Thread-60] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - stopping
2019-03-11T22:04:14,623 INFO [Thread-60] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - stopped
2019-03-11T22:04:14,623 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.http.client.NettyHttpClient.stop()] on object[org.apache.druid.java.util.http.client.NettyHttpClient@27abb6ca].
2019-03-11T22:04:14,641 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.server.coordination.ZkCoordinator.stop()] on object[org.apache.druid.server.coordination.ZkCoordinator@6413d7e7].
2019-03-11T22:04:14,641 INFO [Thread-60] org.apache.druid.server.coordination.ZkCoordinator - Stopping ZkCoordinator for [DruidServerMetadata{name='localhost:8083', hostAndPort='localhost:8083', hostAndTlsPort='null', maxSize=10000000000, tier='_default_tier', type=historical, priority=0}]
2019-03-11T22:04:14,641 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.server.coordination.SegmentLoadDropHandler.stop()] on object[org.apache.druid.server.coordination.SegmentLoadDropHandler@6c479fdf].
2019-03-11T22:04:14,641 INFO [Thread-60] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopping...
2019-03-11T22:04:14,642 INFO [Thread-60] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopped.
2019-03-11T22:04:14,642 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.metrics.MonitorScheduler.stop()] on object[org.apache.druid.java.util.metrics.MonitorScheduler@3f5156a6].
2019-03-11T22:04:14,642 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.client.cache.CaffeineCache.close() throws java.io.IOException] on object[org.apache.druid.client.cache.CaffeineCache@2b867cd3].
2019-03-11T22:04:14,644 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.emitter.service.ServiceEmitter.close() throws java.io.IOException] on object[ServiceEmitter{serviceDimensions={service=druid/historical, host=localhost:8083, version=}, emitter=LoggingEmitter{log=Logger{name=[org.apache.druid.java.util.emitter.core.LoggingEmitter], class[class org.apache.logging.slf4j.Log4jLogger]}, level=INFO}}].
2019-03-11T22:04:14,644 INFO [Thread-60] org.apache.druid.java.util.emitter.core.LoggingEmitter - Close: started [false]
2019-03-11T22:04:14,644 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.emitter.core.LoggingEmitter.close()] on object[LoggingEmitter{log=Logger{name=[org.apache.druid.java.util.emitter.core.LoggingEmitter], class[class org.apache.logging.slf4j.Log4jLogger]}, level=INFO}].
2019-03-11T22:04:14,644 INFO [Thread-60] org.apache.druid.curator.CuratorModule - Stopping Curator
2019-03-11T22:04:14,645 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
2019-03-11T22:04:14,654 INFO [Thread-60] org.apache.zookeeper.ZooKeeper - Session: 0x10003528b320020 closed
2019-03-11T22:04:14,654 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x10003528b320020
2019-03-11T22:04:14,654 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [INIT]
2019-03-11T22:04:14,655 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner.stop()] on object[org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner@2d760326].

and for http segment management:

historical:
2019-03-11T22:07:29,006 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Lifecycle [module] running shutdown hook
2019-03-11T22:07:29,008 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
2019-03-11T22:07:29,010 INFO [Thread-60] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannouncing [DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/historical', host='localhost', bindOnHost=false, port=-1, plaintextPort=8083, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='HISTORICAL', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=10000000000, type=historical, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}].
2019-03-11T22:07:29,010 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/internal-discovery/HISTORICAL/localhost:8083]
2019-03-11T22:07:29,034 INFO [Thread-60] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced [DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/historical', host='localhost', bindOnHost=false, port=-1, plaintextPort=8083, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='HISTORICAL', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=10000000000, type=historical, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}].
2019-03-11T22:07:29,034 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.announcement.Announcer.stop()] on object[org.apache.druid.curator.announcement.Announcer@5a4dda2].
2019-03-11T22:07:29,034 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - Stopping announcer
2019-03-11T22:07:29,036 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/localhost:8083]

broker (http segment discovery):
2019-03-11T22:07:29,032 INFO [CuratorDruidNodeDiscoveryProvider-ListenerExecutor] org.apache.druid.client.HttpServerInventoryView - Server[localhost:8083] disappeared.
2019-03-11T22:07:29,032 INFO [CuratorDruidNodeDiscoveryProvider-ListenerExecutor] org.apache.druid.server.coordination.ChangeRequestHttpSyncer - Stopped ChangeRequestHttpSyncer[http://localhost:8083/_1552342037182].
2019-03-11T22:07:29,038 INFO [BrokerServerView-0] org.apache.druid.sql.calcite.schema.DruidSchema - Removed all metadata for dataSource[wikiticker].

historical:
2019-03-11T22:07:29,039 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/listeners/lookups/__default/http:localhost:8083]
2019-03-11T22:07:29,042 INFO [Thread-60] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/segments/localhost:8083/localhost:8083_historical__default_tier_2019-03-11T22:07:05.488Z_02242ea50c0f44fda31d2379ca1db7260]
2019-03-11T22:07:29,045 INFO [Thread-60] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]
2019-03-11T22:07:29,045 INFO [Thread-60] org.apache.druid.server.initialization.jetty.JettyServerModule - Waiting 15000 ms for unannouncement to propagate.
2019-03-11T22:07:44,049 INFO [Thread-60] org.apache.druid.server.initialization.jetty.JettyServerModule - Stopping Jetty Server...
2019-03-11T22:07:44,061 INFO [Thread-60] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@7e5efcab{HTTP/1.1,[http/1.1]}{0.0.0.0:8083}

@fjy fjy added this to the 0.15.0 milestone Mar 12, 2019
Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LG but I am confused about what's going on with the DerivativeDataSourceManager.

@@ -65,7 +65,7 @@
* Read and store derivatives information from dataSource table frequently.
* When optimize query, DerivativesManager offers the information about derivatives.
*/
@ManageLifecycleLast
@ManageLifecycleAnnouncements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this in @ManageLifecycleAnnouncements? It doesn't seem to be doing announcements?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I'll see if I can dig up why it was marked @ManageLifecycleLast prior to this PR...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason in the initial PR, it was already in the first commit and not an explicit response to review afaict. Looking over what it is doing, it doesn't seem like it would need to be in the last stage, I will move it to just @ManageLifecycle

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, moved it to Lifecycle.Stage.NORMAL, thanks for review!

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell if this approach is clunky or elegant. Let's go with elegant :)

@fjy fjy merged commit 4d3987c into apache:master Mar 12, 2019
@clintropolis clintropolis deleted the jetty-lifecycle-stuff-pt2 branch March 12, 2019 18:43
gianm pushed a commit to implydata/druid-public that referenced this pull request Mar 14, 2019
…servers and announcements (apache#7234)

* lifecycle stage refactor to ensure proper ordering of servers and announcements

* move DerivativeDataSourceManager to Lifecycle.Stage.NORMAL
gianm pushed a commit to implydata/druid-public that referenced this pull request Apr 8, 2019
…servers and announcements (apache#7234)

* lifecycle stage refactor to ensure proper ordering of servers and announcements

* move DerivativeDataSourceManager to Lifecycle.Stage.NORMAL
gianm pushed a commit to implydata/druid-public that referenced this pull request Apr 14, 2019
…servers and announcements (apache#7234)

* lifecycle stage refactor to ensure proper ordering of servers and announcements

* move DerivativeDataSourceManager to Lifecycle.Stage.NORMAL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants