Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-10232 - Add timeouts to the channel CacheClientChannelInitialize… #6221

Closed
wants to merge 4 commits into from

Conversation

thenatog
Copy link
Contributor

…r for the NettyDistributedMapCacheClient.

Summary

NIFI-10232

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 8
    • JDK 11
    • JDK 17

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@thenatog
Copy link
Contributor Author

I believe there may be another issue related to the DistributedMapCache which is causing NiFi to hang on shutdown, so I will investigate the cause of that.. this is what is showing in the logs with a flow only containing the server and client controller services and nothing else:

2022-07-18 19:12:46,945 INFO [main] org.apache.nifi.bootstrap.Command Apache NiFi has accepted the Shutdown Command and is shutting down now
2022-07-18 19:12:47,229 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [60804] shutdown in progress...
2022-07-18 19:12:49,256 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [60804] shutdown in progress...
2022-07-18 19:12:51,283 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [60804] shutdown in progress...
2022-07-18 19:12:53,310 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [60804] shutdown in progress...
2022-07-18 19:12:55,341 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [60804] shutdown in progress...
2022-07-18 19:12:57,404 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [60804] shutdown in progress...
2022-07-18 19:12:59,430 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [60804] shutdown in progress...
2022-07-18 19:13:01,458 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [60804] shutdown in progress...
2022-07-18 19:13:03,485 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [60804] shutdown in progress...
2022-07-18 19:13:05,514 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [60804] shutdown in progress...
2022-07-18 19:13:07,540 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [60804] shutdown in progress...
2022-07-18 19:13:07,565 WARN [main] org.apache.nifi.bootstrap.Command NiFi PID [60804] shutdown not completed after 20 seconds: Killing process
2022-07-18 19:13:07,601 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [60804] shutdown completed

@thenatog
Copy link
Contributor Author

I was trying to track down the issue of NiFi not shutting down cleanly when a user disables/enables the DistributedMapCacheClientService, but I did not come up with anything definitive. I've added some debug logging for future troubleshooting, and noted that it seems that NiFi does appear to be shutting down fine, but for whatever reason the bootstrap service itself is not being notified.

The log in NiFi shows:
2022-07-22 13:08:40,008 INFO [pool-2-thread-2] org.apache.nifi.BootstrapListener Received SHUTDOWN request from Bootstrap

2022-07-22 13:08:40,009 INFO [pool-2-thread-2] org.apache.nifi.NiFi Application Server shutdown started
2022-07-22 13:08:40,013 INFO [pool-2-thread-2] o.eclipse.jetty.server.AbstractConnector Stopped ServerConnector@1788cb61{SSL, (ssl, http/1.1)}{127.0.0.1:8443}
2022-07-22 13:08:40,013 INFO [pool-2-thread-2] org.eclipse.jetty.server.session node0 Stopped scavenging
2022-07-22 13:08:40,018 INFO [pool-2-thread-2] o.e.jetty.server.handler.ContextHandler Stopped o.e.j.w.WebAppContext@27e44e9c{nifi-error,/,null,STOPPED}{./work/nar/extensions/nifi-server-nar-1.17.0-SNAPSHOT.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-error-1.17.0-SNAPSHOT.war}
2022-07-22 13:08:40,023 INFO [pool-2-thread-2] o.e.jetty.server.handler.ContextHandler Stopped o.e.j.w.WebAppContext@9408857{nifi-docs,/nifi-docs,null,STOPPED}{./work/nar/extensions/nifi-server-nar-1.17.0-SNAPSHOT.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-docs-1.17.0-SNAPSHOT.war}
2022-07-22 13:08:40,028 INFO [pool-2-thread-2] o.e.jetty.server.handler.ContextHandler Stopped o.e.j.w.WebAppContext@30508066{nifi-content-viewer,/nifi-content-viewer,null,STOPPED}{./work/nar/extensions/nifi-server-nar-1.17.0-SNAPSHOT.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-content-viewer-1.17.0-SNAPSHOT.war}
2022-07-22 13:08:40,036 INFO [pool-2-thread-2] o.a.n.w.c.ApplicationStartupContextListener Flow Service shutdown started
2022-07-22 13:08:40,042 INFO [pool-2-thread-2] o.apache.nifi.controller.FlowController Initiated graceful shutdown of flow controller...waiting up to 10 seconds
2022-07-22 13:08:40,048 DEBUG [pool-2-thread-2] o.a.n.d.c.s.m.DistributedMapCacheServer DistributedMapCacheServer[id=26971b7b-0182-1000-8d1c-ab19ea5eade7] Shutting down distributed cache server
2022-07-22 13:08:40,048 INFO [pool-2-thread-2] o.a.n.d.c.s.map.StandardMapCacheServer Shutting down StandardMapCacheServer
2022-07-22 13:08:40,159 INFO [pool-2-thread-2] o.a.n.d.c.s.m.DistributedMapCacheServer DistributedMapCacheServer[id=26971b7b-0182-1000-8d1c-ab19ea5eade7] Stopped Cache Server Port [4557]
2022-07-22 13:08:40,159 DEBUG [pool-2-thread-2] o.a.n.d.c.s.m.DistributedMapCacheServer DistributedMapCacheServer[id=26971b7b-0182-1000-8d1c-ab19ea5eade7] Successfully shut down distributed cache server
2022-07-22 13:08:40,159 DEBUG [pool-2-thread-2] o.a.n.d.c.c.DistributedMapCacheClientService DistributedMapCacheClientService[id=2697027b-0182-1000-d10e-d28c48c2cbb9] Disabling Map Cache Client Service
2022-07-22 13:08:40,199 ERROR [pool-2-thread-2] o.a.n.d.c.c.DistributedMapCacheClientService DistributedMapCacheClientService[id=2697027b-0182-1000-d10e-d28c48c2cbb9] Failed to contact remote server when closing
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:4557
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
2022-07-22 13:08:40,204 INFO [pool-2-thread-2] o.apache.nifi.controller.FlowController Controller has been terminated successfully.
2022-07-22 13:08:40,207 INFO [pool-2-thread-2] o.a.n.w.c.ApplicationStartupContextListener Flow Service shutdown completed
2022-07-22 13:08:40,207 INFO [pool-2-thread-2] o.e.j.s.handler.ContextHandler._nifi_api Closing Spring root WebApplicationContext
2022-07-22 13:08:40,273 INFO [pool-2-thread-2] o.e.jetty.server.handler.ContextHandler Stopped o.e.j.w.WebAppContext@646d810b{nifi-api,/nifi-api,null,STOPPED}{./work/nar/extensions/nifi-server-nar-1.17.0-SNAPSHOT.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-api-1.17.0-SNAPSHOT.war}
2022-07-22 13:08:40,376 INFO [pool-2-thread-2] o.e.jetty.server.handler.ContextHandler Stopped o.e.j.w.WebAppContext@33e3bd2e{nifi,/nifi,null,STOPPED}{./work/nar/extensions/nifi-server-nar-1.17.0-SNAPSHOT.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-ui-1.17.0-SNAPSHOT.war}
2022-07-22 13:08:40,397 INFO [pool-2-thread-2] o.e.jetty.server.handler.ContextHandler Stopped o.e.j.w.WebAppContext@3711c71c{/nifi-jolt-transform-json-ui-1.17.0-SNAPSHOT,/nifi-jolt-transform-json-ui-1.17.0-SNAPSHOT,null,STOPPED}{./work/nar/extensions/nifi-standard-nar-1.17.0-SNAPSHOT.nar-unpacked/NAR-INF/bundled-dependencies/nifi-jolt-transform-json-ui-1.17.0-SNAPSHOT.war}
2022-07-22 13:08:40,408 INFO [pool-2-thread-2] o.e.jetty.server.handler.ContextHandler Stopped o.e.j.w.WebAppContext@1e495414{/nifi-update-attribute-ui-1.17.0-SNAPSHOT,/nifi-update-attribute-ui-1.17.0-SNAPSHOT,null,STOPPED}{./work/nar/extensions/nifi-update-attribute-nar-1.17.0-SNAPSHOT.nar-unpacked/NAR-INF/bundled-dependencies/nifi-update-attribute-ui-1.17.0-SNAPSHOT.war}
2022-07-22 13:08:40,413 INFO [pool-2-thread-2] o.e.jetty.server.handler.ContextHandler Stopped o.e.j.w.WebAppContext@72c4a3aa{nifi-standard-content-viewer,/nifi-standard-content-viewer-1.17.0-SNAPSHOT,null,STOPPED}{./work/nar/extensions/nifi-standard-nar-1.17.0-SNAPSHOT.nar-unpacked/NAR-INF/bundled-dependencies/nifi-standard-content-viewer-1.17.0-SNAPSHOT.war}
2022-07-22 13:08:40,414 INFO [pool-2-thread-2] org.apache.nifi.nar.NarAutoLoaderTask Stopping NAR Auto-loader
2022-07-22 13:08:40,414 INFO [pool-2-thread-2] org.apache.nifi.nar.NarAutoLoader NAR Auto-Loader stopped
2022-07-22 13:08:40,415 INFO [pool-2-thread-2] o.a.n.f.r.CompositeExternalResourceProviderService External Resource Provider Service is stopped
2022-07-22 13:08:40,415 INFO [pool-2-thread-2] org.apache.nifi.NiFi Application Server shutdown completed

and the bootstrap log on the console shows:

2022-07-22 13:08:39,991 INFO [main] o.a.n.b.NotificationServiceManager Successfully loaded the following 0 services: []
2022-07-22 13:08:39,994 INFO [main] org.apache.nifi.bootstrap.RunNiFi Registered no Notification Services for Notification Type NIFI_STARTED
2022-07-22 13:08:39,994 INFO [main] org.apache.nifi.bootstrap.RunNiFi Registered no Notification Services for Notification Type NIFI_STOPPED
2022-07-22 13:08:39,994 INFO [main] org.apache.nifi.bootstrap.RunNiFi Registered no Notification Services for Notification Type NIFI_DIED
2022-07-22 13:08:40,009 INFO [main] org.apache.nifi.bootstrap.Command Apache NiFi has accepted the Shutdown Command and is shutting down now
2022-07-22 13:08:40,284 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [75270] shutdown in progress...
2022-07-22 13:08:42,299 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [75270] shutdown in progress...
2022-07-22 13:08:44,312 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [75270] shutdown in progress...
2022-07-22 13:08:46,328 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [75270] shutdown in progress...
2022-07-22 13:08:48,353 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [75270] shutdown in progress...
2022-07-22 13:08:50,365 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [75270] shutdown in progress...
2022-07-22 13:08:52,387 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [75270] shutdown in progress...
2022-07-22 13:08:54,402 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [75270] shutdown in progress...
2022-07-22 13:08:56,415 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [75270] shutdown in progress...
2022-07-22 13:08:58,431 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [75270] shutdown in progress...
2022-07-22 13:09:00,446 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [75270] shutdown in progress...
2022-07-22 13:09:00,459 WARN [main] org.apache.nifi.bootstrap.Command NiFi PID [75270] shutdown not completed after 20 seconds: Killing process
2022-07-22 13:09:00,472 INFO [main] org.apache.nifi.bootstrap.Command NiFi PID [75270] shutdown completed
2022-07-22 13:09:00,990 INFO [main] org.apache.nifi.bootstrap.RunNiFi Status File no longer exists. Will not restart NiFi

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements @thenatog! Adding OnShutdown to DistributedCacheServer and DistributedCacheMapClientService should improve standard behavior, and applying the communications timeout to write and idle operations looks good. +1 merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants