Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[馃悰 Bug]: Grid 4 - Selenium hub crashes after a while #1439

Closed
lszasz opened this issue Oct 27, 2021 · 16 comments
Closed

[馃悰 Bug]: Grid 4 - Selenium hub crashes after a while #1439

lszasz opened this issue Oct 27, 2021 · 16 comments

Comments

@lszasz
Copy link

lszasz commented Oct 27, 2021

What happened?

We've been using Selenium Grid 3 for years and we are running small tests frequently. We have a hub and 10 nodes on the same machine which we start and stop once a day. After upgrading to Grid 4 the tests started failing after about 40 minutes most probably due to memory problems. We have reproduced this constantly.
Could you please look into it?

memory-usage

Command used to start Selenium Grid with Docker

version: '3'
services:
   seleniumhub:
       image: selenium/hub
       container_name: seleniumhub
       ports:
             - "4442:4442"
             - "4443:4443"
             - "4444:4444"
       volumes:
             - /dev/shm:/dev/shm
       environment:
             - GRID_TIMEOUT=340
             - GRID_BROWSER_TIMEOUT=320

   chrome:
       image: selenium/node-chrome
       shm_size: 512m
       depends_on:
             - seleniumhub
       volumes:
             - /tmp/automation:/home/seluser/Downloads
             - /dev/shm:/dev/shm

       environment:
        - SCREEN_WIDTH=1920
        - SCREEN_HEIGHT=1080
        - TZ=Europe/Bucharest
        - SE_EVENT_BUS_HOST=seleniumhub
        - SE_EVENT_BUS_PUBLISH_PORT=4442
        - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
        - SE_NODE_MAX_SESSIONS=1

Relevant log output

09:25:13.536 INFO [GridModel.setAvailability] - Switching node 7bf35e99-1aef-4bc6-9b3d-54f3c1d41db9 (uri: http://172.23.0.10:5555) from DOWN to UP
09:25:15.410 INFO [GridModel.setAvailability] - Switching node 2d9bb0bc-6489-4f86-a0a5-7f4ea279e8b0 (uri: http://172.23.0.9:5555) from DOWN to UP
09:25:16.639 INFO [GridModel.setAvailability] - Switching node 9ac8adaf-1d78-4289-b65a-0d42117a1345 (uri: http://172.23.0.6:5555) from DOWN to UP
09:25:16.812 INFO [GridModel.setAvailability] - Switching node aa57fc51-923e-47d9-a545-132f22b39083 (uri: http://172.23.0.8:5555) from DOWN to UP
09:25:38.474 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "3bcb3c95d05b2ecbd26ac31e78caec8c","eventTime": 1635240338457300715,"eventName": "exception","attributes": {"exception.message": "Unable to execute request for an existing session: NettyHttpHandler request execution error","exception.stacktrace": "java.lang.RuntimeException: NettyHttpHandler request execution error\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:83)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:119)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:110)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:91)\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Request timeout to 172.23.0.7\u002f172.23.0.7:5555 after 180000 ms\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022)\n\tat org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:66)\n\t... 35 more\nCaused by: java.util.concurrent.TimeoutException: Request timeout to 172.23.0.7\u002f172.23.0.7:5555 after 180000 ms\n\tat org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)\n\tat org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)\n\tat io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:669)\n\tat io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:744)\n\tat io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:469)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\t... 1 more\n","exception.type": "java.lang.RuntimeException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.router.HandleSession","http.host": "clj-heartbeat01:4444","http.method": "POST","http.request_content_length": "369","http.scheme": "HTTP","http.target": "\u002fsession\u002f56e701c17e041d6a58aea67d9357bdf3\u002felement","http.user_agent": "selenium\u002f3.141.59 (java windows)","session.id": "56e701c17e041d6a58aea67d9357bdf3"}}

09:25:38.489 INFO [GridModel.setAvailability] - Switching node 6ba5452d-7e1d-49d8-a6f2-d647f5254c05 (uri: http://172.23.0.4:5555) from DOWN to UP
09:25:38.519 INFO [GridModel.setAvailability] - Switching node 51cfe841-ee7c-4b31-a3d3-518fe4c6c3ff (uri: http://172.23.0.11:5555) from DOWN to UP
09:25:38.536 INFO [GridModel.setAvailability] - Switching node 368f0d70-e29a-4a74-b5ae-3bc7fc25a7ca (uri: http://172.23.0.12:5555) from DOWN to UP
09:25:38.555 INFO [GridModel.setAvailability] - Switching node 6ac6dec8-89c3-46fe-a141-cd0d62bae768 (uri: http://172.23.0.3:5555) from DOWN to UP
09:25:38.690 INFO [GridModel.setAvailability] - Switching node 77469386-1e9f-4f45-ae38-d4105d7888c2 (uri: http://172.23.0.5:5555) from DOWN to UP
09:25:38.708 INFO [GridModel.setAvailability] - Switching node 46495380-18e2-4a18-b1ba-6409b8af671f (uri: http://172.23.0.7:5555) from DOWN to UP
09:25:43.570 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "edb6138ff49821c43611d17b6e18b44f","eventTime": 1635240343568906501,"eventName": "exception","attributes": {"exception.message": "Unable to execute request for an existing session: java.net.ConnectException: Connection refused: \u002f172.23.0.7:5555","exception.stacktrace": "java.io.UncheckedIOException: java.net.ConnectException: Connection refused: \u002f172.23.0.7:5555\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:80)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:119)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:110)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:91)\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.net.ConnectException: Connection refused: \u002f172.23.0.7:5555\n\tat org.asynchttpclient.netty.channel.NettyConnectListener.onFailure(NettyConnectListener.java:179)\n\tat org.asynchttpclient.netty.channel.NettyChannelConnector$1.onFailure(NettyChannelConnector.java:108)\n\tat org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:28)\n\tat org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:20)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)\n\tat io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)\n\tat io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)\n\tat io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)\n\tat io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321)\n\tat io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\t... 1 more\nCaused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: \u002f172.23.0.7:5555\nCaused by: java.net.ConnectException: Connection refused\n\tat java.base\u002fsun.nio.ch.SocketChannelImpl.checkConnect(Native Method)\n\tat java.base\u002fsun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)\n\tat io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)\n\tat io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\n","exception.type": "java.io.UncheckedIOException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.router.HandleSession","http.host": "clj-heartbeat01:4444","http.method": "POST","http.request_content_length": "205","http.scheme": "HTTP","http.target": "\u002fsession\u002f56e701c17e041d6a58aea67d9357bdf3\u002felement","http.user_agent": "selenium\u002f3.141.59 (java windows)","session.id": "56e701c17e041d6a58aea67d9357bdf3"}}

09:31:38.314 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "cb34f0f1e67ebde733f21638f2b070ac","eventTime": 1635240698313527177,"eventName": "exception","attributes": {"exception.message": "Unable to execute request for an existing session: java.net.ConnectException: Connection refused: \u002f172.23.0.6:5555","exception.stacktrace": "java.io.UncheckedIOException: java.net.ConnectException: Connection refused: \u002f172.23.0.6:5555\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:80)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)\n\tat org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:119)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:92)\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:110)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:91)\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\nCaused by: java.net.ConnectException: Connection refused: \u002f172.23.0.6:5555\n\tat org.asynchttpclient.netty.channel.NettyConnectListener.onFailure(NettyConnectListener.java:179)\n\tat org.asynchttpclient.netty.channel.NettyChannelConnector$1.onFailure(NettyChannelConnector.java:108)\n\tat org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:28)\n\tat org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:20)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)\n\tat io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)\n\tat io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)\n\tat io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)\n\tat io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)\n\tat io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321)\n\tat io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\t... 1 more\nCaused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: \u002f172.23.0.6:5555\nCaused by: java.net.ConnectException: Connection refused\n\tat java.base\u002fsun.nio.ch.SocketChannelImpl.checkConnect(Native Method)\n\tat java.base\u002fsun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)\n\tat io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)\n\tat io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\n","exception.type": "java.io.UncheckedIOException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.router.HandleSession","http.host": "clj-heartbeat01:4444","http.method": "POST","http.request_content_length": "233","http.scheme": "HTTP","http.target": "\u002fsession\u002ff68cd24799bcf753161802302c2c8044\u002felement","http.user_agent": "selenium\u002f3.141.59 (java windows)","session.id": "f68cd24799bcf753161802302c2c8044"}}

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x000000076f800000, 873463808, 0) failed; error='Not enough space' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 873463808 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /tmp/hs_err_pid10.log
2021-10-26 09:31:38,543 INFO exited: selenium-grid-hub (exit status 1; not expected)

Operating System

Centos

Docker Selenium version (tag)

selenium/hub:4.0.0-20211025 and node-chrome:95.0-chromedriver-95.0-20211025

@lszasz lszasz changed the title Grid 4 - Selenium hub crashes after a while [馃悰 Bug]: Grid 4 - Selenium hub crashes after a while Oct 28, 2021
@diemol
Copy link
Member

diemol commented Nov 15, 2021

Can you share how you start the whole Grid? You mention you have 10 Chrome nodes but I only see one in the script above.

Also, what resources do you have available in the VM you are running all the containers?
Is that graphic showin the VM resources or just a container?

What tests could I use to reproduce the issue?

@lszasz
Copy link
Author

lszasz commented Nov 16, 2021

I use the following commands to start the grid:
docker-compose scale seleniumhub=1
docker-compose scale chrome=10

I have 16GB memory on the VM. The same as with grid 3 which is not running out of memory. The graphic is showing the VM's resources.
Not sure what tests you could use to reproduce this issue. I don't have at the moment tests that I could share.

@ssfxate
Copy link

ssfxate commented Dec 9, 2021

I've got similar condition with 50 chrome nodes and used a workaround.

But have lower efficiency rather than had with 3rd version around 60%.

@Nagam-Naidu
Copy link

Getting the same error with Grid4 having 30 chrome nodes.

| 18:17:01.472 INFO [LocalSessionMap.lambda$new$0] - Deleted session from local session map, Id: 5244b021d05b636b1b7577388380aef9 | OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x000000074cb00000, 378535936, 0) failed; error='Not enough space' (errno=12) | # | # There is insufficient memory for the Java Runtime Environment to continue. | # Native memory allocation (mmap) failed to map 378535936 bytes for committing reserved memory. | # An error report file with more information is saved as: | # /tmp/hs_err_pid12.log | 2021-12-14 18:18:05,985 INFO exited: selenium-grid-hub (exit status 1; not expected) | Trapped SIGTERM/SIGINT/x so shutting down supervisord... | 2021-12-14 19:52:24,025 WARN received SIGTERM indicating exit request | Shutdown complete

@Nagam-Naidu
Copy link

Nagam-Naidu commented Dec 16, 2021

Setup is in docker swarm mode. Below is the docker-compose.yaml.
Total chrome nodes are 35 and we are running total of 30 cases in parallel.

Running UI tests to generate load.

Below are the VM configurations.

VM instance config:
slave 1,2 : r5.large 	16.0 GiB 	2 vCPUs

docker-compose.yaml

version: "3"

networks:
  main:
    driver: overlay
services:
  hub:
    #image: selenium/hub:latest
    image: selenium/hub:latest
    ports:
      - "4442:4442"
      - "4443:4443"
      - "4444:4444"
    restart: always
    networks:
      - main
    deploy:
      mode: replicated
      replicas: 1
      labels:
        selenium.grid.type: "hub"
        selenium.grid.hub: "true"
      restart_policy:
        condition: none
      placement:
        constraints: [node.role == worker]
  chrome:
    #image: selenium/node-chrome:91.0
    image: selenium/node-chrome:latest
    #image: selenium/node-chrome:3.141.59-20210713
    shm_size: 2gb
    entrypoint: >
      bash -c '
        export IP_ADDRESS=$$(ip addr show eth0 | grep "inet\b" | awk '"'"'{print $$2}'"'"' | awk -F/ '"'"'{print $$1}'"'"' | head -1) &&
        SE_OPTS="--host $$IP_ADDRESS" /opt/bin/entry_point.sh'
    volumes:
      - /dev/urandom:/dev/random
      - /dev/shm:/dev/shm
      - /tmp/:/home/seluser/Downloads
    environment:
      - SE_EVENT_BUS_HOST=hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - HUB_PORT_4444_TCP_ADDR=hub
      - HUB_PORT_4444_TCP_PORT=4444
      - NODE_MAX_SESSION=1
      - SCREEN_WIDTH=1920
      - SCREEN_HEIGHT=1024
    restart: always
    networks:
      - main
    deploy:
      mode: replicated
      replicas: 1
      labels:
        selenium.grid.type: "node"
        selenium.grid.node: "true"
        selenium.grid.node.type: "chrome"
      restart_policy:
        condition: none
      placement:
        constraints: [node.role == worker]
  chrome:
    #iimage: selenium/node-chrome:91.0
    #image: selenium/node-chrome:latest
    image: selenium/node-chrome:latest
    shm_size: 2gb
    entrypoint: >
      bash -c '
        export IP_ADDRESS=$$HOSTNAME &&
        SE_OPTS="--host $$IP_ADDRESS" /opt/bin/entry_point.sh'
    volumes:
      - /dev/shm:/dev/shm
      - /dev/urandom:/dev/random
      - /tmp/:/home/seluser/Downloads
    environment:
      - SE_EVENT_BUS_HOST=hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - HUB_PORT_4444_TCP_ADDR=hub
      - HUB_PORT_4444_TCP_PORT=4444
      - NODE_MAX_SESSION=1
      - SCREEN_WIDTH=1920
      - SCREEN_HEIGHT=1024
    restart: always
    networks:
      - main
    depends_on:
      - hub
    deploy:
      mode: replicated
      replicas: 1
      labels:
        selenium.grid.type: "node"
        selenium.grid.node: "true"
        selenium.grid.node.type: "chrome"
      restart_policy:
        condition: none
      placement:
        constraints: [node.role == worker]

@shamsheerd
Copy link

shamsheerd commented Jan 3, 2022

Hello,
Do we have any update on this issue! Even I am stuck moving to Grid 4 (latest), where selenium-hub crashes and test runs become orphans. I did check all the blogs pertaining to this issue, but could not find any answers.
We are using docker setup on a linux host (Ec2 - c5a.xlarge). We create this infra before the test run and destroy it post the run using Azure Devops releases. This host have 7 containers, 1 hub and 6 nodes. Each node is hosted with 5 Chrome browser instances.
Smoke suite (160 tests) works fine with no issues, but when running regression suite (4k tests), hub crashes between 5-10 minutes of the run. Below are the logs for reference.

Images:

selenium/hub:latest
selenium/node-chrome:latest

Error Message:
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x000000072b800000, 253755392, 0) failed; error='Not enough space' (errno=12)

There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 253755392 bytes for committing reserved memory.
An error report file with more information is saved as:
/package/hs_err_pid231.log

--------------- S Y S T E M ---------------

OS:DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS"
uname:Linux 4.14.252-195.483.amzn2.x86_64 #1 SMP Mon Nov 1 20:58:46 UTC 2021 x86_64
OS uptime: 0 days 4:29 hours
libc:glibc 2.31 NPTL 2.31
rlimit (soft/hard): STACK 10240k/10240k , CORE 0k/0k , NPROC infinity/infinity , NOFILE 65536/65536 , AS infinity/infinity , CPU infinity/infinity , DATA infinity/infinity , FSIZE infinity/infinity , MEMLOCK infinity/infinity
load average:48.25 19.78 8.02

/proc/meminfo:
MemTotal: 32118168 kB
MemFree: 206532 kB
MemAvailable: 98824 kB
Buffers: 0 kB
Cached: 1056524 kB
SwapCached: 0 kB
Active: 29841680 kB
Inactive: 999368 kB
Active(anon): 29785500 kB
Inactive(anon): 943916 kB
Active(file): 56180 kB
Inactive(file): 55452 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 2836 kB
Writeback: 4 kB
AnonPages: 29784372 kB
Mapped: 607036 kB
Shmem: 944572 kB
Slab: 632380 kB
SReclaimable: 195116 kB
SUnreclaim: 437264 kB

>> docker stats

CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
b8190aa56f4d node2 112.60% 3.732GiB / 30.63GiB 12.19% 159MB / 8.39MB 3.6MB / 246MB 645
90970faaa538 node6 270.06% 3.547GiB / 30.63GiB 11.58% 188MB / 9.02MB 28.8MB / 282MB 631
1e22994b3be5 node1 64.47% 3.69GiB / 30.63GiB 12.05% 153MB / 8.22MB 11.3MB / 266MB 654
302889ac63c0 node3 187.68% 4.019GiB / 30.63GiB 13.12% 195MB / 9.65MB 11.2MB / 311MB 625
2e85967517ac node5 442.81% 3.594GiB / 30.63GiB 11.73% 174MB / 8.54MB 10.7MB / 260MB 693
76f1c58ad9da node4 58.89% 3.724GiB / 30.63GiB 12.16% 154MB / 8.11MB 4.85MB / 255MB 681
fb0752c0b64a selenium-hub 26.73% 7.307GiB / 30.63GiB 23.86% 182MB / 254MB 5.29MB / 18.9MB 275
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
b8190aa56f4d node2 112.60% 3.732GiB / 30.63GiB 12.19% 159MB / 8.39MB 3.6MB / 246MB 645
90970faaa538 node6 270.06% 3.547GiB / 30.63GiB 11.58% 188MB / 9.02MB 28.8MB / 282MB 631
1e22994b3be5 node1 64.47% 3.69GiB / 30.63GiB 12.05% 153MB / 8.22MB 11.3MB / 266MB 654
302889ac63c0 node3 187.68% 4.019GiB / 30.63GiB 13.12% 195MB / 9.65MB 11.2MB / 311MB 625
2e85967517ac node5 442.81% 3.594GiB / 30.63GiB 11.73% 174MB / 8.54MB 10.7MB / 260MB 693
76f1c58ad9da node4 58.89% 3.724GiB / 30.63GiB 12.16% 154MB / 8.11MB 4.85MB / 255MB 681
fb0752c0b64a selenium-hub 26.73% 7.307GiB / 30.63GiB 23.86% 182MB / 254MB 5.29MB / 18.9MB 275
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
b8190aa56f4d node2 -- -- / -- -- -- -- --
90970faaa538 node6 -- -- / -- -- -- -- --
1e22994b3be5 node1 -- -- / -- -- -- -- --
302889ac63c0 node3 -- -- / -- -- -- -- --
2e85967517ac node5 -- -- / -- -- -- -- --
76f1c58ad9da node4 -- -- / -- -- -- -- --
fb0752c0b64a selenium-hub -- -- / -- -- -- -- --
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
b8190aa56f4d node2 -- -- / -- -- -- -- --
90970faaa538 node6 -- -- / -- -- -- -- --
1e22994b3be5 node1 -- -- / -- -- -- -- --
302889ac63c0 node3 -- -- / -- -- -- -- --
2e85967517ac node5 -- -- / -- -- -- -- --
76f1c58ad9da node4 -- -- / -- -- -- -- --
fb0752c0b64a selenium-hub -- -- / -- -- -- -- --

Do let me know if any further info if required to troubleshoot the issue.

TIA.

@ezedonovan
Copy link

ezedonovan commented Jan 31, 2022

Hi all,
Seems that I am having a similar issue on my end.
The grid is running on a AWS EB instance type c5.2xlarge with a total of 8 Chrome nodes. The c5.2xlarge has 8 vCPUs and 16GB memory.

My docker-compose looks pretty much like the below (just change the {i} with the node number):

version: '2'
services:
  selenium-hub:
    image: selenium/hub:4.1.1-20220121
    ports:
      - "80:4444"
      - "4442:4442"
      - "4443:4443"
  chrome_{i}:
    image: selenium/node-chrome:4.1.1-20220121
    shm_size: 1500mb
    mem_limit: 1500mb
    memswap_limit: 1500mb
    restart: always
    depends_on:
      - selenium-hub
    environment:
      - SE_NODE_SESSION_TIMEOUT=120
      - SE_NODE_OVERRIDE_MAX_SESSIONS=true
      - SE_NODE_MAX_SESSIONS=1
      - SE_EVENT_BUS_HOST=selenium-hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - START_XVFB=false
    ports:
      - "690{i}:5900"

The problem

After some time I find that:

  • Checking docker stats, the nodes reach a constant memory usage of 99%. They don't reach nor pass the 100% and thus the restart policy doesn't hit
  • Going to the UI, the queue start filling up more and more
  • I'm forced to restart all the containers

While the nodes are running good I see the following from docker logs {id}. I see it occurring many times.

Starting ChromeDriver 97.0.4692.71 (adefa7837d02a07a604c1e6eff0b3a09422ab88d-refs/branch-heads/4692@{#1247}) on port 33725                                      
Only local connections are allowed.                                                                                                                             
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.                                              
ChromeDriver was started successfully.                                                                                                                          
[1643659937.243][SEVERE]: bind() failed: Cannot assign requested address (99)                                                                                   
20:12:20.295 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "a89d97356cc1a87529ca5e4e8ffc8211","eventTime": 1643659940249171116,"eventName": "excep
tion","attributes": {"driver.url": "http:\u002f\u002flocalhost:33725","exception.message": "Error while creating session with the driver service. Stopping drive
r service: Could not start a new session. Response code 500. Message: unknown error: Chrome failed to start: crashed.\n  (chrome not reachable)\n  (The process 
started from chrome location \u002fusr\u002fbin\u002fgoogle-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)\nBuild info: version: '4.1.1', revision: 'e8fcc2cecf'\nSystem info: host: '6485cc164d7a', ip: '172.20.0.9', os.name: 'Linux', os.arch: 'amd64', os.version: '4.14.256-197.484.amz
n2.x86_64', java.version: '11.0.13'\nDriver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.SessionNotCreatedException: Could not st
art a new session. Response code 500. Message: unknown error: Chrome failed to start: crashed.\n  (chrome not reachable)\n  (The process started from chrome location \u002fusr\u002fbin\u002fgoogle-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)\nBuild info: version: '4.1.1', revision: 'e8fcc2cecf'\nSystem info: host: '6485cc164d7a', ip: '172.20.0.9', os.name: 'Linux', os.arch: 'amd64', os.version: '4.14.256-197.484.amzn2.x86_64', java.version: '11.0.13'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:126)\n\tat org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:84)\n\tat org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:62)\n\tat org.openqa.selenium.grid.node.config.DriverServiceSessionFactory.apply(DriverServiceSessionFactory.java:131)\n\tat org.openqa.selenium.grid.node.config.DriverServiceSessionFactory.apply(DriverServiceSessionFactory.java:65)\n\tat org.openqa.selenium.grid.node.local.SessionSlot.apply(SessionSlot.java:143)\n\tat org.openqa.selenium.grid.node.local.LocalNode.newSession(LocalNode.java:314)\n\tat org.openqa.selenium.grid.node.NewNodeSession.execute(NewNodeSession.java:52)\n\tat org.openqa.selenium.remote.http.Route$TemplatizedRoute.handle(Route.java:192)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.security.RequiresSecretFilter.lambda$apply$0(RequiresSecretFilter.java:64)\n\tat org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:86)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.node.Node.execute(Node.java:240)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\n","exception.type": "org.openqa.selenium.SessionNotCreatedException","logger": "org.openqa.selenium.grid.node.config.DriverServiceSessionFactory","session.capabilities": "{\"browserName\": \"chrome\",\"goog:chromeOptions\": {\"extensions\": [  ],  \"args\": [    \"-headless\"    ]    },    \"pageLoadStrategy\": \"normal\"    }\n"}}

I'll post back the logs for when the memory usage reaches 99% and nodes become unusable.

Thank you,
Ezequiel

@fescobar
Copy link

fescobar commented Mar 9, 2022

After 1/2 days in my case using Kubernetes in EKS (Amazon WS) my tests start to fail because they can't connect to the grid.
My only workaround is to re-deploy everything every 1 day at least.
Even running 1 node instance (1 session) I can reproduce an infinite loop in the hub (not sure from where is comming)
I think this issue is related with this one #1503 (comment)

@lszasz
Copy link
Author

lszasz commented Apr 20, 2022

I have retested this issue using the latest versions selenium-hub [4.1.3] and node-chrome [100.0]. Selenium grid still crashes although I did not see any out of memory exceptions. The grid UI will show loading and no new sessions are created. Already running tests will be hanging.

The logs are full with this warning:

12:25:44.475 WARN [SeleniumSpanExporter$1.lambda$export$0] - {"traceId": "b28ea890a1158419b6992cfb23166459","eventTime": 1650457544475246842,"eventName": "exception","attributes": {"exception.message": "Unable to execute request for an existing session: Unable to find session with ID: 1782072244ee3c083cc772587ffaa25e\nBuild info: version: '4.1.3', revision: '7b1ebf28ef'\nSystem info: host: 'e751198e2355', ip: '172.28.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-514.26.2.el7.x86_64', java.version: '11.0.14.1'\nDriver info: driver.version: unknown","exception.stacktrace": "org.openqa.selenium.NoSuchSessionException: Unable to find session with ID: 1782072244ee3c083cc772587ffaa25e\nBuild info: version: '4.1.3', revision: '7b1ebf28ef'\nSystem info: host: 'e751198e2355', ip: '172.28.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-514.26.2.el7.x86_64', java.version: '11.0.14.1'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.grid.sessionmap.local.LocalSessionMap.get(LocalSessionMap.java:129)\n\tat org.openqa.selenium.grid.router.HandleSession.lambda$loadSessionId$4(HandleSession.java:158)\n\tat io.opentelemetry.context.Context.lambda$wrap$2(Context.java:224)\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:121)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:373)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:91)\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:336)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:68)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:64)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base\u002fjava.lang.Thread.run(Thread.java:829)\n","exception.type": "org.openqa.selenium.NoSuchSessionException","http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.router.HandleSession","http.host": "clj-heartbeat01:4444","http.method": "POST","http.request_content_length": "754","http.scheme": "HTTP","http.target": "\u002fsession\u002f1782072244ee3c083cc772587ffaa25e\u002felement","http.user_agent": "selenium\u002f3.141.59 (java windows)","session.id": "1782072244ee3c083cc772587ffaa25e"}}

image

@lszasz
Copy link
Author

lszasz commented Apr 22, 2022

I have managed to understand the steps how to reproduce this issue mentioned in the last comment with the loading screen in the grid.

Steps to repro:

  1. run a test in debug where a driver is initialized
  2. hit pause and wait until the session is deleted from the grid
  3. resume the test and search for an element (seems to be reproducing only for some xpaths)
    Actual: The grid will be showing loading and no new sessions are created. The grid basically will not working anymore until the hub is restarted.

I have added a project which you can use to reproduce the issue since it's not that straightforward.
https://github.com/lszasz/selenium-test
@diemol Please let me know if you need more details.

@diemol
Copy link
Member

diemol commented Apr 26, 2022

@lszasz thank you for the GitHub repository.

Sadly, I am not able to reproduce the issue with it. However, the UI looks similar to what happens in #10485. Are you running a security scanning tool or something similar?

@diemol
Copy link
Member

diemol commented Apr 26, 2022

Spoke too fast, seems I was able to reproduce it, please disregard my previous comment and I will let you know if more details are needed.

@diemol
Copy link
Member

diemol commented Apr 27, 2022

The fix for this will be released in the next couple of days.

@fescobar
Copy link

Please @diemol let us know when you release this fix.

@diemol
Copy link
Member

diemol commented Apr 29, 2022

It was already released.

@fescobar
Copy link

Thanks @diemol

elgatov pushed a commit to elgatov/selenium that referenced this issue Jun 27, 2022
This is needed when a client sends a request, and we
reply right away (e.g. session not found), but we do
not read the whole request content.

If we do not read the whole request content, Netty
will wait until it does, and then it blocks. Which
is not ideal since no more requests can be processed.

Now, we close the pipeline when the client disconnects.

Fixes SeleniumHQ#10485
Fixes SeleniumHQ/docker-selenium#1439
@github-actions github-actions bot locked and limited conversation to collaborators Apr 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants