New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When a Windows client is killed, Grid nodes can leave sessions/browsers open indefinitely #2332
Comments
@jessehudl bump - I'd like to see a PR raised for this (submitted for 3.x), assuming it still an issue |
@jessehudl
This has some good ideas (and code) for killing remaining/crashed browsers before each test run: |
Sorry guys, I actually took another job so wasn't getting these notifications since they were tied to my old email. @cgoldberg: my proposed change which I implemented at my now-former job was to specifically remove the session deletions on the hub in these cases where the client disconnects unexpectedly. What was happening was when the client session was terminated/killed, the hub would simply delete the session on the hub only and not the node. This left sessions and browsers open indefinitely on the nodes causing resource exhaustion and slowness. The comment I quoted in the original post seems to confirm this is intentional, in
So it removes sessions from the Grid but not the node. By removing these I'm not sure what the intended use case for this is, that's the reason I opened the discussion. Perhaps it was simply the benefit of freeing the Grid session immediately, but I don't see why an attempt wouldn't be made to free the session on the node. Maybe that would be an even better fix than what I proposed - cleaning up the session on the hub and the node. I think I did investigate that approach, but didn't see an easy way to kill the session on the node from this area of the code. This was the first time I ever looked at the Grid code and I'm no longer working on this or using Selenium grid regularly. Selenium Grid Extras is fine as an add-on, but if we could fix this in the core code, why not? Maybe this is a bug that the Extras is working around? I have no problem opening a PR for the fix I originally proposed if @mach6 agrees it's the best approach. |
I tried reproducing this tonight with v3.3.1 and was unable to. It appears that this scenario is covered by
Running the same repro steps outlined in the OP I see the hub detect the client gone first as usual (the node is again not affected):
However the node does now detect a client timeout and closes the session (and the browser):
I also suspect commit 705be71 fixed this issue (maybe)? I think this issue can be closed as it's no longer occurring on v3.3.1, unless the fix should be backported to 2.x |
Checked again in 3.8.1 -- the issue seems to be fixed. Yes, the hub starts all pending sessions, but the node kills the by timeout, one by one. |
Meta -
OS: Windows
Selenium Version: 2.53
Expected Behavior -
When a client has disconnected for any reason, an attempt is always made to cleanup the session on the remote node, at least by the expiration of the idle timeout.
Actual Behavior -
If the Selenium Grid hub detects a client "is gone" or "Socket timed out", it terminates/frees the session on the Grid, but no attempt is made to delete the session on the remote node. This is handled in RequestHandler.java:#L131-136:
registry.terminate()
specifies this in the docstring:This means sessions are left on the node and will never be cleaned up, which can cause dozens of web browsers left running eventually leading to resource exhaustion on the node and slow/failing tests.
maxSession
does not matter in this case.We see this in our environment when our continuous integration server (TeamCity) kills a build on a Windows agent during a Selenium test, either manually or due to overrunning the allowed run time.
Steps to reproduce -
I can only reproduce this using Windows as a client, I am not sure if there's something different in the way it handles connections, or why exactly. I imagine it's possible on other OS, but not as easy to reproduce.
Since this is a Grid hub specific issue, there are a few steps:
Start a grid server with an idle timeout of 10 seconds:
Start a Grid node on the same machine:
On a Windows machine from command prompt run a script equivalent to this (replace localhost with the Grid IP, if different):
While the script is running close the terminal window (do not ctrl-c or kill the process)
You should see this in the log:
This means the session is freed on the Grid, but the session is left running on the node. You can confirm this by browsing to your node's URL (e.g. http://localhost:5555/wd/hub/static/resource/hub.html).
Solutions -
I have a (very small) commit on my forked branch which removes the
session.terminate()
calls when handling these types of client disconnects - jessehudl@fe5fafbRemoving the
session.terminate()
allows the Grid to clean up the sessions like it usually would - after the idle timeout has expired. If you re-run the "Steps to reproduce", you will see that after the client is gone, the session is still cleaned up after 10 seconds and the browser is closed on the node.This is working well for us internally but I wanted to gain some insight from the core devs before submitting a PR:
The text was updated successfully, but these errors were encountered: