-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[grid] Unable to find session with ID when Node drain-after-session-count
#15370
base: trunk
Are you sure you want to change the base?
Conversation
Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
CI Feedback 🧐(Feedback updated until commit df987e2)A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
@@ -77,8 +76,7 @@ public class NodeServer extends TemplateGridServerCommand { | |||
private final AtomicBoolean nodeRegistered = new AtomicBoolean(false); | |||
private Node node; | |||
private EventBus bus; | |||
private final Thread shutdownHook = | |||
new Thread(() -> bus.fire(new NodeRemovedEvent(node.getStatus()))); | |||
private final Thread shutdownHook = new Thread(() -> node.drain()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After checking the code briefly, I see that NodeRemovedEvent
is processed in several places. I don't dislike shutting down the Node by draining it. We need to make sure we respect the current logic, which is to check where NodeRemovedEvent
is used and adapt to it, or fire a NodeRemovedEvent
when the Node drains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also checked, in drain()
it will fire 2 events are NodeDrainStarted
and NodeDrainComplete
Get back NodeServer
, it also listen on event NodeDrainComplete
for System.exit(0)
Do you think this event cycle is safe enough?
IMHO, this will not fix #15347, will report details later in the original issue. |
It might not fix 15347. However, sometimes I can reproduce with a strict condition, something like
|
Without looking deeper into this i would assume the following might happen:
Are you able to run a patched version of the Hub? If so, just remove the |
@VietND96 could you share the hub logs when this happens? |
Hub logs that I could collect for one case
At between, you can see gaps 2 seconds, that is the wait time (by time.sleep in python) before calling |
Could you share more of the logs, it is allways good to see what happened some time before and concurrently by other sessions. |
I attched full logs for id e33534f26859916b894c37fcb5da30df |
Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
drain-after-session-count
@joerg1985, do you think currentSessions.cleanUp() in selenium/java/src/org/openqa/selenium/grid/node/local/LocalNode.java Lines 1018 to 1023 in f779ad0
I think Node drain behavior should update Node status to DRAINING but still waiting for sessions to be completed, session only stops on demand (client call quit() ), or session gets timed out.
|
Yes, It should only perform allready outstanding maintenance operations. When reading the caffeine issues, there are alot of 'there is no guarantee for this and that' statements. |
User description
Thanks for contributing to Selenium!
A PR well described will help maintainers to quickly review and merge it
Before submitting your PR, please check our contributing guidelines.
Avoid large PRs, help reviewers by making them as simple and short as possible.
Motivation and Context
LocalSessionMap
starts removing sessions on the eventNodeRemovedEvent
Check all places where Eventbus fires that event, there is
NodeServer
atshutdownHook
Not sure when the httpd server gets restarted due to a glitch (where
LocalNode
might still be running), this event causesLocalSessionMap
removing the session unexpectedly which leads to the issue. -> Do we need to remove this shutdownHook in NodeServer? Let theLocalNode
do thedrain()
procedure for shutdown.Currently, I updated it to
node.drain()
to trigger the LocalNodedrain()
, where updating Node status toDRAINING
and waiting for pending tasks to be completed before shutting down completely.Types of changes
Checklist
PR Type
Bug fix
Description
Replaced
NodeRemovedEvent
withnode.drain()
inNodeServer
shutdown hook.Ensures proper session handling during node shutdown.
Prevents unexpected session removal during server glitches or restarts.
Changes walkthrough 📝
NodeServer.java
Updated NodeServer shutdown hook for graceful shutdown
java/src/org/openqa/selenium/grid/node/httpd/NodeServer.java
NodeRemovedEvent
firing in the shutdown hook.node.drain()
call to handle node draining during shutdown.