Skip to content

Conversation

@VietND96
Copy link
Member

@VietND96 VietND96 commented Jan 18, 2026

User description

🔗 Related Issues

💥 What does this PR do?

reserveSlot() with write lock is exclusive - only one thread can hold it. All other threads must wait, even though selectSlot() is just reading data. Now, use read lock for slot selection (allows concurrent reads), write lock only for actual reservation (brief, atomic). The procedure would be

  1. selectSlot() is a read-only operation - it queries the snapshot, doesn't modify it
  2. Multiple threads can select slots in parallel - they may pick the same slot, but that's OK
  3. reserve() uses write lock briefly - only during the actual reservation
  4. LocalGridModel.reserve() now validates slot is free - so only one thread succeeds per slot
  5. Failed reservations try next slot - from their pre-selected list

Improve

  • Higher throughput - concurrent session requests don't block each other during selection
  • Lower latency - threads spend less time waiting for locks

When multiple concurrent session requests arrive, the Grid can send multiple requests to the same Node slot, causing the Node to reject with "Max session count reached" even when slots should be available. (point 2 - they may pick the same slot).
Few reasons: The Distributor maintains a snapshot of slot states, but multiple threads could:

  1. Select the same slot during concurrent reads
  2. All successfully "reserved" the slot (no validation of current slot state)
  3. All send requests to the Node, overwhelming it
  • LocalGridModel.java - Prevent Double Reservation
    Only the first thread to acquire write lock can reserve a slot. Subsequent threads see session != null and try the next slot.
  • SessionSlot.java - Thread Safety & Logging
    volatile ensures visibility across threads. Logging moved before stop for better debugging.
  • HandleSession.java - Fix TOCTOU Race in HTTP Client Cleanup
    computeIfPresent provides atomic check-and-remove, preventing race where inUse could be incremented between check and removal.

🔧 Implementation Notes

💡 Additional Considerations

🔄 Types of changes

  • Cleanup (formatting, renaming)
  • Bug fix (backwards compatible)
  • New feature (non-breaking change which adds functionality and tests!)
  • Breaking change (fix or feature that would cause existing functionality to change)

PR Type

Bug fix


Description

  • Optimize locking strategy in slot selection using read locks for concurrent reads

  • Prevent double reservation by filtering already-reserved slots during selection

  • Add volatile modifier to ensure thread-safe visibility of session state

  • Fix TOCTOU race condition in HTTP client cleanup using atomic operations


Diagram Walkthrough

flowchart LR
  A["Concurrent Session Requests"] -->|"Read Lock"| B["Slot Selection"]
  B -->|"Write Lock"| C["Atomic Reservation"]
  C -->|"Check Session == null"| D["Reserve Slot"]
  D -->|"Success"| E["Session Created"]
  D -->|"Already Reserved"| F["Try Next Slot"]
  G["HTTP Client Cleanup"] -->|"computeIfPresent"| H["Atomic Check-Remove"]
Loading

File Walkthrough

Relevant files
Bug fix
LocalDistributor.java
Optimize locking strategy for concurrent slot selection   

java/src/org/openqa/selenium/grid/distributor/local/LocalDistributor.java

  • Split locking strategy: use read lock for slot selection, write lock
    only for reservation
  • Reduces contention by allowing concurrent reads of available slots
  • Slot selection now happens outside write lock critical section
  • Multiple threads can select same slot but only one reserves it
    successfully
+24/-17 
LocalGridModel.java
Prevent double reservation with session state check           

java/src/org/openqa/selenium/grid/distributor/local/LocalGridModel.java

  • Add filter to check slot.getSession() == null during reservation
  • Prevents reserving already-occupied slots when multiple threads race
  • Change log level from WARNING to FINE for expected race condition case
  • Improve log message clarity to indicate slot already reserved
+7/-4     
SessionSlot.java
Add volatile modifier and improve logging in SessionSlot 

java/src/org/openqa/selenium/grid/node/local/SessionSlot.java

  • Add volatile modifier to currentSession field for thread-safe
    visibility
  • Reorder logging in stop() method to log before session stops for
    better debugging
  • Improve log messages with session ID for better traceability
  • Ensure memory visibility across threads when session state changes
+5/-3     
HandleSession.java
Fix TOCTOU race in HTTP client cleanup with atomic operations

java/src/org/openqa/selenium/grid/router/HandleSession.java

  • Replace iterator-based cleanup with computeIfPresent for atomic
    operations
  • Fix TOCTOU race condition where inUse could be incremented between
    check and removal
  • Simplify logic using functional approach with forEach and
    computeIfPresent
  • Remove unused Iterator import
+26/-23 

Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
@selenium-ci selenium-ci added B-grid Everything grid and server related C-java Java Bindings labels Jan 18, 2026
@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 18, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🟡
🎫 #1234
🔴 Fix regression where click() in Selenium 2.48.x does not trigger JavaScript in a link’s
href (works in 2.47.1).
Ensure behavior is correct when using Firefox 42.0 (as reported).
Provide/validate with the referenced reproduction case (linked videos/test case).
🟡
🎫 #5678
🔴 Investigate and fix repeated ChromeDriver instantiation errors showing ConnectFailure
(Connection refused) after the first instance.
Ensure correct behavior on Ubuntu 16.04.4 / Chrome 65 / ChromeDriver 2.35 / Selenium 3.9.0
as reported.
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit context: New logging around slot selection does not include any actor/user identity (if applicable)
so it may be insufficient as an audit trail depending on how Grid requests are
authenticated and attributed.

Referred Code
LOG.log(
    getDebugLogLevel(),
    String.format("No slots found for request %s and capabilities %s", requestId, caps));
return null;

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Potential info exposure: The new debug log message prints full Capabilities (caps) which may contain internal
details or sensitive values, and it is unclear if this log can reach user-accessible
outputs.

Referred Code
LOG.log(
    getDebugLogLevel(),
    String.format("No slots found for request %s and capabilities %s", requestId, caps));
return null;

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Capabilities logged: The new log line includes %s for caps, and Selenium capabilities can include potentially
sensitive configuration (e.g., extension data, binary paths, custom metadata), so logging
them verbatim may violate secure logging requirements.

Referred Code
LOG.log(
    getDebugLogLevel(),
    String.format("No slots found for request %s and capabilities %s", requestId, caps));
return null;

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Unredacted external input: Capabilities originate from external clients and are now logged without
sanitization/redaction, so additional review is needed to ensure no secrets or
user-provided sensitive data can be persisted to logs.

Referred Code
LOG.log(
    getDebugLogLevel(),
    String.format("No slots found for request %s and capabilities %s", requestId, caps));
return null;

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 18, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid potential concurrent modification exceptions
Suggestion Impact:The commit removed the forEach/computeIfPresent pattern and implemented conditional removal via httpClients.entrySet().removeIf(), including closing stale clients before removal.

code diff:

-          // Use computeIfPresent for atomic check-and-remove to avoid TOCTOU race condition
-          // where inUse could be incremented between check and removal
-          httpClients.forEach(
-              (uri, entry) -> {
-                httpClients.computeIfPresent(
-                    uri,
-                    (key, currentEntry) -> {
-                      // Atomically check conditions and remove if stale
-                      if (currentEntry.inUse.get() != 0) {
-                        // the client is currently in use
-                        return currentEntry;
-                      } else if (!currentEntry.lastUse.isBefore(staleBefore)) {
-                        // the client was recently used
-                        return currentEntry;
-                      } else {
-                        // the client has not been used for a while, remove it from the cache
-                        try {
-                          currentEntry.httpClient.close();
-                        } catch (Exception ex) {
-                          LOG.log(Level.WARNING, "failed to close a stale httpclient", ex);
-                        }
-                        return null; // removes entry from map
-                      }
-                    });
+          // Use removeIf for safe and efficient removal from ConcurrentHashMap
+          httpClients.entrySet().removeIf(
+              entry -> {
+                CacheEntry cacheEntry = entry.getValue();
+                if (cacheEntry.inUse.get() != 0) {
+                  // the client is currently in use
+                  return false;
+                }
+                if (!cacheEntry.lastUse.isBefore(staleBefore)) {
+                  // the client was recently used
+                  return false;
+                }
+                // the client has not been used for a while, close and remove it
+                try {
+                  cacheEntry.httpClient.close();
+                } catch (Exception ex) {
+                  LOG.log(Level.WARNING, "failed to close a stale httpclient", ex);
+                }
+                return true;
               });

Replace the forEach and nested computeIfPresent calls with
Map.entrySet().removeIf() to safely and efficiently remove entries from the
httpClients map, avoiding potential concurrent modification issues.

java/src/org/openqa/selenium/grid/router/HandleSession.java [121-145]

-// Use computeIfPresent for atomic check-and-remove to avoid TOCTOU race condition
-// where inUse could be incremented between check and removal
-httpClients.forEach(
-    (uri, entry) -> {
-      httpClients.computeIfPresent(
-          uri,
-          (key, currentEntry) -> {
-            // Atomically check conditions and remove if stale
-            if (currentEntry.inUse.get() != 0) {
-              // the client is currently in use
-              return currentEntry;
-            } else if (!currentEntry.lastUse.isBefore(staleBefore)) {
-              // the client was recently used
-              return currentEntry;
-            } else {
-              // the client has not been used for a while, remove it from the cache
-              try {
-                currentEntry.httpClient.close();
-              } catch (Exception ex) {
-                LOG.log(Level.WARNING, "failed to close a stale httpclient", ex);
-              }
-              return null; // removes entry from map
+httpClients
+    .entrySet()
+    .removeIf(
+        entry -> {
+          CacheEntry currentEntry = entry.getValue();
+          if (currentEntry.inUse.get() == 0 && currentEntry.lastUse.isBefore(staleBefore)) {
+            try {
+              currentEntry.httpClient.close();
+            } catch (Exception ex) {
+              LOG.log(Level.WARNING, "failed to close a stale httpclient", ex);
             }
-          });
-    });
+            return true; // remove this entry
+          }
+          return false; // keep this entry
+        });

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that modifying a map within a forEach loop is problematic and proposes using removeIf, which is the idiomatic and safer approach for conditional removal from a collection while iterating.

Medium
Learned
best practice
Make stop idempotent and safe

Use reserved.compareAndSet(true, false) to make stop idempotent and capture
currentSession into a local variable before stopping to avoid races/NPEs.

java/src/org/openqa/selenium/grid/node/local/SessionSlot.java [129-146]

 public void stop(SessionClosedReason reason) {
-  if (isAvailable()) {
+  if (!reserved.compareAndSet(true, false)) {
     return;
   }
 
-  SessionId id = currentSession.getId();
+  ActiveSession session = currentSession;
+  if (session == null) {
+    return;
+  }
+
+  SessionId id = session.getId();
   LOG.info(String.format("Stopping session %s (reason: %s)", id, reason));
   try {
-    currentSession.stop();
+    session.stop();
     LOG.info(String.format("Session stopped successfully: %s", id));
   } catch (Exception e) {
     LOG.log(Level.WARNING, String.format("Unable to cleanly close session %s", id), e);
+  } finally {
+    currentSession = null;
+    connectionCounter.set(0);
+    bus.fire(new SessionClosedEvent(id, reason));
   }
-  currentSession = null;
-  connectionCounter.set(0);
-  release();
-  bus.fire(new SessionClosedEvent(id, reason));
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Make lifecycle/concurrency-sensitive cleanup idempotent and race-safe by using atomic state transitions before operating on shared resources.

Low
  • Update

Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
@VietND96 VietND96 force-pushed the session-distribution branch from 67d8350 to 5f84284 Compare January 18, 2026 09:41
@VietND96 VietND96 merged commit 691fb1c into trunk Jan 18, 2026
37 checks passed
@VietND96 VietND96 deleted the session-distribution branch January 18, 2026 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-grid Everything grid and server related C-java Java Bindings Review effort 3/5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants