-
Notifications
You must be signed in to change notification settings - Fork 12k
Description
Before Creating the Bug Report
-
I found a bug, not just asking a question, which should be created in GitHub Discussions.
-
I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.
-
I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.
Runtime platform environment
All platforms
RocketMQ version
develop branch (latest)
JDK Version
JDK 8+
Describe the Bug
In MQClientInstance.java, the brokerVersionTable (a ConcurrentHashMap) is accessed using a non-atomic check-then-act pattern in multiple locations:
Location 1 - sendHeartbeatToBroker() (line ~657):
if (!this.brokerVersionTable.containsKey(brokerName)) {
this.brokerVersionTable.put(brokerName, new ConcurrentHashMap<>(4));
}
this.brokerVersionTable.get(brokerName).put(addr, version);Location 2 - sendHeartbeatToAllBrokerV2() (line ~737):
Same pattern as Location 1.
Location 3 - findBrokerVersion() (line ~1304):
if (this.brokerVersionTable.containsKey(brokerName)) {
if (this.brokerVersionTable.get(brokerName).containsKey(brokerAddr)) {
return this.brokerVersionTable.get(brokerName).get(brokerAddr);
}
}Between containsKey() and get(), another thread could remove the entry, causing a NullPointerException when the result of get() is dereferenced.
Notably, a similar operation in the same class at line ~814 already uses the correct pattern:
ConcurrentHashMap<String, Integer> inner = MQClientInstance.this.brokerVersionTable
.computeIfAbsent(brokerName, k -> new ConcurrentHashMap<>(4));
inner.put(brokerAddr, version);Steps to Reproduce
The race condition occurs under concurrent heartbeat sending and broker disconnection. When a broker is removed from the version table concurrently with a heartbeat send or version lookup, a NullPointerException can occur.
What Did You Expect to See?
Thread-safe access to brokerVersionTable using atomic operations like computeIfAbsent and local variable caching.
What Did You See Instead?
Non-atomic check-then-act pattern that can result in NullPointerException under concurrent access.
Additional Context
Related to previously reported #8743 (closed by stale bot without fix).