Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
srmmanager: fix race condition in LoginBrokerSubscriber
Motivation: The `LoginBrokerSubscriber` class is responsible for maintaining a list of doors. It is currently used by the `srmmanager`, `frontend` and `httpd` services. Instances of `LoginBrokerSubscriber` receive updated information from doors in which they provide `LoginBrokerInfo` objects. These are immutable objects that describe their current status. `LoginBrokerSubscriber` maintains three distinct mappings that hold the `LoginBrokerInfo` objects. The `doorsByIdentity` map holds all `LoginBrokerInfo` objects. There are also two by-protocol mappings (one for doors supporting writing/uploading; the other for doors supporting reading/downloading). In addition, there is an ordered queue to support automatic removal of `LoginBrokerInfo` objects if no update is heard from a door for "a while". Only `srmmanager` uses the two by-protocol maps. The `frontend` and `httpd` services use only the `doorsByIdentity` mapping. The `LoginBrokerSubscriber` code assumes that any `LoginBrokerInfo` object that exists in either of the two by-protocol mappings also exists in the `doorsByIdentity` mapping. However, when accepting information from a door, the `LoginBrokerInfo` object is first added to the `doorsByIdentity` mapping before it is added to the by-protocol mappings. This violates that assumption for "a short while". Note that the `srmmanager` cell can process incoming messages with multiple threads and that some doors may send multiple `LoginBrokerInfo` objects in rapid succession, particularly on start-up. Also `DelayQueue` uses locking to prevent concurrent updates. In the current code, when receiving a `LoginBrokerInfo` object, the object is added to the `doorsByIdentity` (retaining any existing object so it may be removed), then added to the expiry queue, then the `ByProtocolMap` mappings are updated. Under `DelayQueue` lock contention and when `door-1` sends two `LoginBrokerInfo` objects (`info-A` and `info-B`) in rapid succession, the following may happen: Thread-A puts LoginBrokerInfo info-A (from door-1) into doorsByIdentity. Thread-A attempts to add info-A into DelayQueue; this blocks. Thread-B puts LoginBrokerInfo info-B (from door-1) into doorsByIdentity. The put call receives info-A: stale info that should be removed. Thread-B attempts to add info-B into DelayQueue; this blocks. Thread-B's call to DelayQueue#add is processed. Thread-B adds info-B to the two ByProtocolMap mappings. Thread-B attempts to remove info-A from the two ByProtocolMap mappings. This does nothing because Thread-A hasn't yet added info-A. Thread-A's call to DelayQueue#add is processed. Thread-A adds info-A to the two ByProtocolMap mappings. The result is that `info-A` exists in the two `ByProtocolMap` mappings but not in the `doorsByIdentity` mapping. The `info-A` object does not appear in `lb ls` admin command because it does not exist in `doorsByIdentity`; however, `info-A` continues to affect TURL selection as it exists in the `ByProtocolMap` mappings. When it is time for `info-A` to expire (in the `DelayQueue`) the removal fails because `info-A` does not exist in the `doorsByIdentity` mapping. The code assumes that, if a `LoginBrokerInfo` object doesn't exist in `doorsByIdentity` then it also doesn't exist in either of the two `ByProtocolMap` mappings, so there's nothing to remove. Modification: Update order in which a `LoginBrokerInfo` object is added to the maps. This guarantees that any object in either of the two `ByProtocolMap` mappings also exists in `doorsByIdentity`. Add a safety feature where the code will attempt to remove an expired `LoginBrokerInfo` objects from the two `ByProtocolMap` mappings even if it doesn't exist in `doorsByIdentity`. Warn if this actually does something. Result: A race condition is fixed that, if triggered, results in a memory leak. This leak can also affect the TURLs returned by SrmManager, where out-of-date information about doors is used. Target: master Requires-notes: yes Requires-book: no Request: 7.1 Request: 7.0 Request: 6.2 Request: 6.1 Request: 6.0 Request: 5.2 Closes: #5972 Patch: https://rb.dcache.org/r/13113/ Acked-by: Lea Morschel
- Loading branch information