-
-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[keyring-controller] Lock controller mutex on write operations #4182
Conversation
2e46463
to
ef8943d
Compare
} finally { | ||
releaseLock(); | ||
if (!this.#controllerOperationMutex.isLocked()) { | ||
throw new Error(KeyringControllerError.ControllerLockRequired); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should ensure that all calls to #updateVault
are done after locking on the controller mutex, enforcing mutually exclusive operations in mutable methods
6a04825
to
b5b0c02
Compare
ef8943d
to
a2b2119
Compare
b5b0c02
to
4d0136a
Compare
a2b2119
to
4f1cab2
Compare
4d0136a
to
cacb308
Compare
4f1cab2
to
459273c
Compare
cacb308
to
57ba781
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but it might be worth encoding the knowledge that we have about the bug somewhere, perhaps in the docs for withControllerLock
or somewhere else.
/** | ||
* Lock the controller mutex before executing the given function, | ||
* and release it after the function is resolved or after an | ||
* error is thrown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth explaining why this method exists and what problem it is intended to solve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely! added some details in ab38237
459273c
to
4b382ae
Compare
029f2c0
to
ab38237
Compare
I was thinking that this would cause a deadlock if anyone listened for state change events, then called a write operation in response, e.g.
But no, this should be OK because all write operations are async. The triggered operation would just wait for the first to complete. Is that correct? |
@Gudahtt In your example, if This is also part of the reason why methods and actions like |
Hmm. Event publishing is synchronous, so I do not see how an |
In this example test case the |
In either case, perhaps warranted to add an explicit test-case with that to capture current behavior? |
Ah I see what you are saying. It would wait for the lock to be released, and just executed afterwards 🤔 |
With the introduction of this new lock, is the vault mutex still useful? It now seems redundant to me. Not a blocker - we can remove it in a later PR, but if we can remove it we should, because it would simplify the controller a great deal and eliminate the chances of a new deadlock involving that mutex being introduced in the future. |
Hmm. It would protect against two concurrent write operations to the vault I guess, as sequential changes would be fine. I get the impression that the lock was originally added to prevent concurrent vault operations caused by concurrent calls to write operations, rather than internal calls. I see how it could be a useful protection measure internally, but it comes at a non-trivial cost in complexity and deadlock risk, and I don't see a reason to suspect concurrent write operations are especially likely to occur. We have no similar protection against concurrent keyring updates, which would be just as problematic. |
ca03cc4
to
e29dbfe
Compare
Good point! I added assertions for all those methods 9780406 |
True, there's a high cost in terms of additional complexity. Perhaps we can remove the vault lock before #4192 |
9780406
to
b6875dd
Compare
Thanks! It looks like there is still one method missing this assertion: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One pending non-blocking suggestion (about adding one last lock assertion), but overall this LGTM!
@@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
|
|||
## [Unreleased] | |||
|
|||
### Changed | |||
|
|||
- **BREAKING**: `getAccounts` return type changed from `Promise<string>` to `string` ([#4182](https://github.com/MetaMask/core/pull/4182)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider preserving async
on getAccounts
, to avoid this breaking change. But I am OK with this changing, it's very easy to accommodate this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, it's very easy to accommodate but we'll have to refactor a lot of code already when releasing the complete set of atomic/exclusive operations on clients. Since it's functionally equivalent I'd say that we can come back at this later and remove the async
.
I reverted the async
removal part in 14ab92e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm thinking that if we intend to release #4192 on clients along with these changes, then we'll have to release #4199 too - which includes a bunch of breaking changes already, so changing getAccounts
too should not change much in terms of update complexity since we'll have to release a major version anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! That is why I was ambivalent about this
@@ -1540,6 +1570,8 @@ export class KeyringController extends BaseController< | |||
* when initializing the controller | |||
*/ | |||
async #addQRKeyring(): Promise<QRKeyring> { | |||
this.#assertControllerMutexIsLocked(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense that some methods have steps that need to be performed atomically and exclusively — they can only run as a whole once the mutex is unlocked — but it's interesting to me that we would go the other way and restrict some methods from being performed outside of a mutex (they have to be a step in an atomic method). Why is that? Or am I not understanding this correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason for that is to avoid the addition of methods in the future that could unintentionally call these methods internally without locking: if it's true that all KeyringController mutable operations must be atomic, then should be fair to assert that all mutable methods (including #
methods) must be behind the mutex.
|
||
this.messagingSystem.publish(`${name}:lock`); | ||
this.messagingSystem.publish(`${name}:lock`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that for removeAccount
, the accountRemoved
is emitted after the lock is released, but in this case the lock
event is emitted before the lock is released. Do these need to behave the same or does it matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I moved that out of the lock in removeAccount
for two reasons mainly:
- I initially had a deadlock concern for side effects running while the mutex was still locked, but this has been proven not to be the case
- In the subsequent PR for atomic operations
removeAccount
will be wrapped inthis.#persistOrRollback
, and since the state and vault update will be delegated to the wrapper, this function would emit the:accountRemoved
event before removing the account from the state, so a consumer listening would receive the event while having the controller on its original state - (Related to (2))
setLocked
does not persist to the vault, it just clears the keyrings array, so it will not be wrapped in thethis.#persistOrRollback
function
Good point! I also added it to |
This reverts commit 5d8bd35.
14ab92e
to
952aa96
Compare
## Explanation This PR is an intermediate refactor needed for #4192. Since operations are eventually rolled back, the internal method will not be able to return the last controller state, risking returning a stale one. Moreover, function returns make more sense now. This PR needs these changes to be merged **first**: - [x] #4182 <!-- Thanks for your contribution! Take a moment to answer these questions so that reviewers have the information they need to properly understand your changes: * What is the current state of things and why does it need to change? * What is the solution your changes offer and how does it work? * Are there any changes whose purpose might not obvious to those unfamiliar with the domain? * If your primary goal was to update one package but you found you had to update another one along the way, why did you do so? * If you had to upgrade a dependency, why did you do so? --> ## References <!-- Are there any issues that this pull request is tied to? Are there other links that reviewers should consult to understand these changes better? For example: * Fixes #12345 * Related to #67890 -- * Related to #4192 ## Changelog <!-- If you're making any consumer-facing changes, list those changes here as if you were updating a changelog, using the template below as a guide. (CATEGORY is one of BREAKING, ADDED, CHANGED, DEPRECATED, REMOVED, or FIXED. For security-related issues, follow the Security Advisory process.) Please take care to name the exact pieces of the API you've added or changed (e.g. types, interfaces, functions, or methods). If there are any breaking changes, make sure to offer a solution for consumers to follow once they upgrade to the changes. Finally, if you're only making changes to development scripts or tests, you may replace the template below with "None". --> ### `@metamask/keyring-controller` - **BREAKING**: Change various `KeyringController` methods so they no longer return the controller state - Changed `addNewAccount` return type to `Promise<string>` - Changed `addNewAccountWithoutUpdate` return type to `Promise<string>` - Changed `createNewVaultAndKeychain` return type to `Promise<void>` - Changed `createNewVaultAndRestore` return type to `Promise<void>` - Changed `importAccountWithStrategy` return type to `Promise<string>` - Changed `removeAccount` return type to `Promise<void>` - Changed `setLocked` return type to `Promise<void>` - Changed `submitEncryptionKey` return type to `Promise<void>` - Changed `submitPassword` return type to `Promise<void>` ## Checklist - [ ] I've updated the test suite for new or updated code as appropriate - [ ] I've updated documentation (JSDoc, Markdown, etc.) for new or updated code as appropriate - [ ] I've highlighted breaking changes using the "BREAKING" category above as appropriate --------- Co-authored-by: Elliot Winkler <elliot.winkler@gmail.com>
## Explanation Part of `KeyringController` responsibilies is ensuring each operation is [mutually exclusive](#4182) and [atomic](#4192), updating keyring instances and the vault (or rolling them back) in the same mutex lock. However, the ability of clients to have direct access to a keyring instance represents a loophole, as with the current implementation they don’t have to comply with the rules enforced by the controller: we should provide a way for clients to interact with a keyring instance through safeguards provided by KeyringController. The current behavior is this one: 1. Client obtains a keyring instance through `getKeyringForAccount` 2. Client interacts with the instance 3. Client calls `persistAllKeyrings` We should, instead, have something like this: 1. Client calls a `withKeyring` method, passing a _keyring selector_ and a callback 2. KeyringController selects the keyring instance and calls the callback with it, after locking the controller operation mutex 3. Client interacts with the keyring instance safely, inside the callback 4. KeyringController, after the callback execution, internally updates the vault or rolls back changes in case of error, and then releases the mutex lock ## References <!-- Are there any issues that this pull request is tied to? Are there other links that reviewers should consult to understand these changes better? For example: * Fixes #12345 * Related to #67890 --> * Fixes #4198 * Related to #4192 ## Changelog <!-- If you're making any consumer-facing changes, list those changes here as if you were updating a changelog, using the template below as a guide. (CATEGORY is one of BREAKING, ADDED, CHANGED, DEPRECATED, REMOVED, or FIXED. For security-related issues, follow the Security Advisory process.) Please take care to name the exact pieces of the API you've added or changed (e.g. types, interfaces, functions, or methods). If there are any breaking changes, make sure to offer a solution for consumers to follow once they upgrade to the changes. Finally, if you're only making changes to development scripts or tests, you may replace the template below with "None". --> ### `@metamask/keyring-controller` - **ADDED**: Added `withKeyring` method - **DEPRECATED**: Deprecated `persistAllKeyrings` method - Use `withKeyring` instead ## Checklist - [ ] I've updated the test suite for new or updated code as appropriate - [ ] I've updated documentation (JSDoc, Markdown, etc.) for new or updated code as appropriate - [ ] I've highlighted breaking changes using the "BREAKING" category above as appropriate
Explanation
Since all changes to the state of any keyring held by KeyringController should result in an update to the vault, we need to ensure that what we save into the vault is the result of the latest executed operation. In other words, we need to:
Visually:
To do this, this PR adds the
KeyringController.#withControllerLock
helper function, to be used on any write operation that can potentially mutate the controller state (e.g. any function that callsthis.persistAllKeyrings
at the end).References
Changelog
@metamask/keyring-controller
KeyringController
method calls that change state are now mutually exclusiveChecklist