Skip to content

[keyring-controller] Atomic Operations#4192

Merged
mikesposito merged 17 commits intomainfrom
feat/keyring-controller-atomic-operations
Apr 26, 2024
Merged

[keyring-controller] Atomic Operations#4192
mikesposito merged 17 commits intomainfrom
feat/keyring-controller-atomic-operations

Conversation

@mikesposito
Copy link
Member

@mikesposito mikesposito commented Apr 19, 2024

Explanation

This PR makes KeyringController operations atomic, with the addition of #persistOrRollback. Each of these operations will be executed mutually exclusively, and before each of them, a snapshot of the current keyrings state is kept and restored in case of errors in the operation execution.

More details on why this is needed here

This PR needs these changes to be merged first:

References

Changelog

@metamask/keyring-controller

  • FIXED: All mutable operations are now atomic: each method will roll back keyring instances in case of errors

Checklist

  • I've updated the test suite for new or updated code as appropriate
  • I've updated documentation (JSDoc, Markdown, etc.) for new or updated code as appropriate
  • I've highlighted breaking changes using the "BREAKING" category above as appropriate

@mikesposito mikesposito force-pushed the feat/keyring-controller-atomic-operations branch 2 times, most recently from 234fcdf to 13dfb62 Compare April 19, 2024 13:00
@mikesposito mikesposito force-pushed the feat/keyring-controller-atomic-operations branch from 77261a2 to 5819589 Compare April 19, 2024 14:11
);
expect(initialSeedWord).toBeDefined();
expect(initialState).toBe(controller.state);
expect(initialState).toStrictEqual(controller.state);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to understand why this started failing with these changes, but this was suggested by jest

@mikesposito
Copy link
Member Author

mikesposito commented Apr 22, 2024

We should remove return this.#getMemStore() from all function returns, since now it would return a stale version of the state (or alternatively, find a solution for those)

Edit: Refactored here #4199

@mikesposito mikesposito force-pushed the feat/keyring-controller-lock-v2 branch from 029f2c0 to ab38237 Compare April 22, 2024 08:43
@mikesposito mikesposito force-pushed the feat/keyring-controller-atomic-operations branch from 81d5e29 to 6d13430 Compare April 22, 2024 08:59
@mikesposito mikesposito changed the base branch from feat/keyring-controller-lock-v2 to refactor/remove-get-mem-state April 22, 2024 09:19
@mikesposito mikesposito force-pushed the feat/keyring-controller-atomic-operations branch 2 times, most recently from 808ec49 to 871c5fe Compare April 22, 2024 09:32
@mikesposito mikesposito changed the title [keyring-controller]: Atomic Operations [keyring-controller] Atomic Operations Apr 22, 2024
Comment on lines +1660 to +1648
for (const serializedKeyring of serializedKeyrings) {
await this.#restoreKeyring(serializedKeyring);
}
Copy link
Member Author

@mikesposito mikesposito Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restoring keyrings without enforcing the order (await Promise.all()) always represented a risk, that most likely never turned into an actual bug for this reason:

While async operations are handed off from the main thread until they are resolved (e.g. net calls, encryption), deserialize operations on our default set of keyrings are marked as "async" but none of them is a real async operation, so, likely, they always get executed on the main thread synchronously (and sequentially) - async/await won't give JS multithreading powers.

We use a for loop instead of Promise.all here to also eliminate the risk, in case one of the keyrings used has a truly async operation in deserialize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, init is also called during keyring restore, but since the HD and Simple keyrings don't have an asynchronous init (they don't have an init method at all), the above statement should remain valid

Comment on lines +2074 to +2088
/**
* Execute the given function after acquiring the controller lock
* and save the keyrings to state after it, or rollback to their
* previous state in case of error.
*
* @param fn - The function to execute.
* @returns The result of the function.
*/
async #persistOrRollback<T>(fn: MutuallyExclusiveCallback<T>): Promise<T> {
return this.#withControllerLock(async ({ releaseLock }) => {
const currentSerializedKeyrings = await this.#getSerializedKeyrings();

try {
const callbackResult = await fn({ releaseLock });
// State is committed only if the operation is successful
await this.#updateVault();

return callbackResult;
} catch (e) {
// Keyrings are cleared and restored to their previous state
await this.#clearKeyrings({ skipStateUpdate: true });
await this.#restoreSerializedKeyrings(currentSerializedKeyrings);

throw e;
}
});
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serializing and restoring (in case it's needed) should be the most reliable way to roll back all keyrings to a previous state.

Copy link
Member Author

@mikesposito mikesposito Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would have been something like:

const backup = this.#keyrings.slice();
// --
this.#keyrings = backup

But this doesn't work for several reasons:

  1. The shallow copies of keyring instances in backup would be different instances than the original ones (which is ok); but if a keyring instance has a class variable with an array of objects, the shallow copy of that instance would have a class variable array with references to the same objects of the original. This is the case with the HD Keyring, so a rollback would not work there because changing the originals would also change the backups
  2. An eventual QRKeyring event subscription from the controller would be left invalid in case of rollback, since the controller would be subscribed to the original instance instead of the shallow copy
  3. Having two living instances of a keyring could create unwanted scenarios (e.g. two keyring instances that communicate with the same device)
  4. Shallow copies could not be initialized properly

Comment on lines 901 to 907
async persistAllKeyrings(): Promise<boolean> {
return this.#withControllerLock(async () => this.#updateVault());
return this.#persistOrRollback(async () => true);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ensures that all vault update operations triggered by clients can also fail safely

@mikesposito mikesposito force-pushed the refactor/remove-get-mem-state branch 2 times, most recently from dc3165d to d9b5134 Compare April 24, 2024 15:56
@mikesposito mikesposito force-pushed the feat/keyring-controller-atomic-operations branch from 871c5fe to 6bc84be Compare April 24, 2024 16:02
Base automatically changed from refactor/remove-get-mem-state to main April 24, 2024 17:13
mikesposito added a commit that referenced this pull request Apr 24, 2024
## Explanation

This PR is an intermediate refactor needed for #4192.

Since operations are eventually rolled back, the internal method will
not be able to return the last controller state, risking returning a
stale one. Moreover, function returns make more sense now.

This PR needs these changes to be merged **first**:
- [x] #4182

<!--
Thanks for your contribution! Take a moment to answer these questions so
that reviewers have the information they need to properly understand
your changes:

* What is the current state of things and why does it need to change?
* What is the solution your changes offer and how does it work?
* Are there any changes whose purpose might not obvious to those
unfamiliar with the domain?
* If your primary goal was to update one package but you found you had
to update another one along the way, why did you do so?
* If you had to upgrade a dependency, why did you do so?
-->

## References

<!--
Are there any issues that this pull request is tied to? Are there other
links that reviewers should consult to understand these changes better?

For example:

* Fixes #12345
* Related to #67890
--
* Related to #4192

## Changelog

<!--
If you're making any consumer-facing changes, list those changes here as
if you were updating a changelog, using the template below as a guide.

(CATEGORY is one of BREAKING, ADDED, CHANGED, DEPRECATED, REMOVED, or
FIXED. For security-related issues, follow the Security Advisory
process.)

Please take care to name the exact pieces of the API you've added or
changed (e.g. types, interfaces, functions, or methods).

If there are any breaking changes, make sure to offer a solution for
consumers to follow once they upgrade to the changes.

Finally, if you're only making changes to development scripts or tests,
you may replace the template below with "None".
-->

### `@metamask/keyring-controller`

- **BREAKING**: Change various `KeyringController` methods so they no
longer return the controller state
  - Changed `addNewAccount` return type to `Promise<string>` 
- Changed `addNewAccountWithoutUpdate` return type to `Promise<string>`
  - Changed `createNewVaultAndKeychain` return type to `Promise<void>`
  - Changed `createNewVaultAndRestore` return type to `Promise<void>`
  - Changed `importAccountWithStrategy` return type to `Promise<string>`
  - Changed `removeAccount` return type to `Promise<void>`
  - Changed `setLocked` return type to `Promise<void>`
  - Changed `submitEncryptionKey` return type to `Promise<void>`
  - Changed `submitPassword` return type to `Promise<void>`

## Checklist

- [ ] I've updated the test suite for new or updated code as appropriate
- [ ] I've updated documentation (JSDoc, Markdown, etc.) for new or
updated code as appropriate
- [ ] I've highlighted breaking changes using the "BREAKING" category
above as appropriate

---------

Co-authored-by: Elliot Winkler <elliot.winkler@gmail.com>
@mikesposito mikesposito force-pushed the feat/keyring-controller-atomic-operations branch from 6bc84be to c3e6a34 Compare April 24, 2024 17:14
@mikesposito mikesposito marked this pull request as ready for review April 24, 2024 17:14
@mikesposito mikesposito requested a review from a team April 24, 2024 17:14
Gudahtt
Gudahtt previously approved these changes Apr 26, 2024
Copy link
Member

@Gudahtt Gudahtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

return await fn({ releaseLock });
} catch (e) {
// Keyrings and password are restored to their previous state
await this.#restoreSerializedKeyrings(currentSerializedKeyrings);
Copy link
Member Author

@mikesposito mikesposito Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm realising now that this probably does not restore the QR keyring event subscription, because that's a specific step of submitPassword and submitEncryptionKey 🤔

We need to move the subscription to the internal restore method, or in #newKeyring

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it makes sense for that to go in the internal restore method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving it in #newKeyring allowed me to also remove some superfluous code, since the subscription was also created in #addQRKeyring: 36347ad

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, right because #newKeyring is called inside of the restore method. Makes sense!

@Gudahtt Gudahtt dismissed their stale review April 26, 2024 11:53

Need to preserve QR keyring events

Comment on lines -1554 to -1553
const accounts = await qrKeyring.getAccounts();
await this.#checkForDuplicate(KeyringTypes.qr, accounts);

this.#keyrings.push(qrKeyring as unknown as EthKeyring<Json>);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also done in #newKeyring

Comment on lines -1980 to -1957
// getAccounts also validates the accounts for some keyrings
await keyring.getAccounts();
this.#keyrings.push(keyring);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also done in #newKeyring

async #newKeyring(
type: string,
data: unknown,
persist = false,
Copy link
Member Author

@mikesposito mikesposito Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

persist is now always true, so we can remove the last argument

@mikesposito mikesposito force-pushed the feat/keyring-controller-atomic-operations branch from 36347ad to 20fd344 Compare April 26, 2024 12:22
@mikesposito mikesposito merged commit 5536420 into main Apr 26, 2024
@mikesposito mikesposito deleted the feat/keyring-controller-atomic-operations branch April 26, 2024 12:39
mikesposito added a commit that referenced this pull request Apr 30, 2024
## Explanation

Part of `KeyringController` responsibilies is ensuring each operation is
[mutually exclusive](#4182) and
[atomic](#4192), updating keyring
instances and the vault (or rolling them back) in the same mutex lock.

However, the ability of clients to have direct access to a keyring
instance represents a loophole, as with the current implementation they
don’t have to comply with the rules enforced by the controller: we
should provide a way for clients to interact with a keyring instance
through safeguards provided by KeyringController.

The current behavior is this one:
1. Client obtains a keyring instance through `getKeyringForAccount`
2. Client interacts with the instance
3. Client calls `persistAllKeyrings` 

We should, instead, have something like this:
1. Client calls a `withKeyring` method, passing a _keyring selector_ and
a callback
2. KeyringController selects the keyring instance and calls the callback
with it, after locking the controller operation mutex
3. Client interacts with the keyring instance safely, inside the
callback
4. KeyringController, after the callback execution, internally updates
the vault or rolls back changes in case of error, and then releases the
mutex lock

## References

<!--
Are there any issues that this pull request is tied to? Are there other
links that reviewers should consult to understand these changes better?

For example:

* Fixes #12345
* Related to #67890
-->
* Fixes #4198 
* Related to #4192 

## Changelog

<!--
If you're making any consumer-facing changes, list those changes here as
if you were updating a changelog, using the template below as a guide.

(CATEGORY is one of BREAKING, ADDED, CHANGED, DEPRECATED, REMOVED, or
FIXED. For security-related issues, follow the Security Advisory
process.)

Please take care to name the exact pieces of the API you've added or
changed (e.g. types, interfaces, functions, or methods).

If there are any breaking changes, make sure to offer a solution for
consumers to follow once they upgrade to the changes.

Finally, if you're only making changes to development scripts or tests,
you may replace the template below with "None".
-->

### `@metamask/keyring-controller`

- **ADDED**: Added `withKeyring` method
- **DEPRECATED**: Deprecated `persistAllKeyrings` method
  - Use `withKeyring` instead

## Checklist

- [ ] I've updated the test suite for new or updated code as appropriate
- [ ] I've updated documentation (JSDoc, Markdown, etc.) for new or
updated code as appropriate
- [ ] I've highlighted breaking changes using the "BREAKING" category
above as appropriate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants