Skip to content

Conversation

@mcmire
Copy link
Contributor

@mcmire mcmire commented Nov 18, 2025

Explanation

NetworkController has the ability to analyze an RPC endpoint and capture its availability status (that is, whether the controller is able to make successful requests to the endpoint). However, this step occurs automatically only once, when the RPC endpoint's network is switched to, and so any changes in status while the network is being used will not be reflected in state. This problem can be mitigated by periodically calling lookupNetwork manually, but this is awkward, and usage of this method should be kept in check so as not to create too many requests.

Ideally, the controller should keep track of network status as requests are made. This commit implements this change by hooking into network client events added in a previous commit.

Note that this PR does not remove lookupNetwork or touch the existing behavior for this method. So with these changes there are now two strategies at play for updating the network status. This should be okay for the time being, although we should look to refactor this in the future.

References

Progresses https://consensyssoftware.atlassian.net/browse/WPC-99.

Checklist

  • I've updated the test suite for new or updated code as appropriate
  • I've updated documentation (JSDoc, Markdown, etc.) for new or updated code as appropriate
  • I've communicated my changes to consumers by updating changelogs for packages I've changed, highlighting breaking changes as necessary
  • I've prepared draft pull requests for clients and consumer packages to resolve any breaking changes

Note

Automatically updates network status metadata in response to chain-level RPC events; introduces Degraded status, refactors metadata update API, and adds comprehensive provider tests.

  • NetworkController:
    • Auto-subscribes to NetworkController:rpcEndpointChainUnavailable, ...ChainDegraded, and ...ChainAvailable to update networksMetadata[networkClientId].status in real time.
    • Refactors #updateMetadataForNetwork to accept a metadata object and updates all call sites (#lookupGivenNetwork, #lookupSelectedNetwork).
  • Constants:
    • Add NetworkStatus.Degraded and clarify enum semantics.
  • Tests:
    • Add tests/NetworkController.provider.test.ts covering status transitions (available ↔ degraded ↔ unavailable), retries, failover, and recovery.
    • Move/export withController helper to tests/helpers.ts and remove inline duplicate.
  • Changelog:
    • Document automatic status updates via chain-level RPC events.

Written by Cursor Bugbot for commit e623211. This will update automatically on new commits. Configure here.

In a future commit we will introduce changes to `network-controller` so
that it will keep track of the status of each network as requests are
made. These updates to `createServicePolicy` assist with that.

See the changelog for a list of changes to the `ServicePolicy` API.

Besides the changes listed there, the tests for `createServicePolicy`
have been refactored slightly so that it is easier to maintain in the
future.
In a future commit we will introduce changes to `network-controller` so
that it will keep track of the status of each network as requests are
made. This commit paves the way for this to happen by redefining the
existing RPC endpoint-related events that NetworkController produces.

Currently, when requests are made through the network clients that
NetworkController exposes, three events are published:

- `NetworkController:rpcEndpointDegraded`
  - Published when enough successive retriable errors are encountered
    while making a request to an RPC endpoint that the maximum number of
    retries is reached.
- `NetworkController:rpcEndpointUnavailable`
  - Published when enough successive errors are encountered while making
    a request to an RPC endpoint that the underlying circuit breaks.
- `NetworkController:rpcEndpointRequestRetried`
  - Published when a request is retried (mainly used for testing).

It's important to note that in the context of the RPC failover feature,
an "RPC endpoint" can actually encompass multiple URLs, so the above
events actually fire for any URL.

While these events are useful for reporting metrics on RPC endpoints, in
order to effectively be able to update the status of a network, we need
events that are less granular and are guaranteed not to fire multiple
times in a row. We also need a new event.

Now the list of events looks like this:

- `NetworkController:rpcEndpointInstanceDegraded`
  - The same as `NetworkController:rpcEndpointDegraded` before.
- `NetworkController:rpcEndpointInstanceUnavailable`
  - The same as `NetworkController:rpcEndpointInstanceDegraded` before.
- `NetworkController:rpcEndpointInstanceRetried`
  - Renamed from `NetworkController:rpcEndpointRequestRetried`.
- `NetworkController:rpcEndpointDegraded`
  - Similar to `NetworkController:rpcEndpointInstanceDegraded`, but
    won't be published again if the RPC endpoint is already in a
    degraded state.
- `NetworkController:rpcEndpointUnavailable`
  - Published when all of the circuits underlying all of the URLs for an
    RPC endpoint have broken (none of the URLs are available). Won't be
    published again if the RPC endpoint is already in an unavailable
    state.
- `NetworkController:rpcEndpointAvailable`
  - A new event. Published the first time a successful request is made
    to one of the URLs for an RPC endpoint, or following a degraded or
    unavailable status.
NetworkController has the ability to analyze an RPC endpoint and capture
its availability status (that is, whether the controller is able to make
successful requests to the endpoint). However, this step occurs
automatically only once, when the RPC endpoint's network is switched to,
and so any changes in status while the network is being used will not
be reflected in state. This problem can be mitigated by periodically
calling `lookupNetwork` manually, but this is awkward, and usage of this
method should be kept in check so as not to create too many requests.

Ideally, the controller should keep track of network status as requests
are made. This commit implements this change by hooking into events
published by the network client and added in a previous commit.
@mcmire mcmire force-pushed the update-network-status-live branch from f91d5e1 to dabf9bd Compare November 18, 2025 21:59
@mcmire
Copy link
Contributor Author

mcmire commented Nov 18, 2025

@metamaskbot publish-previews

@github-actions
Copy link
Contributor

Preview builds have been published. See these instructions for more information about preview builds.

Expand for full list of packages and versions.
{
  "@metamask-previews/account-tree-controller": "3.0.0-preview-dabf9bd8",
  "@metamask-previews/accounts-controller": "34.0.0-preview-dabf9bd8",
  "@metamask-previews/address-book-controller": "7.0.0-preview-dabf9bd8",
  "@metamask-previews/analytics-controller": "0.0.0-preview-dabf9bd8",
  "@metamask-previews/announcement-controller": "8.0.0-preview-dabf9bd8",
  "@metamask-previews/app-metadata-controller": "2.0.0-preview-dabf9bd8",
  "@metamask-previews/approval-controller": "8.0.0-preview-dabf9bd8",
  "@metamask-previews/assets-controllers": "88.0.0-preview-dabf9bd8",
  "@metamask-previews/base-controller": "9.0.0-preview-dabf9bd8",
  "@metamask-previews/bridge-controller": "60.1.0-preview-dabf9bd8",
  "@metamask-previews/bridge-status-controller": "60.1.0-preview-dabf9bd8",
  "@metamask-previews/build-utils": "3.0.4-preview-dabf9bd8",
  "@metamask-previews/chain-agnostic-permission": "1.2.2-preview-dabf9bd8",
  "@metamask-previews/claims-controller": "0.2.0-preview-dabf9bd8",
  "@metamask-previews/composable-controller": "12.0.0-preview-dabf9bd8",
  "@metamask-previews/controller-utils": "11.15.0-preview-dabf9bd8",
  "@metamask-previews/core-backend": "4.0.0-preview-dabf9bd8",
  "@metamask-previews/delegation-controller": "1.0.0-preview-dabf9bd8",
  "@metamask-previews/earn-controller": "10.0.0-preview-dabf9bd8",
  "@metamask-previews/eip-5792-middleware": "2.0.0-preview-dabf9bd8",
  "@metamask-previews/eip-7702-internal-rpc-middleware": "0.1.0-preview-dabf9bd8",
  "@metamask-previews/eip1193-permission-middleware": "1.0.2-preview-dabf9bd8",
  "@metamask-previews/ens-controller": "18.0.0-preview-dabf9bd8",
  "@metamask-previews/error-reporting-service": "3.0.0-preview-dabf9bd8",
  "@metamask-previews/eth-block-tracker": "14.0.0-preview-dabf9bd8",
  "@metamask-previews/eth-json-rpc-middleware": "21.0.0-preview-dabf9bd8",
  "@metamask-previews/eth-json-rpc-provider": "5.0.1-preview-dabf9bd8",
  "@metamask-previews/foundryup": "1.0.1-preview-dabf9bd8",
  "@metamask-previews/gas-fee-controller": "25.0.0-preview-dabf9bd8",
  "@metamask-previews/gator-permissions-controller": "0.4.0-preview-dabf9bd8",
  "@metamask-previews/json-rpc-engine": "10.1.1-preview-dabf9bd8",
  "@metamask-previews/json-rpc-middleware-stream": "8.0.8-preview-dabf9bd8",
  "@metamask-previews/keyring-controller": "24.0.0-preview-dabf9bd8",
  "@metamask-previews/logging-controller": "7.0.0-preview-dabf9bd8",
  "@metamask-previews/message-manager": "14.0.0-preview-dabf9bd8",
  "@metamask-previews/messenger": "0.3.0-preview-dabf9bd8",
  "@metamask-previews/multichain-account-service": "3.0.0-preview-dabf9bd8",
  "@metamask-previews/multichain-api-middleware": "1.2.4-preview-dabf9bd8",
  "@metamask-previews/multichain-network-controller": "2.0.0-preview-dabf9bd8",
  "@metamask-previews/multichain-transactions-controller": "6.0.0-preview-dabf9bd8",
  "@metamask-previews/name-controller": "9.0.0-preview-dabf9bd8",
  "@metamask-previews/network-controller": "25.0.0-preview-dabf9bd8",
  "@metamask-previews/network-enablement-controller": "3.1.0-preview-dabf9bd8",
  "@metamask-previews/notification-services-controller": "20.0.0-preview-dabf9bd8",
  "@metamask-previews/permission-controller": "12.1.0-preview-dabf9bd8",
  "@metamask-previews/permission-log-controller": "5.0.0-preview-dabf9bd8",
  "@metamask-previews/phishing-controller": "15.0.1-preview-dabf9bd8",
  "@metamask-previews/polling-controller": "15.0.0-preview-dabf9bd8",
  "@metamask-previews/preferences-controller": "21.0.0-preview-dabf9bd8",
  "@metamask-previews/profile-sync-controller": "26.0.0-preview-dabf9bd8",
  "@metamask-previews/rate-limit-controller": "7.0.0-preview-dabf9bd8",
  "@metamask-previews/remote-feature-flag-controller": "2.0.0-preview-dabf9bd8",
  "@metamask-previews/sample-controllers": "3.0.0-preview-dabf9bd8",
  "@metamask-previews/seedless-onboarding-controller": "6.1.0-preview-dabf9bd8",
  "@metamask-previews/selected-network-controller": "25.0.0-preview-dabf9bd8",
  "@metamask-previews/shield-controller": "2.1.0-preview-dabf9bd8",
  "@metamask-previews/signature-controller": "36.0.0-preview-dabf9bd8",
  "@metamask-previews/subscription-controller": "4.2.2-preview-dabf9bd8",
  "@metamask-previews/token-search-discovery-controller": "4.0.0-preview-dabf9bd8",
  "@metamask-previews/transaction-controller": "61.3.0-preview-dabf9bd8",
  "@metamask-previews/transaction-pay-controller": "6.0.0-preview-dabf9bd8",
  "@metamask-previews/user-operation-controller": "40.0.0-preview-dabf9bd8"
}

@cryptodev-2s cryptodev-2s force-pushed the update-network-controller-rpc-endpoint-events branch from 81a987c to 2c35648 Compare November 25, 2025 23:00
Add the same undefined check that exists in onBreak to ensure type safety
and prevent publishing events with undefined error values.
Use CockatielFailureReason type instead of generic object type for better
type safety and clarity.
Capture the chain status before calling service.request() to prevent
spurious onBreak emissions. The onDegraded handler can fire synchronously
during service.request() and change the status from Unavailable to Degraded
before the catch block checks it, causing incorrect onBreak events when
recovery attempts fail.
Revert the previous fix that captured previousStatus before the request.
Checking the current status (this.#status) is correct because it accounts
for status changes that may occur during the request from other services
in the chain. The original check prevents duplicate onBreak emissions
when the chain is already Unavailable.
The test 'calls onAvailable when a service becomes degraded by responding
slowly, and then recovers' was not actually simulating a slow response,
so it was only testing initial availability, not recovery from degraded state.

Changes:
- Add clock.tick(DEFAULT_DEGRADED_THRESHOLD + 1) to first mock to simulate slow response
- Add onDegraded listener to verify degradation actually occurred
- Add assertions to verify both onDegraded and onAvailable are called
- Add assertion to verify call order (degradation before recovery)
…vents

- Remove primaryEndpointUrl from event type definitions for onBreak, onDegraded, and onAvailable
- Remove primaryEndpointUrl from event emissions in RpcServiceChain
- Update event listener type signatures to not include primaryEndpointUrl
- Update all test expectations to remove primaryEndpointUrl from assertions
- Update create-network-client.ts to remove primaryEndpointUrl from event handlers
- Note: onService* methods still include primaryEndpointUrl as they were not changed
- Remove endpointUrl from onBreak, onDegraded, and onAvailable events in RpcServiceChain
- Update type definitions to exclude endpointUrl using ExcludeCockatielEventData
- Update event emissions to exclude endpointUrl from chain-level events
- Update NetworkController event types to remove endpointUrl from chain-level events (rpcEndpointChainDegraded, rpcEndpointChainAvailable, rpcEndpointChainUnavailable)
- Update event handlers in create-network-client.ts to not destructure endpointUrl
- Update all test assertions to remove endpointUrl from chain-level event expectations
- Remove unused rpcUrl parameters from test functions
- Align all chain-level events to not include endpointUrl (consistent with unavailable event)
Base automatically changed from update-network-controller-rpc-endpoint-events to main November 27, 2025 14:35
@cryptodev-2s cryptodev-2s marked this pull request as ready for review November 27, 2025 19:10
@cryptodev-2s cryptodev-2s requested review from a team as code owners November 27, 2025 19:10
@cryptodev-2s cryptodev-2s requested a review from Gudahtt November 27, 2025 19:47
- Document that NetworkController now automatically subscribes to chain-level RPC events
- Updates network status metadata in real-time when events are published
- Enables real-time network status updates without explicit lookupNetwork calls
);
});

it('does not transition the status of a network client from "degraded" the first time a failover is activated but it does not return a 2xx response', async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I found this difficult to understand because of the awkward "but" and the double-negative. Here's an attempt at simplifying:

Suggested change
it('does not transition the status of a network client from "degraded" the first time a failover is activated but it does not return a 2xx response', async () => {
it('does not transition the status of a network client from "degraded" the first time a failover is activated if it returns a non-2xx response', async () => {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed here e623211

);
});

it('does not transition the status of a network client from "degraded" the first time a failover is activated but requests are slow to complete', async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Another attempted simplification:

Suggested change
it('does not transition the status of a network client from "degraded" the first time a failover is activated but requests are slow to complete', async () => {
it('does not transition the status of a network client from "degraded" the first time a failover is activated if requests are slow to complete', async () => {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed here e623211

- Update test descriptions to use 'if' instead of 'but' for better readability
- Make test intent clearer with more concise wording
@Gudahtt
Copy link
Member

Gudahtt commented Nov 27, 2025

Note that this PR does not remove lookupNetwork or touch the existing behavior for this method. So with these changes there are now two strategies at play for updating the network status. This should be okay for the time being, although we should look to refactor this in the future.

I don't think leaving lookupNetwork here helps us. It just makes the network status harder to understand. But still, dealing with it in a later PR sounds good to me. It doesn't seem like it would make the status less accurate than it is today at least.

Plus I'd hesitate to remove it completely without finding a more elegant way of dealing with the EIP1559 compatibility check.

Edit: Maybe in a future PR, we could consider updating lookupNetwork to just update the EIP1559 compatibility, and rely on the network client to indirectly also update the network status if appropriate.

Copy link
Member

@Gudahtt Gudahtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cryptodev-2s cryptodev-2s added this pull request to the merge queue Nov 27, 2025
@cryptodev-2s
Copy link
Contributor

Note that this PR does not remove lookupNetwork or touch the existing behavior for this method. So with these changes there are now two strategies at play for updating the network status. This should be okay for the time being, although we should look to refactor this in the future.

I don't think leaving lookupNetwork here helps us. It just makes the network status harder to understand. But still, dealing with it in a later PR sounds good to me. It doesn't seem like it would make the status less accurate than it is today at least.

Plus I'd hesitate to remove it completely without finding a more elegant way of dealing with the EIP1559 compatibility check.

Yes, that’s correct, we previously discussed this with Elliot. The plan was to remove it in a future breaking change. Currently, it’s only used in Mobile for the banner, which I am in the process of removing and replacing through the new WIP NetworkStatusController controller. Once that is complete, it will no longer be used. On Extension, it is still present but has no meaningful impact.

Merged via the queue into main with commit 5d24e81 Nov 27, 2025
277 checks passed
@cryptodev-2s cryptodev-2s deleted the update-network-status-live branch November 27, 2025 20:10
cryptodev-2s added a commit to MetaMask/metamask-mobile that referenced this pull request Nov 28, 2025
Remove the force resolution for @metamask/network-controller and add
ts-expect-error comment to handle type mismatch with NetworkStatus enum.

The latest version (v27.0.0+) adds NetworkStatus.Degraded enum value
which causes type mismatches in our codebase.

See: MetaMask/core#7186
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants