Keep network statuses up to date as requests are made #7186

mcmire · 2025-11-18T21:55:52Z

Explanation

NetworkController has the ability to analyze an RPC endpoint and capture its availability status (that is, whether the controller is able to make successful requests to the endpoint). However, this step occurs automatically only once, when the RPC endpoint's network is switched to, and so any changes in status while the network is being used will not be reflected in state. This problem can be mitigated by periodically calling lookupNetwork manually, but this is awkward, and usage of this method should be kept in check so as not to create too many requests.

Ideally, the controller should keep track of network status as requests are made. This commit implements this change by hooking into network client events added in a previous commit.

Note that this PR does not remove lookupNetwork or touch the existing behavior for this method. So with these changes there are now two strategies at play for updating the network status. This should be okay for the time being, although we should look to refactor this in the future.

References

Progresses https://consensyssoftware.atlassian.net/browse/WPC-99.

Checklist

I've updated the test suite for new or updated code as appropriate
I've updated documentation (JSDoc, Markdown, etc.) for new or updated code as appropriate
I've communicated my changes to consumers by updating changelogs for packages I've changed, highlighting breaking changes as necessary
I've prepared draft pull requests for clients and consumer packages to resolve any breaking changes

Note

Automatically updates network status metadata in response to chain-level RPC events; introduces Degraded status, refactors metadata update API, and adds comprehensive provider tests.

NetworkController:
- Auto-subscribes to NetworkController:rpcEndpointChainUnavailable, ...ChainDegraded, and ...ChainAvailable to update networksMetadata[networkClientId].status in real time.
- Refactors #updateMetadataForNetwork to accept a metadata object and updates all call sites (#lookupGivenNetwork, #lookupSelectedNetwork).
Constants:
- Add NetworkStatus.Degraded and clarify enum semantics.
Tests:
- Add tests/NetworkController.provider.test.ts covering status transitions (available ↔ degraded ↔ unavailable), retries, failover, and recovery.
- Move/export withController helper to tests/helpers.ts and remove inline duplicate.
Changelog:
- Document automatic status updates via chain-level RPC events.

^{Written by Cursor Bugbot for commit e623211. This will update automatically on new commits. Configure here.}

In a future commit we will introduce changes to `network-controller` so that it will keep track of the status of each network as requests are made. These updates to `createServicePolicy` assist with that. See the changelog for a list of changes to the `ServicePolicy` API. Besides the changes listed there, the tests for `createServicePolicy` have been refactored slightly so that it is easier to maintain in the future.

In a future commit we will introduce changes to `network-controller` so that it will keep track of the status of each network as requests are made. This commit paves the way for this to happen by redefining the existing RPC endpoint-related events that NetworkController produces. Currently, when requests are made through the network clients that NetworkController exposes, three events are published: - `NetworkController:rpcEndpointDegraded` - Published when enough successive retriable errors are encountered while making a request to an RPC endpoint that the maximum number of retries is reached. - `NetworkController:rpcEndpointUnavailable` - Published when enough successive errors are encountered while making a request to an RPC endpoint that the underlying circuit breaks. - `NetworkController:rpcEndpointRequestRetried` - Published when a request is retried (mainly used for testing). It's important to note that in the context of the RPC failover feature, an "RPC endpoint" can actually encompass multiple URLs, so the above events actually fire for any URL. While these events are useful for reporting metrics on RPC endpoints, in order to effectively be able to update the status of a network, we need events that are less granular and are guaranteed not to fire multiple times in a row. We also need a new event. Now the list of events looks like this: - `NetworkController:rpcEndpointInstanceDegraded` - The same as `NetworkController:rpcEndpointDegraded` before. - `NetworkController:rpcEndpointInstanceUnavailable` - The same as `NetworkController:rpcEndpointInstanceDegraded` before. - `NetworkController:rpcEndpointInstanceRetried` - Renamed from `NetworkController:rpcEndpointRequestRetried`. - `NetworkController:rpcEndpointDegraded` - Similar to `NetworkController:rpcEndpointInstanceDegraded`, but won't be published again if the RPC endpoint is already in a degraded state. - `NetworkController:rpcEndpointUnavailable` - Published when all of the circuits underlying all of the URLs for an RPC endpoint have broken (none of the URLs are available). Won't be published again if the RPC endpoint is already in an unavailable state. - `NetworkController:rpcEndpointAvailable` - A new event. Published the first time a successful request is made to one of the URLs for an RPC endpoint, or following a degraded or unavailable status.

…oller-rpc-endpoint-events

NetworkController has the ability to analyze an RPC endpoint and capture its availability status (that is, whether the controller is able to make successful requests to the endpoint). However, this step occurs automatically only once, when the RPC endpoint's network is switched to, and so any changes in status while the network is being used will not be reflected in state. This problem can be mitigated by periodically calling `lookupNetwork` manually, but this is awkward, and usage of this method should be kept in check so as not to create too many requests. Ideally, the controller should keep track of network status as requests are made. This commit implements this change by hooking into events published by the network client and added in a previous commit.

mcmire · 2025-11-18T22:01:31Z

@metamaskbot publish-previews

github-actions · 2025-11-18T22:05:58Z

Preview builds have been published. See these instructions for more information about preview builds.

Expand for full list of packages and versions.

{
  "@metamask-previews/account-tree-controller": "3.0.0-preview-dabf9bd8",
  "@metamask-previews/accounts-controller": "34.0.0-preview-dabf9bd8",
  "@metamask-previews/address-book-controller": "7.0.0-preview-dabf9bd8",
  "@metamask-previews/analytics-controller": "0.0.0-preview-dabf9bd8",
  "@metamask-previews/announcement-controller": "8.0.0-preview-dabf9bd8",
  "@metamask-previews/app-metadata-controller": "2.0.0-preview-dabf9bd8",
  "@metamask-previews/approval-controller": "8.0.0-preview-dabf9bd8",
  "@metamask-previews/assets-controllers": "88.0.0-preview-dabf9bd8",
  "@metamask-previews/base-controller": "9.0.0-preview-dabf9bd8",
  "@metamask-previews/bridge-controller": "60.1.0-preview-dabf9bd8",
  "@metamask-previews/bridge-status-controller": "60.1.0-preview-dabf9bd8",
  "@metamask-previews/build-utils": "3.0.4-preview-dabf9bd8",
  "@metamask-previews/chain-agnostic-permission": "1.2.2-preview-dabf9bd8",
  "@metamask-previews/claims-controller": "0.2.0-preview-dabf9bd8",
  "@metamask-previews/composable-controller": "12.0.0-preview-dabf9bd8",
  "@metamask-previews/controller-utils": "11.15.0-preview-dabf9bd8",
  "@metamask-previews/core-backend": "4.0.0-preview-dabf9bd8",
  "@metamask-previews/delegation-controller": "1.0.0-preview-dabf9bd8",
  "@metamask-previews/earn-controller": "10.0.0-preview-dabf9bd8",
  "@metamask-previews/eip-5792-middleware": "2.0.0-preview-dabf9bd8",
  "@metamask-previews/eip-7702-internal-rpc-middleware": "0.1.0-preview-dabf9bd8",
  "@metamask-previews/eip1193-permission-middleware": "1.0.2-preview-dabf9bd8",
  "@metamask-previews/ens-controller": "18.0.0-preview-dabf9bd8",
  "@metamask-previews/error-reporting-service": "3.0.0-preview-dabf9bd8",
  "@metamask-previews/eth-block-tracker": "14.0.0-preview-dabf9bd8",
  "@metamask-previews/eth-json-rpc-middleware": "21.0.0-preview-dabf9bd8",
  "@metamask-previews/eth-json-rpc-provider": "5.0.1-preview-dabf9bd8",
  "@metamask-previews/foundryup": "1.0.1-preview-dabf9bd8",
  "@metamask-previews/gas-fee-controller": "25.0.0-preview-dabf9bd8",
  "@metamask-previews/gator-permissions-controller": "0.4.0-preview-dabf9bd8",
  "@metamask-previews/json-rpc-engine": "10.1.1-preview-dabf9bd8",
  "@metamask-previews/json-rpc-middleware-stream": "8.0.8-preview-dabf9bd8",
  "@metamask-previews/keyring-controller": "24.0.0-preview-dabf9bd8",
  "@metamask-previews/logging-controller": "7.0.0-preview-dabf9bd8",
  "@metamask-previews/message-manager": "14.0.0-preview-dabf9bd8",
  "@metamask-previews/messenger": "0.3.0-preview-dabf9bd8",
  "@metamask-previews/multichain-account-service": "3.0.0-preview-dabf9bd8",
  "@metamask-previews/multichain-api-middleware": "1.2.4-preview-dabf9bd8",
  "@metamask-previews/multichain-network-controller": "2.0.0-preview-dabf9bd8",
  "@metamask-previews/multichain-transactions-controller": "6.0.0-preview-dabf9bd8",
  "@metamask-previews/name-controller": "9.0.0-preview-dabf9bd8",
  "@metamask-previews/network-controller": "25.0.0-preview-dabf9bd8",
  "@metamask-previews/network-enablement-controller": "3.1.0-preview-dabf9bd8",
  "@metamask-previews/notification-services-controller": "20.0.0-preview-dabf9bd8",
  "@metamask-previews/permission-controller": "12.1.0-preview-dabf9bd8",
  "@metamask-previews/permission-log-controller": "5.0.0-preview-dabf9bd8",
  "@metamask-previews/phishing-controller": "15.0.1-preview-dabf9bd8",
  "@metamask-previews/polling-controller": "15.0.0-preview-dabf9bd8",
  "@metamask-previews/preferences-controller": "21.0.0-preview-dabf9bd8",
  "@metamask-previews/profile-sync-controller": "26.0.0-preview-dabf9bd8",
  "@metamask-previews/rate-limit-controller": "7.0.0-preview-dabf9bd8",
  "@metamask-previews/remote-feature-flag-controller": "2.0.0-preview-dabf9bd8",
  "@metamask-previews/sample-controllers": "3.0.0-preview-dabf9bd8",
  "@metamask-previews/seedless-onboarding-controller": "6.1.0-preview-dabf9bd8",
  "@metamask-previews/selected-network-controller": "25.0.0-preview-dabf9bd8",
  "@metamask-previews/shield-controller": "2.1.0-preview-dabf9bd8",
  "@metamask-previews/signature-controller": "36.0.0-preview-dabf9bd8",
  "@metamask-previews/subscription-controller": "4.2.2-preview-dabf9bd8",
  "@metamask-previews/token-search-discovery-controller": "4.0.0-preview-dabf9bd8",
  "@metamask-previews/transaction-controller": "61.3.0-preview-dabf9bd8",
  "@metamask-previews/transaction-pay-controller": "6.0.0-preview-dabf9bd8",
  "@metamask-previews/user-operation-controller": "40.0.0-preview-dabf9bd8"
}

…feedback

…event

Add the same undefined check that exists in onBreak to ensure type safety and prevent publishing events with undefined error values.

Use CockatielFailureReason type instead of generic object type for better type safety and clarity.

Capture the chain status before calling service.request() to prevent spurious onBreak emissions. The onDegraded handler can fire synchronously during service.request() and change the status from Unavailable to Degraded before the catch block checks it, causing incorrect onBreak events when recovery attempts fail.

Revert the previous fix that captured previousStatus before the request. Checking the current status (this.#status) is correct because it accounts for status changes that may occur during the request from other services in the chain. The original check prevents duplicate onBreak emissions when the chain is already Unavailable.

The test 'calls onAvailable when a service becomes degraded by responding slowly, and then recovers' was not actually simulating a slow response, so it was only testing initial availability, not recovery from degraded state. Changes: - Add clock.tick(DEFAULT_DEGRADED_THRESHOLD + 1) to first mock to simulate slow response - Add onDegraded listener to verify degradation actually occurred - Add assertions to verify both onDegraded and onAvailable are called - Add assertion to verify call order (degradation before recovery)

…vents - Remove primaryEndpointUrl from event type definitions for onBreak, onDegraded, and onAvailable - Remove primaryEndpointUrl from event emissions in RpcServiceChain - Update event listener type signatures to not include primaryEndpointUrl - Update all test expectations to remove primaryEndpointUrl from assertions - Update create-network-client.ts to remove primaryEndpointUrl from event handlers - Note: onService* methods still include primaryEndpointUrl as they were not changed

- Remove endpointUrl from onBreak, onDegraded, and onAvailable events in RpcServiceChain - Update type definitions to exclude endpointUrl using ExcludeCockatielEventData - Update event emissions to exclude endpointUrl from chain-level events - Update NetworkController event types to remove endpointUrl from chain-level events (rpcEndpointChainDegraded, rpcEndpointChainAvailable, rpcEndpointChainUnavailable) - Update event handlers in create-network-client.ts to not destructure endpointUrl - Update all test assertions to remove endpointUrl from chain-level event expectations - Remove unused rpcUrl parameters from test functions - Align all chain-level events to not include endpointUrl (consistent with unavailable event)

- Change tertiaryEndpointUrl from 'https://second.endpoint' to 'https://third.endpoint'

…ate-network-status-live

packages/network-controller/src/NetworkController.ts

- Document that NetworkController now automatically subscribes to chain-level RPC events - Updates network status metadata in real-time when events are published - Enables real-time network status updates without explicit lookupNetwork calls

Gudahtt · 2025-11-27T19:57:44Z

packages/network-controller/tests/NetworkController.provider.test.ts

+    );
+  });
+
+  it('does not transition the status of a network client from "degraded" the first time a failover is activated but it does not return a 2xx response', async () => {


Nit: I found this difficult to understand because of the awkward "but" and the double-negative. Here's an attempt at simplifying:

Suggested change

it('does not transition the status of a network client from "degraded" the first time a failover is activated but it does not return a 2xx response', async () => {

it('does not transition the status of a network client from "degraded" the first time a failover is activated if it returns a non-2xx response', async () => {

fixed here e623211

Gudahtt · 2025-11-27T19:58:06Z

packages/network-controller/tests/NetworkController.provider.test.ts

+    );
+  });
+
+  it('does not transition the status of a network client from "degraded" the first time a failover is activated but requests are slow to complete', async () => {


Nit: Another attempted simplification:

Suggested change

it('does not transition the status of a network client from "degraded" the first time a failover is activated but requests are slow to complete', async () => {

it('does not transition the status of a network client from "degraded" the first time a failover is activated if requests are slow to complete', async () => {

fixed here e623211

- Update test descriptions to use 'if' instead of 'but' for better readability - Make test intent clearer with more concise wording

Gudahtt · 2025-11-27T20:02:53Z

Note that this PR does not remove lookupNetwork or touch the existing behavior for this method. So with these changes there are now two strategies at play for updating the network status. This should be okay for the time being, although we should look to refactor this in the future.

I don't think leaving lookupNetwork here helps us. It just makes the network status harder to understand. But still, dealing with it in a later PR sounds good to me. It doesn't seem like it would make the status less accurate than it is today at least.

Plus I'd hesitate to remove it completely without finding a more elegant way of dealing with the EIP1559 compatibility check.

Edit: Maybe in a future PR, we could consider updating lookupNetwork to just update the EIP1559 compatibility, and rely on the network client to indirectly also update the network status if appropriate.

Gudahtt

LGTM!

cryptodev-2s · 2025-11-27T20:07:14Z

Note that this PR does not remove lookupNetwork or touch the existing behavior for this method. So with these changes there are now two strategies at play for updating the network status. This should be okay for the time being, although we should look to refactor this in the future.

I don't think leaving lookupNetwork here helps us. It just makes the network status harder to understand. But still, dealing with it in a later PR sounds good to me. It doesn't seem like it would make the status less accurate than it is today at least.

Plus I'd hesitate to remove it completely without finding a more elegant way of dealing with the EIP1559 compatibility check.

Yes, that’s correct, we previously discussed this with Elliot. The plan was to remove it in a future breaking change. Currently, it’s only used in Mobile for the banner, which I am in the process of removing and replacing through the new WIP NetworkStatusController controller. Once that is complete, it will no longer be used. On Extension, it is still present but has no meaningful impact.

Remove the force resolution for @metamask/network-controller and add ts-expect-error comment to handle type mismatch with NetworkStatus enum. The latest version (v27.0.0+) adds NetworkStatus.Degraded enum value which causes type mismatches in our codebase. See: MetaMask/core#7186

mcmire added 21 commits November 14, 2025 14:45

Fix tests

6a3cff1

Add more tests

c08f398

No need for getLastInnerFailureReason

5e0e3e1

Fix an issue with onAvailable

e2eba7a

Reduce the diff

246b2b5

Fix tests

199bb79

Use a quasi-enum for the availability status

ff6d832

Fix test

fa66813

Remove this comment

0da865b

Add 'degraded' status

b3909af

Use similar terminology as in createServicePolicy

6b628d7

Merge branch 'update-create-service-policy' into update-network-contr…

4a3985a

…oller-rpc-endpoint-events

Adjust createServicePolicy as well

2d38446

Adjust createServicePolicy as well

3d8da80

Merge branch 'update-create-service-policy' into update-network-contr…

7860897

…oller-rpc-endpoint-events

Update some of the terminology

f67839a

Update more of the terminology

110cb0b

RpcEndpointUnvailable -> RpcEndpointUnavailable

b16597a

mcmire force-pushed the update-network-status-live branch from f91d5e1 to dabf9bd Compare November 18, 2025 21:59

mcmire added 6 commits November 19, 2025 11:41

Merge branch 'main' into update-network-controller-rpc-endpoint-events

137c3d7

Address Cursor comment

cb498e3

Make the RPC endpoint event tests more realistic, and address Cursor …

cdd8491

…feedback

Reword JSDoc for events, remove endpointUrl from chainUnavailable event

73017d1

Merge branch 'main' into update-network-controller-rpc-endpoint-events

7de4def

Use the primaryEndpointUrl from the Cockatiel event in the messenger …

026ce00

…event

cryptodev-2s force-pushed the update-network-controller-rpc-endpoint-events branch from 81a987c to 2c35648 Compare November 25, 2025 23:00

cryptodev-2s added 9 commits November 26, 2025 17:03

docs: clarify event payload changes in network-controller changelog

30a9486

fix: add undefined check for error in onServiceBreak handler

4e7a37a

Add the same undefined check that exists in onBreak to ensure type safety and prevent publishing events with undefined error values.

refactor: improve type safety for getError function

61bdbbc

Use CockatielFailureReason type instead of generic object type for better type safety and clarity.

Fix typo: correct tertiary endpoint URL in test

916a0e2

- Change tertiaryEndpointUrl from 'https://second.endpoint' to 'https://third.endpoint'

Base automatically changed from update-network-controller-rpc-endpoint-events to main November 27, 2025 14:35

Merge branch 'update-network-controller-rpc-endpoint-events' into upd…

47a5270

…ate-network-status-live

cryptodev-2s marked this pull request as ready for review November 27, 2025 19:10

cryptodev-2s requested review from a team as code owners November 27, 2025 19:10

Merge branch 'main' into update-network-status-live

4b56af2

cryptodev-2s requested a review from Gudahtt November 27, 2025 19:47

Gudahtt reviewed Nov 27, 2025

View reviewed changes

packages/network-controller/src/NetworkController.ts Show resolved Hide resolved

Gudahtt reviewed Nov 27, 2025

View reviewed changes

test: improve test descriptions for clarity

e623211

- Update test descriptions to use 'if' instead of 'but' for better readability - Make test intent clearer with more concise wording

cryptodev-2s enabled auto-merge November 27, 2025 20:02

Gudahtt approved these changes Nov 27, 2025

View reviewed changes

cryptodev-2s added this pull request to the merge queue Nov 27, 2025

Merged via the queue into main with commit 5d24e81 Nov 27, 2025
277 checks passed

cryptodev-2s deleted the update-network-status-live branch November 27, 2025 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Keep network statuses up to date as requests are made #7186

Keep network statuses up to date as requests are made #7186

mcmire commented Nov 18, 2025 •

edited by cursor bot

Loading

Uh oh!

mcmire commented Nov 18, 2025

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

Uh oh!

Gudahtt Nov 27, 2025

Uh oh!

cryptodev-2s Nov 27, 2025

Uh oh!

Gudahtt Nov 27, 2025

Uh oh!

cryptodev-2s Nov 27, 2025

Uh oh!

Gudahtt commented Nov 27, 2025 •

edited

Loading

Uh oh!

Gudahtt left a comment

Uh oh!

cryptodev-2s commented Nov 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	it('does not transition the status of a network client from "degraded" the first time a failover is activated but it does not return a 2xx response', async () => {
	it('does not transition the status of a network client from "degraded" the first time a failover is activated if it returns a non-2xx response', async () => {

Uh oh!

Keep network statuses up to date as requests are made #7186

Keep network statuses up to date as requests are made #7186

Conversation

mcmire commented Nov 18, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Explanation

References

Checklist

Uh oh!

mcmire commented Nov 18, 2025

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

Uh oh!

Gudahtt Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

cryptodev-2s Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Gudahtt Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

cryptodev-2s Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Gudahtt commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gudahtt left a comment

Choose a reason for hiding this comment

Uh oh!

cryptodev-2s commented Nov 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mcmire commented Nov 18, 2025 •

edited by cursor bot

Loading

Gudahtt commented Nov 27, 2025 •

edited

Loading