-
-
Notifications
You must be signed in to change notification settings - Fork 256
Keep network statuses up to date as requests are made #7186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In a future commit we will introduce changes to `network-controller` so that it will keep track of the status of each network as requests are made. These updates to `createServicePolicy` assist with that. See the changelog for a list of changes to the `ServicePolicy` API. Besides the changes listed there, the tests for `createServicePolicy` have been refactored slightly so that it is easier to maintain in the future.
In a future commit we will introduce changes to `network-controller` so
that it will keep track of the status of each network as requests are
made. This commit paves the way for this to happen by redefining the
existing RPC endpoint-related events that NetworkController produces.
Currently, when requests are made through the network clients that
NetworkController exposes, three events are published:
- `NetworkController:rpcEndpointDegraded`
- Published when enough successive retriable errors are encountered
while making a request to an RPC endpoint that the maximum number of
retries is reached.
- `NetworkController:rpcEndpointUnavailable`
- Published when enough successive errors are encountered while making
a request to an RPC endpoint that the underlying circuit breaks.
- `NetworkController:rpcEndpointRequestRetried`
- Published when a request is retried (mainly used for testing).
It's important to note that in the context of the RPC failover feature,
an "RPC endpoint" can actually encompass multiple URLs, so the above
events actually fire for any URL.
While these events are useful for reporting metrics on RPC endpoints, in
order to effectively be able to update the status of a network, we need
events that are less granular and are guaranteed not to fire multiple
times in a row. We also need a new event.
Now the list of events looks like this:
- `NetworkController:rpcEndpointInstanceDegraded`
- The same as `NetworkController:rpcEndpointDegraded` before.
- `NetworkController:rpcEndpointInstanceUnavailable`
- The same as `NetworkController:rpcEndpointInstanceDegraded` before.
- `NetworkController:rpcEndpointInstanceRetried`
- Renamed from `NetworkController:rpcEndpointRequestRetried`.
- `NetworkController:rpcEndpointDegraded`
- Similar to `NetworkController:rpcEndpointInstanceDegraded`, but
won't be published again if the RPC endpoint is already in a
degraded state.
- `NetworkController:rpcEndpointUnavailable`
- Published when all of the circuits underlying all of the URLs for an
RPC endpoint have broken (none of the URLs are available). Won't be
published again if the RPC endpoint is already in an unavailable
state.
- `NetworkController:rpcEndpointAvailable`
- A new event. Published the first time a successful request is made
to one of the URLs for an RPC endpoint, or following a degraded or
unavailable status.
…oller-rpc-endpoint-events
…oller-rpc-endpoint-events
NetworkController has the ability to analyze an RPC endpoint and capture its availability status (that is, whether the controller is able to make successful requests to the endpoint). However, this step occurs automatically only once, when the RPC endpoint's network is switched to, and so any changes in status while the network is being used will not be reflected in state. This problem can be mitigated by periodically calling `lookupNetwork` manually, but this is awkward, and usage of this method should be kept in check so as not to create too many requests. Ideally, the controller should keep track of network status as requests are made. This commit implements this change by hooking into events published by the network client and added in a previous commit.
f91d5e1 to
dabf9bd
Compare
|
@metamaskbot publish-previews |
|
Preview builds have been published. See these instructions for more information about preview builds. Expand for full list of packages and versions. |
81a987c to
2c35648
Compare
Add the same undefined check that exists in onBreak to ensure type safety and prevent publishing events with undefined error values.
Use CockatielFailureReason type instead of generic object type for better type safety and clarity.
Capture the chain status before calling service.request() to prevent spurious onBreak emissions. The onDegraded handler can fire synchronously during service.request() and change the status from Unavailable to Degraded before the catch block checks it, causing incorrect onBreak events when recovery attempts fail.
Revert the previous fix that captured previousStatus before the request. Checking the current status (this.#status) is correct because it accounts for status changes that may occur during the request from other services in the chain. The original check prevents duplicate onBreak emissions when the chain is already Unavailable.
The test 'calls onAvailable when a service becomes degraded by responding slowly, and then recovers' was not actually simulating a slow response, so it was only testing initial availability, not recovery from degraded state. Changes: - Add clock.tick(DEFAULT_DEGRADED_THRESHOLD + 1) to first mock to simulate slow response - Add onDegraded listener to verify degradation actually occurred - Add assertions to verify both onDegraded and onAvailable are called - Add assertion to verify call order (degradation before recovery)
…vents - Remove primaryEndpointUrl from event type definitions for onBreak, onDegraded, and onAvailable - Remove primaryEndpointUrl from event emissions in RpcServiceChain - Update event listener type signatures to not include primaryEndpointUrl - Update all test expectations to remove primaryEndpointUrl from assertions - Update create-network-client.ts to remove primaryEndpointUrl from event handlers - Note: onService* methods still include primaryEndpointUrl as they were not changed
- Remove endpointUrl from onBreak, onDegraded, and onAvailable events in RpcServiceChain - Update type definitions to exclude endpointUrl using ExcludeCockatielEventData - Update event emissions to exclude endpointUrl from chain-level events - Update NetworkController event types to remove endpointUrl from chain-level events (rpcEndpointChainDegraded, rpcEndpointChainAvailable, rpcEndpointChainUnavailable) - Update event handlers in create-network-client.ts to not destructure endpointUrl - Update all test assertions to remove endpointUrl from chain-level event expectations - Remove unused rpcUrl parameters from test functions - Align all chain-level events to not include endpointUrl (consistent with unavailable event)
- Change tertiaryEndpointUrl from 'https://second.endpoint' to 'https://third.endpoint'
…ate-network-status-live
- Document that NetworkController now automatically subscribes to chain-level RPC events - Updates network status metadata in real-time when events are published - Enables real-time network status updates without explicit lookupNetwork calls
| ); | ||
| }); | ||
|
|
||
| it('does not transition the status of a network client from "degraded" the first time a failover is activated but it does not return a 2xx response', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I found this difficult to understand because of the awkward "but" and the double-negative. Here's an attempt at simplifying:
| it('does not transition the status of a network client from "degraded" the first time a failover is activated but it does not return a 2xx response', async () => { | |
| it('does not transition the status of a network client from "degraded" the first time a failover is activated if it returns a non-2xx response', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed here e623211
| ); | ||
| }); | ||
|
|
||
| it('does not transition the status of a network client from "degraded" the first time a failover is activated but requests are slow to complete', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Another attempted simplification:
| it('does not transition the status of a network client from "degraded" the first time a failover is activated but requests are slow to complete', async () => { | |
| it('does not transition the status of a network client from "degraded" the first time a failover is activated if requests are slow to complete', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed here e623211
- Update test descriptions to use 'if' instead of 'but' for better readability - Make test intent clearer with more concise wording
I don't think leaving Plus I'd hesitate to remove it completely without finding a more elegant way of dealing with the EIP1559 compatibility check. Edit: Maybe in a future PR, we could consider updating |
Gudahtt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Yes, that’s correct, we previously discussed this with Elliot. The plan was to remove it in a future breaking change. Currently, it’s only used in Mobile for the banner, which I am in the process of removing and replacing through the new WIP |
Remove the force resolution for @metamask/network-controller and add ts-expect-error comment to handle type mismatch with NetworkStatus enum. The latest version (v27.0.0+) adds NetworkStatus.Degraded enum value which causes type mismatches in our codebase. See: MetaMask/core#7186
Explanation
NetworkController has the ability to analyze an RPC endpoint and capture its availability status (that is, whether the controller is able to make successful requests to the endpoint). However, this step occurs automatically only once, when the RPC endpoint's network is switched to, and so any changes in status while the network is being used will not be reflected in state. This problem can be mitigated by periodically calling
lookupNetworkmanually, but this is awkward, and usage of this method should be kept in check so as not to create too many requests.Ideally, the controller should keep track of network status as requests are made. This commit implements this change by hooking into network client events added in a previous commit.
Note that this PR does not remove
lookupNetworkor touch the existing behavior for this method. So with these changes there are now two strategies at play for updating the network status. This should be okay for the time being, although we should look to refactor this in the future.References
Progresses https://consensyssoftware.atlassian.net/browse/WPC-99.
Checklist
Note
Automatically updates network status metadata in response to chain-level RPC events; introduces Degraded status, refactors metadata update API, and adds comprehensive provider tests.
NetworkController:rpcEndpointChainUnavailable,...ChainDegraded, and...ChainAvailableto updatenetworksMetadata[networkClientId].statusin real time.#updateMetadataForNetworkto accept ametadataobject and updates all call sites (#lookupGivenNetwork,#lookupSelectedNetwork).NetworkStatus.Degradedand clarify enum semantics.tests/NetworkController.provider.test.tscovering status transitions (available ↔ degraded ↔ unavailable), retries, failover, and recovery.withControllerhelper totests/helpers.tsand remove inline duplicate.Written by Cursor Bugbot for commit e623211. This will update automatically on new commits. Configure here.