Skip to content

Conversation

@mcmire
Copy link
Contributor

@mcmire mcmire commented Nov 14, 2025

Explanation

In a future commit we will introduce changes to network-controller so that it will keep track of the status of each network as requests are made. These updates to createServicePolicy assist with that. See the changelog for more.

Besides this, the tests for createServicePolicy have been refactored slightly so that they are easier to maintain in the future.

References

Progresses https://consensyssoftware.atlassian.net/browse/WPC-99.

You can see how these changes will be used in the next PR: #7166

Checklist

  • I've updated the test suite for new or updated code as appropriate
  • I've updated documentation (JSDoc, Markdown, etc.) for new or updated code as appropriate
  • I've communicated my changes to consumers by updating changelogs for packages I've changed, highlighting breaking changes as necessary
  • I've prepared draft pull requests for clients and consumer packages to resolve any breaking changes

Note

Adds getCircuitState, onAvailable, and reset to ServicePolicy, exports Cockatiel types, and updates logic/tests to support availability tracking and circuit state introspection.

  • controller-utils:
    • ServicePolicy API:
      • Add getCircuitState() to expose underlying circuit state.
      • Add onAvailable event for first success and post-recovery success.
      • Add reset() to close the circuit and reset breaker counters.
    • Behavior/Internals:
      • Track availability status and emit onAvailable/onDegraded appropriately.
      • Update onBreak to mark unavailable; wire ConsecutiveBreaker for reset.
    • Exports:
      • Export CockatielEventEmitter and CockatielFailureReason; re-export via index.
    • Tests:
      • Expand/refactor tests to cover onAvailable, getCircuitState, reset, and timing cases; update export snapshot.
    • Docs:
      • Update CHANGELOG.md with new methods and exports.

Written by Cursor Bugbot for commit e597d0b. This will update automatically on new commits. Configure here.

@mcmire mcmire force-pushed the update-create-service-policy branch from 237b741 to 0a9dc27 Compare November 14, 2025 21:40
In a future commit we will introduce changes to `network-controller` so
that it will keep track of the status of each network as requests are
made. These updates to `createServicePolicy` assist with that.

See the changelog for a list of changes to the `ServicePolicy` API.

Besides the changes listed there, the tests for `createServicePolicy`
have been refactored slightly so that it is easier to maintain in the
future.
@mcmire mcmire force-pushed the update-create-service-policy branch from 0a9dc27 to cca8f81 Compare November 14, 2025 21:46
@mcmire mcmire marked this pull request as ready for review November 17, 2025 16:50
@mcmire mcmire requested a review from a team as a code owner November 17, 2025 16:50
Copy link
Member

@Gudahtt Gudahtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I'll continue review tomorrow, the only thing I have left is the tests.


const promise = policy.execute(mockService);
// It's safe not to await this promise; adding it to the promise queue
// is enough to prevent this test from running indefinitely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not sure about this justification. Floating promises are problematic for reasons other than keeping the test process alive / leaking memory. They also can cause test failures in the un-awaited code to appear in later tests.

I see this comment is pre-existing in many tests though, so best not to address this in this PR anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. This is an old pattern. In newer tests, I tend to use this pattern:

policy.onRetry(() => {
  clock.next();
});

When I rewrite this file I'll go through and make sure to use this pattern instead.

@@ -215,6 +229,29 @@ describe('createServicePolicy', () => {

expect(onDegradedListener).not.toHaveBeenCalled();
});

it('does not call onAvailable listeners', async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This behavior is a bit questionable actually. If a request 'fails" but was filtered out by the policy, it means that there is no outage, i.e. that the service is working as intended, it "failed successfully", and is likely available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, it does seem fair to call service degraded if it fails really slowly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can leave this change for a separate PR, if we want to make it, since it would be more of a departure from how it works today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can consider this for another PR.

expect(onDegradedListener).toHaveBeenCalledWith({ error });
});

it('does not call onAvailable listeners', async () => {
Copy link
Member

@Gudahtt Gudahtt Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: These tests seem unnecessarily exhaustive, and this test case is a good example of this.

It's great that we're testing that the default number of retries works as expected, and that a custom number of retries works as expected. But why do we need to test this both with default max consecutive failures and with custom max consecutive failures? And do we really need to test all four combinations of these conditions with a "always failing" policy and a "failing repeatedly until it finally succeeds" policy - do we really expect one to fail if the other succeeds?

This unit test suite seems to be taking the extreme approach of covering every possible combination of conditions. This is certainly the only way to be absolutely certain the code works as expected, but we rarely write tests this exhaustive for good reason: it's really expensive/time-consuming.

It is enough to test expected behaviour once. We don't need to test that behaviour is identical across scenarios that don't impact that behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out. I agree that the exhaustive strategy I took with this file is hurting more than helping at this point. I think I just need to take a fresh approach and try to keep it simple. I'll think more about how I can do that.

describe('wrapping a service that succeeds at first and then fails enough to break the circuit', () => {
describe.each([
{
desc: `the default max number of consecutive failures (${DEFAULT_MAX_CONSECUTIVE_FAILURES})`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: as mentioned in the other comment, I don't think we need to test the effectiveness of this option in every single test scenario

Copy link
Member

@Gudahtt Gudahtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I had some suggestions for the test suite, but most of them were related to pre-existing patterns.

{
desc: 'a custom circuit break duration',
circuitBreakDuration: DEFAULT_CIRCUIT_BREAK_DURATION,
optionsWithCircuitBreakDuration: {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Circuit Break: Default Overrides Custom

The circuitBreakDuration variable is set to DEFAULT_CIRCUIT_BREAK_DURATION instead of 5_000, which contradicts the description "a custom circuit break duration" and will cause the test to use an incorrect wait time (30 minutes instead of 5 seconds) when calling clock.tick().

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I thought I fixed this already. This is true, but the test still works, so I'm inclined to go with it for now.

@mcmire mcmire added this pull request to the merge queue Nov 18, 2025
Merged via the queue into main with commit 04fa775 Nov 18, 2025
485 of 539 checks passed
@mcmire mcmire deleted the update-create-service-policy branch November 18, 2025 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants