Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure Zebra uses poll_ready and Buffer reservations correctly #1593

Closed
7 of 9 tasks
teor2345 opened this issue Jan 14, 2021 · 3 comments · Fixed by #1965
Closed
7 of 9 tasks

Make sure Zebra uses poll_ready and Buffer reservations correctly #1593

teor2345 opened this issue Jan 14, 2021 · 3 comments · Fixed by #1965
Assignees
Labels
A-docs Area: Documentation C-design Category: Software design work

Comments

@teor2345
Copy link
Contributor

teor2345 commented Jan 14, 2021

Motivation

The constraints imposed by the tower::Buffer and tower::Batch implementations are:

  1. poll_ready must be called at least once for each call
  2. Once we've reserved a buffer slot, we always get Poll::Ready from a buffer, regardless of the current readiness of the buffer or its underlying service
  3. The Buffer/Batch capacity limits the number of concurrently waiting tasks. Once this limit is reached, further tasks will block, awaiting a free reservation.
  4. Some tasks can depend on other tasks before they resolve. (For example: block validation.) If there are task dependencies, the Buffer/Batch capacity must be larger than the maximum number of concurrently waiting tasks, or Zebra could deadlock (hang).
  5. To avoid deadlocks, we should acquire slots in a consistent order, using the ready! macro, rather than polling them all simultaneously

These constraints are mitigated by:

  • the timeouts on network messages, block downloads and block verification, which restart verification if it hangs
  • the Buffer/Batch reservation release when response future is returned by the buffer/batch, even if the future doesn't complete
    • in general, we should move as much work into futures as possible, unless the design requires sequential calls
  • larger Buffer/Batch bounds

Tasks

Here's a dealock avoidance example from #1735:

        // We acquire checkpoint readiness before block readiness, to avoid an unlikely
        // hang during the checkpoint to block verifier transition. If the checkpoint and
        // block verifiers are contending for the same buffer/batch, we want the checkpoint
        // verifier to win, so that checkpoint verification completes, and block verification
        // can start. (Buffers and batches have multiple slots, so this contention is unlikely.)
        use futures::ready;
        // The chain verifier holds one slot in each verifier, for each concurrent task.
        // Therefore, any shared buffers or batches polled by these verifiers should double
        // their bounds. (For example, the state service buffer.)
        ready!(self
            .checkpoint
            .poll_ready(cx)
            .map_err(VerifyChainError::Checkpoint))?;
        ready!(self.block.poll_ready(cx).map_err(VerifyChainError::Block))?;
        Poll::Ready(Ok(()))

Outdated

The rest of this ticket is outdated, but might be useful for writing the RFC.

Obsolete Tasks

  • Review existing Service implementations and calls, and prioritise these fixes. Check for code that:
    • uses Service or ServiceExt
    • calls poll_ready or call on a Service
  • Open separate tickets for each crate/Service/area of work
  • Turn this ticket text into a Zebra RFC, using examples from Fix poll_ready usage in ChainVerifier #1700 (obsoleted)

Motivation

Symptoms

Zebra sometimes hangs (#1435) for long periods of time. These hangs seem to happen in the peer set, but they could be caused by bugs elsewhere in Zebra.

Analysis

Zebra uses Tower services extensively. The tower::Buffer documentation says that every poll_ready and call should happen in a matched pair, or the Buffer will run out of reservation slots and hang:

When Buffer's implementation of poll_ready returns Poll::Ready, it reserves a slot in the channel for the forthcoming call. However, if this call doesn't arrive, this reserved slot may be held up for a long time.
https://docs.rs/tower/0.4.3/tower/buffer/struct.Buffer.html#method.new

But Zebra often calls Buffer::poll_ready (or ready_and), gets an Ok result, then never uses the Buffer reservation in a Buffer::call.

This bug can also be transitive:

  1. an Inner Zebra service correctly follows every Buffer::poll_ready with a Buffer::call
  2. an Outer Zebra service uses Inner::poll_ready, but doesn't follow it with an Inner::call
  3. when combined, the services have a bug, because an Outer::poll_ready reserves slots via the Inner service's Buffer::poll_ready, but never follows it with an Inner::call, so the Buffer slot is reserved pending the Inner::call

This issue also affects Zebra's own tower-batch, which has a very similar implementation to tower::Buffer. See #1668 for details.

Requirements

Therefore, all Zebra services must follow poll_ready with call on all of the services they use, so that any Buffers or Batches lower down in the service stack are used correctly. (Since Zebra or its dependencies may add Buffers or Batches in future changes, this requirement applies even if they are not currently in the service stack.)

As of January 2021, Zebra often breaks these requirements using the following anti-patterns:

  • poll_ready:
    • check if all services we might use for any request are ready, using their poll_readys
    • if only some services are ready, we add a reservation to their Buffers, but return Poll::Pending
    • later, callers that received Poll::Pending retry our poll_ready
    • we add more Buffer reservations, without any calls to use those reservations, filling up the Buffers of the ready services
  • call:
    • only call the services used for this specific kind of request
    • if any services are unused, their Buffers fill up with reservations

Design

Overall Summary

Zebra uses 3 different design patterns for service readiness:

  • simple wrapper services forward poll_ready and call unconditionally
  • complex services use ServiceExt::oneshot to make sure poll_ready and call happen together
    • other code which calls into services also uses the oneshot pattern
  • if a complex service needs to transmit backpressure, it uses a ReadyCache

Simple Wrapper Services

Simple wrapper services can forward poll_ready and call when a Wrapper service:

  • only ever calls one Inner service, and
  • calls the Inner service:
    • immediately and unconditionally,
    • for every request, regardless of its content, and
    • for every request type.

We can't use this pattern if Inner service call happens:

  • in a returned future (which might be dropped or delayed), or
  • after code that returns early on errors - check for the ? operator.

Implementation:

  • Wrapper::poll_ready:
    • first, check any readiness conditions for the Wrapper service, returning immediately if it is not ready or errored
    • if the Wrapper service is ready, check the readiness of the Inner service and return the result, converting the error types as needed
  • Wrapper::call:
    • first, use the Inner::call method directly
    • add a comment saying that Inner::call must be called immediately, unconditionally, and regardless of any errors
    • to avoid code changes which add errors before Inner::call, wrap it in a function that can not return an error

Simple services transmit backpressure directly from Inner::poll_ready, so a Buffer is optional.

Complex Services

We need to follow every poll_ready with an immediate call when an Outer service:

  • calls multiple inner services,
  • calls different inner services for different request contents,
  • calls different inner services for different request types,
  • calls inner services in a future (which can be dropped or delayed),
  • only calls an inner service under some conditions, or
  • calls an inner service after code that returns early on errors - check for the ? operator.

Implementation:

  • Outer::poll_ready:
    • check any readiness conditions for the Outer service and return the result
    • ignore readiness for inner services until we get the specific request in Outer::call
  • Outer::call:
    • use an inner service via service.clone().oneshot(request) (oneshot is from tower::ServiceExt)
    • the inner services need to be wrapped in Buffers so they are cloneable

When non-service components call into service code, they should also use this oneshot pattern. (Non-services can't transmit backpressure, so the other patterns can't be used for them).

Complex Backpressure

If a complex service needs to transmit backpressure from one or more inner services, it can:

  • wrap each inner service it wants to transmit backpressure from in a ReadyCache
  • Outer::poll_ready:
    • first, check any readiness conditions for the Outer service, returning immediately if it is not ready or errored
    • if the Outer service is ready, check the readiness of the backpressure inner services using ReadyCache::poll_pending and return the result, converting the error types as needed
  • Outer::call:
    • use backpressure inner services via ReadyCache::call_ready
    • use any other inner services via service.clone().oneshot(request)

Outer must not call ReadyCache::check_ready (or ReadyCache::check_ready_index), because they use poll_ready on services that are already ready. This ensures that each inner service has at most one outstanding Buffer reservation in the ReadyCache.

It's also possible to store multiple services in a ReadyCache. But Zebra services typically have inner services with different types, so this requires type erasure, which complicates the design.

See https://docs.rs/tower/0.4.3/tower/ready_cache/cache/struct.ReadyCache.html

Stop Using call_all

call_all leaks a buffer reservation every time the inner service returns Poll::Pending: https://github.com/tower-rs/tower/blob/master/tower/src/util/call_all/common.rs#L112

Buffers return Poll::Pending when they are full:
https://github.com/tower-rs/tower/blob/master/tower/src/buffer/service.rs#L119
https://github.com/tower-rs/tower/blob/master/tower/src/semaphore.rs#L55

So we should stop using call_all, because buffers can always return Poll:Pending.

For quick requests, we can just use a loop inside the returned future. Correctness and simplicity is more important that a possible small performance gain from doing concurrent tasks (as long as they are quick).

If the requests are slow, or have dependencies on each other, we should spawn each request, and store their JoinHandles in a FuturesUnordered. We might also want to wrap each request future in a Timeout, so we don't hang on very slow tasks or missing dependencies.

Buffers

A service should be provided wrapped in a Buffer if:

  • it is a complex service
  • it has multiple callers, or
  • it has a single caller than calls it multiple times concurrently.

Services might also have other reasons for using a Buffer. These reasons should be documented.

Choosing Buffer Bounds

Zebra's Buffer bounds should be set to the maximum number of concurrent requests, plus 1:

it's advisable to set bound to be at least the maximum number of concurrent requests the Buffer will see
https://docs.rs/tower/0.4.3/tower/buffer/struct.Buffer.html#method.new

The extra slot protects us from future changes that add an extra caller, or extra concurrency.

As a general rule, Zebra Buffers should all have at least 3 slots (2 + 1), because most Zebra services can be called concurrently by:

  • the sync service, and
  • the inbound service.

Services might also have other reasons for a larger bound. These reasons should be documented.

We should minimise Buffer lengths for services whose requests or responses contain Blocks (or other large data items, such as Transaction vectors). A long Buffer full of Blocks can significantly increase memory usage.

Long Buffers can also increase request latency. Latency isn't a concern for Zebra's core use case as a node software, but it might be an issue if wallets, exchanges, or block explorers want to use Zebra.

Implementation Tasks

We should review every Zebra service, and all the code that calls Zebra services, and make sure it uses one of the design patterns listed in this ticket. In particular, we should search for the following strings:

  • poll_ready
  • ready_and
  • call
  • any other Service or ServiceExt functions that call poll_ready

We should stop using ServiceExt::call_all, because it has a bug that leaks service reservations, which can fill up buffers, causing hangs.

Underlying Abstraction Failure

These kinds of issues occur because Zebra wants:

  • a one-time setup for the service pipeline
  • with a single instance of each service
  • delivering reliable microservices in one process
  • with compile-time or runtime errors for service misuse
  • and monitoring tools for service health

But Tower's canonical use case is:

  • unreliable service requests over the network
  • discarding and re-launching failed services
  • with multiple instances of each service

And Tower's design and tooling is a work-in-progress:

  • it's easy to write complex service code - simplification APIs are hard to discover or missing
  • a lack of compile-time checks, runtime panics, or runtime tracing for service misuse
  • few monitoring tools for service health

In addition, Zebra's definition of service misuse doesn't quite match Tower's.

For example, we might want to panic rather than hanging if a service's buffer fills up. Typically, networked services are de-prioritised, restarted or replaced if they are slower than other services. But Zebra only creates a single instance of each service at startup.

Stream Abstraction

Stream is immune from these issues, because its poll_next function returns the underlying result when ready.

Open Questions

Batch Verification and Blocking Tasks

Do batch verification, CPU-bound services, and other potentially blocking tasks need larger Buffers?
Do the service layers above blocking services need larger Buffers, up to the point where concurrency is introduced?

For example, some desktop machines have 64 cores and 128 threads:
https://www.amd.com/en/products/cpu/amd-ryzen-threadripper-pro-3995wx

(I haven't seen any CPU-limiting issues on my 18-core / 36-thread machine, so this question isn't a high priority.)

Alternative Designs

Here are some other designs for readiness:

  • make all the Buffers bigger - just delays the hangs
  • submit a Request::Null to services we don't use - makes the API really messy
  • do complex readiness handling - makes it easier for unrelated changes to introduce subtle hang bugs
  • use ready_and().await?.call(request) - more complex than clone().oneshot(request), and both ways require a Buffer anyway
  • a complex service forwarding to a single service that is always used, without a ReadyCache - easier to get wrong
  • just drop and re-create services when they hang - Zebra isn't really designed like this, infrequent operations often lead to subtle bugs, and reloading services could interfere with client code, or lead to privacy leaks in network traffic (note that hangs can also lead to privacy leaks)

Here are some other designs for backpressure:

  • add an outer Buffer for backpressure - only works if blocking or slow tasks happen directly in the call, rather than the returned future, because the Buffer slot is released after the future is returned

Related Issues

This work is part of fixing the hangs in #1435.

Follow-Up Tasks

The content of this ticket should be turned into a design RFC, with example code based on the implementation of this ticket. But we need to check that it works first.

We should keep the current backpressure implementations as much as possible, but review them as a separate task (#1618).

@teor2345 teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code labels Jan 14, 2021
@teor2345 teor2345 added this to the 2021 Sprint 1 milestone Jan 14, 2021
@teor2345 teor2345 added this to No Estimate in Effort Affinity Grouping via automation Jan 14, 2021
@teor2345 teor2345 self-assigned this Jan 14, 2021
@teor2345 teor2345 added this to To Do in 🦓 via automation Jan 14, 2021
@teor2345 teor2345 moved this from No Estimate to M - 5 in Effort Affinity Grouping Jan 14, 2021
teor2345 added a commit to teor2345/zebra that referenced this issue Jan 22, 2021
Uses the `ServiceExt::oneshot` design pattern from ZcashFoundation#1593.
teor2345 added a commit to teor2345/zebra that referenced this issue Jan 22, 2021
Uses the `ServiceExt::oneshot` design pattern from ZcashFoundation#1593.
teor2345 added a commit to teor2345/zebra that referenced this issue Jan 25, 2021
ServiceExt::call_all leaks Tower::Buffer reservations, so we can't use
it in Zebra.

Instead, use a loop in the returned future.

See ZcashFoundation#1593 for details.
teor2345 added a commit to teor2345/zebra that referenced this issue Jan 25, 2021
Uses the `ServiceExt::oneshot` design pattern from ZcashFoundation#1593.
teor2345 added a commit to teor2345/zebra that referenced this issue Jan 25, 2021
ServiceExt::call_all leaks Tower::Buffer reservations, so we can't use
it in Zebra.

Instead, use a loop in the returned future.

See ZcashFoundation#1593 for details.
teor2345 added a commit to teor2345/zebra that referenced this issue Jan 25, 2021
ServiceExt::call_all leaks Tower::Buffer reservations, so we can't use
it in Zebra.

Instead, use a loop in the returned future.

See ZcashFoundation#1593 for details.
teor2345 added a commit to teor2345/zebra that referenced this issue Jan 25, 2021
Uses the `ServiceExt::oneshot` design pattern from ZcashFoundation#1593.
teor2345 added a commit to teor2345/zebra that referenced this issue Jan 25, 2021
ServiceExt::call_all leaks Tower::Buffer reservations, so we can't use
it in Zebra.

Instead, use a loop in the returned future.

See ZcashFoundation#1593 for details.
@mpguerra mpguerra modified the milestones: 2021 Sprint 1, 2021 Sprint 2 Jan 25, 2021
@mpguerra mpguerra linked a pull request Jan 26, 2021 that will close this issue
2 tasks
@teor2345
Copy link
Contributor Author

@mpguerra #1620 only fixes one instance of #1593, there could be more

@teor2345 teor2345 removed a link to a pull request Jan 29, 2021
2 tasks
teor2345 added a commit that referenced this issue Feb 2, 2021
Uses the `ServiceExt::oneshot` design pattern from #1593.
teor2345 added a commit that referenced this issue Feb 2, 2021
ServiceExt::call_all leaks Tower::Buffer reservations, so we can't use
it in Zebra.

Instead, use a loop in the returned future.

See #1593 for details.
@teor2345 teor2345 added the C-design Category: Software design work label Feb 5, 2021
@teor2345
Copy link
Contributor Author

teor2345 commented Feb 5, 2021

The remaining work in this ticket is routine RFC documentation.

@teor2345
Copy link
Contributor Author

The remaining design work in this ticket can be done as part of any future sprint.

@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Feb 18, 2021
@teor2345 teor2345 mentioned this issue Mar 30, 2021
7 tasks
@mpguerra mpguerra modified the milestones: 2021 Sprint 7, 2021 Sprint 6 Apr 7, 2021
@teor2345 teor2345 changed the title Make sure all poll_ready reservations are actually used in a call Make sure Zebra uses poll_ready and Buffer reservations correctly Apr 18, 2021
@mpguerra mpguerra removed this from the 2021 Sprint 7 milestone Apr 19, 2021
🦓 automation moved this from To Do to Done May 4, 2021
Effort Affinity Grouping automation moved this from S - 3 to Done May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: Documentation C-design Category: Software design work
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants