Make sure Zebra uses poll_ready and Buffer reservations correctly #1593

teor2345 · 2021-01-14T08:57:23Z

Motivation

The constraints imposed by the tower::Buffer and tower::Batch implementations are:

poll_ready must be called at least once for each call
Once we've reserved a buffer slot, we always get Poll::Ready from a buffer, regardless of the current readiness of the buffer or its underlying service
The Buffer/Batch capacity limits the number of concurrently waiting tasks. Once this limit is reached, further tasks will block, awaiting a free reservation.
Some tasks can depend on other tasks before they resolve. (For example: block validation.) If there are task dependencies, the Buffer/Batch capacity must be larger than the maximum number of concurrently waiting tasks, or Zebra could deadlock (hang).
To avoid deadlocks, we should acquire slots in a consistent order, using the ready! macro, rather than polling them all simultaneously

These constraints are mitigated by:

the timeouts on network messages, block downloads and block verification, which restart verification if it hangs
the Buffer/Batch reservation release when response future is returned by the buffer/batch, even if the future doesn't complete
- in general, we should move as much work into futures as possible, unless the design requires sequential calls
larger Buffer/Batch bounds

Tasks

revise or close this ticket and all related tickets and PRs
check our workspace-pinned version of tower-buffer does not leak buffer slots
- confirmation: https://github.com/tower-rs/tower/blob/d4d1c67c6a0e4213a52abcc2b9df6cc58276ee39/tower/src/semaphore.rs#L55
- probably implemented as part of update to Tokio 0.3 tower-rs/tower#476
- test (if we want to cherry-pick): tower-rs/tower@dc2bdc9
- we could use tower::Buffer from 0.4 and everything else from out git dependency, but that's not necessary
open a ticket to make tower-batch wake waiting clones on close: buffer: wake tasks waiting for channel capacity when terminating tower-rs/tower#480 - Update the semaphore impl in tower-batch, to avoid hangs #1671
open a ticket to update our buffer capacities to be larger than the maximum number of concurrent clones, since each clone can reserve a buffer slot (Adjust zebra-consensus Buffer sizes, and document Buffer hang risks #1675, Adjust zebrad Buffer sizes, and document Buffer constraints #1685, Adjust the state Buffer size, and document Buffer constraints #1698)
open tickets or PRs to revert any backpressure that we removed by changing poll_ready impls, after the buffer capacities are fixed
Turn the buffer capacity section of this ticket into a much shorter RFC
- focus on maximum concurrency limits, using the recent buffer capacity changes as examples
Add a poll_ready deadlock section to that RFC, based on the deadlock constraints listed above

Here's a dealock avoidance example from #1735:

        // We acquire checkpoint readiness before block readiness, to avoid an unlikely
        // hang during the checkpoint to block verifier transition. If the checkpoint and
        // block verifiers are contending for the same buffer/batch, we want the checkpoint
        // verifier to win, so that checkpoint verification completes, and block verification
        // can start. (Buffers and batches have multiple slots, so this contention is unlikely.)
        use futures::ready;
        // The chain verifier holds one slot in each verifier, for each concurrent task.
        // Therefore, any shared buffers or batches polled by these verifiers should double
        // their bounds. (For example, the state service buffer.)
        ready!(self
            .checkpoint
            .poll_ready(cx)
            .map_err(VerifyChainError::Checkpoint))?;
        ready!(self.block.poll_ready(cx).map_err(VerifyChainError::Block))?;
        Poll::Ready(Ok(()))

Outdated

The rest of this ticket is outdated, but might be useful for writing the RFC.

Obsolete Tasks

Review existing Service implementations and calls, and prioritise these fixes. Check for code that:
- uses Service or ServiceExt
- calls poll_ready or call on a Service
Open separate tickets for each crate/Service/area of work
~~Turn this ticket text into a Zebra RFC, using examples from Fix poll_ready usage in ChainVerifier #1700~~ (obsoleted)

Motivation

Symptoms

Zebra sometimes hangs (#1435) for long periods of time. These hangs seem to happen in the peer set, but they could be caused by bugs elsewhere in Zebra.

Analysis

Zebra uses Tower services extensively. The tower::Buffer documentation says that every poll_ready and call should happen in a matched pair, or the Buffer will run out of reservation slots and hang:

When Buffer's implementation of poll_ready returns Poll::Ready, it reserves a slot in the channel for the forthcoming call. However, if this call doesn't arrive, this reserved slot may be held up for a long time.
https://docs.rs/tower/0.4.3/tower/buffer/struct.Buffer.html#method.new

But Zebra often calls Buffer::poll_ready (or ready_and), gets an Ok result, then never uses the Buffer reservation in a Buffer::call.

This bug can also be transitive:

an Inner Zebra service correctly follows every Buffer::poll_ready with a Buffer::call
an Outer Zebra service uses Inner::poll_ready, but doesn't follow it with an Inner::call
when combined, the services have a bug, because an Outer::poll_ready reserves slots via the Inner service's Buffer::poll_ready, but never follows it with an Inner::call, so the Buffer slot is reserved pending the Inner::call

This issue also affects Zebra's own tower-batch, which has a very similar implementation to tower::Buffer. See #1668 for details.

Requirements

Therefore, all Zebra services must follow poll_ready with call on all of the services they use, so that any Buffers or Batches lower down in the service stack are used correctly. (Since Zebra or its dependencies may add Buffers or Batches in future changes, this requirement applies even if they are not currently in the service stack.)

As of January 2021, Zebra often breaks these requirements using the following anti-patterns:

poll_ready:
- check if all services we might use for any request are ready, using their poll_readys
- if only some services are ready, we add a reservation to their Buffers, but return Poll::Pending
- later, callers that received Poll::Pending retry our poll_ready
- we add more Buffer reservations, without any calls to use those reservations, filling up the Buffers of the ready services
call:
- only call the services used for this specific kind of request
- if any services are unused, their Buffers fill up with reservations

Design

Overall Summary

Zebra uses 3 different design patterns for service readiness:

simple wrapper services forward poll_ready and call unconditionally
complex services use ServiceExt::oneshot to make sure poll_ready and call happen together
- other code which calls into services also uses the oneshot pattern
if a complex service needs to transmit backpressure, it uses a ReadyCache

Simple Wrapper Services

Simple wrapper services can forward poll_ready and call when a Wrapper service:

only ever calls one Inner service, and
calls the Inner service:
- immediately and unconditionally,
- for every request, regardless of its content, and
- for every request type.

We can't use this pattern if Inner service call happens:

in a returned future (which might be dropped or delayed), or
after code that returns early on errors - check for the ? operator.

Implementation:

Wrapper::poll_ready:
- first, check any readiness conditions for the Wrapper service, returning immediately if it is not ready or errored
- if the Wrapper service is ready, check the readiness of the Inner service and return the result, converting the error types as needed
Wrapper::call:
- first, use the Inner::call method directly
- add a comment saying that Inner::call must be called immediately, unconditionally, and regardless of any errors
- to avoid code changes which add errors before Inner::call, wrap it in a function that can not return an error

Simple services transmit backpressure directly from Inner::poll_ready, so a Buffer is optional.

Complex Services

We need to follow every poll_ready with an immediate call when an Outer service:

calls multiple inner services,
calls different inner services for different request contents,
calls different inner services for different request types,
calls inner services in a future (which can be dropped or delayed),
only calls an inner service under some conditions, or
calls an inner service after code that returns early on errors - check for the ? operator.

Implementation:

Outer::poll_ready:
- check any readiness conditions for the Outer service and return the result
- ignore readiness for inner services until we get the specific request in Outer::call
Outer::call:
- use an inner service via service.clone().oneshot(request) (oneshot is from tower::ServiceExt)
- the inner services need to be wrapped in Buffers so they are cloneable

When non-service components call into service code, they should also use this oneshot pattern. (Non-services can't transmit backpressure, so the other patterns can't be used for them).

Complex Backpressure

If a complex service needs to transmit backpressure from one or more inner services, it can:

wrap each inner service it wants to transmit backpressure from in a ReadyCache
Outer::poll_ready:
- first, check any readiness conditions for the Outer service, returning immediately if it is not ready or errored
- if the Outer service is ready, check the readiness of the backpressure inner services using ReadyCache::poll_pending and return the result, converting the error types as needed
Outer::call:
- use backpressure inner services via ReadyCache::call_ready
- use any other inner services via service.clone().oneshot(request)

Outer must not call ReadyCache::check_ready (or ReadyCache::check_ready_index), because they use poll_ready on services that are already ready. This ensures that each inner service has at most one outstanding Buffer reservation in the ReadyCache.

It's also possible to store multiple services in a ReadyCache. But Zebra services typically have inner services with different types, so this requires type erasure, which complicates the design.

See https://docs.rs/tower/0.4.3/tower/ready_cache/cache/struct.ReadyCache.html

Stop Using `call_all`

call_all leaks a buffer reservation every time the inner service returns Poll::Pending: https://github.com/tower-rs/tower/blob/master/tower/src/util/call_all/common.rs#L112

Buffers return Poll::Pending when they are full:
https://github.com/tower-rs/tower/blob/master/tower/src/buffer/service.rs#L119
https://github.com/tower-rs/tower/blob/master/tower/src/semaphore.rs#L55

So we should stop using call_all, because buffers can always return Poll:Pending.

For quick requests, we can just use a loop inside the returned future. Correctness and simplicity is more important that a possible small performance gain from doing concurrent tasks (as long as they are quick).

If the requests are slow, or have dependencies on each other, we should spawn each request, and store their JoinHandles in a FuturesUnordered. We might also want to wrap each request future in a Timeout, so we don't hang on very slow tasks or missing dependencies.

Buffers

A service should be provided wrapped in a Buffer if:

it is a complex service
it has multiple callers, or
it has a single caller than calls it multiple times concurrently.

Services might also have other reasons for using a Buffer. These reasons should be documented.

Choosing Buffer Bounds

Zebra's Buffer bounds should be set to the maximum number of concurrent requests, plus 1:

it's advisable to set bound to be at least the maximum number of concurrent requests the Buffer will see
https://docs.rs/tower/0.4.3/tower/buffer/struct.Buffer.html#method.new

The extra slot protects us from future changes that add an extra caller, or extra concurrency.

As a general rule, Zebra Buffers should all have at least 3 slots (2 + 1), because most Zebra services can be called concurrently by:

the sync service, and
the inbound service.

Services might also have other reasons for a larger bound. These reasons should be documented.

We should minimise Buffer lengths for services whose requests or responses contain Blocks (or other large data items, such as Transaction vectors). A long Buffer full of Blocks can significantly increase memory usage.

Long Buffers can also increase request latency. Latency isn't a concern for Zebra's core use case as a node software, but it might be an issue if wallets, exchanges, or block explorers want to use Zebra.

Implementation Tasks

We should review every Zebra service, and all the code that calls Zebra services, and make sure it uses one of the design patterns listed in this ticket. In particular, we should search for the following strings:

poll_ready
ready_and
call
any other Service or ServiceExt functions that call poll_ready

We should stop using ServiceExt::call_all, because it has a bug that leaks service reservations, which can fill up buffers, causing hangs.

Underlying Abstraction Failure

These kinds of issues occur because Zebra wants:

a one-time setup for the service pipeline
with a single instance of each service
delivering reliable microservices in one process
with compile-time or runtime errors for service misuse
and monitoring tools for service health

But Tower's canonical use case is:

unreliable service requests over the network
discarding and re-launching failed services
with multiple instances of each service

And Tower's design and tooling is a work-in-progress:

it's easy to write complex service code - simplification APIs are hard to discover or missing
a lack of compile-time checks, runtime panics, or runtime tracing for service misuse
few monitoring tools for service health

In addition, Zebra's definition of service misuse doesn't quite match Tower's.

For example, we might want to panic rather than hanging if a service's buffer fills up. Typically, networked services are de-prioritised, restarted or replaced if they are slower than other services. But Zebra only creates a single instance of each service at startup.

Stream Abstraction

Stream is immune from these issues, because its poll_next function returns the underlying result when ready.

Open Questions

Batch Verification and Blocking Tasks

Do batch verification, CPU-bound services, and other potentially blocking tasks need larger Buffers?
Do the service layers above blocking services need larger Buffers, up to the point where concurrency is introduced?

For example, some desktop machines have 64 cores and 128 threads:
https://www.amd.com/en/products/cpu/amd-ryzen-threadripper-pro-3995wx

(I haven't seen any CPU-limiting issues on my 18-core / 36-thread machine, so this question isn't a high priority.)

Alternative Designs

Here are some other designs for readiness:

make all the Buffers bigger - just delays the hangs
submit a Request::Null to services we don't use - makes the API really messy
do complex readiness handling - makes it easier for unrelated changes to introduce subtle hang bugs
use ready_and().await?.call(request) - more complex than clone().oneshot(request), and both ways require a Buffer anyway
a complex service forwarding to a single service that is always used, without a ReadyCache - easier to get wrong
just drop and re-create services when they hang - Zebra isn't really designed like this, infrequent operations often lead to subtle bugs, and reloading services could interfere with client code, or lead to privacy leaks in network traffic (note that hangs can also lead to privacy leaks)

Here are some other designs for backpressure:

add an outer Buffer for backpressure - only works if blocking or slow tasks happen directly in the call, rather than the returned future, because the Buffer slot is released after the future is returned

Related Issues

This work is part of fixing the hangs in #1435.

Follow-Up Tasks

The content of this ticket should be turned into a design RFC, with example code based on the implementation of this ticket. But we need to check that it works first.

We should keep the current backpressure implementations as much as possible, but review them as a separate task (#1618).

The text was updated successfully, but these errors were encountered:

Uses the `ServiceExt::oneshot` design pattern from ZcashFoundation#1593.

ServiceExt::call_all leaks Tower::Buffer reservations, so we can't use it in Zebra. Instead, use a loop in the returned future. See ZcashFoundation#1593 for details.

Uses the `ServiceExt::oneshot` design pattern from ZcashFoundation#1593.

ServiceExt::call_all leaks Tower::Buffer reservations, so we can't use it in Zebra. Instead, use a loop in the returned future. See ZcashFoundation#1593 for details.

Uses the `ServiceExt::oneshot` design pattern from ZcashFoundation#1593.

ServiceExt::call_all leaks Tower::Buffer reservations, so we can't use it in Zebra. Instead, use a loop in the returned future. See ZcashFoundation#1593 for details.

teor2345 · 2021-01-29T02:40:32Z

@mpguerra #1620 only fixes one instance of #1593, there could be more

Uses the `ServiceExt::oneshot` design pattern from #1593.

ServiceExt::call_all leaks Tower::Buffer reservations, so we can't use it in Zebra. Instead, use a loop in the returned future. See #1593 for details.

teor2345 · 2021-02-05T05:34:14Z

The remaining work in this ticket is routine RFC documentation.

teor2345 · 2021-02-15T05:41:30Z

The remaining design work in this ticket can be done as part of any future sprint.

teor2345 added C-bug Category: This is a bug A-rust Area: Updates to Rust code labels Jan 14, 2021

teor2345 added this to the 2021 Sprint 1 milestone Jan 14, 2021

teor2345 added this to No Estimate in Effort Affinity Grouping via automation Jan 14, 2021

teor2345 self-assigned this Jan 14, 2021

teor2345 added this to To Do in 🦓 via automation Jan 14, 2021

teor2345 moved this from No Estimate to M - 5 in Effort Affinity Grouping Jan 14, 2021

teor2345 mentioned this issue Jan 22, 2021

Choose between backpressure and load shedding for each service #1618

Closed

teor2345 added a commit to teor2345/zebra that referenced this issue Jan 22, 2021

Remove services that are never called from Inbound

2be077b

Uses the `ServiceExt::oneshot` design pattern from ZcashFoundation#1593.

teor2345 mentioned this issue Jan 22, 2021

Fix poll_ready usage for the Inbound service #1620

Merged

2 tasks

teor2345 added a commit to teor2345/zebra that referenced this issue Jan 22, 2021

Remove services that are never called from Inbound

e3499a6

Uses the `ServiceExt::oneshot` design pattern from ZcashFoundation#1593.

This was referenced Jan 25, 2021

zebrad syncer hangs when there are no peers, despite network timeouts #1633

Closed

Extreme CPU usage when sync has hung #1634

Closed

teor2345 added a commit to teor2345/zebra that referenced this issue Jan 25, 2021

Remove services that are never called from Inbound

5b2989d

Uses the `ServiceExt::oneshot` design pattern from ZcashFoundation#1593.

teor2345 added a commit to teor2345/zebra that referenced this issue Jan 25, 2021

Remove services that are never called from Inbound

dd36838

Uses the `ServiceExt::oneshot` design pattern from ZcashFoundation#1593.

mpguerra modified the milestones: 2021 Sprint 1, 2021 Sprint 2 Jan 25, 2021

mpguerra linked a pull request Jan 26, 2021 that will close this issue

Fix poll_ready usage for the Inbound service #1620

Merged

2 tasks

teor2345 mentioned this issue Jan 29, 2021

Peer connections hang temporarily or fail permanently #1435

Closed

1 task

teor2345 removed a link to a pull request Jan 29, 2021

Fix poll_ready usage for the Inbound service #1620

Merged

2 tasks

teor2345 added a commit that referenced this issue Feb 2, 2021

Remove services that are never called from Inbound

2a25b9e

Uses the `ServiceExt::oneshot` design pattern from #1593.

teor2345 added a commit that referenced this issue Feb 2, 2021

Stop using ServiceExt::call_all due to buffer bugs

94eb913

ServiceExt::call_all leaks Tower::Buffer reservations, so we can't use it in Zebra. Instead, use a loop in the returned future. See #1593 for details.

This was referenced Feb 3, 2021

Document the Buffer/Batch constraints on tower-batch #1668

Closed

Stop ignoring semaphore errors in tower-batch #1669

Closed

Use the correct bound for the semaphore in tower-batch #1670

Closed

teor2345 added the C-design Category: Software design work label Feb 5, 2021

oxarbitrage mentioned this issue Feb 8, 2021

Fix poll_ready usage in ChainVerifier #1700

Merged

teor2345 moved this from M - 5 to XXS - 1 in Effort Affinity Grouping Feb 8, 2021

teor2345 moved this from XXS - 1 to XS - 2 in Effort Affinity Grouping Feb 8, 2021

teor2345 modified the milestones: 2021 Sprint 2, 2021 Sprint 3 Feb 8, 2021

teor2345 added the S-needs-triage Status: A bug report needs triage label Feb 10, 2021

This was referenced Feb 15, 2021

Use oneshot() in zebra-consensus #1710

Closed

Revert "Stop using CallAllUnordered in peer_set::add_initial_peers" #1734

Merged

Partly revert "Fix poll_ready usage in ChainVerifier" #1735

Merged

teor2345 moved this from XS - 2 to S - 3 in Effort Affinity Grouping Feb 15, 2021

teor2345 removed this from the 2021 Sprint 3 milestone Feb 15, 2021

teor2345 mentioned this issue Feb 15, 2021

Update the call_all comment in Inbound #1737

Merged

mpguerra removed the S-needs-triage Status: A bug report needs triage label Feb 18, 2021

teor2345 mentioned this issue Feb 23, 2021

Update tower-batch to avoid hangs on drop #1805

Closed

10 tasks

teor2345 mentioned this issue Mar 30, 2021

Async in Zebra RFC #1965

Merged

7 tasks

mpguerra modified the milestones: 2021 Sprint 7, 2021 Sprint 6 Apr 7, 2021

teor2345 changed the title ~~Make sure all poll_ready reservations are actually used in a call~~ Make sure Zebra uses poll_ready and Buffer reservations correctly Apr 18, 2021

mpguerra removed this from the 2021 Sprint 7 milestone Apr 19, 2021

teor2345 closed this as completed in #1965 May 4, 2021

🦓 automation moved this from To Do to Done May 4, 2021

Effort Affinity Grouping automation moved this from S - 3 to Done May 4, 2021

arya2 mentioned this issue Mar 31, 2023

feat(script): Add binary for finding references to closed issues #6347

Merged

6 tasks

mpguerra mentioned this issue Apr 11, 2023

Tracking: TODOs with closed tasks #6281

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure Zebra uses poll_ready and Buffer reservations correctly #1593

Make sure Zebra uses poll_ready and Buffer reservations correctly #1593

teor2345 commented Jan 14, 2021 •

edited

Obsolete Tasks

Motivation

Symptoms

Analysis

Requirements

Design

Overall Summary

Simple Wrapper Services

Complex Services

Complex Backpressure

Stop Using `call_all`

Buffers

Choosing Buffer Bounds

Implementation Tasks

Underlying Abstraction Failure

Stream Abstraction

Open Questions

Batch Verification and Blocking Tasks

Alternative Designs

Related Issues

Follow-Up Tasks

teor2345 commented Jan 29, 2021

teor2345 commented Feb 5, 2021

teor2345 commented Feb 15, 2021

Make sure Zebra uses poll_ready and Buffer reservations correctly #1593

Make sure Zebra uses poll_ready and Buffer reservations correctly #1593

Comments

teor2345 commented Jan 14, 2021 • edited

Motivation

Tasks

Outdated

Obsolete Tasks

Motivation

Symptoms

Analysis

Requirements

Design

Overall Summary

Simple Wrapper Services

Complex Services

Complex Backpressure

Stop Using call_all

Buffers

Choosing Buffer Bounds

Implementation Tasks

Underlying Abstraction Failure

Stream Abstraction

Open Questions

Batch Verification and Blocking Tasks

Alternative Designs

Related Issues

Follow-Up Tasks

teor2345 commented Jan 29, 2021

teor2345 commented Feb 5, 2021

teor2345 commented Feb 15, 2021

teor2345 commented Jan 14, 2021 •

edited

Stop Using `call_all`