Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the metrics recorder to a blocking thread to avoid hangs #2201

Closed
Tracked by #3322 ...
teor2345 opened this issue May 26, 2021 · 2 comments
Closed
Tracked by #3322 ...

Move the metrics recorder to a blocking thread to avoid hangs #2201

teor2345 opened this issue May 26, 2021 · 2 comments
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug I-slow Problems with performance or responsiveness I-usability Zebra is hard to understand or use

Comments

@teor2345
Copy link
Contributor

teor2345 commented May 26, 2021

Scheduling

We should only do this ticket if we have fixed the other Zebra hang bugs, and Zebra still hangs frequently.

This is unlikely to be causing our hang issues, so we should do it last.

Is your feature request related to a problem? Please describe.

Zebra runs the metrics recorder on an async thread, which can block:

  • async code running concurrently in the same task, and
  • async code from other tasks on the same thread.

Describe the solution you'd like

Instead, Zebra should:

  • use tokio's spawn_blocking function to run this task, and
  • make all callers into async functions (or access the executor directly)

https://docs.rs/tokio/1.6.0/tokio/task/fn.spawn_blocking.html

Describe alternatives you've considered

We can use tokio's block_in_place function. This function only blocks concurrent code in the same task, which is better, but still not great.

We could also use block_in_place as we transition code.

Additional context

Blocking functions might be the source of hangs or slowdowns in the inbound service, or state service, or unrelated tasks.

@teor2345 teor2345 added C-enhancement Category: This is an improvement S-needs-triage Status: A bug report needs triage A-rust Area: Updates to Rust code C-bug Category: This is a bug I-slow Problems with performance or responsiveness I-usability Zebra is hard to understand or use P-Medium and removed C-enhancement Category: This is an improvement labels May 26, 2021
@teor2345
Copy link
Contributor Author

teor2345 commented Jun 2, 2021

@mpguerra this change is high-risk, we should only do it after NU5 activation, and after upgrading to tokio 1.0

@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Jun 4, 2021
@teor2345 teor2345 added P-Low and removed P-Medium labels Jan 6, 2022
@teor2345 teor2345 changed the title Move the metrics recorder to a blocking thread Move the metrics recorder to a blocking thread to avoid hangs Jan 6, 2022
@teor2345 teor2345 added P-Optional and removed P-Low labels Jan 6, 2022
@teor2345
Copy link
Contributor Author

teor2345 commented Mar 1, 2022

This doesn't seem to cause problems in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-bug Category: This is a bug I-slow Problems with performance or responsiveness I-usability Zebra is hard to understand or use
Projects
None yet
Development

No branches or pull requests

2 participants