Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bias UDP send loop towards the shutdown signal #372

Merged
merged 3 commits into from
Nov 2, 2022

Conversation

GeorgeHahn
Copy link
Contributor

What does this PR do?

We've observed an issue where the UDP generator will run forever, ignoring the shutdown signal. This PR biases the UDP generation task towards polling the shutdown signal to avoid this.

Motivation

Stop lading processes from spinning on UDP forever.

Related issues

N/A

Additional Notes

We may consider refactoring shutdown to abort async tasks rather than signal and expect cooperation.

@GeorgeHahn GeorgeHahn added the bug Something isn't working label Nov 1, 2022
@GeorgeHahn GeorgeHahn requested a review from a team November 1, 2022 04:10
@GeorgeHahn
Copy link
Contributor Author

This is pretty wild. Even rewritten as its own async task, the UDP sender causes other tasks on the runtime not to be polled. a699c70 works around the issue by adding an explicit yield point. This is on my radar to min-repr and file upstream after further investigation.


To repro with lading:

  • Set up a local lading target (I'm using an agent regression container)
  • Add a log output to the UDP send loop
  • Add an independent task that logs periodically
  • Test with & without the yield point

Here's the lading config I'm using:

generator:
  - udp:
      seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
             59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
      addr: "127.0.0.1:10000"
      variant: "json"
      bytes_per_second: "500 Mb"
      block_sizes: ["1Kb"]
      maximum_prebuild_cache_size_bytes: "256 Mb"

blackhole:
  - http:
      binding_addr: "127.0.0.1:9091"

@blt
Copy link
Collaborator

blt commented Nov 2, 2022

Even rewritten as its own async task, the UDP sender causes other tasks on the runtime not to be polled. a699c70 works around the issue by adding an explicit yield point.

Dang. Getting upstream consideration of this would be excellent. I have wondered now and again if we might not profitably split lading's internals up so that each component runs on a single-threaded executor on top of an OS thread we manage from startup. We'd get better isolation, at least, for things like this. Non-trivial change, but it's crossed my mind now and again.

@@ -149,6 +149,7 @@ impl Udp {
"UDP packet too large (over 65507 B)"
);
if let Some(sock) = &connection {
tokio::task::yield_now().await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess at-speed we always have bytes ready, considering the size of the max payload. Hrm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wild to me is that we have cores - 1 tokio worker threads sitting idle while this is happening. This should be very interesting to min-repr and dig into.

My initial thought is that we only have one task active. Maybe tokio tries to squeeze in its scheduling at a low priority after workers go idle. That could lead to exactly this situation.

@GeorgeHahn GeorgeHahn requested a review from blt November 2, 2022 16:10
@GeorgeHahn
Copy link
Contributor Author

@blt sorry for the review re-request, I missed your comments earlier

@GeorgeHahn
Copy link
Contributor Author

Dang. Getting upstream consideration of this would be excellent. I have wondered now and again if we might not profitably split lading's internals up so that each component runs on a single-threaded executor on top of an OS thread we manage from startup. We'd get better isolation, at least, for things like this. Non-trivial change, but it's crossed my mind now and again.

This issue has me thinking about the same thing. Either a separate OS thread or another tokio runtime on its own pool of threads.

@blt
Copy link
Collaborator

blt commented Nov 2, 2022

Dang. Getting upstream consideration of this would be excellent. I have wondered now and again if we might not profitably split lading's internals up so that each component runs on a single-threaded executor on top of an OS thread we manage from startup. We'd get better isolation, at least, for things like this. Non-trivial change, but it's crossed my mind now and again.

This issue has me thinking about the same thing. Either a separate OS thread or another tokio runtime on its own pool of threads.

There's a lot to be said for the performance of a shared-nothing program with tasks broken down per-thread running on a single OS thread, assuming we know that the threads are dedicated to a busy bit of work.

@GeorgeHahn GeorgeHahn merged commit ac6c9e6 into main Nov 2, 2022
@GeorgeHahn GeorgeHahn deleted the george/fix-udp-shutdown branch November 2, 2022 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants