-
Notifications
You must be signed in to change notification settings - Fork 375
fix(nr_large): use synchronous sockets with timeout and spawn_blocking #8547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(nr_large): use synchronous sockets with timeout and spawn_blocking #8547
Conversation
|
I think this solution should be reliable with enough blocking threads, i.e. if there are more blocking threads than there are nodes to connect to in parallel. Though, when looking at the definition of |
basvandijk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, when looking at the definition of block_on, I see that we use a maximum of 16 blocking threads, even though the default is 512 and the docs explicitely say not to set this number too low. Is there a reason why we chose such a small number?
I don't know but I see it was introduced in 82380b3.
Thank you. Do you think it could be acceptable to bump this number to something like 64 or 128? Again, the default is 512. |
Done in a separate PR |
As mentioned in #8547 (comment), the `tokio` [docs](https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.max_blocking_threads:~:text=It%E2%80%99s%20recommended%20to,.) explicitly say not to set the `max_blocking_threads` of a Runtime too low. So let's just use the default of 512. Considering that these threads are not always active it's fine to have more of them in case they're needed.
…8595) Similarly to #8547, this PR spawns the execution of the bash script itself as a blocking task since it also uses blocking I/O. Unfortunately, I could not simply write ```rust tokio::task::spawn_blocking(move || self.block_on_bash_script_from_session(&session, &script)) .await .expect("Executing bash script task panicked") ``` because I also had to clone `self`, which is not `Clone`. Removing the `self` parameter from `block_on_bash_script_from_session` (which is not used) would not be practical either as we would have to adapt every call site.
nr_largeis still flaky. The last fix was not enough to fix the problem. It looks like rebuilding a socket from scratch after a timeout was the correct approach (see #6823), so I reverted most changes of the last fix, and instead simply wrapped the blocking operation with a.spawn_blocking, because I suspect the test started to become flaky because some tasks started to block others.