You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Buck2 hammers the remote execution as hard as it can (which is a good thing). In my testing I was running everything in the same process (which is not how it should be done in production), it caused the max number of files to be opened then deadlocked because every thread was waiting for another thread to release a file.
This only happens when you are reading and writing from one-file to another (ie: CAS(file) -> worker(file)).
The text was updated successfully, but these errors were encountered:
In testing Buck2, it turns out they hammer the scheduler with jobs
and assume it can keep up (which is good). On local testing, where
all services are on the same machine it was deadlocking because
it was performing the following operations:
1. Open file1 for reading
2. Open file2 for writing
3. Streaming file1 -> file2
Since we allow users to limit the number of open files at any given
time, this was deadlocking because file1 was held open waiting
for file2 to open, which was waiting for a file to be closed. Since
buck2 goes crazy, it was causing a deadlock.
In most production systems this is not an issue because the CAS
is separated from the workers, but rarely might happen on the workers
if the `max_open_files` was set too low.
To get around this issue `ResumeableFileSlot` is introduced. It
allows callers to use a timeout and call `.close_file()` on it
and the next time the struct is used it will re-open the file.
related #222closes#238
Buck2 hammers the remote execution as hard as it can (which is a good thing). In my testing I was running everything in the same process (which is not how it should be done in production), it caused the max number of files to be opened then deadlocked because every thread was waiting for another thread to release a file.
This only happens when you are reading and writing from one-file to another (ie: CAS(file) -> worker(file)).
The text was updated successfully, but these errors were encountered: