-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] kill/crash conan process leaves a lock file containing count -1 #11033
Comments
It is a well known fact that a crash will leave the locks there, and a Yes, we have considered different alternatives. Any daemon has been completely discarded. It is extremely challenging to maintain such functionality, specially in multi-platform (windows, linux, mac, sun, freebsd...). Too much effort for the value. There are some multi-platform issues involved, like for example Windows reusing PIDs. Also the fact that many tasks can take a lot of time, from minutes to almost hours (like building a large library), doesn't help, because that mean that any approach based in timeouts will eventually fail for some users. For Conan 2.0 we have completely removed concurrency at the moment. It was challenging enough to implement the multi-revision cache and get rid of the short-paths implementation. We will eventually go back to the cache concurrency issue, but most likely after 2.0 (what we call 2.X). We might try to leverage DB sync capabilities, or some new implementation of readers-writer that apparently has been contributed to fasteners library, and that might get rid of the problem, managing that at the system level (instead of the app level that is what we are doing now) |
Hi @memsharded ! All of your points are valid and good ideas. You are correct that PIDs could get re-used. Let me know if I understood correct, the flow you are describing:
Wouldn't the following avoid all of these problems?
Even if we avoid implementing check No. 2 and only implement No. 3 (in case someone uses conan not via commandline conan) -> we are just "making the fix incomplete" but we're still making things WAY better here.. We would still identify an "abandoned lock" in the 99% of cases where same PID didn't yet get re-used by ANY process that is not US. For most busy machines (multiple builds per day/week) - I think PID re-use will be a much rarer issue than "any killed conan in the past leaves locks forever". WDYT? P.S. |
Closing in favor of #15840 where the Conan 2 cache concurrency future progress will be reported. |
Environment Details (include every applicable attribute)
17:21:17 Microsoft (R) Build Engine version 16.9.0+5e4b48a27 for .NET Framework
)Steps to reproduce (Include if Applicable)
Logs (Executed commands with output) (Include/Attach if Applicable)
Basically - even running conan remove --locks doesn't help.
Is anyone aware of such a bug?
Executive summary
Conan dying mid-operation sometimes leaks lock files, even
conan remove --locks
doesn't fix it (and too late anyways, after jobs are indefinitely stuck until intervention).I think a simple "pid stored in the lock file" can prevent such infinite deadlocks.
Bonus - we avoid the sketchy
conan remove --locks
command which doesn't even help here.Our only workaround currently is to delete the lock file itself directly..
I'm thinking of the Linux Daemon style classic "If the pid is not alive - just take over the file".
WDYT?
The text was updated successfully, but these errors were encountered: