-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slave: use system rmdir on windows #540
Conversation
I certainly want to support windows better. How did you determine that calling rmdir (at most twice) is more reliable than what we have now. The fact that you have to retry once seems to me like it might be more reliable with your usage patterns, but not generally. Also, |
Hey Tom, I tried a couple of things, the main trouble there is that windows (on the fs level) when deleting a file, actually only flip a switch, that only delete the file when all the file descriptors are closed. So, if a service (name any antivirus for instance) is browsing your file system at this moment, the rmdirRecursive will fail whatever you try as removing the parent directory will not succeed, this one not being empty. Trying another time will often succeed as the service (or whatever was browsing at that time) will surely have changed file in the meantime. We often had trouble with the clobber functionality of the mercurial source step (on 64-bits), this, running for some months now seems to be pretty reliable ... Calling 'rmdir' in itself, instead if the previously complicated function maybe does not really change things, but as we call 'rm -rf' on Linux, I thought we could better unify everything, as the current one did not worked anyway. I did not looked at trying to factor things, can you point me to some places where that could be done ? Regards, On 1 okt. 2012, at 21:46, Tom Prince notifications@github.com wrote:
|
Benoît Allard notifications@github.com writes:
Certainly retrying can cause things to succeed, but it isn't clear that
How much of the improvement is from switching to rmdir, and how much is
Tom |
Ok, fine, I don't know either how much of the improvements come from each modification. One of the trouble I had is that the old rmdirRecursive was simply crashing with an exception:
That should'nt happen, so I thought that it was old and buggy ans that I could simply replace it with rmdir as it was anyway only used on windows. Not trying to justify myself, just explaining my chain of thought. What's the way forward now ? |
There are lots of inconclusive discussions of this sort of thing around the web. In fact, some point to our implementation of This code has been around for a while, and works well on a variety of Windows systems against a variety of common issues. I'd rather improve it to cover the failure case you're seeing, than replace it and start that years-long refinement process over. The most obvious solution is to turn off virus scanning, or disable it for the slave's basedir. It might be worth adding this suggestion to the docs. Assuming that's impossible, I suspect that the solution here is delayed retries, retrying infinitely as long as progress is being made. So, given the list of entries in a directory, keep a list of the entries for which you encounter an error, and when you encountered that error, then re-try those, say, 0.1s and 1s later, while working on other entries in the interim. |
Just got the error on another slave (also Win7 x64), so I can show the full traceback:
This error is something we are working on for a long time already, It was one of Harry's first assignment to get used to buildbot before starting with real development ... And he tried a couple of retry stuff before we decide to switch to the system I just mean to say that this solution evolved a lot in the past. Enough about the past, let's look forward ; @djmitche: You proposed to re-try with delay. Do you mean in a |
In this context, I don't think it's necessary to delay asynchronously, particularly if, as I suggested, other directory entries are deleted while waiting for the retry delay to complete. If the algorithm runs out of work, then a In a bit more detail, I'd design the algorithm as follows: first, convert the recursive algorithm to an iterative algorithm in the customary way, using a stack to hold incomplete work (in this case, directory entries that have yet to be deleted). Aside from this, keep a queue (perhaps a heap?) of delayed tasks, each tagged with the time of the next try and the number of retries. Before popping a work item off the stack (or maybe every 100th pop?), check the queue for timed-out delayed tasks, and perform any that are due. If the work stack is empty, but the queue is not, sleep until the next item in the queue is due. Note that when that item executes successfully, it may place more items on the stack. Each directory entry should have a small, finite number of retries (I suggested two, one at 0.1s and one at 1s, above). That way, the algorithm can perform lots of retries, without retrying a single obstinate file too many times. |
Wow, step back for a minute. That's the best we can come up with for removing directories recursively on Windows? YUCK! |
Using |
@tardyp - do you want to pick this back up? If not (and that's fine), let's close the pull req. |
This can always be picked back up in a new pull request.. |
see also http://trac.buildbot.net/ticket/2885 |
And try twice if the first time is not working.
This seems to be the only way of getting rid of annoying trouble when some files are not deleted, making the traversal cleanup fail.