slave: use system rmdir on windows #540

benallard · 2012-10-01T11:29:03Z

And try twice if the first time is not working.

This seems to be the only way of getting rid of annoying trouble when some files are not deleted, making the traversal cleanup fail.

…eded)

tomprince · 2012-10-01T19:46:48Z

I certainly want to support windows better.

How did you determine that calling rmdir (at most twice) is more reliable than what we have now. The fact that you have to retry once seems to me like it might be more reliable with your usage patterns, but not generally.

Also, rmdirRecursive is used in a few places on the slave, so it would be better to factor the logic into a single place.

benallard · 2012-10-01T21:38:37Z

Hey Tom,

I tried a couple of things, the main trouble there is that windows (on the fs level) when deleting a file, actually only flip a switch, that only delete the file when all the file descriptors are closed. So, if a service (name any antivirus for instance) is browsing your file system at this moment, the rmdirRecursive will fail whatever you try as removing the parent directory will not succeed, this one not being empty. Trying another time will often succeed as the service (or whatever was browsing at that time) will surely have changed file in the meantime.

We often had trouble with the clobber functionality of the mercurial source step (on 64-bits), this, running for some months now seems to be pretty reliable ... Calling 'rmdir' in itself, instead if the previously complicated function maybe does not really change things, but as we call 'rm -rf' on Linux, I thought we could better unify everything, as the current one did not worked anyway.

I did not looked at trying to factor things, can you point me to some places where that could be done ?

Regards,
Benoît.

On 1 okt. 2012, at 21:46, Tom Prince notifications@github.com wrote:

I certainly want to support windows better?

How did you determine that calling rmdir (at most twice) is more reliable than what we have now. The fact that you have to retry once seems to me like it might be more reliable with your usage patterns, but not generally.

Also, rmdirRecursive is used in a few places on the slave, so it would be better to factor the logic into a single place.

—
Reply to this email directly or view it on GitHub.

tomprince · 2012-10-03T01:53:06Z

Benoît Allard notifications@github.com writes:

I tried a couple of things, ...

Certainly retrying can cause things to succeed, but it isn't clear that
retrying once is better than retying multiple times. (I'm not suggesting
you rewrite this to retry multiple times.)

We often had trouble with the clobber functionality of the mercurial source step (on 64-bits), this, running for some months now seems to be pretty reliable ... Calling 'rmdir' in itself, instead if the previously complicated function maybe does not really change things, but as we call 'rm -rf' on Linux, I thought we could better unify everything, as the current one did not worked anyway.

How much of the improvement is from switching to rmdir, and how much is
from retrying? I worry that this might make things more reliable for
you, and less reliable for someone else. Without some analysis of why/if
this is better, I am relucant to change things around based on anecdotal
evidence.

I did not looked at trying to factor things, can you point me to some places where that could be done ?

git grep rmdirRecursive and git grep 'rm.*-rf'.

Tom

benallard · 2012-10-04T08:59:58Z

Ok, fine, I don't know either how much of the improvements come from each modification.

One of the trouble I had is that the old rmdirRecursive was simply crashing with an exception:

exceptions.WindowsError: [Error 145] The directory is not empty: u'c:\...'

That should'nt happen, so I thought that it was old and buggy ans that I could simply replace it with rmdir as it was anyway only used on windows. Not trying to justify myself, just explaining my chain of thought.

What's the way forward now ?

tomprince · 2012-10-04T22:28:59Z

It wouldn't suprise me if rmdir is more reliable, but I don't want to
just randomly change things (maybe @Callek can chime in here).

Retrying just feels like it is papering over the issue, but maybe that
is the best that can be done one windows. (@djmitche: Thoughts?)

djmitche · 2012-10-05T14:39:48Z

There are lots of inconclusive discussions of this sort of thing around the web. In fact, some point to our implementation of rmdirRecursive. One of the things rmdirRecursive gets right is setting permissions while removing. I don't think rmdir does so.

This code has been around for a while, and works well on a variety of Windows systems against a variety of common issues. I'd rather improve it to cover the failure case you're seeing, than replace it and start that years-long refinement process over.

The most obvious solution is to turn off virus scanning, or disable it for the slave's basedir. It might be worth adding this suggestion to the docs. Assuming that's impossible, I suspect that the solution here is delayed retries, retrying infinitely as long as progress is being made. So, given the list of entries in a directory, keep a list of the entries for which you encounter an error, and when you encountered that error, then re-try those, say, 0.1s and 1s later, while working on other entries in the interim.

benallard · 2012-10-24T08:57:58Z

Just got the error on another slave (also Win7 x64), so I can show the full traceback:

exception from rmdirRecursive
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 524, in __bootstrap
    self.__bootstrap_inner()
  File "C:\Python27\lib\threading.py", line 551, in __bootstrap_inner
    self.run()
  File "C:\Python27\lib\threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
--- <exception caught here> ---
  File "C:\Python27\lib\site-packages\twisted\python\threadpool.py", line 167, in _worker
    result = context.call(ctx, function, *args, **kwargs)
  File "C:\Python27\lib\site-packages\twisted\python\context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "C:\Python27\lib\site-packages\twisted\python\context.py", line 81, in callWithContext
    return func(*args,**kw)
  File "C:\Python27\lib\site-packages\buildslave\commands\utils.py", line 91, in rmdirRecursive
    rmdirRecursive(full_name)
  File "C:\Python27\lib\site-packages\buildslave\commands\utils.py", line 91, in rmdirRecursive
    rmdirRecursive(full_name)
  File "C:\Python27\lib\site-packages\buildslave\commands\utils.py", line 91, in rmdirRecursive
    rmdirRecursive(full_name)
  File "C:\Python27\lib\site-packages\buildslave\commands\utils.py", line 91, in rmdirRecursive
    rmdirRecursive(full_name)
  File "C:\Python27\lib\site-packages\buildslave\commands\utils.py", line 91, in rmdirRecursive
    rmdirRecursive(full_name)
  File "C:\Python27\lib\site-packages\buildslave\commands\utils.py", line 91, in rmdirRecursive
    rmdirRecursive(full_name)
  File "C:\Python27\lib\site-packages\buildslave\commands\utils.py", line 91, in rmdirRecursive
    rmdirRecursive(full_name)
  File "C:\Python27\lib\site-packages\buildslave\commands\utils.py", line 91, in rmdirRecursive
    rmdirRecursive(full_name)
  File "C:\Python27\lib\site-packages\buildslave\commands\utils.py", line 91, in rmdirRecursive
    rmdirRecursive(full_name)
  File "C:\Python27\lib\site-packages\buildslave\commands\utils.py", line 91, in rmdirRecursive
    rmdirRecursive(full_name)
  File "C:\Python27\lib\site-packages\buildslave\commands\utils.py", line 96, in rmdirRecursive
    os.rmdir(dir)
exceptions.WindowsError: [Error 145] The directory is not empty: u'C:\\Users\\buildbot\\buildslave\\slave_win_inst_installer_pack\\build\\.hg\\store\\data\\long\\long\\long\\long\\long\\long\\directory'
program finished with exit code -1

This error is something we are working on for a long time already, It was one of Harry's first assignment to get used to buildbot before starting with real development ... And he tried a couple of retry stuff before we decide to switch to the system rmdir. Too bad we did not documented the progresses, but I think there is an old email where he asks for advices over there. Not saying he tried smart stuff though, he did not knew how to write a line of python at that time.

I just mean to say that this solution evolved a lot in the past.

Enough about the past, let's look forward ; @djmitche: You proposed to re-try with delay. Do you mean in a deffered way ? Are you thinking of something like callLater or such ? Can you elaborate ?

djmitche · 2012-10-30T00:42:57Z

In this context, I don't think it's necessary to delay asynchronously, particularly if, as I suggested, other directory entries are deleted while waiting for the retry delay to complete. If the algorithm runs out of work, then a sleep() is OK.

In a bit more detail, I'd design the algorithm as follows: first, convert the recursive algorithm to an iterative algorithm in the customary way, using a stack to hold incomplete work (in this case, directory entries that have yet to be deleted).

Aside from this, keep a queue (perhaps a heap?) of delayed tasks, each tagged with the time of the next try and the number of retries. Before popping a work item off the stack (or maybe every 100th pop?), check the queue for timed-out delayed tasks, and perform any that are due. If the work stack is empty, but the queue is not, sleep until the next item in the queue is due. Note that when that item executes successfully, it may place more items on the stack.

Each directory entry should have a small, finite number of retries (I suggested two, one at 0.1s and one at 1s, above). That way, the algorithm can perform lots of retries, without retrying a single obstinate file too many times.

djmitche · 2013-01-19T22:48:36Z

Wow, step back for a minute. That's the best we can come up with for removing directories recursively on Windows? YUCK!

tomprince · 2013-01-20T05:42:18Z

Using @inlineCallbacks and deferLater(reactor, delay, lambda: None) instead of sleep will prevent this stalling other builds.

djmitche · 2013-11-17T17:24:14Z

@tardyp - do you want to pick this back up? If not (and that's fine), let's close the pull req.

djmitche · 2014-04-07T01:25:07Z

This can always be picked back up in a new pull request..

djmitche · 2014-09-03T13:31:13Z

see also http://trac.buildbot.net/ticket/2885

Benoit Allard added 2 commits September 28, 2012 20:43

Rewrite the rmdir function on Windows: call system rmdir (twice if ne…

6762f45

…eded)

Merge remote-tracking branch 'upstream/master' into doublermdir

da7c590

thezoggy mentioned this pull request Apr 14, 2013

WindowsError is a subclass of OSError and need not be excepted midgetspy/Sick-Beard#654

Merged

djmitche closed this Apr 7, 2014

benallard deleted the doublermdir branch October 20, 2014 19:11

ciarancourtney mentioned this pull request Jul 24, 2019

[buildbot-worker] git clean may fail on windows with 'warning: failed to remove ***: Directory not empty' #4916

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slave: use system rmdir on windows #540

slave: use system rmdir on windows #540

benallard commented Oct 1, 2012

tomprince commented Oct 1, 2012

benallard commented Oct 1, 2012

tomprince commented Oct 3, 2012

benallard commented Oct 4, 2012

tomprince commented Oct 4, 2012

djmitche commented Oct 5, 2012

benallard commented Oct 24, 2012

djmitche commented Oct 30, 2012

djmitche commented Jan 19, 2013

tomprince commented Jan 20, 2013

djmitche commented Nov 17, 2013

djmitche commented Apr 7, 2014

djmitche commented Sep 3, 2014

slave: use system rmdir on windows #540

slave: use system rmdir on windows #540

Conversation

benallard commented Oct 1, 2012

tomprince commented Oct 1, 2012

benallard commented Oct 1, 2012

tomprince commented Oct 3, 2012

benallard commented Oct 4, 2012

tomprince commented Oct 4, 2012

djmitche commented Oct 5, 2012

benallard commented Oct 24, 2012

djmitche commented Oct 30, 2012

djmitche commented Jan 19, 2013

tomprince commented Jan 20, 2013

djmitche commented Nov 17, 2013

djmitche commented Apr 7, 2014

djmitche commented Sep 3, 2014