New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while building tensorflow 0.11.0 - cache (directory not empty) #1970

Closed
rohan589 opened this Issue Oct 20, 2016 · 13 comments

Comments

Projects
None yet
@rohan589

rohan589 commented Oct 20, 2016

I'm trying to install tensorflow 0.11.0 by running

./configure

I'm getting an error saying :

ERROR: /home/abc/.cache/bazel/_bazel_abc/235fe154e0/server (Directory not empty).

I'm not sure if they are related, but before the error message, I also get a warning saying:

WARNING: Output base '/home/abc/.cache/bazel/_bazel_abc/235fe154e0' is on NFS.     
This may lead to surprising failures and undetermined behavior.

I have no clue what the error message means, but if I try running ./configure right after this error message, I get another message saying:

/home/rkohli1/.cache/bazel/_bazel_rkohli1/235fe154e0a4c7e0c0527cd185fe6b6b/server/
.nfs00000000820050bd00000e9e (Device or resource busy).

At this point, I just tried deleting the entire .cache folder (I had to first kill a process which was preventing me from deleting it). I tried running configure with the --expunge_async flag as well but it doesn't help. It takes me back to the first error message.

Not sure if it's relevant, but I'm trying to install tensorflow with GPU support and use cuda 8.0 and cudNN 5

I raised this issue on stackoverflow (http://stackoverflow.com/questions/40144776/tensorflow-installation-error-directory-not-empty), and someone pointed out that it's due to a bug in bazel. Please advise me if I'm wrong.

@zhang-jian

This comment has been minimized.

Show comment
Hide comment
@zhang-jian

zhang-jian Oct 21, 2016

I am also having the same issue at the moment.

zhang-jian commented Oct 21, 2016

I am also having the same issue at the moment.

@elenacuoco

This comment has been minimized.

Show comment
Hide comment
@elenacuoco

elenacuoco Oct 21, 2016

Same error message also for me.

elenacuoco commented Oct 21, 2016

Same error message also for me.

@lematt1991

This comment has been minimized.

Show comment
Hide comment
@lematt1991

lematt1991 Oct 21, 2016

Also having the same issue.

lematt1991 commented Oct 21, 2016

Also having the same issue.

@zhang-jian

This comment has been minimized.

Show comment
Hide comment
@zhang-jian

zhang-jian Oct 21, 2016

No sure if this is correct, but after I make the following change in the configure file:

function bazel_clean_and_fetch() {

bazel clean --expunge currently doesn't work on Windows

TODO(pcloudy): Re-enable it after bazel clean --expunge is fixed.

if ! is_windows; then
#bazel clean --expunge
bazel clean --expunge_async
fi
bazel fetch //tensorflow/...
}

I can install tensorflow 0.11 from source, with

  • bazel 0.3.1
  • cuDNN 5
  • Cuda 8.0

Jian

zhang-jian commented Oct 21, 2016

No sure if this is correct, but after I make the following change in the configure file:

function bazel_clean_and_fetch() {

bazel clean --expunge currently doesn't work on Windows

TODO(pcloudy): Re-enable it after bazel clean --expunge is fixed.

if ! is_windows; then
#bazel clean --expunge
bazel clean --expunge_async
fi
bazel fetch //tensorflow/...
}

I can install tensorflow 0.11 from source, with

  • bazel 0.3.1
  • cuDNN 5
  • Cuda 8.0

Jian

@ulfjack

This comment has been minimized.

Show comment
Hide comment
@ulfjack

ulfjack Dec 1, 2016

Contributor

+meteorcloudy

Contributor

ulfjack commented Dec 1, 2016

+meteorcloudy

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Dec 12, 2016

Contributor

Wait there is multiple various issue collated to this one:

  1. NFS mount point are known to be problematic. Use --output_base to direct the cache dir of bazel out of the NFS mount point.
  2. This is a known issue on Windows and @meteorcloudy is working on fixing it IIUC.

Anyway rm the bazel cache should fix the issue all the time.

Closing this issue please reopen a specific one for your use cases if you are not on those case.

Contributor

damienmg commented Dec 12, 2016

Wait there is multiple various issue collated to this one:

  1. NFS mount point are known to be problematic. Use --output_base to direct the cache dir of bazel out of the NFS mount point.
  2. This is a known issue on Windows and @meteorcloudy is working on fixing it IIUC.

Anyway rm the bazel cache should fix the issue all the time.

Closing this issue please reopen a specific one for your use cases if you are not on those case.

@damienmg damienmg closed this Dec 12, 2016

@AIROBOTAI

This comment has been minimized.

Show comment
Hide comment
@AIROBOTAI

AIROBOTAI Jan 17, 2017

hi @damienmg, I encountered the same issue as yours. As you suggested, I used ./configure --output_base=/temp/cache_bazel, however, I still found below warnings during configuration:

WARNING: Output base '/home/AIJ/.cache/bazel/_bazel_AIJ/aa61f742fcd63eed03445cc6cf85534c' is on NFS. This may lead to surprising failures and undetermined behavior.

Does this mean the output base has not changed to my specified folder, i.e. /temp/cache_bazel? And what should I do to make the cache dir of bazel out to be local?

Thanks!

AIROBOTAI commented Jan 17, 2017

hi @damienmg, I encountered the same issue as yours. As you suggested, I used ./configure --output_base=/temp/cache_bazel, however, I still found below warnings during configuration:

WARNING: Output base '/home/AIJ/.cache/bazel/_bazel_AIJ/aa61f742fcd63eed03445cc6cf85534c' is on NFS. This may lead to surprising failures and undetermined behavior.

Does this mean the output base has not changed to my specified folder, i.e. /temp/cache_bazel? And what should I do to make the cache dir of bazel out to be local?

Thanks!

@sfincke

This comment has been minimized.

Show comment
Hide comment
@sfincke

sfincke Jan 20, 2017

After having the same problems as @AIROBOTAI , I finally hacked ./configure into submission. In 'bazel_clean_and_fetch', I added '--output_base TARGET_DIRCTORY' to both 'bazel ... clean' and 'bazel ... fetch'.

sfincke commented Jan 20, 2017

After having the same problems as @AIROBOTAI , I finally hacked ./configure into submission. In 'bazel_clean_and_fetch', I added '--output_base TARGET_DIRCTORY' to both 'bazel ... clean' and 'bazel ... fetch'.

@yselivonchyk

This comment has been minimized.

Show comment
Hide comment
@yselivonchyk

yselivonchyk Jan 26, 2017

I was trying to build TF0.12 from source with bazel using NFS.

Neither of the suggestions from above worked for me:

  1. editing .config file and adding --output_base did not work for fetch
  2. everything from that thread resulted in the same NFS warning and issue with bazel's cache

This solution seems to be helping:
http://stackoverflow.com/questions/40144776/tensorflow-installation-error-directory-not-empty

Solution:
edit .config file and replace
bazel clean --expunge
with
bazel clean --expunge_async

yselivonchyk commented Jan 26, 2017

I was trying to build TF0.12 from source with bazel using NFS.

Neither of the suggestions from above worked for me:

  1. editing .config file and adding --output_base did not work for fetch
  2. everything from that thread resulted in the same NFS warning and issue with bazel's cache

This solution seems to be helping:
http://stackoverflow.com/questions/40144776/tensorflow-installation-error-directory-not-empty

Solution:
edit .config file and replace
bazel clean --expunge
with
bazel clean --expunge_async

@PiranjaF

This comment has been minimized.

Show comment
Hide comment
@PiranjaF

PiranjaF Feb 24, 2017

I attempted both solutions suggest by @sfincke and @yselivonchyk, but without luck. Finally, I managed to change the cache location by running TEST_TMPDIR=/tmp/bazel/ ./configure, which solved the issue.

This global variable sets the overall cache directory as described here: https://bazel.build/versions/master/docs/output_directories.html.

PiranjaF commented Feb 24, 2017

I attempted both solutions suggest by @sfincke and @yselivonchyk, but without luck. Finally, I managed to change the cache location by running TEST_TMPDIR=/tmp/bazel/ ./configure, which solved the issue.

This global variable sets the overall cache directory as described here: https://bazel.build/versions/master/docs/output_directories.html.

@yselivonchyk

This comment has been minimized.

Show comment
Hide comment
@yselivonchyk

yselivonchyk Apr 10, 2017

Update,

I tried the same thing again: compile latest version of TF with bazel while using NFS file system after adjusting .config "bazel clean --expunge_async".
It did not work. After sometime at a random step server just hangs. The NFS consumes full network capacity while process is doing nothing. Most astonishingly, kill -9 does not help the process.

So, I would not recommend doint that on NFS unless you are free to restart your servers.

I tried some bazel commands to use custom cache location, but it did not work either.

yselivonchyk commented Apr 10, 2017

Update,

I tried the same thing again: compile latest version of TF with bazel while using NFS file system after adjusting .config "bazel clean --expunge_async".
It did not work. After sometime at a random step server just hangs. The NFS consumes full network capacity while process is doing nothing. Most astonishingly, kill -9 does not help the process.

So, I would not recommend doint that on NFS unless you are free to restart your servers.

I tried some bazel commands to use custom cache location, but it did not work either.

@fvisin

This comment has been minimized.

Show comment
Hide comment
@fvisin

fvisin May 9, 2017

I confirm #1970 (comment) works for me as well!

fvisin commented May 9, 2017

I confirm #1970 (comment) works for me as well!

@PiranjaF

This comment has been minimized.

Show comment
Hide comment
@PiranjaF

PiranjaF May 26, 2017

For anyone still having issues with this note that TEST_TMPDIR=/tmp/bazel/ should be used before any command related to compilation. For instance, bazel clean --expunge_async should also be TEST_TMPDIR=/tmp/bazel/ bazel clean --expunge_async using your chosen temp directory.

PiranjaF commented May 26, 2017

For anyone still having issues with this note that TEST_TMPDIR=/tmp/bazel/ should be used before any command related to compilation. For instance, bazel clean --expunge_async should also be TEST_TMPDIR=/tmp/bazel/ bazel clean --expunge_async using your chosen temp directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment