Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errno::ENOENT - No such file or directory - bs_fetch:atomic_write_cache_file:rename #177

Closed
Ferdy89 opened this issue Jul 12, 2018 · 20 comments
Closed

Comments

@Ferdy89
Copy link

@Ferdy89 Ferdy89 commented Jul 12, 2018

Hi! We've recently started to use Bootsnap (1.3.0) and we just noticed this error pop up in our Sentry account for our staging server:

Errno::ENOENT
No such file or directory - bs_fetch:atomic_write_cache_file:rename

bootsnap/compile_cache/iseq.rb in fetch at line 37
bootsnap/compile_cache/iseq.rb in load_iseq at line 37
bootsnap/load_path_cache/core_ext/kernel_require.rb in require at line 21
bootsnap/load_path_cache/core_ext/kernel_require.rb in block in require_with_bootsnap_lfi at line 21
bootsnap/load_path_cache/loaded_features_index.rb in register at line 65
bootsnap/load_path_cache/core_ext/kernel_require.rb in require_with_bootsnap_lfi at line 20
bootsnap/load_path_cache/core_ext/kernel_require.rb in require at line 29

Both the caller and the required file are from gems. We run Rails on a single-threaded/single-process Linux environment inside Docker. So far it has only happened once.

I don't have enough knowledge about Bootsnap to dive deeper into the issue, but I'm happy to provide any other information I can. However, we haven't been able to reproduce it.

Thank you!

@burke
Copy link
Member

@burke burke commented Mar 5, 2019

(sorry for the extreme delay in response)

I'm looking through the code and there are two reasons I can think of for this.

Basically what's happening is:

  1. We want to write out /a/b/c, so we write out /a/b/c.tmp.12384, creating /a/b if necessary.
  2. We move /a/b/c.tmp.12384 to /a/b/c.

The error (ENOENT) is coming from step 2.

This means one of two things:

  1. /a/b/c.tmp.12384 didn't exist; or
  2. /a/b didn't exist.

The two reasons I can think of for this happening are:

  1. A path over 1000 characters long made us create the temporary path incorrectly; or
  2. The tempfile was somehow deleted during the execution of this function.

Anyone who's experiencing this, issue, do you have extremely long paths? Or do you have some kind of frequently-running cruft-cleanup task?

@Ferdy89
Copy link
Author

@Ferdy89 Ferdy89 commented Mar 6, 2019

Thank you for looking into this! I looked through our logs and we're still seeing this issue sporadically (fortunately, still only in our staging environment!).

Here are some data points from the most recent occurrences:

  • It sometimes happens when loading fairly long paths, but not extremely long. Ours is about 127 characters long
  • We don't have any cruft-cleanup task (that I know of)
  • Very often the error happens very early in the life of the Docker container. Particularly while hitting out /healthcheck endpoint from the load balancer (which means that container never even gets hit by real requests)

I hope this helps. Let me know if there's anything I can do to gather better information for you 👍

@Ferdy89
Copy link
Author

@Ferdy89 Ferdy89 commented Mar 7, 2019

Another data point: some of the failures seem to happen when the require is triggered by Ruby's autoload

@ethicalhack3r
Copy link

@ethicalhack3r ethicalhack3r commented Apr 9, 2019

I've been getting this error too. Using docker-compose.

/usr/local/bundle/gems/bootsnap-1.4.3/lib/bootsnap/compile_cache/iseq.rb:37:in `fetch': No such file or directory - bs_fetch:atomic_write_cache_file:chmod (Errno::ENOENT)
@adamdullenty
Copy link

@adamdullenty adamdullenty commented Apr 26, 2019

@ethicalhack3r @Ferdy89 are either of you hitting this error when running in delegated configuration? https://docs.docker.com/docker-for-mac/osxfs-caching/
We just switched to this to fix some local performance issues and started seeing this error - may be related

@ethicalhack3r
Copy link

@ethicalhack3r ethicalhack3r commented May 6, 2019

@adamdullenty I don't believe our configuration uses delegated

@dirkdk
Copy link

@dirkdk dirkdk commented May 6, 2019

we do use delegated in our docker config and get this error

@chamnap
Copy link

@chamnap chamnap commented May 13, 2019

I faced this issue as well. It happens on the blank tmp directory, where both the app and sidekiq are loading at the same time on my docker container. Once there's a cache from bootsnap, the issue seems to be gone.

@ethicalhack3r
Copy link

@ethicalhack3r ethicalhack3r commented May 13, 2019

Yea, so our issue was two machines that were trying to load the /app/tmp directory at the same time, we solved it with the following docker-compose config:

volumes:
      - .:/app
      # don't mount tmp directory
      - /app/tmp
@EnziinSystem
Copy link

@EnziinSystem EnziinSystem commented Jan 24, 2020

Yea, so our issue was two machines that were trying to load the /app/tmp directory at the same time, we solved it with the following docker-compose config:

volumes:
      - .:/app
      # don't mount tmp directory
      - /app/tmp

I deploy the Rails app to docker and I don't mount tmp directory but the problem still occurs.

@wflanagan
Copy link

@wflanagan wflanagan commented Mar 26, 2020

I tried the above fix as well, and it doesn't work either for me.

@cblunt
Copy link

@cblunt cblunt commented Mar 27, 2020

Not sure if this will help, but I found the same issue running a Rails app in Docker. The app has two containers (web and worker), and I think trying to run them at the same time was causing issues.

So far, I've managed to get workaround it by removing bootsnap cache files and restarting the containers separately:

$ docker-compose down
$ rm -rf tmp/cache/bootsnap-*
$ docker-compose up -d web # wait for app to boot
$ docker-compose up -d worker
@ruslanshelar
Copy link

@ruslanshelar ruslanshelar commented May 11, 2020

Have the same problem. Solutions higher do not help.

@brentgreeff
Copy link

@brentgreeff brentgreeff commented May 17, 2020

@cblunt - I have a similar issue.

Changed bootsnaps cache dir

Bootsnap.setup(
  cache_dir: '/bootsnap_cache'
)

I added virtual volumes

volumes:
  rails_bootsnap_cache:
  bg_bootsnap_cache:

Then add the virtual volume to the services

services:
  rails:
    build: .
    volumes:
      - rails_bootsnap_cache:/bootsnap_cache
  bg:
    build: .
    volumes:
      - bg_bootsnap_cache:/bootsnap_cache

For some reason now I am getting bootsnap cache files in both the project dir & in /bootsnap_cache

  • but it seems to work.
@lukaso
Copy link

@lukaso lukaso commented May 17, 2020

I think there's possibly a race condition in bootsnap.c in the function atomic_write_cache_file (which BTW implies it is atomic, but is making a number of different file manipulation calls so not sure if it is really atomic). It might be worth adding some testing code that would output what the file names are and which part of the rename isn't working to be able to better debug.

https://github.com/Shopify/bootsnap/blob/master/ext/bootsnap/bootsnap.c#L546-L554

Our incident has been caused by these calls (and happens very rarely):

time bundle exec rake webdrivers:chromedriver:update &
time bundle exec rake assets:precompile &

Somehow the two calls are interfering with each other (very occasionally). We will test a workaround of adding a sleep of a few seconds in between (assuming it is indeed a race condition rather than something else).

@samcdavid
Copy link

@samcdavid samcdavid commented May 19, 2020

For the time being, a combination of:

Not sure if this will help, but I found the same issue running a Rails app in Docker. The app has two containers (web and worker), and I think trying to run them at the same time was causing issues.

So far, I've managed to get workaround it by removing bootsnap cache files and restarting the containers separately:

$ docker-compose down
$ rm -rf tmp/cache/bootsnap-*
$ docker-compose up -d web # wait for app to boot
$ docker-compose up -d worker

and

Yea, so our issue was two machines that were trying to load the /app/tmp directory at the same time, we solved it with the following docker-compose config:

volumes:
- .:/app
# don't mount tmp directory
- /app/tmp

worked for me. My gut is telling me that @lukaso is on to something with the race condition but this work around is ok for now.

Kovah added a commit to Kovah/KVH-Tools that referenced this issue May 24, 2020
abicky added a commit to abicky/bootsnap that referenced this issue Jun 28, 2020
mkstemp(3) ensures that a unique file is created, but in the
previous implementation, there's a possibility that a process
uses the temporary file created by another process if mkstemp(3)
fails to create a file due to EEXIST. That has the same risk as
Shopify#174.

This commit will also resolve
Shopify#177 if the cause is
that multiple processes try to create a file with the same name
at the same time.
abicky added a commit to abicky/bootsnap that referenced this issue Jun 28, 2020
mkstemp(3) ensures that a unique file is created, but in the
previous implementation, there's a possibility that a process
uses the temporary file created by another process if mkstemp(3)
fails to create a file due to EEXIST. That has the same risk as
Shopify#174.

This commit will also resolve
Shopify#177 if the cause is
that multiple processes try to create a file with the same name
at the same time.
@burke burke closed this in #309 Jul 14, 2020
@matti
Copy link

@matti matti commented Jul 16, 2020

now this is closed, but no release has been made

jivdhaliwal added a commit to ministryofjustice/staff-device-dns-dhcp-admin that referenced this issue Aug 21, 2020
We were getting the following error when running tests

An error occurred while loading ./spec/controllers/subnets_controller_spec.rb.
Failure/Error: require File.expand_path("../config/environment", __dir__)

Errno::ENOENT:
  No such file or directory - bs_fetch:atomic_write_cache_file:chmod

See Shopify/bootsnap#177
jivdhaliwal added a commit to ministryofjustice/staff-device-dns-dhcp-admin that referenced this issue Aug 24, 2020
We were getting the following error when running tests

An error occurred while loading ./spec/controllers/subnets_controller_spec.rb.
Failure/Error: require File.expand_path("../config/environment", __dir__)

Errno::ENOENT:
  No such file or directory - bs_fetch:atomic_write_cache_file:chmod

See Shopify/bootsnap#177
@joshuapinter
Copy link

@joshuapinter joshuapinter commented Jan 20, 2021

Just to add to this, we're seeing this when using parallel_tests gem to run our test suite in parallel. Not seeing it on our development machines but we are seeing it quite frequently on our CI Mac Pro that has 24 threads and is parallelizing the test suite with 24 test runners.

Based on this, I would assume some kind of race condition is happening like @lukaso mentioned above. I have to disable this in our CI environment for now, which isn't great because we're seeing a 50% decrease in test times when using bootsnap, due to the large number of threads spinning up the Rails environment for testing.

@casperisfine
Copy link
Contributor

@casperisfine casperisfine commented Apr 14, 2021

Ref: #353, more investigation happened there, and at this point we believe it is caused by a race conditions when multiple process use the same cache directory.

That other issue also include an experimental fix, that I'd appreciate if people suffering from this problem could experiment with and report back.

@ouaziz
Copy link

@ouaziz ouaziz commented Apr 24, 2021

Juste restart Docker engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.