Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use realpaths in Cache @path_obj #136

Merged
merged 2 commits into from Apr 5, 2018
Merged

Use realpaths in Cache @path_obj #136

merged 2 commits into from Apr 5, 2018

Conversation

ojab
Copy link
Contributor

@ojab ojab commented Feb 25, 2018

If gems are loaded from symlinked directory, $LOAD_PATH has mixed symlinked
and real paths, while subsequent requires are resolved to real path, so we can
end up in situation when we rerequire the same gem with different path, require
loads this gem again and things go BOOM. for example:

From: /…/src/bootsnap/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb @ line 18 Kernel#require:

    15: def require(path)
    16:   if resolved = Bootsnap::LoadPathCache.load_path_cache.find(path)
    17:     binding.pry if path == 'set'
 => 18:     require_without_cache(resolved)
    19:   else
    20:     raise Bootsnap::LoadPathCache::CoreExt.make_load_error(path)
    21:   end
    22: rescue Bootsnap::LoadPathCache::ReturnFalse
    23:   return false
    24: rescue Bootsnap::LoadPathCache::FallbackScan
    25:   require_without_cache(path)
    26: end

[1] pry(main)> path
=> "set"
[2] pry(main)> Set.new.method(:to_s).source_location
=> ["/real_path/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/set.rb", 629]
[3] pry(main)> resolved
=> "/symlinked_path/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/set.rb"
[4] pry(main)> require_without_cache(resolved)
/symlinked_path/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/set.rb:625: warning: already initialized constant Set::InspectKey
/real_path/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/set.rb:625: warning: previous definition of InspectKey was here
=> true
[5] pry(main)> Set.new.method(:to_s).source_location
=> ["/symlinked_path/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/set.rb", 629]
[6] pry(main)>

@ojab
Copy link
Contributor Author

ojab commented Feb 25, 2018

Hopefully this fixes a bunch of already initialized constant issues that exists today, at least it fixes rails start for me.

  • I'm not sure if it's MRI bug (and $LOAD_PATH should always be real paths in the first place)
  • I have no idea how it can be tested
  • AFAIU all subsequent requires are resolved to real path, so we only need this on $LOAD_PATH, but I can be wrong

Please advise how should I proceed with this PR.

@ojab
Copy link
Contributor Author

ojab commented Feb 25, 2018

#54
#75
#112
#114
#122
#123
#134

Maybe the same issue with different symptoms:
#86
#103
#106

@burke
Copy link
Member

burke commented Feb 26, 2018

IIRC realpath (or the C equivalent thereof) is normally called after the point at which we've hooked require, so this would cause mismatches in some cases. (additionally, calling realpath is frustratingly expensive, though this isn't as big a deal for this initialization case).

@ojab
Copy link
Contributor Author

ojab commented Feb 26, 2018

Hmm, could you elaborate where how exactly we can have mismatch? AFAIU files are always resolved using realpath, but $LOAD_PATH sometimes contains symlinked paths.
So far it fixes issue for some people (#123 (comment), #112 (comment), #75 (comment), myself included).

I understand that realpath is expensive, but alternative right now is disabling bootsnap completely :(.

@ojab
Copy link
Contributor Author

ojab commented Feb 26, 2018

And of course I'm happy to rework this PR if other solution is more suitable, but unfortunately I haven't found such solution.

@burke
Copy link
Member

burke commented Feb 26, 2018

The main case in which this is an issue is:

  1. The user has the application source at some path on disk, say, /Users/me/src/github.com/org/app
  2. The user symlinks the application somewhere else, say, /Users/me/app
  3. The application adds (e.g.) lib to the $LOAD_PATH
  4. The application uses a combination of require and require_relative to load files from lib.

For example, let's say there's a file lib/util.rb, which is required sometimes with require_relative and sometimes with require.

One file might require util.rb as require_relative '../lib/util.rb'; another as require 'util'.

If bootsnap is configured to resolve indexed paths to their realpath, then the first invocation would resolve to /Users/me/src/github.com/org/app/lib/util.rb, while the second would resolve to /Users/me/app/lib/util.rb, and the file would be double-loaded.

Here's some exploration of this issue, though it doesn't precisely reproduce the case described here:

require 'fileutils'

$info = ->(indent, msg) do
  STDERR.puts(format("%*s\x1b[1;34m%% \x1b[1;37m%s\x1b[0m", indent, "", msg))
end

$real_app_root      = 'realpath'
$symlinked_app_root = 'expanded_path'
$load_path_entry    = 'expanded_path/lib'
$some_file          = 'expanded_path/lib/some-file.rb'

at_exit do
  File.unlink($some_file)           rescue nil
  FileUtils.rmdir($load_path_entry) rescue nil
  File.unlink($symlinked_app_root)  rescue nil
  FileUtils.rmdir($real_app_root)   rescue nil
end

FileUtils.mkdir($real_app_root)
FileUtils.ln_s($real_app_root, $symlinked_app_root)
FileUtils.mkdir($load_path_entry)
File.write($some_file, 'puts "\x1b[1;31m!\x1b[0;31m evaluated contents\x1b[0m"')

$LOAD_PATH.unshift(File.expand_path($load_path_entry))

alias :req :require
def require(f)
  $info.(0, "require(#{f.inspect})")
  req(f)
end

def (RubyVM::InstructionSequence).load_iseq(f)
  $info.(2, "load_iseq(#{f.inspect})")
end

require 'some-file'
puts $LOADED_FEATURES.grep(/some-file/)

require_relative './expanded_path/lib/some-file'
puts $LOADED_FEATURES.grep(/some-file/)

require_relative './realpath/lib/some-file'
puts $LOADED_FEATURES.grep(/some-file/)

It might be possible to work around this by also hooking require_relative to also realpath its argument.

@ojab
Copy link
Contributor Author

ojab commented Feb 26, 2018

Idea to override $LOAD_PATH modifying methods and make all paths realpaths on insert/whatever sounds too crazy?

@ojab
Copy link
Contributor Author

ojab commented Feb 26, 2018

Forget it, doesn't help here.
Looks like we can't do this without realpath overhead on each require_relative :/
I'll try to think more about it and write some code this weekend.

@burke
Copy link
Member

burke commented Feb 26, 2018

I think it might not be completely crazy to add that overhead to require_relative.

One other possibility would be to take the realpath of the directory containing the target of require_relative and cache that, joining the basename to the resolved/cached dirname. If we wanted to be extra safe, we could lstat the target and, if the file itself is a symlink, run a full realpath on that file to resolve it.

@burke
Copy link
Member

burke commented Feb 26, 2018

Something like this might do the trick:

require 'English'

module RealpathCache
  @cache = {}

  class << self
    def call(path)
      call_rec(File.expand_path(path))
    end

    private

    def call_rec(path)
      if found = @cache[path]
        return found
      end

      # base case: FS root
      return path if File.dirname(path) == path

      if File.lstat(path).symlink?
        link_target = File.readlink(path)
        link_base   = File.dirname(path)
        expanded    = File.expand_path(link_target, link_base)
        return @cache[path] = call_rec(expanded)
      end

      File.join(
        call_rec(File.dirname(path)),
        File.basename(path)
      )
    end
  end
end

if __FILE__ == $PROGRAM_NAME
  puts RealpathCache.call(ARGV.first)
end

@ojab ojab force-pushed the master branch 2 times, most recently from 085fa8b to ecf7444 Compare March 10, 2018 08:48
@ojab
Copy link
Contributor Author

ojab commented Mar 10, 2018

Sorry for delay (been lazy/other stuff).
Pushed version that should correctly handle require_relative w/ symlinks.
Basically it's doing the same stuff that ruby does in require_relative under the hood, but using real directory instead of possible symlink.

Do we care that required file itself can be a symlink?

@ojab ojab closed this Mar 10, 2018
@ojab ojab reopened this Mar 10, 2018
@ojab
Copy link
Contributor Author

ojab commented Mar 10, 2018

If we wanted to be extra safe, we could lstat the target and, if the file itself is a symlink, run a full realpath on that file to resolve it.

ah, looks like we care.
But we don't have actual filename (i. e. with extension), so how can we know if file.rb or file.so is actually required? Something like CACHED_EXTENSIONS.each { |ext| File.exist?("#{file}#{ext}" } doesn't look ok for me ._.

@rlue
Copy link

rlue commented Mar 23, 2018

Quick question, fellas — is there any way that this is related to this Ruby bug, require_relative and require should be compatible with each other when symlinks are used? It must not be the root cause, because it looks like fixes were merged before the release of v2.5.0, but the revision that fixed it sounds like it should be awful relevant:

Revision 62440

file.c: rb_check_realpath

* file.c (rb_check_realpath): returns real path which has no
  symbolic links.  similar to rb_realpath except for returning
  Qnil if any parts did not exist.

load.c: real path to load

* load.c (rb_construct_expanded_load_path): expand load paths to
  real paths to get rid of duplicate loading from symbolic-linked
  directories.  [Feature #10222]

Changes were committed to Ruby trunk September of last year (5754f15 & b6d3927), but I'm having trouble making out how exactly this patches the require and require_relative methods.

@ojab
Copy link
Contributor Author

ojab commented Mar 29, 2018

We're hijacking require/require_relative to load cached versions, so we're basically doing the same thing here ourself.

@ojab
Copy link
Contributor Author

ojab commented Mar 29, 2018

Added tests for RealpathCache and fixed an issue when required file is a symlink itself.
Anything else should be done to get it merged?

@ojab
Copy link
Contributor Author

ojab commented Mar 29, 2018

Pushed fixed version.

@yskkin
Copy link

yskkin commented Mar 30, 2018

As I commented in #147 (comment), I still have an error for my case.

@ojab
Copy link
Contributor Author

ojab commented Mar 30, 2018

Pushed version that iterates over ['', *CACHED_EXTENSIONS] to find required file.

@yskkin please check if it works for you.

@yskkin
Copy link

yskkin commented Mar 30, 2018

It works. 🎉 Thank you @ojab!

@burke burke merged commit d9ec6c2 into Shopify:master Apr 5, 2018
@burke
Copy link
Member

burke commented Apr 5, 2018

I've pushed this as 1.3.0.beta, but it broke Shopify boot. I'll try to dig into the issue.

@wpolicarpo
Copy link

@ojab I have the same issue here when I try to start the application in a symlink'ed folder (like mina and capistrano do).

@burke 1.3.0.beta also broke the normal usage (no symlinks) for me.

@ojab
Copy link
Contributor Author

ojab commented Apr 5, 2018

@wpolicarpo could you create a new issue with description/backtrace and cc me?

@@ -9,7 +9,7 @@ def initialize(store, path_obj, development_mode: false)
@development_mode = development_mode
@store = store
@mutex = defined?(::Mutex) ? ::Mutex.new : ::Thread::Mutex.new # TODO: Remove once Ruby 2.2 support is dropped.
@path_obj = path_obj
@path_obj = path_obj.map { |f| File.exist?(f) ? File.realpath(f) : f }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is the problem! @path_obj needs to continue to be an actual reference to $LOAD_PATH or to ActiveSupport::Dependencies.autoload_paths, and any change to it needs to trigger a corresponding event in ChangeObserver.

One way to do this here would be to:

        path_obj.map! { |f| File.exist?(f) ? File.realpath(f) : f }
        @path_obj = path_obj

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best too, at this point, to either:

  1. Make the ChangeObserver understand map!; or
  2. Make this line blow up if the ChangeObserver has bound to this object`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch, understood, will take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS ChangeObserver already understands map!, so .map! and we should be fine?
Any hints how it can be reproduced locally?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh great then.

To reproduce, I guess:

  1. load bootsnap
  2. Add a new dir to the load path via, say, $LOAD_PATH.unshift
  3. Observe that bootsnap will not load features from this path since the ChangeObserver hook did not fire.

An easy way to catch this is to just log or whatever from the hooked method in ChangeObserver.

@ojab
Copy link
Contributor Author

ojab commented Apr 5, 2018

@wpolicarpo could you try gem 'bootsnap', github: 'ojab/bootsnap', require: false?

@burke
Copy link
Member

burke commented Apr 5, 2018

1.3.0.beta2 is pushed. Problem seems fixed for me.

@wpolicarpo
Copy link

@ojab it's working fine now. Thanks!

@gathuku
Copy link

gathuku commented Nov 9, 2019

Am having this issue any help.

 Unable to load application: I18n::InvalidLocaleData: can not load translations from /usr/local/bundle/gems/responders-3.0.0/lib/responders/locales/en.yml: #<NameError: uninitialized constant Bootsnap::CompileCache::ISeq>

casperisfine pushed a commit that referenced this pull request Feb 7, 2022
It was added in #136 to handle
a very specific scenario:

```
/app/lib/ - real directory
/symlink/ - symlink to /app/

```
casperisfine pushed a commit that referenced this pull request Feb 7, 2022
It was added in #136 to handle
a very specific scenario:

```
/app/lib/ - real directory
/symlink/ - symlink to /app/

```
casperisfine pushed a commit that referenced this pull request Feb 7, 2022
It was added in #136 to handle
a very specific scenario:

```
/app/lib/ - real directory
/symlink/ - symlink to /app/

```
casperisfine pushed a commit that referenced this pull request Feb 7, 2022
Ref: #402
Ref: https://bugs.ruby-lang.org/issues/10222
Ref: ruby/ruby@5754f15
Ref: ruby/ruby@b6d3927

Up to Ruby 2.3, `require` would resolve symlinks, but `require_relative` wouldn't:

```ruby
require 'fileutils'
FileUtils.mkdir_p("realpath")
File.write("realpath/a.rb", "p :a_loaded")
File.symlink("realpath", "symlink") rescue nil

$LOAD_PATH.unshift(File.realpath(__dir__) + "/symlink")
require "a.rb" # load symlink/a.rb in 2.3 and older, load realpath/a.rb on 2.4 and newer
require_relative "realpath/a.rb" # noop on 2.4+
```

This would easily cause double loading issue when `require` and `require_relative`
were mixed, but was fixed in 2.4 (https://bugs.ruby-lang.org/issues/10222).

The problem is that `Bootsnap` kinda negated this fix, because `realpath()`
wouldn't be applied to absolute paths:

```ruby
require 'fileutils'
FileUtils.mkdir_p("realpath")
File.write("realpath/a.rb", "p :a_loaded")
File.symlink("realpath", "symlink") rescue nil

$LOAD_PATH.unshift(File.realpath(__dir__) + "/symlink")
require File.expand_path("symlink/a.rb") # load symlink/a.rb in 3.0 and older, load realpath/a.rb on 3.1 and newer
require_relative "realpath/a.rb" # noop on 3.1+
```

And for performance reasons, Bootsnap tried really hard not to call `realpath`,
as it's a syscall, instead it used `expand_path`, which is entirely in use
space and doesn't reach to the file system. So if you had a `symlink` in
`$LOAD_PATH`, `bootcsnap` would perpetuate this bug, which led to the
addition of #136.

This was ultimately fixed in Ruby 3.1 (https://bugs.ruby-lang.org/issues/17885),
now `realpath` is applied even on absolute paths.

While `realpath` is indeed expensive, I think the performance impact is ok if
we only call it for `$LOAD_PATH` members, rather than for all requirable files.
So if you have X gems, it's going to be more or less X `realpath` calls.

It would stay a problem if a gem actually contained symlinks and used
`require_relative`, but it's quite the stretch, and with 3.1 now
handling it, it's not worth keeping such workaround.

See: #402
casperisfine pushed a commit that referenced this pull request Feb 7, 2022
It was added in #136 to handle
a very specific scenario:

```
/app/lib/ - real directory
/symlink/ - symlink to /app/

```
casperisfine pushed a commit that referenced this pull request Feb 7, 2022
It was added in #136 to handle
a very specific scenario:

```
/app/lib/ - real directory
/symlink/ - symlink to /app/

```
casperisfine pushed a commit that referenced this pull request Feb 7, 2022
It was added in #136 to handle
a very specific scenario:

```
/app/lib/ - real directory
/symlink/ - symlink to /app/

```
casperisfine pushed a commit that referenced this pull request Feb 7, 2022
Ref: #402
Ref: https://bugs.ruby-lang.org/issues/10222
Ref: ruby/ruby@5754f15
Ref: ruby/ruby@b6d3927

Up to Ruby 2.3, `require` would resolve symlinks, but `require_relative` wouldn't:

```ruby
require 'fileutils'
FileUtils.mkdir_p("realpath")
File.write("realpath/a.rb", "p :a_loaded")
File.symlink("realpath", "symlink") rescue nil

$LOAD_PATH.unshift(File.realpath(__dir__) + "/symlink")
require "a.rb" # load symlink/a.rb in 2.3 and older, load realpath/a.rb on 2.4 and newer
require_relative "realpath/a.rb" # noop on 2.4+
```

This would easily cause double loading issue when `require` and `require_relative`
were mixed, but was fixed in 2.4 (https://bugs.ruby-lang.org/issues/10222).

The problem is that `Bootsnap` kinda negated this fix, because `realpath()`
wouldn't be applied to absolute paths:

```ruby
require 'fileutils'
FileUtils.mkdir_p("realpath")
File.write("realpath/a.rb", "p :a_loaded")
File.symlink("realpath", "symlink") rescue nil

$LOAD_PATH.unshift(File.realpath(__dir__) + "/symlink")
require File.expand_path("symlink/a.rb") # load symlink/a.rb in 3.0 and older, load realpath/a.rb on 3.1 and newer
require_relative "realpath/a.rb" # noop on 3.1+
```

And for performance reasons, Bootsnap tried really hard not to call `realpath`,
as it's a syscall, instead it used `expand_path`, which is entirely in use
space and doesn't reach to the file system. So if you had a `symlink` in
`$LOAD_PATH`, `bootcsnap` would perpetuate this bug, which led to the
addition of #136.

This was ultimately fixed in Ruby 3.1 (https://bugs.ruby-lang.org/issues/17885),
now `realpath` is applied even on absolute paths.

While `realpath` is indeed expensive, I think the performance impact is ok if
we only call it for `$LOAD_PATH` members, rather than for all requirable files.
So if you have X gems, it's going to be more or less X `realpath` calls.

It would stay a problem if a gem actually contained symlinks and used
`require_relative`, but it's quite the stretch, and with 3.1 now
handling it, it's not worth keeping such workaround.

See: #402
casperisfine pushed a commit that referenced this pull request Feb 8, 2022
Ref: #402
Ref: https://bugs.ruby-lang.org/issues/10222
Ref: ruby/ruby@5754f15
Ref: ruby/ruby@b6d3927

Up to Ruby 2.3, `require` would resolve symlinks, but `require_relative` wouldn't:

```ruby
require 'fileutils'
FileUtils.mkdir_p("realpath")
File.write("realpath/a.rb", "p :a_loaded")
File.symlink("realpath", "symlink") rescue nil

$LOAD_PATH.unshift(File.realpath(__dir__) + "/symlink")
require "a.rb" # load symlink/a.rb in 2.3 and older, load realpath/a.rb on 2.4 and newer
require_relative "realpath/a.rb" # noop on 2.4+
```

This would easily cause double loading issue when `require` and `require_relative`
were mixed, but was fixed in 2.4 (https://bugs.ruby-lang.org/issues/10222).

The problem is that `Bootsnap` kinda negated this fix, because `realpath()`
wouldn't be applied to absolute paths:

```ruby
require 'fileutils'
FileUtils.mkdir_p("realpath")
File.write("realpath/a.rb", "p :a_loaded")
File.symlink("realpath", "symlink") rescue nil

$LOAD_PATH.unshift(File.realpath(__dir__) + "/symlink")
require File.expand_path("symlink/a.rb") # load symlink/a.rb in 3.0 and older, load realpath/a.rb on 3.1 and newer
require_relative "realpath/a.rb" # noop on 3.1+
```

And for performance reasons, Bootsnap tried really hard not to call `realpath`,
as it's a syscall, instead it used `expand_path`, which is entirely in use
space and doesn't reach to the file system. So if you had a `symlink` in
`$LOAD_PATH`, `bootcsnap` would perpetuate this bug, which led to the
addition of #136.

This was ultimately fixed in Ruby 3.1 (https://bugs.ruby-lang.org/issues/17885),
now `realpath` is applied even on absolute paths.

While `realpath` is indeed expensive, I think the performance impact is ok if
we only call it for `$LOAD_PATH` members, rather than for all requirable files.
So if you have X gems, it's going to be more or less X `realpath` calls.

It would stay a problem if a gem actually contained symlinks and used
`require_relative`, but it's quite the stretch, and with 3.1 now
handling it, it's not worth keeping such workaround.

See: #402
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants