$LOAD_PATH is being manipulated, causing Psych double loads #2068

Closed
tenderlove opened this Issue Aug 17, 2012 · 28 comments

Comments

Projects
None yet
8 participants
Contributor

tenderlove commented Aug 17, 2012

Hi! If someone installs psych as a gem (on the system), then requires psych in app code, psych will get double loaded. I think this is because when the gem command is called, RubyGems will add the gem to the $LOAD_PATH. I think bundler is removing psych from the load path:

$ gem list psych

*** LOCAL GEMS ***

psych (1.3.4, 1.3.3, 1.3.2)
$ irb
irb(main):001:0> require 'fileutils'
=> true
irb(main):002:0> FileUtils.touch 'Gemfile'
=> ["Gemfile"]
irb(main):003:0> require 'bundler'
=> true
irb(main):004:0> Bundler.setup
=> "GEM\n  specs:\n\nPLATFORMS\n  ruby\n\nDEPENDENCIES\n"
irb(main):005:0> $LOAD_PATH.grep(/psych/)
=> []
irb(main):006:0> defined?(Psych)
=> "constant"
irb(main):007:0> Psych::VERSION
=> "1.3.4"
irb(main):008:0>

You can see that the gem version of psych is required, but the load path for the gem is not in $LOAD_PATH. Compare to vanilla rubygems usage:

irb(main):001:0> gem 'psych'
=> true
irb(main):002:0> require 'psych'
=> true
irb(main):003:0> Psych::VERSION
=> "1.3.4"
irb(main):004:0> $LOAD_PATH.grep(/psych/)
=> ["/Users/aaron/.local/lib/ruby/gems/2.0.0/gems/psych-1.3.4/lib"]
irb(main):005:0>

In the bundler case, if someone requires psych in their app code and they have the psych gem installed on the system, they'll get an exception:

irb(main):001:0> require 'fileutils'
=> true
irb(main):002:0> FileUtils.touch 'Gemfile'
=> ["Gemfile"]
irb(main):003:0> require 'bundler'
=> true
irb(main):004:0> Bundler.setup
=> "GEM\n  specs:\n\nPLATFORMS\n  ruby\n\nDEPENDENCIES\n"
irb(main):005:0> require 'psych'
/Users/aaron/.local/lib/ruby/2.0.0/x86_64-darwin11.4.0/psych.bundle: warning: already initialized constant ANY
/Users/aaron/.local/lib/ruby/2.0.0/x86_64-darwin11.4.0/psych.bundle: warning: already initialized constant UTF8
/Users/aaron/.local/lib/ruby/2.0.0/x86_64-darwin11.4.0/psych.bundle: warning: already initialized constant UTF16LE
/Users/aaron/.local/lib/ruby/2.0.0/x86_64-darwin11.4.0/psych.bundle: warning: already initialized constant UTF16BE

[snip]

TypeError: superclass mismatch for class Mark
    from /Users/aaron/.local/lib/ruby/2.0.0/psych/parser.rb:33:in `<class:Parser>'
    from /Users/aaron/.local/lib/ruby/2.0.0/psych/parser.rb:32:in `<module:Psych>'
    from /Users/aaron/.local/lib/ruby/2.0.0/psych/parser.rb:1:in `<top (required)>'
    from /Users/aaron/.local/lib/ruby/2.0.0/psych.rb:7:in `require'
    from /Users/aaron/.local/lib/ruby/2.0.0/psych.rb:7:in `<top (required)>'
    from (irb):5:in `require'
    from (irb):5
    from /Users/aaron/.local/bin/irb:12:in `<main>'
irb(main):006:0> 

This happens because one version of psych is already loaded (the gem version), and the require to psych in the app code tries to load the stdlib version.

An easy workaround is to add psych to the Gemfile:

irb(main):001:0> File.open('Gemfile', 'w') { |f| f.puts "gem 'psych'" }
=> nil
irb(main):002:0> require 'bundler'
=> true
irb(main):003:0> Bundler.setup
=> "GEM\n  specs:\n    psych (1.3.4)\n\nPLATFORMS\n  ruby\n\nDEPENDENCIES\n  psych\n"
irb(main):004:0> require 'psych'
=> false
irb(main):005:0>

I'm not sure what bundler can or should do about this problem, but I think that as Ruby's stdlib becomes more gemified, this may become a larger issue. Thanks!

Owner

indirect commented Aug 18, 2012

The only option that has occurred to me so far is to special-case the stdlib gems so that Bundler doesn't remove them from the load path. I'm not really sure if that's an improvement or not. :P @tenderlove, do you know if there is an authoritative way to get a list of stdlib gems, from either Ruby or Rubygems?

On Aug 17, 2012, at 11:27 AM, Aaron Patterson notifications@github.com wrote:

Hi! If someone installs psych as a gem (on the system), then requires psych in app code, psych will get double loaded. I think this is because when the gem command is called, RubyGems will add the gem to the $LOAD_PATH. I think bundler is removing psych from the load path:

$ gem list psych

*** LOCAL GEMS ***

psych (1.3.4, 1.3.3, 1.3.2)
$ irb
irb(main):001:0> require 'fileutils'
=> true
irb(main):002:0> FileUtils.touch 'Gemfile'
=> ["Gemfile"]
irb(main):003:0> require 'bundler'
=> true
irb(main):004:0> Bundler.setup
=> "GEM\n specs:\n\nPLATFORMS\n ruby\n\nDEPENDENCIES\n"
irb(main):005:0> $LOAD_PATH.grep(/psych/)
=> []
irb(main):006:0> defined?(Psych)
=> "constant"
irb(main):007:0> Psych::VERSION
=> "1.3.4"
irb(main):008:0>
You can see that the gem version of psych is required, but the load path for the gem is not in $LOAD_PATH. Compare to vanilla rubygems usage:

irb(main):001:0> gem 'psych'
=> true
irb(main):002:0> require 'psych'
=> true
irb(main):003:0> Psych::VERSION
=> "1.3.4"
irb(main):004:0> $LOAD_PATH.grep(/psych/)
=> ["/Users/aaron/.local/lib/ruby/gems/2.0.0/gems/psych-1.3.4/lib"]
irb(main):005:0>
In the bundler case, if someone requires psych in their app code and they have the psych gem installed on the system, they'll get an exception:

irb(main):001:0> require 'fileutils'
=> true
irb(main):002:0> FileUtils.touch 'Gemfile'
=> ["Gemfile"]
irb(main):003:0> require 'bundler'
=> true
irb(main):004:0> Bundler.setup
=> "GEM\n specs:\n\nPLATFORMS\n ruby\n\nDEPENDENCIES\n"
irb(main):005:0> require 'psych'
/Users/aaron/.local/lib/ruby/2.0.0/x86_64-darwin11.4.0/psych.bundle: warning: already initialized constant ANY
/Users/aaron/.local/lib/ruby/2.0.0/x86_64-darwin11.4.0/psych.bundle: warning: already initialized constant UTF8
/Users/aaron/.local/lib/ruby/2.0.0/x86_64-darwin11.4.0/psych.bundle: warning: already initialized constant UTF16LE
/Users/aaron/.local/lib/ruby/2.0.0/x86_64-darwin11.4.0/psych.bundle: warning: already initialized constant UTF16BE

[snip]

TypeError: superclass mismatch for class Mark
from /Users/aaron/.local/lib/ruby/2.0.0/psych/parser.rb:33:in <class:Parser>' from /Users/aaron/.local/lib/ruby/2.0.0/psych/parser.rb:32:inmodule:Psych'
from /Users/aaron/.local/lib/ruby/2.0.0/psych/parser.rb:1:in <top (required)>' from /Users/aaron/.local/lib/ruby/2.0.0/psych.rb:7:inrequire'
from /Users/aaron/.local/lib/ruby/2.0.0/psych.rb:7:in <top (required)>' from (irb):5:inrequire'
from (irb):5
from /Users/aaron/.local/bin/irb:12:in `

'
irb(main):006:0>
This happens because one version of psych is already loaded (the gem version), and the require to psych in the app code tries to load the stdlib version.

An easy workaround is to add psych to the Gemfile:

irb(main):001:0> File.open('Gemfile', 'w') { |f| f.puts "gem 'psych'" }
=> nil
irb(main):002:0> require 'bundler'
=> true
irb(main):003:0> Bundler.setup
=> "GEM\n specs:\n psych (1.3.4)\n\nPLATFORMS\n ruby\n\nDEPENDENCIES\n psych\n"
irb(main):004:0> require 'psych'
=> false
irb(main):005:0>
I'm not sure what bundler can or should do about this problem, but I think that as Ruby's stdlib becomes more gemified, this may become a larger issue. Thanks!


Reply to this email directly or view it on GitHub.

lacker commented Aug 24, 2012

I think it would be an improvement to fix this for the psych gem. It seems like a particular problem for psych because bundler itself requires psych. It means a default 1.9.2-p180 install that uses bundler has a memory-leaking version of psych that cannot be easily upgraded because using a more recent version of the psych gem via the gemfile doesn't work. And there's already some special-cased code to handle the psych dependency in bundler/psyched_yaml.rb anyways. So, would it really hurt that much to add a bit more weird special-casing for psych?

charity commented Aug 24, 2012

This has definitely bitten us in a big way. I'd love to see it get fixed.

Owner

indirect commented Aug 24, 2012

So, what would count as "fixed" for you guys?

On Aug 24, 2012, at 2:54 PM, charity notifications@github.com wrote:

This has definitely bitten us in a big way. I'd love to see it get fixed.


Reply to this email directly or view it on GitHub.

lacker commented Aug 24, 2012

AFAICT, if you run your code with bundle exec under ruby 1.9.2-p180, this bug makes it impossible to use an upgraded version of psych via the gemfile, because the broken psych in the stdlib gets loaded first. Any solution that lets me upgrade the psych version via the gemfile would count as "fixed" for me.

This is causing other people to do strange workarounds, too - see

http://nerdd.dk/posts/YAML-load-considered-harmful
http://www.spacevatican.org/2012/1/26/memory-leak-in-yaml-on-ruby-1-9-2/

Contributor

tenderlove commented Aug 24, 2012

@lacker can you try installing the gem on your system and adding it to the Gemfile?

Contributor

tenderlove commented Oct 24, 2012

So, I think I have a fix for this, but it is many layers of onions to fix. I spoke with @drbrain, and the only time rubygems needs to load YAML is for gem installation. It seems bundler would be the same, e.g. during bundle exec or Bundler.setup you shouldn't need YAML.

First, we need this pull request merged to rubygems. It delays loading yaml until we actually need to parse a YAML gemspec.

We need to delete this file, and just call Gem.load_yaml but only when we're about to use YAML. So, this must be done at runtime, not when the file is required. If yaml is lazily loaded, then Rubygems will only load YAML on gem installation (presumably the same is true for bundler).

We need to figure out what this is for and remove it. Calling Gem.configuration will load up YAML. The comment seems to indicate a work-around for something, but I'm not sure what.

If we run against the patched Rubygems, remove the psyched_yaml.rb file, and remove that call to configuration, it fixes this issue. Here is a video to prove it.

Owner

indirect commented Oct 24, 2012

Unfortunately, Bundler has to load the .bundle/config file, which is YAML. :( The file holds all settings, and so I think sometimes needs to be read as part of the resolve process, for example if you use bundle config local.foo. Maybe we can figure out a way around that?

On Oct 23, 2012, at 5:24 PM, Aaron Patterson notifications@github.com wrote:

So, I think I have a fix for this, but it is many layers of onions to fix. I spoke with @drbrain, and the only time rubygems needs to load YAML is for gem installation. It seems bundler would be the same, e.g. during bundle exec or Bundler.setup you shouldn't need YAML.

First, we need this pull request merged to rubygems. It delays loading yaml until we actually need to parse a YAML gemspec.

We need to delete this file, and just call Gem.load_yaml but only when we're about to use YAML. So, this must be done at runtime, not when the file is required. If yaml is lazily loaded, then Rubygems will only load YAML on gem installation (presumably the same is true for bundler).

We need to figure out what this is for and remove it. Calling Gem.configuration will load up YAML. The comment seems to indicate a work-around for something, but I'm not sure what.

If we run against the patched Rubygems, remove the psyched_yaml.rb file, and remove that call to configuration, it fixes this issue. Here is a video to prove it.


Reply to this email directly or view it on GitHub.

Contributor

tenderlove commented Oct 24, 2012

Does the resolve process need to happen when you're doing bundle exec? I thought that was the point of the lockfile.

Owner

indirect commented Oct 24, 2012

The resolver runs so you can add a gem and then run bundle exec without running bundle install first. :/

On Oct 23, 2012, at 5:36 PM, Aaron Patterson notifications@github.com wrote:

Does the resolve process need to happen when you're doing bundle exec? I thought that was the point of the lockfile.


Reply to this email directly or view it on GitHub.

Contributor

tenderlove commented Oct 24, 2012

Well, we can do some tricks to avoid reading the file. If we have to read the file, you could:

  1. Open a pipe
  2. fork
  3. In the child process, load yaml
  4. In the child process, read the config
  5. In the child process, marshal the config and write it to the pipe
  6. Parent process reads from the pipe

This means the parent process will not be infected with YAML. :-)

Owner

indirect commented Oct 24, 2012

Hah! Maybe we will, at that. Honestly, given how simple the Bundler config YAML is, I bet we could also just parse it with a single regex. :P

I will try to take a look at how we use YAML and see if we can avoid loading YAML at boot. @tenderlove, do you think it would work to keep psyched_yaml.rb and only load it if Gem.load_yaml is missing? That at least seems like it would maintain backwards compatibility...

Contributor

tenderlove commented Oct 24, 2012

Rather than keep the file around, we should just backport Gem.load_yaml. e.g.

unless Gem.respond_to?(:load_yaml)
  def Gem.load_yaml
    ...
  end
end

Though, since rubygems eagerly loads YAML (in some cases) this can't be fixed without people upgrading rubygems.

Owner

indirect commented Oct 25, 2012

Yeah, backporting it sounds like the way to go. IIRC, the file exists because there were some cases where Bundler would load Syck before an app tried to load Psych, and things would go badly wrong. That case is solved by psyched_yaml.rb — this issue, of double loads, definitely seems to require upgrading rubygems to solve.

Contributor

tenderlove commented Oct 26, 2012

Hi, I want to be able to release new versions of Psych without worrying that people on bundler will have problems. I don't have time to fix this bundler issue, so I'm putting a bounty of $100 on this ticket.

I've spelled out mostly what needs to happen here.

Whoever gets a patch in that fixes this issue will get the bounty. I'll award the bounty when I can:

  • Upgrade Rubygems
  • Upgrade Bundler

and have the following steps work with Ruby 1.9.3-p194 (it's important that it's p194):

Step 1: Install psych as a gem.

Psych version 1.3.2 ships with 1.9.3-p194. First install a newer gem:

$ gem install psych -v 1.3.4

You can verify the older version is installed in stdlib like this:

$ ruby -rpsych -e'puts Psych::VERSION'
1.3.2
$ ruby -e'gem "psych"; require "psych"; puts Psych::VERSION'
1.3.4

Step 2: Touch a Gemfile

$ touch Gemfile

Step 3: Require the stdlib version of psych after Bundler.setup

This shouldn't raise an exception, but it does. I would expect the output to be "1.3.2":

$ ruby -rbundler -e'Bundler.setup; require "psych"; p Psych::VERSION'

Step 4: Add psych to the Gemfile and have the latest version used after Bundler.setup:

$ echo "gem 'psych'" > Gemfile
$ bundle
$ ruby -rbundler -e'Bundler.setup; require "psych"; p Psych::VERSION'

The output should be 1.3.4.

I've put a test here to help automate the process.

When I can update rubygems and update bundler (official releases, not beta releases), and successfully perform these operations, I'll award the bounty.

wilkie commented Oct 27, 2012

Hmm. Intriguing. Both rubygems and bundler use #gem to install the latest Psych:

https://github.com/carlhuda/bundler/blob/master/lib/bundler/psyched_yaml.rb#L3
https://github.com/rubygems/rubygems/blob/master/lib/rubygems.rb#L544

But, Bundler then removes any gem paths from $LOAD_PATH:

https://github.com/carlhuda/bundler/blob/master/lib/bundler/shared_helpers.rb#L63

Why does it do that? What's the point? That's obviously going to make it impossible to require 'psych' without it being in the Gemfile. By being in the Gemfile, it just redoes the work and no harm done... that's not really a safe thing to bet upon (apparently.) If it is going to use 'gem' to load a library from a gem path, then it is an implied dependency. It shouldn't remove that path from $LOAD_PATH...

That is, I'm curious, why require a library and then remove it from the load path? It's now fair game to re-require it haphazardly. Am I missing something?

Owner

indirect commented Oct 27, 2012

Well, the first part of the problem (I suspect) is that Bundler shouldn't remove Psych from the load path if it has required it. But the second part (and what Aaron wants, afaict) is for Bundler to not require Psych at all, so that a later version of Psych listed in the gemfile can be required by the application.

In the end, this may require grabbing the Psych version out of the lock and requiring that version, if there is no way to get around needing a YAML parser to run Bundler.setup.

On Oct 27, 2012, at 12:46 PM, Dave Wilkinson II notifications@github.com wrote:

Hmm. Intriguing. Both rubygems and bundler use #gem to install the latest Psych:

https://github.com/carlhuda/bundler/blob/master/lib/bundler/psyched_yaml.rb#L3
https://github.com/rubygems/rubygems/blob/master/lib/rubygems.rb#L544

But, Bundler then removes any gem paths from $LOAD_PATH:

https://github.com/carlhuda/bundler/blob/master/lib/bundler/shared_helpers.rb#L63

Why does it do that? What's the point? That's obviously going to make it impossible to require 'psych' without it being in the Gemfile. By being in the Gemfile, it just redoes the work and no harm done... that's not really a safe thing to bet upon (apparently.) If it is going to use 'gem' to load a library from a gem path, then it is an implied dependency. It shouldn't remove that path from $LOAD_PATH...

That is, I'm curious, why require a library and then remove it from the load path? It's now fair game to re-require it haphazardly. Am I missing something?


Reply to this email directly or view it on GitHub.

wilkie commented Oct 27, 2012

The first part is messy, but easily solved by adding an exception for Gem::Specification.find_by_name('psych').gem_dir in that clean_load_path method.

For that second part, you'd have to not only read the Gemfile and require the 'psych' that the app wants, but also note that if it is not in the Gemfile, you'd have to explicitly load the stdlib but not using 'gem' but just require 'psych'.

Since rubygems also uses #gem in load_yaml, you'd have to also change it there or require it very early so rubygems doesn't do it again, otherwise it will use the latest if installed. If you remove the 'gem "psych"' line from bundler (and force it to require 'psych' alone), it requires 1.3.2 and then rubygems does gem 'psych' adding it to the load path again and then requires 1.3.4, which oddly doesn't explode horribly, but pushed out a bunch of constant redefinition warnings and redefines the constants to a new version. But that's also really terrible... because both psychs have been "required" and are both in $"

All of these #gem and require 'psych's happen on: require 'bundler'

So, the solution would be to patch rubygems to not call #gem if defined?(Psych) since there is no point and the result of adding the gem to the load path when it has already loaded the psych from stdlib is disastrous. Then, in Bundler, detect the version of psych in the Gemfile and if such a thing exists, load this early in psyched_yaml.rb. It should also probably drop-out if Psych is defined, just in case, because the same issue would occur.

Or, better yet, remove psych dependency altogether from both or just from bundler and workaround rubygems requiring it. But, how feasible are either of those options?

Owner

indirect commented Dec 3, 2012

I have implemented lazy-loading of Psych in Bundler. This means that on rubies with the Psych gem installed, any Gemfile lines that require 'psych' will be honored. For example:

[andre ✘ ~/sw/gems/bundler-testcases/psych]$ ruby -v
ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-darwin12.2.1]
[andre ✘ ~/sw/gems/bundler-testcases/psych]$ gem -v
1.8.24
[andre ✘ ~/sw/gems/bundler-testcases/psych]$ dbundle -v
Bundler version 1.3.0.pre
[andre ✘ ~/sw/gems/bundler-testcases/psych]$ ruby -e "require 'psych'; p Psych::VERSION"
"1.3.4"
[andre ✘ ~/sw/gems/bundler-testcases/psych]$ cat Gemfile
source :rubygems

gem 'psych', '1.3.3'
[andre ✘ ~/sw/gems/bundler-testcases/psych]$ bundle exec ruby -e "require 'psych'; p Psych::VERSION"
"1.3.3"

According to @tenderlove, this satisfies the conditions of the bounty, so the bounty is now closed.

It should also mean that Psych can go back to releasing bugfixes as gems. :)

@indirect indirect closed this Dec 3, 2012

Contributor

sunaku commented Dec 26, 2012

Has this bugfix been released yet? I tried bundler 1.3.0.pre.3 (released December 21, 2012) but it doesn't seem to solve this problem in a Rails 3.2 app.

Owner

indirect commented Dec 26, 2012

It is only fixed when paired with the latest rubygems.

On Dec 26, 2012, at 12:23 PM, "Suraj N. Kurapati" notifications@github.com wrote:

Has this bugfix been released yet? I tried bundler 1.3.0.pre.3 (released December 21, 2012) but it doesn't seem to solve this problem in a Rails 3.2 app.


Reply to this email directly or view it on GitHub.

Contributor

sunaku commented Dec 26, 2012

I upgraded to the latest rubygems, but still no luck. 🐛

Owner

indirect commented Dec 26, 2012

You didn't put psych in your Gemfile.

On Dec 26, 2012, at 2:06 PM, "Suraj N. Kurapati" notifications@github.com wrote:

I upgraded to the latest rubygems, but still no luck.


Reply to this email directly or view it on GitHub.

Contributor

sunaku commented Dec 26, 2012

Thanks! That did the trick. However, this doesn't seem like a complete solution to me because even though my dummy rails app doesn't need or use psych at all, I have to add psych to its Gemfile nevertheless just to pacify this bug. 😓

ncri commented Jan 19, 2013

Agree with sunaku, i don't like having to add psych for apps which don't need it...

Owner

indirect commented Jan 20, 2013

@ncri @sunaku I think you guys are experiencing a different bug than the one that this ticket opened with. The OP has verified that the bug he reported is resolved. Can you open a new ticket with data requested in ISSUES and reproduction steps for your issue? Thanks.

@indirect indirect reopened this Jan 20, 2013

@indirect indirect closed this Jan 20, 2013

Did this fix get merged into bundler 1.2.x?

Owner

indirect commented Feb 25, 2013

@joevandyk I don't think the refactoring needed to support lazy-loading YAML is present in 1.2.

@renatocn renatocn referenced this issue in locaweb/locawebstyle Mar 18, 2013

Closed

Problemas no Gemfile #3

EdinaVath pushed a commit to EdinaVath/suspenders that referenced this issue Dec 25, 2013

Fix Bundler / Psych double loads
A certain combination of Ruby, Bundler, & Psych causes errors such

    `<class:Parser>': superclass mismatch for class Mark (TypeError)

bundler/bundler#2068

Including Psych in the bundle fixes the issue.

EdinaVath pushed a commit to EdinaVath/suspenders that referenced this issue Dec 25, 2013

Remove Psych from Gemfile
The reason this was here was to resolve Bundler / Psych double loads

    `<class:Parser>': superclass mismatch for class Mark (TypeError)

bundler/bundler#2068

The issue has been fixed in Rubygems >= 2.0 and Bundler >= 1.3.0, which
we should all be using now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment