Gem Mirror Support (with specs) #2366

Closed
wants to merge 5 commits into
from

7 participants

@danielsdeleo

This is based on #2062; I rebased that patch on master, made an attempt to refactor the code to match conventions I found elsewhere in the source, and added specs.

Questions

I found it difficult to write this cleanly given the way the code is currently factored, but it might just be my inexperience with the project. In particular:

  • RubygemsMirror class does very little other than provide a static location to replace URIs with the desired mirror. Should it exist?
  • URI normalization is duplicated. Is there a good single location?
  • Validation of mirror URIs occurs in Settings, which contains no other validation code. Is there a more desirable location for this?
  • Is there a better solution for case-sensitive URIs?

Finally, the contributing doc doesn't explain what heading to use in the CHANGELOG when you are adding the first feature after a release, so I skipped that to avoid future merge annoyances.

mkristian and others added some commits Aug 13, 2012
@mkristian mkristian add mirror support. each uri to a repository can be replaced by an
mirror uri via the bundler configuration. add mirrors like this
$ bundle config mirror.http://rubygems.org http://localhost:8081/nexus/content/repositories/rubygems.org
dd0ea8b
@danielsdeleo danielsdeleo move public methods to top
seems in line with conventions in rest of project
726db98
@danielsdeleo danielsdeleo Move gem mirror code to be consistent with other features
Factors the gem mirror code similarly to the local git repos feature.

Questions:
* RubygemsMirror class does very little other than provide a static
  location to replace URIs with the desired mirror. Should it exist?
* URI normalization is duplicated. Is there a good single location?
* Validation of mirror URIs occurs in Settings, which contains no other
  validation code. Is there a more desirable location for this?
* Is there a better solution for case-sensitive URIs?
b30c539
@danielsdeleo danielsdeleo add specs for gem mirror support d78ee7d
@danielsdeleo danielsdeleo add more exposition to mirror config manpage e035405
@mkristian mkristian referenced this pull request Mar 7, 2013
Closed

add mirror support. #2062

@indirect
Bundler member

So on the one hand, I really like this idea. On the other hand I'm worried about the potential for security implications, things getting crazy out of sync, and it no longer being clear if two developers (or dev and prod for that matter) are actually getting the same gems. Any thoughts there?

@danielsdeleo

First of all, a bit about my motivation/use case: I work at Opscode, makers of Chef. Since Chef is an open source project, I want the Gemfile to fetch gems from rubygems.org by default, so that developers outside of Opscode can easily work on the project, run the tests, etc. We now offer full stack "omnibus" builds of Chef, which are system packages (deb/rpm/etc) that we build via an internal Ci cluster. Being able to install gems from a mirror would give me:

  1. The ability to run builds if rubygems.org is down. We were blocked from doing a lot of work when it was down for a couple days after the security breach.
  2. Improved build stability. rubygems.org has been really stable recently, but it hasn't always been, and I have no control over its stability in the future. Network errors fetching gems are a big productivity drag.
  3. Improved build performance. I can run my mirror on the same network as my Ci cluster to make bundle install run very fast.
  4. I can isolate my network from the internet so that my production systems don't need to make network calls to the outside world.

I don't think gems getting out of sync should be an issue as long as you follow the recommended workflow. I was playing around with simulating this feature using alternate Gemfiles and eval, and I got into a situation where my Gemfile.lock had a version of active support that wasn't present on my mirror. In this situation, bundler rightly complained and exited with an error (IMO that's a bug with my custom rubygems mirroring code and not bundler).

It's of course possible that users might replace gems on their mirror so that, e.g., activesupport 3.2.14 has different code on their mirror than on rubygems.org. By the same token, a user could bundle install and then edit the files in the gem that got installed, which is of course not bundler's fault or problem.

Could you elaborate on your security concerns? The 90% use case for this feature is running a straight-up mirror of rubygems.org for security/performance/policy reasons. In this, the user has explicitly opted-in to managing their own gem server, which implies responsibility for the maintenance and security of that system.

@hone
Bundler member

I'm +1 this idea. pypi has support for mirroring as well. I believe @indirect's only concern here is dev/prod parity if the mirrors are out of sync. I'm not sure bundler should penalize people for that case or maybe we can deal with it then if it really becomes an issue. I think bundler can make the assumption here if you say X is a mirror of Y then bundler will treat them as interchangeable. /cc @kennethreitz

@mkristian

as the main main contributor to https://github.com/sonatype/nexus-ruby-support I started that PR. anyways I will put some explanations what that "mirror" can do: is runs on nexus from sonatype and nexus started as proxy/mirror for java jar as maven artifacts. so this plugin adds the ability to host your own ruby repository as well adding a proxy to existing gem repos. on top of that you can create "virtual" repositories which basically merge the gems from a given number of gem repos (proxy or hosted ones). it also adds the bundler API to each of these repos BUT bundler can use those repo only if you add the respective source to your Gemfile. but when I check out a project from github (or so) then I would like to use it as is without changing the Gemfile.

with gem command it is possible since here I can just add an extra source or what I did replace the rubygems.org source with that nexus-repos url. the nexus server can setup proxies so they continue to serve whatever there is cached if case the source server is down, it can cache volatile files (specs.4.8.gz, ...) for a day or an hour or not at all, to stay reasonably up-to-date.

those nexus server for java is used by travis or buildhive for jar files and I am sure the nexus ruby plugin could be an option for gems as well if bundler would have some mirror support. that is probably true for all CI installation where which run extensively.

and I am sure bundler support for mirrors would increase the chance to get that nexus-ruby-plugin into the core nexus server.

I am using a monkeypatch bundler for quite some time now on a daily base and I appreciate the 'lesser' traffic when I switch to jruby and back to ruby. so I personally hope to see mirror support in bundler some day (soon) ;)

@indirect
Bundler member

The security issue is people potentially asking a mirror for (say) rails 4, and instead getting a gem that the mirror says is rails 4 but actually contains trojan code of some kind.

Probably the best way to work around that problem is to make sure that the mirror URL is printed prominently? Not totally sure.

@mkristian

well, is the url really sufficient to determine the trust into the gems I download. even with rubygems.org I need to trust the DNS (and that the MITM can not generate SSL certificates with trusted CA).

but as I mentioned before the only mirrors I know are in use are local or company wide mirrors. and the problem of trust I have already with rubygems itself since I can easily setup an "mirror" for the gem command.

of course printing the used url helps, verifying the SSL certifcate (assume that is standard by now in ruby) helps, if rubygems would offer some integrity "check" that would help here and rubygems itself (something like a gem with public key for verifiying a gpg fingerprint of the gem).

but the "trust" issue only pops up if there are public mirrors - are there any known public mirrors ?

@tedpennings

+1 to this PR. I'm in exactly the same position as @danielsdeleo - I work at a very large enterprise that's currently having issues with Ruby builds because we depend on the remote Rubygems.org repo. As a general practice, our build and deploy systems are not permitted to reach the internet. Most require extensive configuration and firewalled off in network enclaves. Many others are simply not permitted to reach the internet at all.

We solve this problem in other areas with internal mirrors for things like Maven Central (Java/JVM languages) and Red Hat's yum repositories. Internal mirrors are the preferred approach as they're entirely owned and controlled internally, so the artifacts in them can be inventoried, scanned for malware and considered for licensing. We're hoping to fix this issue for Ruby by using an internal mirror and the project from @mkristian .

I think accepting this PR and pushing a new minor release of Bundler would go a long way to helping Ruby support at large enterprises that are unable to pull artifacts directly from the internet.

@gnufied gnufied commented on the diff Aug 7, 2013
lib/bundler/rubygems_mirror.rb
@@ -0,0 +1,19 @@
+module Bundler
+ class RubygemsMirror
+
+ def self.to_uri(uri)
+ # NOTE: implementation of Settings forces case insensitivity, which
+ # breaks case sensitive URIs (like file paths). We therefore need to do
+ # lookups on downcased URIs.
+ lookup_uri = URI(uri.to_s.downcase)
+ mirrors[lookup_uri] || uri
+ end
+
+ private
+
+ def self.mirrors
+ Bundler.settings.gem_mirrors
@gnufied
Bundler member
gnufied added a line comment Aug 7, 2013

Do we really need this method here?

@danielsdeleo
danielsdeleo added a line comment Aug 7, 2013

Depends on how you feel about the "Law of Demeter" and long method chains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@gnufied gnufied commented on the diff Aug 7, 2013
lib/bundler/settings.rb
@@ -149,5 +160,15 @@ def load_config(config_file)
end
end
+ # TODO: duplicates Rubygems#normalize_uri
+ # TODO: is this the correct place to validate mirror URIs?
+ def normalize_uri(uri)
+ uri = uri.to_s
+ uri = "#{uri}/" unless uri =~ %r[/\Z]
+ uri = URI(uri)
+ raise ArgumentError, "Gem mirror sources must be absolute URIs (configured: #{mirror_source})" unless uri.absolute?
@gnufied
Bundler member
gnufied added a line comment Aug 7, 2013

We generally prefer exceptions in Bundler namespace.

@gnufied
Bundler member
gnufied added a line comment Aug 7, 2013

Ha ha. but perhaps we can improve on that. The problem is, Exceptions outside Bundler::BundlerError tree IIRC throw Unfortunately a fatal error blah blah.

@danielsdeleo
danielsdeleo added a line comment Aug 7, 2013

I didn't mean that as a criticism.

This patch is the only significant time I've spent in the bundler code base and I've done what I can to match the conventions and idioms I could identify. I'd have preferred not to introduce any duplication, but I didn't see any way to do so without doing a deeper refactoring which isn't something I'm comfortable doing on a new-to-me code base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@danielsdeleo

@indirect I'm not too hot on adding a warning message that a mirror is being used. For unattended installs, no one will be looking at the output. In cases where users regularly use mirrors, the user will be habituated to seeing a message about a mirror being used, so they likely won't notice a message informing them that a different mirror is being used (supposing an attacker changed the mirror to a trojaned one).

In any case, I'm having difficulty imagining a scenario where a malicious entity has control over a user's shell environment or bundler config but doesn't have a more direct route of attack than changing bundler's gem source to a mirror.

@gnufied
Bundler member

Rough +1 from me. One thing which I have been thinking is - Do we expect the mirror to support Bundler API? (which is separate from rubygems api). Will supporting mirrors result in overhead from supporting different index formats?

@danielsdeleo

@gnufied I think it's reasonable to expect mirrors to support the bundler API. https://github.com/geminabox/geminabox does, and I found it not too difficult to implement in my own gem mirror server (https://github.com/danielsdeleo/gem-mirror-tools)

@indirect
Bundler member

Silently making network requests to one server while printing the address of a different server sounds like a debugging nightmare. As long as it's clear where the request is going to, I think supporting this is reasonable.

@danielsdeleo

@indirect that makes sense.

I'm going on vacation for a bit, if anyone on this thread (or not) has time to finish this up, I'd appreciate it, otherwise I'll look into it when I get back.

@xaviershay

Re mirror being malicious, it's the same attack as compromising rubygems, with the same fix: signed gems.

This change would help me out at work.

@indirect
Bundler member

I am +1 on this change in principle. I would like to make sure that the output makes it clear what URL the gems are being fetched from, if not rubygems.org. Open to merging this as soon as it's cleaned up.

@indirect
Bundler member

merged to master! thanks, everybody.

@indirect indirect closed this Oct 18, 2013
@danielsdeleo danielsdeleo deleted the danielsdeleo:gem-mirror branch Oct 18, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment