Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Add metadata cache #2206

Closed
p opened this Issue Dec 7, 2012 · 30 comments

Comments

Projects
None yet
6 participants

p commented Dec 7, 2012

reactor% bundle install
Fetching gem metadata from http://rubygems.org/.......
Fetching gem metadata from http://rubygems.org/..

This happens every time I run the command. If the first attempt failed and I run install again 10 seconds later, it fetches metadata again.

For vast majority of people, fetching metadata more than once per day is unnecessary.

Contributor

rohit commented Dec 7, 2012

But such a cache would be invalidated if within those 10 seconds you changed the Gemfile. Right?

I'm guessing a user configurable option that is by default set to 60 seconds would be a good idea. Also maybe storing a hash of the Gemfile as part of Gemfile.lock so you know if it has changed since last run. That way you don't have to hit the gem store every time. You'd also have to warn the user that the cache is being used. I'm also guessing there will be lots of more tricky issues involved in this so it won't be feasible to do this before 2.0 release? :)

</end_random_thoughts>

Owner

indirect commented Dec 7, 2012

Yep. :) We tried to store a hash of the Gemfile in the lock, and it turns out that Bundler has to resolve the lock twice, with different constraints, each time you run install... so it's a pretty tough problem, unfortunately.

p commented Dec 7, 2012

But such a cache would be invalidated if within those 10 seconds you changed the Gemfile. Right?

No. Whatever it is that you retrieve from the network, don't retrieve that again. If you need something else just fetch the new bits.

Owner

indirect commented Dec 7, 2012

This would be fairly complicated and prone to potential breakage. How would you handle new gem releases? Many people expect to be able to push a new gem and then immediately run bundle install to install it.

p commented Dec 7, 2012

Assuming a universe of packages where existing packages do not change:

Let's start with installing dependencies from a lock file. This should result in exactly the same set of packages every time. If I install from a lock file and then obtain another lock file, there is no need to fetch metadata that was already fetched because it can't be different.

Now, installing dependencies without a lock file. In this case the exact set of installed packages may differ depending on when the installation is done. But there is also no expectation that any particular version of a package with a floating version specification is used.

Vast majority of people do not need to install a package that was published today. This is because vast majority of people are users. A package that was published today could not have been tested by authors of the software that users are trying to install (which lists said package as a dependency). Therefore, users don't have a burning desire to get today's version of this package.

For those times when people change published packages, which is a very Wrong Thing to do, users can install affected packages without using metadata cache.

Many people expect to be able to push a new gem and then immediately run bundle install to install it.

I won't argue against "many people" but I will maintain that "many people" who install software they publish are still a vast minority compared to everyone using ruby. How many people do you suppose download/install rails on the day it is released vs over the lifetime of each release? What is the number of people who can push a new rails gem vs the number of people installing rails?

This is very simple. If you publish gems, don't use metadata cache. If you install software on the day it is released, don't use metadata cache. These two categories are a vast minority of people.

Owner

indirect commented Dec 8, 2012

Granting every single thing that you just said, there is pretty much no point in caching the data.

Bundler requests gem metadata directly from the Bundler-specific dependencies API. If we were to cache every single piece of gem metadata we had ever seen before, we would still have to ask the API if there were newer versions of any gem. The current median response side to an API request is under 10k. Optimizing that down to 1k would have no noticeable effect on the speed of bundle install, because the requests would all still need to be made anyway, and the requests themselves are what is taking up the majority of the time.

Furthermore, your assertion that users don't want the newest version of Rails with the security fix on the day that it is released... does not match up with the reality I have experienced. There have already been upwards of ten shouty tickets opened just because some users have to wait up to 10 or 15 minutes for newly pushed gems to be found by Bundler, and that has only been the case for two or three weeks. Because of that, I don't think defaulting to day-old gems is realistic.

p commented Dec 13, 2012

we would still have to ask the API if there were newer versions of any gem

This is probably where we are misunderstanding each other. No, you wouldn't. If you have enough information locally to perform the installation, and the local information is within cache ttl, you don't request anything from the network.

There have already been upwards of ten shouty tickets opened just because some users have to wait up to 10 or 15 minutes for newly pushed gems to be found by Bundler

I would characterize this as a vocal minority, but, ok.

Because of that, I don't think defaulting to day-old gems is realistic.

How about an option for those of us who know what we are doing which by default is off? This way both camps should be happy.

Note: the CPU part of the installation process should also be optimized. My guess is you are not serializing metadata in an efficient manner, resulting in constant reparsing of the entire world worth of gemspecs or something like that. To draw an analogy, what yum does vs what apt does. The point of this is please don't use huge runtime for actual package installation as a reason to also burn time doing needless network i/o.

p commented Dec 13, 2012

Alternatively I could fetch the entire metadata for all packages at once (manually), like rubygems did/does, and as long as such a local copy exists/is fresh use it. This would also solve the issue of different metadata parts having different expiration times.

Owner

indirect commented Dec 13, 2012

Please feel free to investigate this approach, but the short version is that when we tried this, it completely broke our user's expectations of how bundle install would work. Especially after they edited their Gemfiles.

p commented Dec 17, 2012

I don't expect to be able to get anywhere within any sort of a reasonable amount of time, sorry.

Owner

indirect commented Dec 17, 2012

If you end up with time to work on it, I am happy to discuss the upsides and downsides with you, and explain why we made the decisions that we did. It would definitely be wonderful to make things faster.

aspiers commented Mar 4, 2013

@p is absolutely right to have requested this.

At the moment I am running bundle install multiple times per day, due to (a) adding new gems to projects and (b) testing automated setup of a test environment with a dedicated BUNDLE_PATH. I absolutely do not care if the upstream metadata changes during this period, I just want a reasonably recent version of the gems, as fast as possible. Having to download the same metadata each time is an annoying waste of time. If I could use metadata which is a day old in order to eliminate this 10-20 second delay, then that's a great trade-off in my book.

This is not an un-common use case, and certainly more common than the case where a gem author uploads a new version and immediately wants to bundle install it from upstream (since standard developers outnumber gem authors by several orders of magnitude). But even the latter was more common than the former, it would still be worth implementing a metadata cache, since any remotely sensible implementation would make it optional. The only debate is whether it should be on by default or not. I'd say probably not, because of the stated concerns about potentially confusing users who are used to the existing behaviour.

Owner

indirect commented Mar 4, 2013

@aspiers why don't you just run bundle install --local?

aspiers commented Mar 4, 2013

@indirect because that won't install new gems.

Owner

indirect commented Mar 4, 2013

Wait -- you have new gems, but you don't want Bundler to check with the server to get info about them? I'm confused :/

On Mar 4, 2013, at 1:54 PM, Adam Spiers notifications@github.com wrote:

@indirect because that won't install new gems.


Reply to this email directly or view it on GitHub.

aspiers commented Mar 4, 2013

Correct, because that info (i.e. gem metadata) should already be cached locally. That's the whole point of this issue - to implement a metadata cache.

Owner

indirect commented Mar 4, 2013

Right �\ there are some possible changes to rubygems coming that would make this much easier, and if that happens Bundler will effectively get this "for free".

In the meantime, back to my offer to work with anyone who wants to implement this. :)

On Mar 4, 2013, at 2:39 PM, Adam Spiers notifications@github.com wrote:

Correct, because that info (i.e. gem metadata) should already be cached locally. That's the whole point of this issue - to implement a metadata cache.


Reply to this email directly or view it on GitHub.

aspiers commented Mar 4, 2013

Sounds awesome, I can wait I guess ;-)

Contributor

xaviershay commented Aug 20, 2013

No next steps on this ticket. @indirect is actively working on making this better, small patches welcome in the mean time.

@xaviershay xaviershay closed this Aug 20, 2013

aspiers commented Aug 20, 2013

Why was this closed?? It's not resolved, and without an open ticket it could get forgotten.

Contributor

xaviershay commented Aug 20, 2013

  1. I don't consider it a bug, see https://groups.google.com/forum/#!topic/ruby-bundler/IxIF5pcANTk
  2. Good ideas don't get forgotten.
  3. We're already working on features that will make this better.

aspiers commented Aug 20, 2013

  1. I see. Agreed, it's a feature - so the right thing to do is to add an entry to the bundler-features issue tracker.
  2. Well, that's highly debatable, and it depends on who is try to remember them ;-) But regardless, there is value in having it in the issue tracker so that newcomers to the project can see it (and who knows, even help implement it ...)
  3. Cool :-) But what do you mean by "better"? Either the metadata gets cached, or it doesn't. I find it hard to imagine a half-way house.
Owner

indirect commented Aug 20, 2013

@aspiers I am currently working on the new index (with cached metadata) directly, funded by a grant from Ruby Central. I have to send them progress reports every other week, and completing that task is a condition of my receiving the grant money. It's not going to get forgotten. :)

aspiers commented Aug 20, 2013

@indirect Awesome! :) I still think it's best practice to track progress in an issue though. But it's up to you ...

nickl- commented Sep 12, 2013

@indirect Congrats on the grant! Wheeee \o/

I'm with @aspiers on the open issue where else can we track your progress are those weekly updates published somewhere?

I may just borrow what you're doing for aero too if you don't mind =)

Good luck!

Owner

indirect commented Sep 12, 2013

Nick, we've been posting about our work to the ruby-bundler google group, which also acts as the Bundler mailing list.

On Thu, Sep 12, 2013 at 12:33 PM, Nick Lombard notifications@github.com
wrote:

@indirect Congrats on the grant! Wheeee \o/
I'm with @aspiers on the open issue where else can we track your progress are those weekly updates published somewhere?
I may just borrow what you're doing for aero too if you don't mind =)

Good luck!

Reply to this email directly or view it on GitHub:
#2206 (comment)

nickl- commented Sep 18, 2013

Who doesn't get enough mail already?!? Don't under estimate the benefit of google search listings.

Try: ruby-bundler cache meta data

See what I mean? That was even using the keyword you suggested.You're not going to beat the SEO you're already receiving here.

@indirect Open the issue again.

P.S> and linking to where the progress updates are posted won't hurt either =)

Owner

indirect commented Sep 18, 2013

If you would like to read about our progress as we work to create a completely new server and client gem index format (that will be cached, as well as being more efficient in other ways), you are welcome to check out the monthly Bundler team update posts on the Bundler mailing list.

This issue tracker is explicitly for bugs in bundler. A lack of metadata caching is not a bug, and so this ticket will stay closed, and I won't reply to it again.

aspiers commented Sep 18, 2013

@nickl- is right, a mailing list is nowhere near granular enough to be able to subscribe to individual topics without getting spammed by a load of stuff of no relevance or interest. And there's a reason github called it an issue tracker rather than a bug tracker ... oh well. I won't follow your progress then, I'll just look forward to hopefully noticing at some point in the future that bundler magically no longer downloads metadata over and over again for no good reason. Good luck with your work on this! I'm now unsubscribing from this issue.

nickl- commented Sep 25, 2013

@aspiers I thought about it and we don't really need an "open" ticket to duplicate the updates here. Who ever sees it first can just repost, no skin of my back.

September update

repost: https://groups.google.com/forum/#!topic/ruby-bundler/DDhKQCxgwN4

Hello and welcome to the monthly bundler update!

Here you can find what the core team has been up to and what we are planning for the future. We had a great month getting ready for the 1.4 release, making progress on new index improvements, and the RSoC team hit their stride with a heap of patches, UI polish, and documentation contributions.

Last month

  • Andre Finished a working implementation of the new index format for versions and dependencies.
  • Hemant Thread safety fixes for parallel install [1].
  • Hemant Reviewed changes around handling of file descriptions in Ruby 2.
  • Jessica Mentoring and guidance for new contributors.
  • Jen A plethora of http://bundler.io improvements, making it more useful and easier to find key information.
  • Jen & Joice More helpful messages in CLI, such as when running bundler outdated with invalid gem, bundling gems from git when git isn't available.
  • Joyce Include the old version of an updated gem to the output of bundle update
  • Larry Testing serving of gems from fastly.com Initial results are positive. [2]
  • Terrence Added support for specifying a patch level of Ruby in Gemfile [3].
  • Terrence Basic prototype for serving the new index format.
  • Xavier Triaged all outstanding old issues, 41 still current.

Next month

  • Andre Running the Bundler resolver using only objects created from the new index format.
  • Larry & Terrence Updating bundler-api to the new index format.
  • Xavier Get 1.4 out the door.

Activity Summary

https://github.com/bundler/bundler/pulse/monthly
14 PRs merged, 4 new outstanding.
71 closed issues, 4 new outstanding.
25 authors, 80 commits.

Cheers,
Xavier and the Bundler team

@aspiers aspiers referenced this issue in mitchellh/vagrant May 27, 2014

Closed

winrm dependency causes libvirt plugin to fail #3897

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment