Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Cache requires across installed packages #3761

Merged
merged 89 commits into from Oct 15, 2014
Merged

Conversation

kevinsawicki
Copy link
Contributor

The Problem

Atom packages installed to ~/.atom/packages do not share any requires with core or the packages bundled with Atom.

For example, imagine a package has a dependency on underscore 1.7.

When that package does a require('underscore'), underscore will be required and loaded out of the packages node_modules folder even if Atom core has this same dependency and has already required underscore@1.7.0. Two copies of the exact same module are required and loaded instead of them sharing that exact same dependency.

So if underscore 1.7 takes 10 milliseconds to require and 10 packages you've installed are using it, Atom will spend 100 milliseconds at startup requiring 10 identical copies, instead of requiring it once and sharing that copy among the 10 packages.

This has led package authors to do the following performance hacks:

require(path.join(atom.getLoadSettings().resourcePath, 'node_modules', 'underscore'))

This requires it directly out of core and so it will only loaded once since the full path is given to require.

This hack has the downside of packages breaking when Atom core upgrades or removes dependencies.

The Solution

Patch node's Module to allow packages to share modules.

So if a package is dependent on underscore >= 1.5 and underscore 1.7 has already been loaded by core or another package, that already required underscore version will be used instead of loading a new copy out of the package's node_modules folder.

This is accomplished by doing the following:

  • Precompute all the dependencies a package ships with when it is installed so Atom can track which version of modules are available across all packages and core.
  • Precompute what modules each folder in a package depends on. That way when any file is required in that folder, Atom can immediately know what module versions it needs and return a version from the cache if a compatible version has already been required.

So by patching Module._resolveFilename, Atom can find a compatible version of a module already loaded without stat-ing or reading any files on disk.

It instead uses the precomputed dependency graph to be able to immediately resolve a module name and version to the path of a compatible version that has already been required.

Packages won't have to do anything to take advantage of this, it will be automatically detected during apm install and handled transparently.

Packages will still always get a version of the module that satisfies the dependency range listed in their package.json file.

The apm changes are in atom/apm#200

The Results

To verify that this approach improved startup time I decided to measure the load and activate time for several packages. These packages were chosen because they are all dependent on at least one module and they have similar dependencies to the modules bundled with Atom and/or each other.

Before

Package Load/Activate Time in Milliseconds
atom-color-highlight 176
hex 12
minimap 97
minimap-color-highlight 108
minimap-git-diff 17
minimap-selection 4
Total 414

After

Package Load/Activate Time in Milliseconds
atom-color-highlight 31
hex 2
minimap 69
minimap-color-highlight 7
minimap-git-diff 4
minimap-selection 3
Total 116

/cc @abe33 since I used a bunch of your packages in my testing

Closes #3672

@kevinsawicki kevinsawicki changed the title Cache requires across installed package Cache requires across installed packages Oct 8, 2014
@abe33
Copy link
Contributor

abe33 commented Oct 8, 2014

That's a shocking improvement!

I'm not very familiar with that specific feature in npm, but doesn't that sound like peer dependencies?
I was wondering if I could use that to get rid of the oniguruma compatibility issues I get when node version is upgraded, but this is fine too, and doesn't require changes in packages so people will immediately feel the boost in startup time.

On a side note, I'm currently in the process of refactoring pigments (used in abe33/atom-color-highlight) to use the native RegExp class. The performances so far are noticeably better, even if there's some adjustments to make for really large files containing hundreds of colors (synchronous parsing is less than ideal in this case). And the package won't load oniguruma so it should have improved startup, but it doesn't matter anymore.


loadDependencies(childPath, rootPath, rootMetadata, moduleCache)

undefined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of our code uses return in these instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, yeah, I feel like I've seen both around, and even sometimes null, I'll switch to return and update the CONTRIBUTING.md

@kevinsawicki
Copy link
Contributor Author

I'm not very familiar with that specific feature in npm, but doesn't that sound like peer dependencies?

Sort of, but this way has a nice side-effect that installing any two packages where dependencies overlap will result in a single require per common dependency. I don't think peerDependencies was intended for this kind of reuse.

@mark-hahn
Copy link
Contributor

Will there be only one module initialization per Atom load? If so then the Module object will be shared by all requiring modules, right? I may be confused but I thought that many times the module being used would contain state meant for a single user module. The state could be in a closure or a property such as a class property. There is a potential for breakage here.

@kevinsawicki
Copy link
Contributor Author

I may be confused but I thought that many times the module being used would contain state meant for a single user module.

Modules are shared already, all of the node built-ins for example.

Each time you do require('fs') you get back the exact same object, no matter which file you require it from.

@mark-hahn
Copy link
Contributor

all of the node built-ins for example.

Yes but they were designed with the knowledge they would be used that way.

Each time you do require('fs') you get back the exact same object

Yes, that it what I am concerned about. The module object has only been
shared in single applications in the past where that is expected. It is
sometimes counted on.

I apologize if I'm causing FUD. I know I will need to refactor some of my
modules but it will be worth it for the great load-time improvement. The
time spent on refactoring will be quickly regained by the faster loading.
:-)

On Wed, Oct 8, 2014 at 2:27 PM, Kevin Sawicki notifications@github.com
wrote:

I may be confused but I thought that many times the module being used
would contain state meant for a single user module.

Modules are shared already, all of the node built-ins for example.

Each time you do require('fs') you get back the exact same object, no
matter which file you require it from.


Reply to this email directly or view it on GitHub
#3761 (comment).

@kevinsawicki
Copy link
Contributor Author

The module object has only been shared in single applications in the past where that is expected.

I don't think this is accurate. For example there is the npm dedupe command which moves modules around for the specific reason to allow the same module exports to be shared. The fact that this is a first-class construct in npm makes me believe that module exports should be assumed to be shared.

When you require a module, npm looks in roughly 10 directories for it, so module locations shouldn't be assumed and therefore you can't assume you are the only one using a module.

@mark-hahn
Copy link
Contributor

Having different apps load the npm module from the same directory is not
related to my concern. I am only concerned about the shared module object
and shared closures after loading the module. This is definitely a kind of
"leak" although it probably won't be a big problem. I am looking through
my modules now. So far I've only come up with performance problems.

On Wed, Oct 8, 2014 at 3:18 PM, Kevin Sawicki notifications@github.com
wrote:

The module object has only been shared in single applications in the past
where that is expected.

I don't think this is accurate. For example there is the npm dedupe
https://www.npmjs.org/doc/cli/npm-dedupe.html command which moves
modules around for the specific reason to allow the same module exports to
be shared. The fact that this is a first-class construct in npm makes me
believe that module exports should be assumed to be shared.

When you require a module, npm looks in roughly 10 directories for it, so
module locations shouldn't be assumed and therefore you can't assume you
are the only one using any module.


Reply to this email directly or view it on GitHub
#3761 (comment).

@lee-dohm
Copy link
Contributor

lee-dohm commented Oct 9, 2014

Amazing performance improvement! Great work!

@mark-hahn
Copy link
Contributor

After going through a number of modules I'd like to retract my concern. I haven't been able to find a module that will break. Most use the idiom of returning a class to instantiate which gives everyone their own state object. Even ones that use functional-style closures instead of OO are creating separate closures. I also realize Atom can be considered one app and there are many other ways for packages to collide.

I still stand by the technical arguments but I'm confident conflicts will be rare. I'm sorry to bother everyone.

@kevinsawicki
Copy link
Contributor Author

I'm sorry to bother everyone.

No worries, the more 👀, 👂, and 💭 on a change as drastic as this, the better.

@philipgiuliani
Copy link
Contributor

Amazing!! :) Its getting merged before the next release?

@kevinsawicki
Copy link
Contributor Author

Its getting merged before the next release?

Not 100% sure, yet, we may cut another release today in which case it wouldn't be in there.

But it should be merged and released next week hopefully.

I'm going a little further and now that there is a strategy for quickly resolving module paths, I'm investigating some speedups for requiring relative paths (i.e require('./my-view') since those paths can also be precomputed and cached during the build.

I'll post an update with my findings in the next couple hours, just wrapping it up now.

@kevinsawicki
Copy link
Contributor Author

After looking into speeding up relative paths requires such as require('./my-view') I've tweaked the cache to use a precomputed map of absolute paths so relative paths can be resolved without stat-ing the filesystem.

Node currently handles this in their the Module._findPath function. It loops over a list of possible file extensions and stats them to see if they exist. By using an in-memory cache, it greatly reduces the number of stat calls and time spent stat-ing.

Before

Calls to Module._findPath: 531
Time spent in Module._findPath: 144ms

After

Calls to Module._findPath: 2
Time spent in Module._findPath: 3ms

@benogle
Copy link
Contributor

benogle commented Oct 9, 2014

SHIIIIIIIIT 🎉

@benogle
Copy link
Contributor

benogle commented Oct 9, 2014

What's total startup time now?

@50Wliu
Copy link
Contributor

50Wliu commented Oct 10, 2014

😮 Wow. So excited for this-I can't wait to experience the difference first hand! 😄

registered: false
resourcePath: null

# isAbsolute is inlined from fs-plus so that fs-plust itself can be required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo? (fs-plust)

@lee-dohm
Copy link
Contributor

After

Calls to Module._findPath: 2
Time spent in Module._findPath: 3ms

Holy-oke Massachusetts Batman!

@thomasjo
Copy link
Contributor

Niiiiiiice! 💖

@kevinsawicki
Copy link
Contributor Author

So another thing in this PR, I've adjusted the window load time calculation to be more accurate.

Previously the window load time shown in the console and time cop was measured in the bootstrap script so it didn't include parsing the load settings, requiring coffeescript, requiring atom shell modules, etc.

Now it does so it should now correctly represent the time spent between the onload handler being called and the window being ready to use.

@philipgiuliani
Copy link
Contributor

Im already very curious if i can feel a speed improvement after the release :) 🙊 💥

kevinsawicki added a commit that referenced this pull request Oct 15, 2014
Cache requires across installed packages
@kevinsawicki kevinsawicki merged commit b2b4860 into master Oct 15, 2014
@kevinsawicki kevinsawicki deleted the ks-require-cache branch October 15, 2014 21:00
@kevinsawicki
Copy link
Contributor Author

Okay, this will be in the next Atom release, 0.138

@50Wliu
Copy link
Contributor

50Wliu commented Oct 15, 2014

💯 👍 🎆 🎉 💨

@kevinsawicki
Copy link
Contributor Author

So apm install now adds cache details to the installed package's package.json file.

You can run apm rebuild-module-cache to generate this cache information for all installed packages.

This will be available once Atom 0.138 is out since it ships with apm 0.104

@RPDeshaies
Copy link

🏆 My load time passed from ~22 seconds to 3.64 seconds. Awesome

@magpie514
Copy link

👍 It has improved a lot on my machine as well. Now takes as long as the IDE I used to use, like two seconds, after a fresh boot. Excellent work!

@JoshCheek
Copy link

Noticably faster. This is great, ty!

@postcasio
Copy link
Contributor

👏 so much faster!

@langri-sha
Copy link

Seeing ~40% speedups on a commodity SSD. 💨 significant!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reuse modules across installed packages