Introduce scheme to `npm dedupe` npm dependencies of Atom packages #8217

Open
bolinfest opened this Issue Aug 4, 2015 · 9 comments

Projects

None yet

7 participants

@bolinfest
Contributor

Currently, the organization of the ~/.atom/packages directory makes it impossible to dedupe dependencies, potentially causing a lot of bloat.

Specifically, in the case of Nuclide, we have our nuclide-installer package that is used to install our collection of Nuclide packages. In the most recent release of Nuclide (v0.0.26), there are 17 such packages to install: https://github.com/facebooknuclideapm/nuclide-installer/blob/v0.0.26/lib/config.json.

The way nuclide-installer works is that it starts up and then runs apm install for each package in that config.json file. On my machine (where ~/.npm was already populated with all of the dependencies used by these packages), it took nuclide-installer 6 minutes to install all 17 Atom packages and the resulting ~/.atom/packages directory was 903M! That's enormous! This is bad for several reasons:

  • 903M is an order of magnitude more disk space than is necessary if dependencies could be de-duped.
  • The installation process is much slower than it has to be.
  • Much more JavaScript code has to be loaded at runtime.
  • I suspect that the JavaScript in ~/.atom/packages is not loaded as efficiently as the JavaScript in /Applications/Atom.app/Contents/Resources/app.asar, putting third-party code at even more of a disadvantage.

I suspect that the only thing I can do right now without any changes to Atom is have ~/.atom/nuclide-installer create a special directory structure under itself. Maybe something like:

~/.atom/nuclide-installer/node_modules/
~/.atom/nuclide-installer/node_modules/atom_packages/

where npm packages common to multiple Nuclide Atom packages would be written to ~/.atom/nuclide-installer/node_modules/ (obviously there could only be one version per common dependency) and then each Atom package would be in its own directory under ~/.atom/nuclide-installer/node_modules/atom_packages/. Under that scheme, only an npm dependency that already had an entry under ~/.atom/nuclide-installer/node_modules/ would also have to exist under ~/.atom/nuclide-installer/node_modules/atom_packages/<ATOM PACKAGE>/node_modules if a different version were required.

Once everything was in place, nuclide-installer would run apm link for each entry in ~/.atom/nuclide-installer/node_modules/atom_packages.

While I'm rubber ducking, I would probably add another directory of indirection so that the directory structure would be parameterized by version number:

~/.atom/nuclide-installer/v0.0.26/node_modules/
~/.atom/nuclide-installer/v0.0.26/node_modules/atom_packages/

This way, nuclide-installer could be writing out v0.0.27 and would not switch over until all of the files were in place.

This is really important for Nuclide, so I'm willing to hack this into nuclide-installer, but it would be nice if there were a way for Atom to support this natively so that others could also provide package bundles efficiently for Atom.

@paulcbetts
Contributor

I suspect that the JavaScript in ~/.atom/packages is not loaded as efficiently as the JavaScript in /Applications/Atom.app/Contents/Resources/app.asar, putting third-party code at even more of a disadvantage.

This is definitely true, especially on Windows, calling CreateFile on like 82023402343x JS files is pretty bad for perf

@nathansobo
Contributor

We could de-dupe modules common to all packages by storing them in ~/.atom/packages/node_modules. Of course, this will only work when all packages share a common version. I don't see a great way of automatically grouping packages into directories as you're doing for the special case of Nuclide, however.

This really makes me wonder about doing something that isn't based on the file system in terms of reusing required modules across the entire application. The problem with such a change is it breaks the simple semantics of Node's require system, though we've already gone a step toward that with asar bundles.

@bolinfest
Contributor

Hmm, if we had:

~/.atom/packages/hyperclick

as a symlink to (that was created via apm link):

~/.atom/packages/nuclide-installer/v0.0.27/node_modules/atom_packages/hyperclick

would it look in ~/.atom/packages/nuclide-installer/v0.0.27/node_modules for dependencies? Maybe we would have to use ~/.atom/packages/node_modules or ~/.atom/node_modules after all.

The thing I dislike about that is that while in the process of going from v0.0.26 to v0.0.27, the versions of our nuclide- packages in ~/.atom/node_modules would not match what was needed by the Atom packages during that transition. If that process fails, then things are in a bad state.

@bolinfest
Contributor

@nathansobo Is there any way that third-party packages can leverage asar bundles to get some sort of benefit?

@nathansobo
Contributor

@nathansobo Is there any way that third-party packages can leverage asar bundles to get some sort of benefit?

I haven't investigated but it sounds promising. @kevinsawicki would have more information about what it would take to create an asar bundle for multiple or individual packages.

@methodbox

Why don't you check for an existing npm package that matches during your install? If the version check passes, use the existing module, if not run apm install.

I've also run into similar issues where I need to use different versions of some packages and the same on others. I created a symlink between folders so the program didn't know it was checking another folder.

@bolinfest
Contributor

@methodbox The problem is that if you are using another package's npm dependency, and that package upgrades itself, it could break you, no?

@UnsungHero97

any updates?

@dhaupin
dhaupin commented Dec 30, 2015

Not trying to beat a known horse (src install should work, right), but I'm running into install time far longer than 6 mins due to capped I/O when trying nuclide-installer. Although it hasn't fully installed yet, its happening on Windows 10 via Atom GUI. I started the install at aprx ~10:20am, and now it's a little over halfway done at ~11:40am. Considering its mentioned that there are only 17 packages within that meta plugin, this seems wonked.

"Evented I/O for V8 JavaScript (32 bit)" seems to be the culprit using 100% disk. Im seeing it cap out above 6 MB/s with a sustained CPU between 30-80%. Considering this is a spinner drive (100+gb free), its very much affecting the performance of the whole PC.

Albeit it non SSD, this is a 64bit 8gb 3k machine, it's not like it's too underpowered to run an editor. I concur it's all the files in nuclide-installer meta, although perhaps Evented I/O running 32bit instead of 64bit could compound that effect.

evented

EDIT: after installing, Atom crashes out totally spikes I/O way higher peaking 15 MB/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment