Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache bodies to avoid reparsing #186

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@wonnage
Copy link

wonnage commented Mar 6, 2019

Attempt to address #185 by caching JSON.parse results

@ljharb
Copy link
Collaborator

ljharb left a comment

What's the actual performance difference here?

Parsing JSON is very fast, and if any changes are going to be made solely because of performance, they'll need extensive benchmarks.

@@ -25,6 +25,8 @@ var defaultIsDir = function isDirectory(dir) {
return stat.isDirectory();
};

var pkgCache = {};

This comment has been minimized.

@ljharb

ljharb Mar 6, 2019

Collaborator

This introduces state into what was previously a pure module. I'm not sure this is a good idea.

This comment has been minimized.

@wonnage

wonnage Mar 6, 2019

Author

Hmm, I could pull this logic into a separate module, we basically just need a memoized version of JSON.parse.

@ljharb

This comment has been minimized.

Copy link
Collaborator

ljharb commented Mar 6, 2019

Another alternative is to use require instead of readFile, which will leverage the require cache (altho it'll lose the ability for a package.json file to be modified during the process)

Victor Lin added some commits Mar 6, 2019

Victor Lin
Victor Lin
@wonnage

This comment has been minimized.

Copy link
Author

wonnage commented Mar 6, 2019

@ljharb I added some instrumentation to the PR and ran the following script in the airbnb repo:

https://gist.github.com/wonnage/358fdac4d49fda83b2d9796f0212acd8

This resolves a path 10k times, which seems to be approximately on the order of number of modules in our webpack build.

I also made a small change to hash the contents instead of using the raw file contents as the cache key. The cached version is 6x faster. Complexity probably scales with package.json size (we have a 48kb package json file).

⇒  time node benchmark.js
5200.934117999997
node benchmark.js  6.37s user 0.84s system 100% cpu 7.189 total
⇒  time node benchmark.js
5234.587008999995
node benchmark.js  6.42s user 0.87s system 99% cpu 7.290 total
⇒  time node benchmark.js
5195.409848000002
node benchmark.js  6.35s user 0.86s system 99% cpu 7.211 total
⇒  time CACHE=1 node benchmark.js
842.385424000001
CACHE=1 node benchmark.js  2.10s user 0.82s system 101% cpu 2.885 total
⇒  time CACHE=1 node benchmark.js
819.7484239999997
CACHE=1 node benchmark.js  2.02s user 0.79s system 101% cpu 2.757 total
⇒  time CACHE=1 node benchmark.js
835.9143549999982
CACHE=1 node benchmark.js  2.08s user 0.82s system 101% cpu 2.868 total
⇒
lib/sync.js Outdated
@@ -101,6 +105,20 @@ module.exports = function (x, options) {
}
}

function parsePkg1(body) {
var key = crypto.createHash('md5').update(body).digest('hex');

This comment has been minimized.

@lencioni

lencioni Mar 6, 2019

I think md4 would be sufficient here, and also be faster.

Victor Lin
@wonnage

This comment has been minimized.

Copy link
Author

wonnage commented Mar 6, 2019

I ran some more tests using the Benchmark npm module:

https://gist.github.com/wonnage/e33c9cb1fe9587236eb1ec11afd43772

uncached x 1,142 ops/sec ±5.57% (80 runs sampled)
cached x 4,022 ops/sec ±3.03% (83 runs sampled)
Fastest is cached
Victor Lin
@ljharb

This comment has been minimized.

Copy link
Collaborator

ljharb commented Mar 7, 2019

ok, so in that example it's 4x faster - whether it's 4x or 6x, what savings does this amount to in real time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.