New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
babel-register cache grows infinitely and breaks v8 #5667
Comments
Hey @jamietre! We really appreciate you taking the time to report an issue. The collaborators If you need any help, or just have general Babel or JavaScript questions, we have a vibrant Slack |
@jamietre There are definitely a number of issues with the caching as it currently exists. I'm not sure if this is why you're running into this issue, but using babel 6, all projects, run in all environments (NODE_ENV=mocha/development/production) will share a single file. Splitting up the files by environment happened here: #5411, and using a location specific to each project was added here: #5669. These should "solve" the issue in practice (For example at my job we have a ton of modules, some that are very large, and these fixes solved the immediate issue without having to delete .babel.json every so often). These are of course just stopgaps and won't actually completely address the issue. You'll still be able to recreate if you really try. I get the impression there probably won't be much work on improving the cache in any major way until a decision about how to unify it with the babel-loader caching, and I think there is some desire to standardize around a caching strategy that can be used by other open source libs like ava. Here's some background #5372. In the short term, here's what we've done at work to stop the bleeding...turns out deleting your .babel.json file every couple days wasn't a satisfactory suggestion for most folks ;)
const env = process.env.BABEL_ENV || process.env.NODE_ENV || 'development';
process.env.BABEL_CACHE_PATH = process.env.BABEL_CACHE_PATH || findCacheDir({ name: 'babel-register', thunk: true })(`${env}.json`);
require('babel-register')({
...
}); Then just call this file instead of babel-register. Good luck! |
Seems like the try/catch arround the JSON parse to avoid such issues doesn't work 🤔 |
In my situation changing caching to per-environment would not likely make much difference: the only environments used are Changing it per-project will probably help, though the reality is that most of the weight comes from a single really big project with a large number of tests. So this might slow the bleeding a bit. But even just with a couple days since I deleted my cache it's back up to 80 megabytes. It only took 200 to break The goal of a more holistic approach to caching make sense, but since this could clearly be a while to resolve, are you open to considering any other options in the meantime? What about storing metadata only in the memory cache and reading/writing the cache entries as individual files (perhaps arbitrarily grouped into folders to make fs indexing more efficient for large # of files) in the cache directory or something? This would not take much effort to implement and would eliminate the large memory demands for the cache. A quick google shows a number of caching packages in the ecosystem with some street cred that could be leveraged... |
@xtuc Given that the cache was created by a process that had enough memory, I'd be sort of surprised if a cache was ever large enough to cause issues. My guess is the problems occur because loading the cache has to happen before the majority of the process runs, so its possible there just isn't much memory left for the actual execution. @jamietre I've tried implementing the separate files approach here: #5211, but I couldn't come up with any convincing benchmarks. I ended up scrapping it to do the more obvious improvements like separating sections of cache that would never be used at the same time. I don't think that it would actually help your total memory usage at this point once you've seperated projects/envs. Might help with the long json load time though. I also had/have a couple PRs that would pave the way for pluging in custom/external caches: #5402 / #5439, though I think I've convinced myself its worth standardizing around something before exposing any cache related apis. I don't really know what goes into a decision like that, but maybe helping on that front would be the way to go. |
The failure happens with native V8 edit actually - i remember that when it came to a head, I discovered while trying to trace the problem that I couldn't even run an empty mocha test or run mocha with a glob that matched nothing. So GC/mocha has nothing to do with it. I didn't see the results of your benchmarking in that thread, but am interested, however I saw @xtuc comment in #5211
The answer to this seems pretty clear -
I'd be extremely surprised you couldn't prove this out with benchmarks. After I blew away my ~200 meg cache, and ran once to rebuild it, my performance improved by about tenfold. When this first started happening I realized I could use |
.. actually a good implementation of this would not even require loading metadata at all -- you could just keep the cached data in a path parallel to the input file. The first time you need something, if you don't have it in memory yet, just try to read it from the calculated file path ;) This should in theory completely eliminate startup overhead, and dramatically reduce shutdown overhead of writing changed things. |
@jamietre I didn't mean to give the impression I had done a lot of work to profile it. I wouldn't be surprised if you could show significant improvements. I just sort of ran out of stream, and couldn't figure out a general benchmark setup that wasn't super specific to my work environment. I certainly agree with the approach though. Good luck! |
Oh - sorry I misunderstood, I thought you meant you had done some profiling and didn't find much benefit! So what do you think.. is this the kind of thing that would be accepted? I would love to write the code (if some existing community package doesn't already do something similar), if there's some agreement that it's a worthwhile effort and would be merged. |
Thanks for the investigation on this, @jamietre! We also had this issue come up on our build servers. Our solution was to change our automated build to use the |
FYI in babel 7, the cache should go in |
This is causing me endless frustration with the multitude of tools still using babel 6.. It takes 60s for |
@hzoo My job has given me the rest of the week to upgrade to babel7 and try to improve the caching for our use cases. Outside of just creating benchmarks and actually implementing a faster cache, are there things that we need to keep in mind? I know there was some talk of using the same caching logic as another project (ava or jest or other?). Is there still a desire to share that logic or is it acceptable to have a babel specific implementation? Currently these are the things I'm going to try:
I'd love some feedback on these ideas. We're hoping to get something done and submitted for review by the end of the week. Thanks! |
@pwmckenna I've been running into a similar situation where I'd like to have a somewhat "portable" cache for babel in order to be able to reuse a pre-built cache in multiple environments. Might be an edge-case but I'm wondering if have found a solution to deal with the absolute paths in the cache file? Also I couldn't find your PR for the custom cache. Can somebody point me towards any of that kind of work or any Babel feature / setting that is in-place since then that I might have missed that would allow the cache to be using relative paths in order to become portable? Would like to state I'd happily chime in on helping with this future if there is need. Thanks |
I would hope it's immediately obvious to anyone that one big fat json file is not a viable long-term strategy for fast startup |
Can we safely delete this file? |
Since the state of the CodeWARNING: This is not a proper fork. I know. Bad. Please see notes below. How to test it?The quick-and-dirty approach is to:
Of course this is just a hacky, temporary solution. If there is interest, I can put together a patch and even a small patch script. Some notes
Big warningI am in a hurry, so I just copy+pasted a mix of original (
However, if someone helps with setting up a FORK and adds tests, and the team agrees with the approach, I am sure we can get the PR out within an hour. Any feedback is welcome. |
I tried like above @Domiii yarn run v1.22.15 Why you should do it regularly: Why you should do it regularly: |
As you can see from that verbose message, it found that the file was cached but the original was modified, which requires parsing it again. Caches only provide performance benefits if input files are not changed between two consecutive executions. |
@Domiii thanks for your reply.
How can i fix this issue? |
My hacky fix for this is to modify - serialised = JSON.stringify(data, null, " ");
+ serialised = JSON.stringify(data); When I deleted
I do not know if it's the reduction in memory size specifically, or if node/v8 has a different code path that is more/less likely to result in issues depending on the added arguments. If this change can reduce the chance of issues, I do believe it's a good short term fix, because cache does not need to be stored in a human readable format. I've opened PR #14300 to address this. |
You can use And can use gzip to compress. |
If The issue is memory usage. The cache object, which is indefinitely growing, is turned into JSON in memory to be persisted to disk. Once past a certain size, it exhausts JS's heap space causing a fatal exception. gzip would likely add to the memory usage since the problematic JSON would still be entirely present in memory. |
When serializing 255mb of text with the
255mb cache is big enough, so it's a good short term solution. Unless we are going to rewrite a caching system soon. |
const v8 = require('v8')
console.time("init");
var a = '"'.repeat(Math.pow(2, 28) - 16 - 1);
var b = 'a'.repeat(Math.pow(2, 28) - 16 - 1);
v8.deserialize(v8.serialize(a)); //Force the string to be initialized.
v8.deserialize(v8.serialize(b));
console.timeEnd("init");
global?.gc()
console.log("used_heap_size",v8.getHeapStatistics().used_heap_size/1024/1024);
console.time("v8");
v8.deserialize(v8.serialize(a));
console.timeEnd("v8");
global?.gc()
console.time("JSON");
JSON.parse(JSON.stringify(a));
console.timeEnd("JSON");
global?.gc()
console.time("v8");
v8.deserialize(v8.serialize(b));
console.timeEnd("v8");
global?.gc()
console.time("JSON");
JSON.parse(JSON.stringify(b));
console.timeEnd("JSON");
It looks like the performance boost is amazing! :) |
Just to circle back to this: I can report that my rewrite works very well. Have been using it in Dbux without any problems thus far. Features
Problems that need fixing
|
I tried to run your sample code with the value const v8 = require('v8');
console.time("init");
var l = Math.pow(2, 22) - 16 - 1;
var a = Array.from({ length: l }, () => ({ x: 0 }));
v8.deserialize(v8.serialize(a)); // Force the object to be initialized.
console.timeEnd("init");
global?.gc()
console.log("used_heap_size", v8.getHeapStatistics().used_heap_size / 1024 / 1024);
console.time("v8");
v8.deserialize(v8.serialize(a));
console.timeEnd("v8");
global?.gc()
console.time("JSON");
JSON.parse(JSON.stringify(a));
console.timeEnd("JSON");
|
Nice test! Obviously v8 has high performance for large text and low performance for objects. console.time("init");
var a// = '"'.repeat(Math.pow(2, 28) - 16 - 1);
var b// = 'a'.repeat(Math.pow(2, 28) - 16 - 1);
a = Array.from({ length: 1024 * 1024 }, () => ({ x: 'a'.repeat(128) }));
b = Array.from({ length: 1024 * 128 }, () => ({ x: 'a'.repeat(1024) }));
a = Object.assign({}, a);
b = Object.assign({}, b);
v8.deserialize(v8.serialize(a)); //Force the string to be initialized.
v8.deserialize(v8.serialize(b));
console.timeEnd("init");
|
The benchmarking should be done on an actual 100mb or larger cache file. Any synthetic dataset depends on assumptions. I can't provide a sample cache file because it's for a close source program. |
Can the new solution please store results file-by-file, rather than everything in one big blob? E.g. using some simple magic like this (it works): function makeCacheFilename(opts) {
const srcFilename = opts.filename;
if (!isSubPathOf(srcFilename, cacheRoot)) {
// eslint-disable-next-line max-len
console.warn(`[@babel/register] Could not cache results for file "${srcFilename}" because it is outside of sourceRoot ${cacheRoot}. Please set accurate "sourceRoot" in your babel config manually.`);
return null;
}
const relativePath = path.relative(cacheRoot, srcFilename);
return path.resolve(cacheDir, getEnvName(), relativePath) + '.babel.json'; // (can also store in *.js format, to make it more readable)
} (PS: it would probably be also a good idea to not cache very small input sizes at all? But, then again, the speed tradeoff (cache vs. transform again) depends on the complexity of the plugins involved.) |
@jlennox On average, having a cache file as large as 100mb is unrealistic, if you cache files individually. Sure, some libraries might have several MB in a single file, but that's the exception, especially since most of the files in My points:
|
Choose one: is this a bug report or feature request? a bug
Expected Behavior
By default,
babel-register
creates a cache in the user's home directory,.babel.json
. This cache appears to be unmanaged based on looking at./babel-register/lib/cache.js
. The cache should manage itself to avoid growing to an extremely large size.Current Behavior
I started experiencing v8 crashes when running mocha tests using
--compilers js:babel-core/register
as below:I traced these to
babel-register/lib/cache.js
code callingJSON.stringify
on the cache object.My
.cache.json
was over 200 megabytes. Deleting it immediately resolved the problem.Possible Solution
Context
Prevents inline transpilation from working properly, and performance suffers significantly as the cache size grows and each operation requires reading/writing a huge file.
Because it's very difficult to trace the source of v8 crashes, this is a rather insidious bug. There is at least one other bug report in a random package that is almost certainly this issue:
caolan/async#1311
This would primarily become an issue for people running large test suites using
babel-register
in a single environment that is never purged (e.g. a dev workstation). I expect even though it may not manifest very often, there are certainly performance and stability implications for a large number of users for never pruning the cache.Your Environment
Windows 10
Node 6.10.2
Npm 4.2.0
Babel 6.18.2
The text was updated successfully, but these errors were encountered: