Skip to content

file cache (useful for build system interoperability). #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

file cache (useful for build system interoperability). #94

wants to merge 2 commits into from

Conversation

insidewhy
Copy link
Contributor

With this change I can add a cache entry for source file content without being forced to cache the deps.

If my build system already has all of the file data in memory (indeed using a streaming build system the data may not even exist on the filesystem) then it would be nice if module-deps could be told about this data so it doesn't have to read it from the filesystem.

You can do this by adding a cache entry, however cache entries must also contain the deps, the build system only has the source content not the deps. With this 10 character change the best of both worlds can be had.

@jmm
Copy link
Collaborator

jmm commented Jul 23, 2015

@ohjames Thanks for the suggestion. I too have been contemplating how to better support that scenario. Isn't this change going to make it so that it still tries to read the file from disk though (sorry if I skimmed it too quickly and misread)? I haven't looked into it too much yet, but the approach I was going to explore was documenting the pipeline better and being able to pass in row objects with a source property. That may just work right now, albeit undocumented.

@insidewhy
Copy link
Contributor Author

Isn't this change going to make it so that it still tries to read the file from disk though

Nah it'll work check https://github.com/substack/module-deps/blob/master/index.js#L184

I was going to explore was documenting the pipeline better and being able to pass in row objects with a source property. That may just work right now, albeit undocumented.

This would require modifying the way browserify interacts with module-deps, which I believe would also break watchify/browserify-incremental etc. The caching solution used at the moment is clearly less than ideal but there are many external libraries that depend on it.

I'd definitely like to see it fixed in an ideal world though! For now though it would be really lovely if you could accept this change and push out a new release, as some people would really like to be able to use fast incremental browserify rebuilds via the sigh buildsystem and this patch would allow such a plugin to be written.

@insidewhy
Copy link
Contributor Author

To be clear I'd add a cache entry with only the source entry, this would prevent the read from the filesystem. However the cache entry would not have a deps field. Usually this would crash module-deps, with this change the deps would be parsed from the source. I would then hook back into browserify to update the cache and add the deps entry when it is available.

@insidewhy
Copy link
Contributor Author

You might also consider passing the cache entry to fromSource which would then update the cache's deps entry for you, preventing the nasty "hook back into browserify" mojo watchify et al do. I can show how this would work in a follow up commit if you want, it would only be about 50 characters worth of code.

@insidewhy
Copy link
Contributor Author

Okay I've added the "cache deps update" commit to this PR, I think it would be really neat. Can always be easily removed.

@insidewhy
Copy link
Contributor Author

With this API change you can have this greatness:

var browserify = require('browserify');
var b = browserify({ cache: {} });
b.cache['./main.js'] = { source: mainSourceCodeThatDependsOnOtherStuff };
b.cache['./other-stuff.js'] = { source: otherStuffSourceCode };
b.add('./main.js');
b.bundle().pipe(process.stdout);
console.log(b.cache['./main.js'].deps); // will now log "{ './other-stuff.js': ... }"
/// ... time passes ... ///
b.cache['./main.js'] = { source: newMainSourceCode };
b.bundle().pipe(process.stdout); // re-uses cached other-stuff.js

Wouldn't that be wonderful for streaming build systems!

@jmm
Copy link
Collaborator

jmm commented Jul 23, 2015

@ohjames Thanks for the additional info. I'll look at this closely when I get the chance (or someone else might beat me to it).

@Globegitter
Copy link

Really would appreciate for this patch to land. Useful for some work I am planning to do.

@zertosh
Copy link
Member

zertosh commented Jul 26, 2015

I like this since it's actually a bug fix, given that the code for reading a cached file already exists. But why the change in fromSource? Also, this PR needs a test and docs making it clear that the cache is for pre-transformed source.

@insidewhy
Copy link
Contributor Author

@zertosh The change in fromSource is to update the cache entry's deps if the file already has a cache entry. Without this it's necessary for callees to hook into the deps pipeline to update the cache entries themselves which kinda seems like unnecessary work and a bit spaghetti-like.

Or maybe I misunderstood something, I've been working this stuff out mainly from reading the watchify source code as the documentation in the code is quite sparse.

@zertosh
Copy link
Member

zertosh commented Jul 26, 2015

@ohjames Got it. It seemed like that was the point. However, it's important for module-deps not to alter the cache since it treated as an immutable input throughout. If you need to invalidate/change it, then having a separate plugin doing what watchify does is perfectly reasonable.

@insidewhy
Copy link
Contributor Author

Watchify hooks into the cache to update the deps like this:

b.pipeline.get('deps').push(through.obj(function(row, enc, next) {
    var file = row.expose ? b._expose[row.id] : row.file;
    b.cache[file] = {
        source: row.source,
        deps: xtend({}, row.deps)
    };
    this.push(row);
    next();
}));

Is that the best way of doing it? I feel suspicious of that code.

@zertosh
Copy link
Member

zertosh commented Jul 26, 2015

@ohjames that's how plugins work. what do you find suspicious?

@insidewhy
Copy link
Contributor Author

The bit:

var file = row.expose ? b._expose[row.id] : row.file;

Isn't the _ prefix in _expose indicating a "private" member? I guess it also feels a bit weird how the cache is stored there in the opts of the browserify/b object without being used by it (other than being passed along to module-deps).

@zertosh
Copy link
Member

zertosh commented Jul 26, 2015

Ah, that line is to deal with exposed modules and the watcher file path. As for the cache being in the opts, that's the convention for passing down options to the various modules that make up browserify. But now we're really digressing from this PR.

Like I said earlier, your PR seems perfectly reasonable sans the fromSource changes, and the need for a test and docs.

@insidewhy
Copy link
Contributor Author

Thanks for the info.

Ah, that line is to deal with exposed modules and the watcher file path.

Okay I don't understand it but I guess I'll use the same fragment of code to update the cache in sigh.

@insidewhy
Copy link
Contributor Author

Okay I removed the cache entry updating. As for the test, it seems all the cache*.js tests are copy-paste jobs with minor changes, so is it okay if I do the same?

@zertosh
Copy link
Member

zertosh commented Jul 26, 2015

@ohjames Borrow as much as you want from the tests. But it's important that the test for this ensures that the transforms are being applied to the cached source.

@insidewhy
Copy link
Contributor Author

@zertosh Added the test. It's a modification of the cache.js test only the deps entry is now missing with the source entry in the cache being changed so that the parsed source will reveal the dependency that was in the cache in the original test.

I'm guessing that since this code only provides a minor extension to the feature already tested in cache.js that this modification should be equally acceptable as the cache.js test, if I've missed something let me know.

@zertosh
Copy link
Member

zertosh commented Jul 26, 2015

@ohjames The existing cache tests don't worry about transformations because the cache is assumed to be the final output of module-deps. Your change does something completely different. Your change is pulling the source from the cache and running it through the transforms. So, your test should make sure that this true.

@insidewhy
Copy link
Contributor Author

Huh? All my change does is fix a crash that occurs when you have a cache entry with a source and no deps, it doesn't provide anything that the cache.js doesn't already test besides this. I'll show you the output if I run this test without the change:

test/cache_no_deps.js ................................. 0/1 310ms
  uses cache entry containing source but no deps
  not ok TypeError: Object.keys called on non-object
    at:
      file: null
      line: .nan
      column: .nan
      native: true
      function: Function.keys
    test: uses cache entry containing source but no deps
    message: 'TypeError: Object.keys called on non-object'
    stack: |
      Function.keys (native)
      index.js:362:66
      onresolve (index.js:177:14)
      node_modules/browser-resolve/index.js:254:13
      onfile (node_modules/resolve/lib/async.js:51:21)
      onex (node_modules/resolve/lib/async.js:93:22)
      node_modules/resolve/lib/async.js:24:18
      FSReqWrap.oncomplete (fs.js:95:15)

If the requirement you've stated for my test is necessary then could you help me understand why it isn't also a requirement of the cache.js test?

These are the differences between the two tests:

11c11
<     foo: 'notreal foo',
---
>     foo: 'require("./bar");',
17,18c17
<     source: sources.foo,
<     deps: { './bar': files.bar }
---
>     source: sources.foo

@insidewhy
Copy link
Contributor Author

You can verify the cached source code has passed the transform stages by the fact that the dependency is parsed from the cached code, I guess it might not be too hard to show the transform stuff on top but then I feel like maybe the test is testing more than it has to, I dunno it's up to you though.

@zertosh
Copy link
Member

zertosh commented Jul 26, 2015

Your change, although small, has huge implications wrt transforms.

When there is a deps, the code jumps straight to fromDeps. At this point, source is the post-transformed source – this is fine, this is how it's always been. When there is no deps, the code continues to readFile. Here disk access is skipped because the source is cached, but then it's passed through self.getTransforms. There it gets the transforms applied.

So, with deps the source is the post-transformed source, without deps the source is the pre-transformed source. Huge difference.

@insidewhy
Copy link
Contributor Author

Thanks for taking the time to explain it, I see, that makes it sound like my change makes the single cache data structure a dual mode thing, changing its meaning depending on whether the second deps field exists or not. That seems like some really overly-complicated code.

I think a better solution is needed than this then I guess.

@zertosh
Copy link
Member

zertosh commented Jul 26, 2015

@ohjames I'm cool with a fileCache option that would be used by readFile to avoid disk access (instead of cache). The keys are the file paths, and the value is the source – no need for an object with source. Does that work for you?

@insidewhy
Copy link
Contributor Author

@zertosh yeah I reckon that sounds like a better solution, I was just adding an inputSource parameter to the cache (so you would use either inputSource or source and deps) but I think what you suggested sounds simpler.

@insidewhy
Copy link
Contributor Author

Okay cool I've added that. The test confirms that dependencies are parsed from the code in the fileCache but not that transforms are applied to it. I can try to work out how to do that though...

@insidewhy insidewhy changed the title Allow caching file content without caching deps (useful for build system interoperability). file cache (useful for build system interoperability). Jul 26, 2015
@zertosh
Copy link
Member

zertosh commented Jul 26, 2015

Cool. There's no need to preserve the old behavior in readFile since it's essentially broken. There is no way to ever reach readFile if a file is in cache. So this should be sufficient:

diff --git a/index.js b/index.js
index 53fe143e..24a3e436 100644
--- a/index.js
+++ b/index.js
@@ -27,6 +27,7 @@ function Deps (opts) {

     this.basedir = opts.basedir || process.cwd();
     this.cache = opts.cache;
+    this.fileCache = opts.fileCache;
     this.pkgCache = opts.packageCache || {};
     this.pkgFileCache = {};
     this.pkgFileCachePending = {};
@@ -181,8 +182,8 @@ Deps.prototype.resolve = function (id, parent, cb) {
 Deps.prototype.readFile = function (file, id, pkg) {
     var self = this;
     var tr = through();
-    if (this.cache && this.cache[file]) {
-        tr.push(this.cache[file].source);
+    if (this.fileCache && this.fileCache[file]) {
+        tr.push(this.fileCache[file]);
         tr.push(null);
         return tr;
     }

As for the test, you can follow what's done in tr_fn.js. Sorry for all of this, I hope I'm not coming off as if I'm trying to give you a hard time.

@insidewhy
Copy link
Contributor Author

You're cool ;) 🐤 Sorry for being so slow and confused, I will blame having the flu.

@zertosh
Copy link
Member

zertosh commented Jul 27, 2015

@ohjames nah - I appreciate your effort :)

@insidewhy
Copy link
Contributor Author

Okay I've added the transform stuff to the tests, hopefully should be done now!

@insidewhy
Copy link
Contributor Author

Oops left in sourceCache fixed now.

@zertosh
Copy link
Member

zertosh commented Jul 27, 2015

Looks good. Just needs a quick entry in the README. Something like "opts.fileCache - an object mapping filenames to raw source to avoid reading from disk"

@insidewhy
Copy link
Contributor Author

Done!

@insidewhy
Copy link
Contributor Author

Not quite sure when I change the name of the file from bar to bat the test fails though:

test/file_cache.js .................................... 0/1 371ms
  uses file cache
  not ok Error: Cannot find module './bat' from '/home/james/projects/module-deps/test/files'

I thought it shouldn't be looking for files in the filesystem due to the fileCache so unsure why that module error is thrown.

@zertosh
Copy link
Member

zertosh commented Jul 27, 2015

I'm guessing that you changed 'require("./bar"); var tongs;', to 'require("./bat"); var tongs;',? If so, then that error makes sense because there is no bat on disk nor in the cache. Or am I misunderstanding you?

Also, I just noticed that the indentation for the return block in the test's transform has an extra space.

@insidewhy
Copy link
Contributor Author

I changed everything to bat including the fileCache entry that currently exists for bar. The test output reflects the fileCache entry for bar as it stands, so it's using the cache... so unsure why it throws when the file doesn't also exist on the disk.

@insidewhy
Copy link
Contributor Author

I think maybe line 173 also needs to be changed to use the fileCache.

@zertosh
Copy link
Member

zertosh commented Jul 27, 2015

Ah, so when parseDeps parses foo.js, it uses node-resolve which needs bat to exist. I don't see a way around this.

@insidewhy
Copy link
Contributor Author

I'm getting the error from browserResolve, I guess the node-resolve would also be a problem if I managed to work around browserResolve. Ummmmmmmmmmmmmm 😴 💤

@insidewhy
Copy link
Contributor Author

This second commit allows bar to be bat...

@insidewhy
Copy link
Contributor Author

It needed one more change and now I can also change foo to fop. So maybe this is good enough now?

@zertosh
Copy link
Member

zertosh commented Jul 27, 2015

This second commit allows bar to be bat...

Nope. That only works for files that are siblings of the requiree.

I don't see a way for the resolver to use the paths from fileCache. It's one thing to avoid reading the file from the disk, it's another thing to change the resolver so it doesn't do stats. Especially since browserify passes it's own resolver into module-deps.

@insidewhy
Copy link
Contributor Author

Ah well, guess browserify is inherently tied to the filesystem for now then.

@insidewhy insidewhy closed this Jul 27, 2015
@zertosh
Copy link
Member

zertosh commented Jul 27, 2015

@ohjames I liked https://github.com/ohjames/module-deps/commit/c86adc9d16001c7e553308fc27ca2ba8991fb752 – I was already coming up with uses for it.

You can pass your own resolve function into browserify/module-deps and do whatever you want in there with fileCache.

@insidewhy
Copy link
Contributor Author

You can pass your own resolve function into browserify/module-deps and do whatever you want in there with fileCache.

So I might be able to get a build system to interact with browserify using in-memory files only? I can remove the second commit if you think there's any use for the first one.

@zertosh
Copy link
Member

zertosh commented Jul 27, 2015

So I might be able to get a build system to interact with browserify using in-memory files only?

Yeah. You might not even need to implement the resolver, or even wrap it. browser-resolve just reads the browser field in the package.json and then delegates to resolve. resolve takes a readFile function in the options which is used to read the package.json (to get the main field for example) – I don't know if this one is useful to you or not. It also takes an isFile, which is used to determine if a file exists. That one might be useful to you. Unfortunately, module-deps doesn't pass those to resolve. If you can make your caching system work by using those options, then I'll add the code below here

if (opts.isFile) parent.isFile = opts.isFile;
if (opts.readFile) parent.readFile = opts.readFile;

I can remove the second commit if you think there's any use for the first one.

Yes please 😄 And fix the test indentation :P

@insidewhy
Copy link
Contributor Author

Cool I reset the second commit, I already fixed the test indentation. For some reason github won't let me "Reopen" though.

@zertosh
Copy link
Member

zertosh commented Jul 27, 2015

For some reason github won't let me "Reopen" though.

weird. I guess just open a new PR and reference this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants