-
-
Notifications
You must be signed in to change notification settings - Fork 146
incremental builds with rdmd #170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This won't work in the general case. DMD still emits template instances to the first module specified on the command line if it can't figure out where they should really go. Right now, incremental compilation is only guaranteed to work if you pass the exact same list of modules every time. |
Is this still true? Is there a self-contained test case for this? |
Bah. Do you have an example of when dmd can't figure it out? |
string workDir, string objDir, in string[string] myDeps, | ||
string[] compilerFlags, bool addStubMain) | ||
private int link(in string fullExe, in string[] objects, | ||
in string[] compilerFlags, in string workDir) | ||
{ | ||
version (Windows) | ||
fullExe = fullExe.defaultExtension(".exe"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't compile on Windows any more because fullExe
is an in
parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
e9a7a5b
to
9341f8c
Compare
@klickverbot "Whatever the matters are, we will fix them" [1]. [1] http://forum.dlang.org/post/mkv452$1tdo$4@digitalmars.com |
What about using a hash based on the files instead of using a timestamp? |
This hadn't crossed my mind. Spontaneous thoughts about it:
|
Added some tests. I'm not super happy with |
The file is only recompiled if the content has changed, even if the timestamp has been updated. |
Summary of the process: Build a dependency graph (getDependencies). For every source file, figure out when the last change happened to the file itself or any of its dependencies (getTimes). Split source files up into two groups: * Put files that need to be (re-)compiled into toCompile. * Put files for which an older object file can be reused into toReuse. Compile all files in toCompile in one compiler call (compile). Link all new and old objects together in another compiler call (link).
Also move the temporary directory for the tests to /tmp/rdmd_test. We generate quite some files and we don't want to overwrite the stuff of others.
I suggest you actually try to find a collision even in (technically "broken") MD5. It's not something happening except w/o considerable effort, compute power and time spent to finding collision. Note that git we are using is build on the assumption that SHA1 hash won't have a collision in one repo. In fact most often people use first 8 numbers of hash to checkout some commit.
I believe computing checksum of sources should be fairly cheap, sources rarely exceed a few Mb in total. Phobos is about 9Mb with all the fluf and generated tables in std/internal ticking in at 1.6Mb. Speed depends on CPU but is in the range 200+Mb/s on circa 2011 hardware. |
0acc3c9
to
61f248b
Compare
Heads up: I think I've fucked things up with getTimes. I'm going to look at that again and will likely change things. |
The biggest bottleneck is going to be reading all source code from disk for every Very few build systems make the decision to use checksums because it is usually simply unnecessary. It only makes sense in situations with non-trivial caching systems (where a |
Update: I now think everything's fine. I may have been staring at this for too long. Getting the newest timestamp of a file and its dependencies, and then comparing that to the object file - that's sound, isn't it? |
// Validate extensions of extra files (--extra-file) | ||
foreach (immutable f; extraFiles) | ||
{ | ||
if (![".d", ".di", objExt].canFind(f.extension)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!only(".d", ".di", objExt).canFind(f.extension))
} | ||
} | ||
|
||
immutable rootDir = dirName(rootModule); | ||
return deps; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At 220 lines, this seems to be quite a long function. I think it should be broken into smaller ones.
This is a bit heavier than what I'd hope for what it does, but I found no obvious bilgewater. It does raise an eyebrow that a 1KLOC program needs a Regarding discussion of using hashes instead of timestamps - that's a fine idea (git and others do that), but it's outside the charter of this PR, and may be implemented independently of it. Please focus reviews on the design and implementation of this PR. @aG0aep6G: did you run speed measurements? |
Followed Andrei's suggestions or replied when I didn't. |
Not extensively. I did time a project of mine (about 8 kloc). A toy example where an imported file takes long to compile works nicely:
3s for a full build, 0.2s when only main was touched. |
Here is an example of what David referred to:
Module imp.d is meant as an import from a library like std.algorithm, so not compiled with the other files. Making an example where it's compiled aswell is a bit harder, but not really. Then remove the commented line in main.d, recompile with
In the first run, Struc!int is built into main.obj, so it's gone after being removed from main.d. I think an easier approach to incremental compilation is when dmd supports replacing existing symbols in a library. It accumulates all used template instances ever, though, so it might grow indefinitely... |
That's true. |
No. DMD, when presented with multiple modules on a command line builds exactly ONE combined object file, not one per module. Templates are instantiated and inserted into that object file which the compiler does not see instantiated by one of the imported modules. Oh, I see what you're doing, you're using |
So... we're kind of stuck here. @WalterBright what's the way out? |
The problem is that it is generated into only one of the object files and might disappear there with source code changes while another object file still depends on it. -allinst won't help. |
Would there be a problem if the compiler outputted the symbols in all object files? Doesn't LDC do that? |
@jacob-carlborg: LDC (mostly) uses the same symbol emission strategy as DMD. |
Basically what happens with separate compilation, no? Sticking with Rainer's example, this works:
|
Yes. At the risk of sounding like a broken record, incremental compilation right now is only guaranteed to work if any given module is only compiled as part of one list of source files, which cannot be changed as long as you want to reuse other object files. Only ever compiling a single module at once trivially satisfies this criterion. |
Compiling multiple object files in one go is fundamentally broken and we're unlikely to fix that b/c we decided to emit as few template instances as possible rather than to always emit all into every object file. |
Inspired by Andrei's call for "scaling up rdmd" (forum post, issue 14654).
But this doesn't implement Andrei's idea. Instead it does what Jacob Carlborg and myself thought might be a better approach: Compile all outdated files, and only those, with one compiler invocation to multiple object files. Seems to work fine.
See commit message for a summary of the process.
I'm planning to add some tests to rdmd_test tomorrow. But code-wise I'm happy with it now; ready to get destroyed.