New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Hash dependent build #2044
Conversation
the method creates a list of all files the project depends on
Method BuildGenerator.isUpToDate replaced by interface BuildCache and its descendants. BuildCache childs can be used to change build cache behaviour.
…ttings file value
Thanks for your pull request and interest in making D better, @drug007! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. |
Thank you, this is awesome work. One quick comment: Now only the following CIs are used:
Previously, we used multiple CI services (as none was sufficient on its own back then). Later we stopped using them, but their respectful config files were left in the repo. I corrected this in this pull request: #2042 Can you edit the commits in your branch (with interactive git rebase) to remove |
My plan is:
Of course I'll get rid of appveyor and travis traces. I had seen your commit just before I created the PR. |
This reverts commit 4ab453b.
Use default_ over absence. Append an underscore to it. That is what the D style guidelines says. I would not recommend sha1 as your "cheap" option. If for some reason you need to go cheaper on the hash (unlikely since time will be the cheapest option possible), you can't get much cheaper than a FNV. sha256 is probably a sensible default. But in saying that, we don't need a cryptographic hash, just a checksum (which is what FNV is good at). private {
enum FNV_Prime_64 = (2 ^^ 40) + (2 ^^ 8) + 0xb3;
enum FNV_Offset_Basis_64 = 0xcbf29ce484222325;
}
/**
* Performs a Fowler–Noll–Vo 1a hash
*
* Params:
* data = The data to hash
* start = Start hash (for incremental hashing)
*
* Returns:
* The hash
*/
ulong fnv_64_1a(ubyte[] data, ulong start = FNV_Offset_Basis_64) @safe nothrow @nogc {
ulong hash = start;
foreach (b; data) {
hash ^= b;
hash *= FNV_Prime_64;
}
return hash;
} |
Then user should use |
I don't use Dub much, but I don't understand a few things about this PR:
I can understand hashing source code if you're implementing a caching system such as ccache. However, dub does not cache previous builds. Such functionality would probably be better implemented at a lower layer as well, so that it can be used by build systems or dependency management strategies other than Dub.
When discussing a similar enhancement to rund, there was a similar concern that hashing would introduce a noticeable performance overhead. Benchmarking showed that to be false. There is also the opportunity to speed up hashing by doing it in parallel; but, as already mentioned, there is no reason to require a cryptographically secure hash in this situation. |
It's a request
There was a discussion what hash algorithm should be used. As for me there is no difference, just after my changes implementing different algorithms is rather tirvial. |
@rikkimax Ok, now I use |
@CyberShadow should we use one of existing caching systems like ccache/distcc or invent another one? What do you think? |
I read the description and looked at the sources linked in the description. None of them explain or demonstrate a use case that would be applicable here. If the goal is to try to move towards reproducible builds, this doesn't really help with that. A functional build system makes sense, but Dub already has strong dependency management, so I don't see how it helps with that either. I think Symmetry needs to clarify or re-evaluate the request.
The first order of business is to have a clearly defined goal, with some user stories and definitions of what would pass as a MVP. But, to answer your question... again, it depends on the goals. Dub is already a highly D-specific build system, and the difficulty of integrating it with other build systems is a known weakness. If we were to consider that as part of the overarching problem that is attempted at being solved, then evaluating integration with existing caching systems would be worthwhile. If not, then a simple D-specific implementation that's easy to set up and has no external dependencies would probably be more welcome. |
Maybe |
The goal is to not do a rebuild when you have already built identical files before. It's a build performance aid. It's not a full solution for all build-performance related woes, but it does provide a useful tool. |
As discussed above, this doesn't do that. Dub can only cache one build (per repository and build options). |
Yes, this doesn't do that, but this can be implemented. And I asked feedback exactly to get know what should be implemented and what not. |
Sorry, I wasn't clear. When I said
I meant that the goal is that when none of the files in a dub package have changed contents, but they have changed timestamp, no rebuild is done. I understand that |
Use case: to prevent rebuilds of dependencies (not in |
Note for others: I have talked to @drug007 separately and any misunderstanding seems resolved. The original understanding that was the basis for the work thus far was correct and it seems my attempts to explain it just caused confusion, sorry. I'll let @drug007 discuss the last bits needed. Tdlr; everything is mostly fine |
Closed because of #2077 |
Preface
This is the full implementation that passes tests but the last commits should be polished to get ready to review. To do this I need the feedback in general.
Description
Now
dub
uses timestamps of all source files to determine if the project should be rebuild. It compares timestamp of every source file to timestamp of the target file of the project. If any of source files has more recent timestamp than the target file dub rightly considered that the target file is old and rebuild the project. Call this modetime dependent build
. The PR adds other mode, so calledhash dependent build
that uses the hash sum of a file instead of its timestamp, i.e. in this mode dub checks if the hash sum of the file has been changed comparing to the previous value on the last successful build, if so then rebuild is triggered. If no previous value exists rebuild is triggered too, of course.How to use the feature
The new mode can be set using command line option, environment var or settings file. Option has four values:
absence
means there is no explicit value set so default one will be used (currently it is time dependent build, the current behavior). Unfortunatelydefault
is keyword in D soabsence
was used.absence
lets you explicitly reset the option to undefined value and can be useful in automated workflow.time
means time dependent buildsha1
andsha256
mean hash dependent build, using sha1 and sha256 algorithm respectively.To set the option using command line interface use
hash
argument:To set the option using the environment variable use
DUB_HASH_KIND
var:To set the option using settings file add this:
to your system wide, user wide or project specific setting file, for example
dub.settings.json
file in the project root. Now the project will be built using sha256 algorithm every time. But you can override it using command line or env var:Option priority
Options set by different ways have different priority (strong to weak):
I implemented both sha1 and sha256 because sha1 is less secure than sha256 but faster so user can select what (s)he wants, some users discussed that in the D forum. Also other algorithms can be added too, especially those implemented in Phobos can be added easily.
Tests are provided but I didn't know that Actions was preferable way to run tests and appveyour and travis were used for this. I'll remake that.
Hash sum of source files are not used in calculating the hash sum of the build, so if you revert your changes dub rebuilds the project. This feature was requested but it means creating artifacts on every changes user made in source files and can take much disk space, so it is not implemented (yet).
Questions
Currently I have the following questions about the feature and its implementation:
hash
orhash kind
term are good, probably something likeBuildCachePolicy
is more appropriated?