-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SnoopCompile bot #615
SnoopCompile bot #615
Conversation
Move out from the src folder Co-Authored-By: Ian Butterworth <contact@ianbtw.com>
Co-Authored-By: Ian Butterworth <contact@ianbtw.com>
563ec9e
to
e1b2471
Compare
There is an issue with the compatibility bounds of packages on Julia 1.2, which blocks ERROR: Unsatisfiable requirements detected for package CuArrays [3a865a2d]: https://github.com/aminya/Zygote.jl/runs/616498612?check_suite_focus=true#step:4:132 I will remove Julia 1.2 from the bot. |
So, what are the consequences of merging this? People will get precompiled files when updating Zygote? |
Yes, once you merge this PR, the bot will run and will make another PR like this. After you merge that PR, Julia will cache the inference result for the common operations of the Zygote. In the future, the same precompilation signatures can be used by PackageCompiler to make an almost static Zygote. Not only for
|
So the real gain is larger than what appears in the benchmark? Is it reasonable to trigger bot for each PR merged? What about doing it only when tagging a new release, so to avoid much maintenance burden? |
Is it a ji file Julia can load directly? Are there security concerns we should be aware of of shipping precompiled files not produced on the local machine? |
This is for my own education as anything else |
If this is successful, I would like to explore doing a similar build for Flux |
Yes. I plan to make a PackageCompiler bot, that will use these to make static system images.
It is better to run it every time because as functions change, a new precompilation is needed. To make drastic changes to the API, one can temporarily bypass the precompilation by setting this to false: To minimize the number of PRs created, for bigger repositories I recommend to use: on:
push:
branches:
- 'master'
# pull_request: # comment for big repositories which runs only when people push/merge something to master.
Yes, the precompilations are calls to You can see a sample bot PR here:
IMO, it is better than a local machine. Since we cover different OS and Julia versions. Usually, you can run your script on different configurations. The bot reruns the tests internally in its benchmark stage and will error if anything goes wrong. You will also check the CI before merging, |
I will create a PR for Flux too. |
If you want to run the bot only before publishing you can remove the following altogether # Edit based on your repository.
on:
push:
branches:
# - 'master'
# pull_request: # comment for big repositories and use something like on:
watch
types: [started] Based on the following post, the workflow will be executed each time you star, or unstar and star again the repo. You may do this only before registering. When registration is done, you can set |
ok, this is nicer. I wonder if we can achieve something essentially maintenance-free. E.g. reduce the publishing process to the following comment on the last commit:
and then
on the SnoopCompile commit |
Comments on pull requests will not trigger Registrator, as it is disabled. Please try commenting on a commit or issue. |
If you want a commit message trigger, you can do the following on:
push:
branches:
- 'master'
jobs:
SnoopCompile:
if: "contains(github.event.head_commit.message, '[precompile]')" Skip:
if: "!contains(github.event.head_commit.message, '[precompile]')" However, you need to merge the PR and start the registration. When registration is done, set the Also: it is possible to make the PR merge itself, however, I don't think that is safe. Human review is needed in this process. |
Error while trying to register: Register Failed |
1 similar comment
Error while trying to register: Register Failed |
Error while trying to register: Register Failed |
One option would be to have a release branch; there are no precompile files on master, but on each tag we build a set of precompile files, commit on that branch, and register. I'd also like to more clearly understand how this affects (cold/hot) package load time and 'time to first gradient'. Last time I tried this kind of thing it didn't achieve much, so we should clearly understand how it'll help before adding the maintenance burden. |
That works too. But I think setting
As I mentioned previously, the gain is very high when we create a sysimage (I am planning to make a bot for that too). Because we can store compiled functions. For the raw improvements in a dynamic way (that Julia only stores inference result), there is already a benchmark implemented. I also created an issue to track other possibilities for benchmarking. |
We should definitely not add this for the benefit when creating a sysimg. If you're doing that it's really easy to just call |
That is not the case. We are not doing this only for sysimage. Using current timing tools we still get performance improvements.
Using a simple |
Right, an improvement of 2% for the time it takes to load zygote and take a gradient? If it was a 2x difference it might be worth considering.
So what? 13× more precompile signatures are clearly not improving the performance by 13×. It matters that the Zygote compiler itself is cached, not that everything it has ever compiled is cached. And it seems hard to get that benefit, with the current Julia compiler, unless you call |
@MikeInnes Currently, only Zygote is using its precompile signatures (SnoopCompile generates many more for other packages that are called). Once every package starts to use them, this number will show itself more. |
If other packages decide this is worthwhile, great, we'll take that improvement. But the marginal impact from this patch will still itself be a very small improvement and we're unlikely to consider it worth the additional hassle. |
Running
│ Precompile Deactivated Benchmark
# for using Zygote:
└ Inference time (ms): 23654.842
@timev result (This has some noise).
391.374697 seconds (628.72 M allocations: 31.522 GiB, 4.16% gc time)
elapsed time (ns): 391374697041
gc time (ns): 16295209095
bytes allocated: 33846029551
pool allocs: 628513544
non-pool GC allocs:190530
malloc() calls: 11983
realloc() calls: 85
GC pauses: 503
full collections: 9 │ Precompile Activated Benchmark
┌ Info:
└ Inference time (ms): 22565.431
@timev result (This has some noise).
382.785964 seconds (605.11 M allocations: 30.360 GiB, 3.93% gc time)
elapsed time (ns): 382785964109
gc time (ns): 15042733929
bytes allocated: 32598547445
pool allocs: 604912736
non-pool GC allocs:184276
malloc() calls: 11737
realloc() calls: 81
GC pauses: 467
full collections: 6 https://github.com/aminya/Zygote.jl/runs/646094164?check_suite_focus=true#step:6:123 This is something that I would consider. @MikeInnes |
As things stand right now, I think this has to be entirely maintenance-free to be worth doing |
If you want you can have a release branch, and register from that branch. |
Thanks a lot for looking into this. Given the cost/benefit ratio I don't think it makes sense for us at the moment, though. |
It uses the whole test suit of Zygote (except Cuda and Abstract FFT tests) for generating OS-Version specific precompilation sentences.
In running the tests shows that we get:
10s
raw improvement23 M
less memory usage1.16GiB
less memory allocation.Tests:
The CI also passes for that generated PR: https://travis-ci.com/github/aminya/Zygote.jl/builds/162029060
cc: @ianshmean
related: #607
The changes made in the tests are just adding two simple checks for existance of the
SnoopCompile_ENV
. GitHub diff isn't working properly here.Ready for review