Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SnoopCompile bot #615

Closed
wants to merge 15 commits into from
Closed

SnoopCompile bot #615

wants to merge 15 commits into from

Conversation

aminya
Copy link

@aminya aminya commented Apr 24, 2020

It uses the whole test suit of Zygote (except Cuda and Abstract FFT tests) for generating OS-Version specific precompilation sentences.

In running the tests shows that we get:

  • 10s raw improvement
  • 23 M less memory usage
  • 1.16GiB less memory allocation.

Tests:

cc: @ianshmean

related: #607

The changes made in the tests are just adding two simple checks for existance of the SnoopCompile_ENV. GitHub diff isn't working properly here.

Ready for review

@aminya aminya mentioned this pull request Apr 24, 2020
@aminya aminya force-pushed the SnoopCompile branch 4 times, most recently from 563ec9e to e1b2471 Compare April 24, 2020 19:45
@aminya
Copy link
Author

aminya commented Apr 24, 2020

There is an issue with the compatibility bounds of packages on Julia 1.2, which blocks add Zygote and its dependencies.

ERROR: Unsatisfiable requirements detected for package CuArrays [3a865a2d]:

https://github.com/aminya/Zygote.jl/runs/616498612?check_suite_focus=true#step:4:132

I will remove Julia 1.2 from the bot.

@CarloLucibello
Copy link
Member

So, what are the consequences of merging this? People will get precompiled files when updating Zygote?
Another question: I see there is a benchmark script, what are the results?

@aminya
Copy link
Author

aminya commented Apr 30, 2020

So, what are the consequences of merging this? People will get precompiled files when updating Zygote?

Yes, once you merge this PR, the bot will run and will make another PR like this. After you merge that PR, Julia will cache the inference result for the common operations of the Zygote.

In the future, the same precompilation signatures can be used by PackageCompiler to make an almost static Zygote. Not only for using Zygote but also for calling its functions.

Another question: I see there is a benchmark script, what are the results?
The current benchmark only measures the inference time:

 Benchmarking the inference time of `using Zygote`
┌ Info: *******************
│ Benchmark Started
└ *******************
┌ Info: Precompile Deactivated Inference Benchmark
└ ------------------------
[ Info: 170.793 ms
┌ Info: Precompile Activated Inference Benchmark
└ ------------------------
[ Info: 159.906 ms
┌ Info: *******************
│ Benchmark Finished
└ *******************
Benchmarking the inference time of `using Zygote` & basic function test
┌ Info: *******************
│ Benchmark Started
└ *******************
┌ Info: Precompile Deactivated Inference Benchmark
└ ------------------------
[ Info: 19043.853 ms
┌ Info: Precompile Activated Inference Benchmark
└ ------------------------
[ Info: 18622.542 ms
┌ Info: *******************
│ Benchmark Finished
└ *******************

@CarloLucibello
Copy link
Member

CarloLucibello commented Apr 30, 2020

So the real gain is larger than what appears in the benchmark?

Is it reasonable to trigger bot for each PR merged? What about doing it only when tagging a new release, so to avoid much maintenance burden?

@DhairyaLGandhi
Copy link
Member

Is it a ji file Julia can load directly?

Are there security concerns we should be aware of of shipping precompiled files not produced on the local machine?

@DhairyaLGandhi
Copy link
Member

This is for my own education as anything else

@DhairyaLGandhi
Copy link
Member

If this is successful, I would like to explore doing a similar build for Flux

@aminya
Copy link
Author

aminya commented Apr 30, 2020

So the real gain is larger than what appears in the benchmark?

Yes. I plan to make a PackageCompiler bot, that will use these to make static system images.

Is it reasonable to trigger bot for each PR merged? What about doing it only when tagging a new release, so to avoid much maintenance burden?

It is better to run it every time because as functions change, a new precompilation is needed. To make drastic changes to the API, one can temporarily bypass the precompilation by setting this to false:
https://github.com/aminya/Zygote.jl/pull/4/files#diff-2270ce88ed8290c3b808f5ea76728946R1

To minimize the number of PRs created, for bigger repositories I recommend to use:

on:
  push:
    branches:
       - 'master'
  # pull_request:  # comment for big repositories

which runs only when people push/merge something to master.

Is it a ji file Julia can load directly?

Yes, the precompilations are calls to Base.precompile. Many packages already use this. For example, Plots.jl.

You can see a sample bot PR here:
https://github.com/aminya/Zygote.jl/pull/4/files

Are there security concerns we should be aware of shipping precompiled files not produced on the local machine?

IMO, it is better than a local machine. Since we cover different OS and Julia versions. Usually, you can run your script on different configurations. The bot reruns the tests internally in its benchmark stage and will error if anything goes wrong. You will also check the CI before merging,

@aminya
Copy link
Author

aminya commented Apr 30, 2020

If this is successful, I would like to explore doing a similar build for Flux

I will create a PR for Flux too.

@aminya
Copy link
Author

aminya commented Apr 30, 2020

If you want to run the bot only before publishing you can remove the following altogether
https://github.com/FluxML/Zygote.jl/pull/615/files#diff-0b4f3d0a2175dd678c65e865e174454aR4

# Edit based on your repository.
on:
  push:
    branches:
      # - 'master'
  # pull_request:  # comment for big repositories

and use something like

on:
  watch
    types: [started]

Based on the following post, the workflow will be executed each time you star, or unstar and star again the repo.
https://dev.to/s_abderemane/manual-trigger-with-github-actions-279e

You may do this only before registering. When registration is done, you can set should_precompile to false again to let people develop without any precompilation happening.
https://github.com/aminya/Zygote.jl/pull/4/files#diff-2270ce88ed8290c3b808f5ea76728946R1

@CarloLucibello
Copy link
Member

ok, this is nicer. I wonder if we can achieve something essentially maintenance-free. E.g. reduce the publishing process to the following comment on the last commit:

@SnoopCompile precompile()  # precompile, automatically merge, set should_precompile=false

and then

@JuliaRegistrator register()

on the SnoopCompile commit

@JuliaRegistrator
Copy link

Comments on pull requests will not trigger Registrator, as it is disabled. Please try commenting on a commit or issue.

@aminya
Copy link
Author

aminya commented Apr 30, 2020

ok, this is nicer. I wonder if we can achieve something essentially maintenance-free. E.g. reduce the publishing process to the following comment on the last commit:

@SnoopCompile precompile()  # precompile, automatically merge, set should_precompile=false

and then

@JuliaRegistrator register()

on the SnoopCompile commit

If you want a commit message trigger, you can do the following

on:
  push:
    branches:
       - 'master'
jobs:
  SnoopCompile:
    if: "contains(github.event.head_commit.message, '[precompile]')"
  Skip:
    if: "!contains(github.event.head_commit.message, '[precompile]')"

However, you need to merge the PR and start the registration. When registration is done, set the should_precompile to false. Automating the last part requires interfacing with JuliaRegsitrator, which I don't think I can implement any time soon.

Also: it is possible to make the PR merge itself, however, I don't think that is safe. Human review is needed in this process.

@JuliaRegistrator
Copy link

Error while trying to register: Register Failed
@aminya, it looks like you are not a publicly listed member/owner in the parent organization (FluxML).
If you are a member/owner, you will need to change your membership to public. See GitHub Help

1 similar comment
@JuliaRegistrator
Copy link

Error while trying to register: Register Failed
@aminya, it looks like you are not a publicly listed member/owner in the parent organization (FluxML).
If you are a member/owner, you will need to change your membership to public. See GitHub Help

@JuliaRegistrator
Copy link

Error while trying to register: Register Failed
@aminya, it looks like you are not a publicly listed member/owner in the parent organization (FluxML).
If you are a member/owner, you will need to change your membership to public. See GitHub Help

@MikeInnes
Copy link
Member

One option would be to have a release branch; there are no precompile files on master, but on each tag we build a set of precompile files, commit on that branch, and register.

I'd also like to more clearly understand how this affects (cold/hot) package load time and 'time to first gradient'. Last time I tried this kind of thing it didn't achieve much, so we should clearly understand how it'll help before adding the maintenance burden.

@aminya
Copy link
Author

aminya commented May 5, 2020

One option would be to have a release branch; there are no precompile files on master, but on each tag we build a set of precompile files, commit on that branch, and register.

That works too. But I think setting should_precompile = false after registration is easier than creating another branch. It is up to you. If you are already planning for such a branch, then we can go for it.

I'd also like to more clearly understand how this affects (cold/hot) package load time and 'time to first gradient'. Last time I tried this kind of thing it didn't achieve much, so we should clearly understand how it'll help before adding the maintenance burden.

As I mentioned previously, the gain is very high when we create a sysimage (I am planning to make a bot for that too). Because we can store compiled functions.

For the raw improvements in a dynamic way (that Julia only stores inference result), there is already a benchmark implemented.

I also created an issue to track other possibilities for benchmarking.
timholy/SnoopCompile.jl#78

@MikeInnes
Copy link
Member

We should definitely not add this for the benefit when creating a sysimg. If you're doing that it's really easy to just call gradient and all this stuff will be compiled.

@aminya
Copy link
Author

aminya commented May 5, 2020

We should definitely not add this for the benefit when creating a sysimg.

That is not the case. We are not doing this only for sysimage. Using current timing tools we still get performance improvements.

If you're doing that it's really easy to just call gradient and all this stuff will be compiled.

Using a simple gradient example, only 90 precompile signatures were generated. But now 1190 signatures are generated.
https://github.com/aminya/Zygote.jl/pull/4/files

@MikeInnes
Copy link
Member

Right, an improvement of 2% for the time it takes to load zygote and take a gradient? If it was a 2x difference it might be worth considering.

Using a simple gradient example, only 90 precompile signatures were generated. But now 1190 signatures are generated.

So what? 13× more precompile signatures are clearly not improving the performance by 13×. It matters that the Zygote compiler itself is cached, not that everything it has ever compiled is cached. And it seems hard to get that benefit, with the current Julia compiler, unless you call gradient at the time that native code is being generated.

@aminya
Copy link
Author

aminya commented May 5, 2020

@MikeInnes Currently, only Zygote is using its precompile signatures (SnoopCompile generates many more for other packages that are called). Once every package starts to use them, this number will show itself more.

@MikeInnes
Copy link
Member

If other packages decide this is worthwhile, great, we'll take that improvement. But the marginal impact from this patch will still itself be a very small improvement and we're unlikely to consider it worth the additional hassle.

@aminya
Copy link
Author

aminya commented May 5, 2020

Running @timev in running the tests shows that we get:

  • 10s raw improvement
  • 23 M less memory usage
  • 1.16GiB less memory allocation.
│ Precompile Deactivated Benchmark
# for using Zygote:
└ Inference time (ms): 	23654.842
@timev result (This has some noise).
391.374697 seconds (628.72 M allocations: 31.522 GiB, 4.16% gc time)
elapsed time (ns): 391374697041
gc time (ns):      16295209095
bytes allocated:   33846029551
pool allocs:       628513544
non-pool GC allocs:190530
malloc() calls:    11983
realloc() calls:   85
GC pauses:         503
full collections:  9
│ Precompile Activated Benchmark
┌ Info: 
└ Inference time (ms): 	22565.431
@timev result (This has some noise).
382.785964 seconds (605.11 M allocations: 30.360 GiB, 3.93% gc time)
elapsed time (ns): 382785964109
gc time (ns):      15042733929
bytes allocated:   32598547445
pool allocs:       604912736
non-pool GC allocs:184276
malloc() calls:    11737
realloc() calls:   81
GC pauses:         467
full collections:  6

https://github.com/aminya/Zygote.jl/runs/646094164?check_suite_focus=true#step:6:123

This is something that I would consider. @MikeInnes

@CarloLucibello
Copy link
Member

As things stand right now, I think this has to be entirely maintenance-free to be worth doing

@aminya
Copy link
Author

aminya commented May 21, 2020

As things stand right now, I think this has to be entirely maintenance-free to be worth doing

If you want you can have a release branch, and register from that branch.

@MikeInnes
Copy link
Member

Thanks a lot for looking into this. Given the cost/benefit ratio I don't think it makes sense for us at the moment, though.

@MikeInnes MikeInnes closed this May 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants