SnoopCompile bot #615

aminya · 2020-04-24T16:37:53Z

It uses the whole test suit of Zygote (except Cuda and Abstract FFT tests) for generating OS-Version specific precompilation sentences.

In running the tests shows that we get:

10s raw improvement
23 M less memory usage
1.16GiB less memory allocation.

Tests:

A successful run: https://github.com/aminya/Zygote.jl/actions/runs/95945010
A successful PR created by the bot: https://github.com/aminya/Zygote.jl/pull/4/files which has precompilation sentences for Windows, Linux, macOS for Julia 1.4 and Julia 1.3.
The CI also passes for that generated PR: https://travis-ci.com/github/aminya/Zygote.jl/builds/162029060

cc: @ianshmean

related: #607

The changes made in the tests are just adding two simple checks for existance of the SnoopCompile_ENV. GitHub diff isn't working properly here.

Ready for review

Move out from the src folder Co-Authored-By: Ian Butterworth <contact@ianbtw.com>

Co-Authored-By: Ian Butterworth <contact@ianbtw.com>

aminya · 2020-04-24T19:50:07Z

There is an issue with the compatibility bounds of packages on Julia 1.2, which blocks add Zygote and its dependencies.

ERROR: Unsatisfiable requirements detected for package CuArrays [3a865a2d]:

https://github.com/aminya/Zygote.jl/runs/616498612?check_suite_focus=true#step:4:132

I will remove Julia 1.2 from the bot.

CarloLucibello · 2020-04-30T09:19:39Z

So, what are the consequences of merging this? People will get precompiled files when updating Zygote?
Another question: I see there is a benchmark script, what are the results?

aminya · 2020-04-30T09:38:37Z

So, what are the consequences of merging this? People will get precompiled files when updating Zygote?

Yes, once you merge this PR, the bot will run and will make another PR like this. After you merge that PR, Julia will cache the inference result for the common operations of the Zygote.

In the future, the same precompilation signatures can be used by PackageCompiler to make an almost static Zygote. Not only for using Zygote but also for calling its functions.

Another question: I see there is a benchmark script, what are the results?
The current benchmark only measures the inference time:

 Benchmarking the inference time of `using Zygote`
┌ Info: *******************
│ Benchmark Started
└ *******************
┌ Info: Precompile Deactivated Inference Benchmark
└ ------------------------
[ Info: 170.793 ms
┌ Info: Precompile Activated Inference Benchmark
└ ------------------------
[ Info: 159.906 ms
┌ Info: *******************
│ Benchmark Finished
└ *******************
Benchmarking the inference time of `using Zygote` & basic function test
┌ Info: *******************
│ Benchmark Started
└ *******************
┌ Info: Precompile Deactivated Inference Benchmark
└ ------------------------
[ Info: 19043.853 ms
┌ Info: Precompile Activated Inference Benchmark
└ ------------------------
[ Info: 18622.542 ms
┌ Info: *******************
│ Benchmark Finished
└ *******************

CarloLucibello · 2020-04-30T09:55:52Z

So the real gain is larger than what appears in the benchmark?

Is it reasonable to trigger bot for each PR merged? What about doing it only when tagging a new release, so to avoid much maintenance burden?

DhairyaLGandhi · 2020-04-30T10:02:53Z

Is it a ji file Julia can load directly?

Are there security concerns we should be aware of of shipping precompiled files not produced on the local machine?

DhairyaLGandhi · 2020-04-30T10:05:39Z

This is for my own education as anything else

DhairyaLGandhi · 2020-04-30T10:07:32Z

If this is successful, I would like to explore doing a similar build for Flux

aminya · 2020-04-30T10:12:46Z

So the real gain is larger than what appears in the benchmark?

Yes. I plan to make a PackageCompiler bot, that will use these to make static system images.

Is it reasonable to trigger bot for each PR merged? What about doing it only when tagging a new release, so to avoid much maintenance burden?

It is better to run it every time because as functions change, a new precompilation is needed. To make drastic changes to the API, one can temporarily bypass the precompilation by setting this to false:
https://github.com/aminya/Zygote.jl/pull/4/files#diff-2270ce88ed8290c3b808f5ea76728946R1

To minimize the number of PRs created, for bigger repositories I recommend to use:

on:
  push:
    branches:
       - 'master'
  # pull_request:  # comment for big repositories

which runs only when people push/merge something to master.

Is it a ji file Julia can load directly?

Yes, the precompilations are calls to Base.precompile. Many packages already use this. For example, Plots.jl.

You can see a sample bot PR here:
https://github.com/aminya/Zygote.jl/pull/4/files

Are there security concerns we should be aware of shipping precompiled files not produced on the local machine?

IMO, it is better than a local machine. Since we cover different OS and Julia versions. Usually, you can run your script on different configurations. The bot reruns the tests internally in its benchmark stage and will error if anything goes wrong. You will also check the CI before merging,

aminya · 2020-04-30T10:18:25Z

If this is successful, I would like to explore doing a similar build for Flux

I will create a PR for Flux too.

aminya · 2020-04-30T13:09:56Z

If you want to run the bot only before publishing you can remove the following altogether
https://github.com/FluxML/Zygote.jl/pull/615/files#diff-0b4f3d0a2175dd678c65e865e174454aR4

# Edit based on your repository.
on:
  push:
    branches:
      # - 'master'
  # pull_request:  # comment for big repositories

and use something like

on:
  watch
    types: [started]

Based on the following post, the workflow will be executed each time you star, or unstar and star again the repo.
https://dev.to/s_abderemane/manual-trigger-with-github-actions-279e

You may do this only before registering. When registration is done, you can set should_precompile to false again to let people develop without any precompilation happening.
https://github.com/aminya/Zygote.jl/pull/4/files#diff-2270ce88ed8290c3b808f5ea76728946R1

CarloLucibello · 2020-04-30T13:39:49Z

ok, this is nicer. I wonder if we can achieve something essentially maintenance-free. E.g. reduce the publishing process to the following comment on the last commit:

@SnoopCompile precompile()  # precompile, automatically merge, set should_precompile=false

and then

@JuliaRegistrator register()

on the SnoopCompile commit

JuliaRegistrator · 2020-04-30T13:39:51Z

Comments on pull requests will not trigger Registrator, as it is disabled. Please try commenting on a commit or issue.

aminya · 2020-04-30T13:56:18Z

ok, this is nicer. I wonder if we can achieve something essentially maintenance-free. E.g. reduce the publishing process to the following comment on the last commit:
@SnoopCompile precompile()  # precompile, automatically merge, set should_precompile=false
and then
@JuliaRegistrator register()
on the SnoopCompile commit

If you want a commit message trigger, you can do the following

on:
  push:
    branches:
       - 'master'
jobs:
  SnoopCompile:
    if: "contains(github.event.head_commit.message, '[precompile]')"

  Skip:
    if: "!contains(github.event.head_commit.message, '[precompile]')"

However, you need to merge the PR and start the registration. When registration is done, set the should_precompile to false. Automating the last part requires interfacing with JuliaRegsitrator, which I don't think I can implement any time soon.

Also: it is possible to make the PR merge itself, however, I don't think that is safe. Human review is needed in this process.

JuliaRegistrator · 2020-04-30T13:56:20Z

Error while trying to register: Register Failed
@aminya, it looks like you are not a publicly listed member/owner in the parent organization (FluxML).
If you are a member/owner, you will need to change your membership to public. See GitHub Help

JuliaRegistrator · 2020-04-30T14:13:45Z

Error while trying to register: Register Failed
@aminya, it looks like you are not a publicly listed member/owner in the parent organization (FluxML).
If you are a member/owner, you will need to change your membership to public. See GitHub Help

JuliaRegistrator · 2020-05-05T11:05:01Z

Error while trying to register: Register Failed
@aminya, it looks like you are not a publicly listed member/owner in the parent organization (FluxML).
If you are a member/owner, you will need to change your membership to public. See GitHub Help

MikeInnes · 2020-05-05T11:33:06Z

One option would be to have a release branch; there are no precompile files on master, but on each tag we build a set of precompile files, commit on that branch, and register.

I'd also like to more clearly understand how this affects (cold/hot) package load time and 'time to first gradient'. Last time I tried this kind of thing it didn't achieve much, so we should clearly understand how it'll help before adding the maintenance burden.

aminya · 2020-05-05T11:59:41Z

One option would be to have a release branch; there are no precompile files on master, but on each tag we build a set of precompile files, commit on that branch, and register.

That works too. But I think setting should_precompile = false after registration is easier than creating another branch. It is up to you. If you are already planning for such a branch, then we can go for it.

I'd also like to more clearly understand how this affects (cold/hot) package load time and 'time to first gradient'. Last time I tried this kind of thing it didn't achieve much, so we should clearly understand how it'll help before adding the maintenance burden.

As I mentioned previously, the gain is very high when we create a sysimage (I am planning to make a bot for that too). Because we can store compiled functions.

For the raw improvements in a dynamic way (that Julia only stores inference result), there is already a benchmark implemented.

I also created an issue to track other possibilities for benchmarking.
timholy/SnoopCompile.jl#78

MikeInnes · 2020-05-05T12:20:02Z

We should definitely not add this for the benefit when creating a sysimg. If you're doing that it's really easy to just call gradient and all this stuff will be compiled.

aminya · 2020-05-05T12:23:49Z

We should definitely not add this for the benefit when creating a sysimg.

That is not the case. We are not doing this only for sysimage. Using current timing tools we still get performance improvements.

If you're doing that it's really easy to just call gradient and all this stuff will be compiled.

Using a simple gradient example, only 90 precompile signatures were generated. But now 1190 signatures are generated.
https://github.com/aminya/Zygote.jl/pull/4/files

MikeInnes · 2020-05-05T12:35:56Z

Right, an improvement of 2% for the time it takes to load zygote and take a gradient? If it was a 2x difference it might be worth considering.

Using a simple gradient example, only 90 precompile signatures were generated. But now 1190 signatures are generated.

So what? 13× more precompile signatures are clearly not improving the performance by 13×. It matters that the Zygote compiler itself is cached, not that everything it has ever compiled is cached. And it seems hard to get that benefit, with the current Julia compiler, unless you call gradient at the time that native code is being generated.

aminya · 2020-05-05T12:45:55Z

@MikeInnes Currently, only Zygote is using its precompile signatures (SnoopCompile generates many more for other packages that are called). Once every package starts to use them, this number will show itself more.

MikeInnes · 2020-05-05T12:50:45Z

If other packages decide this is worthwhile, great, we'll take that improvement. But the marginal impact from this patch will still itself be a very small improvement and we're unlikely to consider it worth the additional hassle.

aminya · 2020-05-05T13:48:36Z

Running @timev in running the tests shows that we get:

10s raw improvement
23 M less memory usage
1.16GiB less memory allocation.

│ Precompile Deactivated Benchmark
# for using Zygote:
└ Inference time (ms): 	23654.842
@timev result (This has some noise).
391.374697 seconds (628.72 M allocations: 31.522 GiB, 4.16% gc time)
elapsed time (ns): 391374697041
gc time (ns):      16295209095
bytes allocated:   33846029551
pool allocs:       628513544
non-pool GC allocs:190530
malloc() calls:    11983
realloc() calls:   85
GC pauses:         503
full collections:  9

│ Precompile Activated Benchmark
┌ Info: 
└ Inference time (ms): 	22565.431
@timev result (This has some noise).
382.785964 seconds (605.11 M allocations: 30.360 GiB, 3.93% gc time)
elapsed time (ns): 382785964109
gc time (ns):      15042733929
bytes allocated:   32598547445
pool allocs:       604912736
non-pool GC allocs:184276
malloc() calls:    11737
realloc() calls:   81
GC pauses:         467
full collections:  6

https://github.com/aminya/Zygote.jl/runs/646094164?check_suite_focus=true#step:6:123

This is something that I would consider. @MikeInnes

CarloLucibello · 2020-05-21T17:28:54Z

As things stand right now, I think this has to be entirely maintenance-free to be worth doing

aminya · 2020-05-21T17:33:54Z

As things stand right now, I think this has to be entirely maintenance-free to be worth doing

If you want you can have a release branch, and register from that branch.

MikeInnes · 2020-05-22T12:26:55Z

Thanks a lot for looking into this. Given the cost/benefit ratio I don't think it makes sense for us at the moment, though.

aminya and others added 8 commits April 24, 2020 09:09

Remove precompile related code from src/Zygote.jl

2c48050

Rename precompile.jl to example_script.jl

1368636

Move out from the src folder Co-Authored-By: Ian Butterworth <contact@ianbtw.com>

snoopi scripts

fd94e05

Co-Authored-By: Ian Butterworth <contact@ianbtw.com>

SnoopCompile Github Action

0ac2b6f

using absolute path

f472b5c

rename bot branch to SnoopCompile_AutoPR

ff55078

using functions

f08c029

Multi-os Multi version

42f47f2

aminya mentioned this pull request Apr 24, 2020

Add SnoopCompile bot #606

Closed

[skip ci] enable only on master push

1fc279d

aminya force-pushed the SnoopCompile branch 4 times, most recently from 563ec9e to e1b2471 Compare April 24, 2020 19:45

using julia 1.3 and 1.2

dbbd4a7

aminya force-pushed the SnoopCompile branch from e1b2471 to dbbd4a7 Compare April 24, 2020 19:47

Remove 1.2.0

7813dbe

aminya force-pushed the SnoopCompile branch from beab8fa to 49559be Compare April 25, 2020 12:30

aminya added 2 commits April 25, 2020 21:33

Using the tests as example_script

5dc0dab

skip precompiling abstract FFT

fb0917d

aminya force-pushed the SnoopCompile branch from 9ffdad3 to fb0917d Compare April 26, 2020 02:34

update to the latest workflow file

951e4b3

aminya mentioned this pull request May 5, 2020

stop running functions in init #607

Merged

aminya mentioned this pull request May 5, 2020

SnoopCompile bot FluxML/Flux.jl#1163

Open

@Timev benchmark

ffd0d55

MikeInnes closed this May 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SnoopCompile bot #615

SnoopCompile bot #615

aminya commented Apr 24, 2020 •

edited

Loading

aminya commented Apr 24, 2020 •

edited

Loading

CarloLucibello commented Apr 30, 2020

aminya commented Apr 30, 2020

CarloLucibello commented Apr 30, 2020 •

edited

Loading

DhairyaLGandhi commented Apr 30, 2020

DhairyaLGandhi commented Apr 30, 2020

DhairyaLGandhi commented Apr 30, 2020

aminya commented Apr 30, 2020 •

edited

Loading

aminya commented Apr 30, 2020

aminya commented Apr 30, 2020

CarloLucibello commented Apr 30, 2020

JuliaRegistrator commented Apr 30, 2020

aminya commented Apr 30, 2020 •

edited

Loading

JuliaRegistrator commented Apr 30, 2020

JuliaRegistrator commented Apr 30, 2020

JuliaRegistrator commented May 5, 2020

MikeInnes commented May 5, 2020

aminya commented May 5, 2020

MikeInnes commented May 5, 2020

aminya commented May 5, 2020 •

edited

Loading

MikeInnes commented May 5, 2020

aminya commented May 5, 2020

MikeInnes commented May 5, 2020

aminya commented May 5, 2020 •

edited

Loading

CarloLucibello commented May 21, 2020

aminya commented May 21, 2020

MikeInnes commented May 22, 2020

SnoopCompile bot #615

SnoopCompile bot #615

Conversation

aminya commented Apr 24, 2020 • edited Loading

Ready for review

aminya commented Apr 24, 2020 • edited Loading

CarloLucibello commented Apr 30, 2020

aminya commented Apr 30, 2020

CarloLucibello commented Apr 30, 2020 • edited Loading

DhairyaLGandhi commented Apr 30, 2020

DhairyaLGandhi commented Apr 30, 2020

DhairyaLGandhi commented Apr 30, 2020

aminya commented Apr 30, 2020 • edited Loading

aminya commented Apr 30, 2020

aminya commented Apr 30, 2020

CarloLucibello commented Apr 30, 2020

JuliaRegistrator commented Apr 30, 2020

aminya commented Apr 30, 2020 • edited Loading

JuliaRegistrator commented Apr 30, 2020

JuliaRegistrator commented Apr 30, 2020

JuliaRegistrator commented May 5, 2020

MikeInnes commented May 5, 2020

aminya commented May 5, 2020

MikeInnes commented May 5, 2020

aminya commented May 5, 2020 • edited Loading

MikeInnes commented May 5, 2020

aminya commented May 5, 2020

MikeInnes commented May 5, 2020

aminya commented May 5, 2020 • edited Loading

CarloLucibello commented May 21, 2020

aminya commented May 21, 2020

MikeInnes commented May 22, 2020

aminya commented Apr 24, 2020 •

edited

Loading

aminya commented Apr 24, 2020 •

edited

Loading

CarloLucibello commented Apr 30, 2020 •

edited

Loading

aminya commented Apr 30, 2020 •

edited

Loading

aminya commented Apr 30, 2020 •

edited

Loading

aminya commented May 5, 2020 •

edited

Loading

aminya commented May 5, 2020 •

edited

Loading