-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate possible optimizations in PassiveLogic internal Swift AD benchmark #69967
Comments
@BradLarson @asl Could you guys take a look at this? |
It would be helpful if the testcase could be self-contained. |
But judging from the rough sketch presented here: I won't bother about top-level (
|
In any case, it seems you're looking from the wrong direction. We do not want |
In particular, it seems we can think about the following optimization: consider a Essentially, this would look like as if autodiff-code was run on the function with nested function calls inlined. |
Based on offline discussion with @asl we have the following work items.
|
@asl here's the full code for the benchmark we have been using internally.
|
As we discussed, I modified the current closure-spec optimization to handle "returned" closures and the performance of the benchmark has improved -- the reverse to forward ratio has been cut in half(a little more actually). And I think with better placement of the optimization in the pipeline the performance can be improved further. |
@jkshtj do you have a PR to look at? |
I haven't been able to send out a PR yet, but I pushed the changes to my fork of the Swift repo, here. Please note that I added the optimization in a new pass just for prototyping purposes. This is not what I intend to do in the final change. |
We want to investigate possible optimizations in one of our key, internal benchmarks for Swift AD, through a combination of inlining and closure-sepcialization. We are specifically pointing out those 2 optimizations because they can help us get rid of the memory allocations made by Swift AD for creating pullback closures.
The following code is representative of the structure of the benchmark. Other than the code shown in the functions, the size of the functions can be assumed to be the same, i.e., think of
// ...
as representing a constant number of inline differentiable operations. No control-flow is involved.Using the existing compiler optimizations, the top-level function
foo
ends up looking something like:What we instead want, is one of the following outcomes.
VJP and pullback of
SM
are fully inlined intofoo
. This way thepartial_apply
s of intermediate pullbacks inSM_VJP
should get constant-folded into the correspondingapply
s of the pullbacks inSM_PB
.1. Based on some experimentation, inlining benefits of 70 (currently 30) for VJP like functions returning closures and 170 (currently 70) for PB like functions receiving closures achieve this goal.
2. Simply tweaking the cost-benefit analysis however might be an over-fitting solution and might not work in the general case.
SM_VJP
is fully inlined intofoo
. This way even if we cannot inlineSM_PB
intofoo
we can specialize it to take the values closed over by the intermediate pullback closures instead of the intermediate pullback closures themselves.1. This should be a more generally useful optimization that should work on a larger number of cases. It can be enabled by modifying the existing closure-spec optimization to handle callsites taking multiple closures.
What's clear?
What's unclear?
The text was updated successfully, but these errors were encountered: