[AutoDiff] Implement the closure optimization that is specialized towards the linear map tuples / enums produced by autodiff. #68944

jkshtj · 2023-10-03T19:01:39Z

Subtask of #68901

Implement the closure optimization that is specialized towards the linear map tuples / enums produced by autodiff. In particular, as in example above, if we see the particular closure (partial_apply), then instead of storing the closure in the tuple, store the closed value. And move partial_apply down to the apply site (no need to fold, there are existing passes to do this). And we know the place of use due to the way how the linear map tuples and branch tracing enums are generated.

asl · 2023-10-04T08:57:25Z

Copied testcase and example from #68901:

import _Differentiation
import Darwin

@differentiable(reverse)
func f(_ x: Float) -> Float {
  if (x > 0) {
    return sin(x) * cos(x)
  } else {
    return sin(x) + cos(x)
  }
}

So, for the case above we'd turn:

// foo(_:)
sil hidden [noinline] @$s6sincos3fooyS2fF : $@convention(thin) (Float) -> Float {
[global: read,write,copy,destroy,allocate,deinit_barrier]
// %0 "x"                                         // users: %26, %24, %16, %12, %3, %2, %1
bb0(%0 : $Float):
  debug_value %0 : $Float, let, name "x", argno 1 // id: %1
  debug_value %0 : $Float, let, name "x", argno 1 // id: %2
  %3 = struct_extract %0 : $Float, #Float._value  // users: %13, %9, %5
  %4 = float_literal $Builtin.FPIEEE32, 0x0 // 0  // user: %5
  %5 = builtin "fcmp_olt_FPIEEE32"(%4 : $Builtin.FPIEEE32, %3 : $Builtin.FPIEEE32) : $Builtin.Int1 // user: %7
  %6 = tuple ()                                   // users: %22, %8
  cond_br %5, bb1, bb2                            // id: %7

bb1:                                              // Preds: bb0
  %8 = enum $_AD__$s6sincos1fyS2fF_bb1__Pred__src_0_wrt_0, #_AD__$s6sincos1fyS2fF_bb1__Pred__src_0_wrt_0.bb0!enumelt, %6 : $() // user: %19
  %9 = builtin "int_sin_FPIEEE32"(%3 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // user: %10
  %10 = struct $Float (%9 : $Builtin.FPIEEE32)    // user: %18
  // function_ref closure #1 in _vjpSin(_:)
  %11 = function_ref @$s16_Differentiation7_vjpSinySf5value_S2fc8pullbacktSfFS2fcfU_ : $@convention(thin) (Float, Float) -> Float // user: %12
  %12 = partial_apply [callee_guaranteed] %11(%0) : $@convention(thin) (Float, Float) -> Float // user: %19
  %13 = builtin "int_cos_FPIEEE32"(%3 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // user: %14
  %14 = struct $Float (%13 : $Builtin.FPIEEE32)   // user: %18
  // function_ref closure #1 in _vjpCos(_:)
  %15 = function_ref @$s16_Differentiation7_vjpCosySf5value_S2fc8pullbacktSfFS2fcfU_ : $@convention(thin) (Float, Float) -> Float // user: %16
  %16 = partial_apply [callee_guaranteed] %15(%0) : $@convention(thin) (Float, Float) -> Float // user: %19
  // function_ref closure #1 in static Float._vjpMultiply(lhs:rhs:)
  %17 = function_ref @$sSf16_DifferentiationE12_vjpMultiply3lhs3rhsSf5value_Sf_SftSfc8pullbacktSf_SftFZSf_SftSfcfU_ : $@convention(thin) (Float, Float, Float) -> (Float, Float) // user: %18
  %18 = partial_apply [callee_guaranteed] %17(%14, %10) : $@convention(thin) (Float, Float, Float) -> (Float, Float) // user: %19
  %19 = tuple $(predecessor: _AD__$s6sincos1fyS2fF_bb1__Pred__src_0_wrt_0, @callee_guaranteed (Float) -> Float, @callee_guaranteed (Float) -> Float, @callee_guaranteed (Float) -> (Float, Float)) (%8, %12, %16, %18) // user: %20
  %20 = enum $_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0, #_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0.bb1!enumelt, %19 : $(predecessor: _AD__$s6sincos1fyS2fF_bb1__Pred__src_0_wrt_0, @callee_guaranteed (Float) -> Float, @callee_guaranteed (Float) -> Float, @callee_guaranteed (Float) -> (Float, Float)) // user: %21
  br bb3(%20 : $_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0) // id: %21

bb2:                                              // Preds: bb0
  %22 = enum $_AD__$s6sincos1fyS2fF_bb2__Pred__src_0_wrt_0, #_AD__$s6sincos1fyS2fF_bb2__Pred__src_0_wrt_0.bb0!enumelt, %6 : $() // user: %29
  // function_ref closure #1 in _vjpSin(_:)
  %23 = function_ref @$s16_Differentiation7_vjpSinySf5value_S2fc8pullbacktSfFS2fcfU_ : $@convention(thin) (Float, Float) -> Float // user: %24
  %24 = partial_apply [callee_guaranteed] %23(%0) : $@convention(thin) (Float, Float) -> Float // user: %29
  // function_ref closure #1 in _vjpCos(_:)
  %25 = function_ref @$s16_Differentiation7_vjpCosySf5value_S2fc8pullbacktSfFS2fcfU_ : $@convention(thin) (Float, Float) -> Float // user: %26
  %26 = partial_apply [callee_guaranteed] %25(%0) : $@convention(thin) (Float, Float) -> Float // user: %29
  // function_ref closure #1 in static Float._vjpAdd(lhs:rhs:)
  %27 = function_ref @$sSf16_DifferentiationE7_vjpAdd3lhs3rhsSf5value_Sf_SftSfc8pullbacktSf_SftFZSf_SftSfcfU_ : $@convention(thin) (Float) -> (Float, Float) // user: %28
  %28 = thin_to_thick_function %27 : $@convention(thin) (Float) -> (Float, Float) to $@callee_guaranteed (Float) -> (Float, Float) // user: %29
  %29 = tuple $(predecessor: _AD__$s6sincos1fyS2fF_bb2__Pred__src_0_wrt_0, @callee_guaranteed (Float) -> Float, @callee_guaranteed (Float) -> Float, @callee_guaranteed (Float) -> (Float, Float)) (%22, %24, %26, %28) // user: %30
  %30 = enum $_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0, #_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0.bb2!enumelt, %29 : $(predecessor: _AD__$s6sincos1fyS2fF_bb2__Pred__src_0_wrt_0, @callee_guaranteed (Float) -> Float, @callee_guaranteed (Float) -> Float, @callee_guaranteed (Float) -> (Float, Float)) // user: %31
  br bb3(%30 : $_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0) // id: %31

// %32                                            // user: %37
bb3(%32 : $_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0): // Preds: bb1 bb2
  // function_ref pullback of f(_:)
  %33 = function_ref @$s6sincos1fyS2fFTJpSpSr : $@convention(thin) (Float, @owned _AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0) -> Float // user: %37
  %34 = integer_literal $Builtin.Int64, 1         // user: %35
  %35 = builtin "sitofp_Int64_FPIEEE32"(%34 : $Builtin.Int64) : $Builtin.FPIEEE32 // user: %36
  %36 = struct $Float (%35 : $Builtin.FPIEEE32)   // user: %37
  %37 = apply %33(%36, %32) : $@convention(thin) (Float, @owned _AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0) -> Float // user: %38
  return %37 : $Float                             // id: %38
} // end sil function '$s6sincos3fooyS2fF'

into:

enum _AD__$s6sincos1fyS2fF_bb0__Pred__src_0_wrt_0 {
}

enum _AD__$s6sincos1fyS2fF_bb1__Pred__src_0_wrt_0 {
  case bb0(())
}

enum _AD__$s6sincos1fyS2fF_bb2__Pred__src_0_wrt_0 {
  case bb0(())
}

enum _AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0 {
  case bb2((predecessor: _AD__$s6sincos1fyS2fF_bb2__Pred__src_0_wrt_0, Float, Float, (Float, Float)))
  case bb1((predecessor: _AD__$s6sincos1fyS2fF_bb1__Pred__src_0_wrt_0, Float, Float))
}


// foo(_:)
sil hidden [noinline] @$s6sincos3fooyS2fF : $@convention(thin) (Float) -> Float {
[global: read,write,copy,destroy,allocate,deinit_barrier]
// %0 "x"                                         // users: %26, %24, %16, %12, %3, %2, %1
bb0(%0 : $Float):
  debug_value %0 : $Float, let, name "x", argno 1 // id: %1
  debug_value %0 : $Float, let, name "x", argno 1 // id: %2
  %3 = struct_extract %0 : $Float, #Float._value  // users: %13, %9, %5
  %4 = float_literal $Builtin.FPIEEE32, 0x0 // 0  // user: %5
  %5 = builtin "fcmp_olt_FPIEEE32"(%4 : $Builtin.FPIEEE32, %3 : $Builtin.FPIEEE32) : $Builtin.Int1 // user: %7
  %6 = tuple ()                                   // users: %22, %8
  cond_br %5, bb1, bb2                            // id: %7

bb1:                                              // Preds: bb0
  %8 = enum $_AD__$s6sincos1fyS2fF_bb1__Pred__src_0_wrt_0, #_AD__$s6sincos1fyS2fF_bb1__Pred__src_0_wrt_0.bb0!enumelt, %6 : $() // user: %19
  %9 = builtin "int_sin_FPIEEE32"(%3 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // user: %10
  %10 = struct $Float (%9 : $Builtin.FPIEEE32)    // user: %18
  %13 = builtin "int_cos_FPIEEE32"(%3 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // user: %14
  %14 = struct $Float (%13 : $Builtin.FPIEEE32)
  %newt = tuple $(Float, Float) (%14, %10)
  %19 = tuple $(predecessor: _AD__$s6sincos1fyS2fF_bb1__Pred__src_0_wrt_0,  Float, Float, (Float, Float)) (%8, %0, %0, %newt) // user: %20
  %20 = enum $_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0, #_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0.bb1!enumelt, %19 : $(predecessor: _AD__$s6sincos1fyS2fF_bb1__Pred__src_0_wrt_0, Float, (Float, Float)) // user: %21
  br bb3(%20 : $_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0) // id: %21

bb2:                                              // Preds: bb0
  %22 = enum $_AD__$s6sincos1fyS2fF_bb2__Pred__src_0_wrt_0, #_AD__$s6sincos1fyS2fF_bb2__Pred__src_0_wrt_0.bb0!enumelt, %6 : $() // user: %29
  %29 = tuple $(predecessor: _AD__$s6sincos1fyS2fF_bb2__Pred__src_0_wrt_0, Float, Float) (%22, %0, %0) // user: %30
  %30 = enum $_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0, #_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0.bb2!enumelt, %29 : $(predecessor: _AD__$s6sincos1fyS2fF_bb2__Pred__src_0_wrt_0, Float, Float) // user: %31
  br bb3(%30 : $_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0) // id: %31

// %32                                            // user: %37
bb3(%32 : $_AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0): // Preds: bb1 bb2
  // function_ref pullback of f(_:)
  %33 = function_ref @$s6sincos1fyS2fFTJpSpSr : $@convention(thin) (Float, @owned _AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0) -> Float // user: %37
  %34 = integer_literal $Builtin.Int64, 1         // user: %35
  %35 = builtin "sitofp_Int64_FPIEEE32"(%34 : $Builtin.Int64) : $Builtin.FPIEEE32 // user: %36
  %36 = struct $Float (%35 : $Builtin.FPIEEE32)   // user: %37
  %37 = apply %33(%36, %32) : $@convention(thin) (Float, @owned _AD__$s6sincos1fyS2fF_bb3__Pred__src_0_wrt_0) -> Float // user: %38
  return %37 : $Float                             // id: %38
} // end sil function '$s6sincos3fooyS2fF'

corresponding transformation of the pullback.

jkshtj · 2023-10-25T22:39:58Z

[Design] Autodiff Pullback Closure Specialization Optimization

What is it?

This optimization can help alleviate heap allocation costs associated to closures by eliminating usage of closures altogether, in certain call-sites.

Given a function call-site, if the callee takes a closure as an input argument and then calls the closure in its body, we can eliminate the overhead of the closure context's heap allocation by -

Cloning the original callee.
Moving the closure creation (the partial_apply instruction) from the body of the caller to the body of the cloned callee (where the corresponding apply of the closure lives).
Changing the cloned callee's signature to take the partially applied arguments, instead of the closure.
Modifing the apply site in the caller to call the specialized callee.
Getting rid of the closure creation (the partial_apply now in the cloned callee) altogether via peephole optimizations.

Using this optimization, SIL code like this:

sil [noinline] @takes_closure : $@convention(thin) (Int, @owned @callee_owned (Int) -> Int) -> () {
bb0(%0: $Int, %1: $@callee_owned (Int) -> Int):
  %4 = apply %1(%0) : $@callee_owned (Int) -> Int
  %9999 = tuple()
  return %9999 : $()
}

sil [noinline] @partially_applies : $@convention(thin) (Int) -> () {
bb0(%0 : $Int):
  // `multiplies_two_ints` is a defined function
  %1 = function_ref @multiplies_two_ints : $@convention(thin) (Int, Int) -> Int
  // Create a closure out of `multiplies_two_ints` by partially 
  // applying one of the integer arguments.
  %2 = partial_apply [callee_guaranteed] %1(%0) : $@convention(thin) (Int, Int) -> Int
  %3 = convert_function %2 : $@callee_guaranteed (Int) -> Int to $@callee_owned (Int) -> Int
  // Pass the closure created out of `multiplies_two_ints` to `takes_closure`
  %4 = function_ref @takes_closure : $@convention(thin) (Int, @owned @callee_owned (Int) -> Int) -> ()
  %5 = apply %4(%0, %3) : $@convention(thin) (Int, @owned @callee_owned (Int) -> Int) -> ()
  %9999 = tuple()
  return %9999 : $()
}

Will look like this:

sil shared [noinline] @specialized_takes_closure : $@convention(thin) (Int, Int) -> () {
// %0                                             // user: %5
// %1                                             // user: %3
bb0(%0 : $Int, %1 : $Int):
  // function_ref multiplies_two_ints
  %2 = function_ref @multiplies_two_ints : $@convention(thin) (Int, Int) -> Int // user: %3
  // This partial apply can be optimized away altogether via peephole optimizations
  // since the corresponding apply is also in the same block at %5.
  %3 = partial_apply [callee_guaranteed] %2(%1) : $@convention(thin) (Int, Int) -> Int // user: %4
  %4 = convert_function %3 : $@callee_guaranteed (Int) -> Int to $@callee_owned (Int) -> Int // user: %5
  %5 = apply %4(%0) : $@callee_owned (Int) -> Int
  %6 = tuple ()                                   // user: %7
  return %6 : $()                                 // id: %7
}

sil [noinline] @partially_applies : $@convention(thin) (Int) -> () {
// %0                                             // users: %6, %6, %3
bb0(%0 : $Int):
  %1 = function_ref @$s13takes_closure19multiplies_two_intsSiTf1nc_n : $@convention(thin) (Int, Int) -> () // user: %6
  // `takes_closure` has been specialized and now takes the originally closed over
  // arguments instead of the closure.
  %2 = apply %1(%0, %0) : $@convention(thin) (Int, Int) -> ()
  %3 = tuple ()                                   // user: %9
  return %3 : $()                                 // id: %9
}

See here for a high-level description of the general Swift closure specialization optimization.

Limitations of the general closure specialization optimization

The general closure specialization optimization operates under numerous restrictions. The ones most directly affecting AD are:

Callsites with multiple closure arguments are not handled (NOT handled in this design. Will be handled in the general closure specialization optimization).
- Pullbacks taking multiple intermediate closures are not optimized.
Closures must be passed as arguments directly visible to the compiler (Handled in this design).
- Pullbacks with control flow are not optimized since the closures are hidden behind branch trace enums.

Design

This optimization is going to be implemented as a SILFunctionTransform.

Pre-optimization steps

Exit early if function is in a module that does not import _Differentiable.

Optimization steps

Note - The below steps are a rough estimation and not set in stone.

Some terminology used below -

branch trace enum - An enum that records the execution flow of a differentiable function with control flow. Anatomically,
a branch trace enum consists of other branch trace enums and together with the cases/variants of these enums we essentially get a graph of the execution flow of the concerned function.

chain of creation - Refers to the list of SSA values that culminate into the final branch trace enum.

node - Can be thought to have a one-to-one mapping to a case/variant of a branch trace enum.

Gather call-sites in a function
- Find all callsites where one of the arguments to the ApplyInst is an AD branch trace enum.
- Collect this callsite if the AD branch trace enum is non-trivial -- due to the enum wrapping intermediate pullback closures.

Below steps are per callsite of interest

From the callsite, trace up the chain of creation of the branch trace enum, until you're at the top node.
Generate a specialized branch trace enum by tracing down the same chain of creation (No code will be modified at this point).
- At each point where we add a node to the trace
  - If the parent enum is non-trivial, i.e., one or more of its cases contain intermediate pullback closures
    - A specialized version of the enum will need to be created. Find out if such a version already exists and if not, create it.
    - If the current case being looked at has a non-trivial payload
      - Add a corresponding specialized case/variant to the specialized enum. This specialized case's payload, should consist of the values that the original pullback closure was closing over instead of the pullback closure itself.
      - If the payload of the original case also contains another branch trace enum, we should replace it with a specialized version, if one exists.
    - Else, add a corresponding trivial case to the specialized enum as is.
  - Else if the parent enum is trivial
    - Re-use the existing enum. No further changes required.
Clone the callee.
From the top of the chain of (branch trace enum) creation in the caller
- Modify all the branch trace enum creation steps to use the specialized branch trace enum.
- Move PartialApplyInst from the caller to the cloned callee.
  - Information regarding where this PartialApplyInst is copied to, in the cloned callee, can be derived from the original branch tracing enum.
- Ensure that any originally captured values, that are now being passed to the callee directly have their reference counts adjusted properly.
  - Insert an additional retain for each originally captured argument with reference semantics.

Post-optimization steps

Peephole optimizations to get rid of any partial applies moved to the cloned callee.
Peephole optimizations to get rid of any dead instructions in the caller or the cloned callee.

Location in optimization pipeline

Right after the current position of the general closure specialization optimization pipeline seems like a sane default to start out with.

Open questions for discussion

Q. What happens to old branch tracing enums?
Ans. They should likely stay around. Removing them might require modification/regeneration of the existing code that might still use them.

Q. What happens to old pullback?
And. Same as above.

jkshtj · 2023-10-25T22:41:18Z

@asl Could you take a look at the brief design write up for our AD specific closure specialization optimization?

asl · 2023-10-27T12:06:47Z

Tagging @BradLarson as well

Overall, I think some steps / part would likely need some clarification.

Given a function call-site, if the callee takes a closure as an input argument and then calls the closure in its body, we can eliminate the overhead of the closure context's heap allocation by -

Note that in AD case the closure it not passed as input argument directly. Instead it's buried deep inside a linear map tuple.

Collect this callsite if the AD branch trace enum is non-trivial -- due to the enum wrapping intermediate pullback closures.

What does it mean "branch trace enum is non-trivial"? Separate enums are created for each BB. So, if the last BB of the function does not have any calls / predecessors, the corresponding enum will be quite trivial (e.g. no tuple payload).

From the callsite, trace up the chain of creation of the branch trace enum, until you're at the top node.

What does this mean? What is the "node" here? What is the "top node"?

If the parent enum is non-trivial, i.e., one or more of its cases contain intermediate pullback closures

What do you mean as "parent enum"? Note that you need to operate on the function as a whole as case / payloads for branch trace enum for a basic block are predecessor basic blocks and corresponding linear map tuples. Also, you need to build everything at once at RPOT-manner otherwise we'd end with the same type lowering issues as we already faced previously.

Information regarding where this PartialApplyInst is copied to, in the cloned callee, can be derived from the original branch tracing enum.

Can you please expand "can be derived" case? It seems the most important thing here.

Insert an additional retain for each originally captured argument with reference semantics.

What are "arguments" here? Arguments of original function? I am confused. Do you have an example where this would be necessary?

In general, it would help if each of the steps will be illustrated by some SIL example, so we can review the meaning of each step.

They should likely stay around. Removing them might require modification/regeneration of the existing code that might still use them.

What "existing code" you're referring to? Certainly we can safely delete the unused enums.

And. Same as above.

Again, we can safely remove unused code. Pullback function does not exist as a separate entity, it's an internal implementation detail. And if unused, could be dropped entirely.

asl · 2023-11-01T06:43:50Z

We discussed the intended optimization and its scope with @jkshtj. He will prepare a refined proposal.

jkshtj · 2023-11-03T19:12:20Z

@asl @BradLarson I've revised the proposal. I've rewritten it a gist here. Could you please take a look?

asl · 2023-11-14T05:53:13Z

Here are my comments:

Pre-optimization steps

You need also ensure that pullback is a private function. While this is always currently, but it's an implementation detail as of now.

Exit early if the VJP has already been optimized.

How would you determine it?

trivial payloads.

Again, you need to define what trivial payload is.

Top-level closures - these are directly closed over by the returned pullback.

Isn't this case covered by the generic closure specialization transformation? Why would we need to reimplement it?

Generate specialized pullback and move code from VJP to it.

How would you handle various ownership-related things?

VJPs themselves should not have been inlined.

There is some contradiction here. This is all possible if all nested VJPs would be inlined into a top-level VJP. Otherwise you won't see these nested pullbacks (partial apply's).

Considering the above points, this pass should run after inlining into VJPs

You definitely need to restart pipeline after this transformation.

jkshtj changed the title ~~[Subtask 2] Investigate possible optimization opportunities for autodiff code with control flow~~ [AutoDiff] [Subtask 2] Investigate possible optimization opportunities for autodiff code with control flow Oct 3, 2023

jkshtj mentioned this issue Oct 3, 2023

[AutoDiff] Check the inlining cost / benefit model for autodiff-generated functions (we need to ensure they receive benefit bonus) #68945

Closed

AnthonyLatsis added compiler The Swift compiler in itself task AutoDiff SILOptimizer Area → compiler: SIL optimization passes labels Oct 4, 2023

asl changed the title ~~[AutoDiff] [Subtask 2] Investigate possible optimization opportunities for autodiff code with control flow~~ [AutoDiff] Implement the closure optimization that is specialized towards the linear map tuples / enums produced by autodiff. Oct 4, 2023

jkshtj mentioned this issue Oct 16, 2023

[AutoDiff] Modify inlining logic to award inlining benefits to VJPs #69212

Merged

AnthonyLatsis added closures Feature: closures expressions Feature: expressions labels Oct 27, 2023

jkshtj self-assigned this Mar 27, 2024

jkshtj mentioned this issue Mar 27, 2024

Investigate possible optimization opportunities for autodiff code with control flow #68901

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoDiff] Implement the closure optimization that is specialized towards the linear map tuples / enums produced by autodiff. #68944

[AutoDiff] Implement the closure optimization that is specialized towards the linear map tuples / enums produced by autodiff. #68944

jkshtj commented Oct 3, 2023

asl commented Oct 4, 2023 •

edited

Loading

jkshtj commented Oct 25, 2023

jkshtj commented Oct 25, 2023

asl commented Oct 27, 2023

asl commented Nov 1, 2023

jkshtj commented Nov 3, 2023 •

edited

Loading

asl commented Nov 14, 2023

[AutoDiff] Implement the closure optimization that is specialized towards the linear map tuples / enums produced by autodiff. #68944

[AutoDiff] Implement the closure optimization that is specialized towards the linear map tuples / enums produced by autodiff. #68944

Comments

jkshtj commented Oct 3, 2023

asl commented Oct 4, 2023 • edited Loading

jkshtj commented Oct 25, 2023

[Design] Autodiff Pullback Closure Specialization Optimization

What is it?

Limitations of the general closure specialization optimization

Design

Pre-optimization steps

Optimization steps

Post-optimization steps

Location in optimization pipeline

Open questions for discussion

jkshtj commented Oct 25, 2023

asl commented Oct 27, 2023

asl commented Nov 1, 2023

jkshtj commented Nov 3, 2023 • edited Loading

asl commented Nov 14, 2023

asl commented Oct 4, 2023 •

edited

Loading

jkshtj commented Nov 3, 2023 •

edited

Loading