You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We note that we can take this idea of contracting chains further by eliminating branches.
In other words, a binary tree of depth 2, (with update steps at the leaves) requires (assuming all functions on edges are unique) 6 matrices, 2 intermediary vectors, and performs 6 matrix-vector multiplications. This could be re-written as an equivalent 1-deep 4-ary tree which requires just 4 matrices, 0 intermediary vectors, and performs 4 matrix-vector multiplications. This result cannot be improved upon, unless edge pruning of duplicate operations is performed.
This speed-up and memory improvement of an entire decapode in this fashion is unlikely to occur, since Op2s are likely to break this downwards tree-like property.
This issue would allow for the most speed-ups in downstream Decapodes simulations ATM, since most bottlenecks relating to operator efficiency are resolved.
We note that we can take this idea of contracting chains further by eliminating branches.
In other words, a binary tree of depth 2, (with update steps at the leaves) requires (assuming all functions on edges are unique) 6 matrices, 2 intermediary vectors, and performs 6 matrix-vector multiplications. This could be re-written as an equivalent 1-deep 4-ary tree which requires just 4 matrices, 0 intermediary vectors, and performs 4 matrix-vector multiplications. This result cannot be improved upon, unless edge pruning of duplicate operations is performed.
This speed-up and memory improvement of an entire decapode in this fashion is unlikely to occur, since Op2s are likely to break this downwards tree-like property.
Originally posted by @lukem12345 in AlgebraicJulia/Decapodes.jl#93 (comment)
The text was updated successfully, but these errors were encountered: