-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document properties of computed gradients in cholesky
and eigh
#8312
Document properties of computed gradients in cholesky
and eigh
#8312
Conversation
chainerx/_docs/routines.py
Outdated
|
||
Note: | ||
The computed gradient is a symmetric matrix and not an upper/lower | ||
triangular matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest:
The computed gradient is a symmetric matrix, where the mathematical definition of Chelesky decomposition supersedes the definition of ``*potrf`` in LAPACK. More specifically, the gradient is computed as if the domain of the function is restricted to a Riemannian submanifold of the Euclidean space :math:`R^{n \times n}` (an open subset of the subspace of symmetric matrices, whose inner product structure is compatible with the Frobenius norm).
(cholesky
needs "an open subset of" for the positive-definiteness but eigh
does not.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you rephrase the documentation for eigh
similar to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masomatics helped me to improve the above.
The computed gradient is a symmetric matrix. Our implementation of backpropagation is faithful to the mathematical definition of Cholesky decomposition. The forward computation with LAPACK does not really check if the input matrix is symmetric. We, however, do not really backpropagate this very process. Instead, the gradient is computed as if the function is restricted to a Riemannian submanifold of R_{n \times n} consisting just of positive-definite symmetric matrices.
How would you rephrase the documentation for
eigh
similar to this?
s/Cholesky decomposition/eigenvalue decomposition of symmetric matrices/
s/positive-definite//
I admit that it is superfluous to call the linear subspace Sym(n) of R^{n×n} a Riemannian submanifold of R^{n×n}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Could you just elaborate what you mean with We, however, do not really backpropagate this very process.
? It sounded a bit unnatural to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A sentence like "LAPACK reconstructs a symmetric matrix from a triangular part of the input" should be inserted. However the word "reconstruct" there does not mean copying on the memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the notes, please take a look at them.
chainerx/_docs/routines.py
Outdated
@@ -807,6 +807,13 @@ def _docs_linalg(): | |||
Returns: | |||
:class:`~chainerx.ndarray`: Output array. Cholesky factor of ``a``. | |||
|
|||
Note: | |||
This functions assumes that ``a`` is symmetric and is undefined otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undefined
sounds ValueError
from the math perspective or undefined behavior from the C++ perspective. I suggest:
This functions assumes that ``a`` is symmetric. The output of the forward/backward computation is unspecified otherwise.
IMO, the backward of eigh
is also returning an inconsistent value unless the forward input is symmetric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed this too.
PTAL. |
PTAl. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Jenkins & flexCI, test this please. |
Jenkins CI test (for commit d078f3e, target branch master) failed with status FAILURE. |
The Jenkins failure is unrelated to the PR. |
An alternative to #8306, keeping the current gradient behavior but document it instead.