Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document properties of computed gradients in cholesky and eigh #8312

Merged
merged 4 commits into from
Oct 28, 2019

Conversation

hvy
Copy link
Member

@hvy hvy commented Oct 20, 2019

An alternative to #8306, keeping the current gradient behavior but document it instead.

@hvy hvy added cat:document Documentation such as function documentations, comments and tutorials. ChainerX Related to ChainerX. labels Oct 20, 2019

Note:
The computed gradient is a symmetric matrix and not an upper/lower
triangular matrix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest:

The computed gradient is a symmetric matrix, where the mathematical definition of Chelesky decomposition supersedes the definition of ``*potrf`` in LAPACK. More specifically, the gradient is computed as if the domain of the function is restricted to a Riemannian submanifold of the Euclidean space :math:`R^{n \times n}` (an open subset of the subspace of symmetric matrices, whose inner product structure is compatible with the Frobenius norm).

(cholesky needs "an open subset of" for the positive-definiteness but eigh does not.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you rephrase the documentation for eigh similar to this?

Copy link
Member

@toslunar toslunar Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masomatics helped me to improve the above.

The computed gradient is a symmetric matrix. Our implementation of backpropagation is faithful to the mathematical definition of Cholesky decomposition. The forward computation with LAPACK does not really check if the input matrix is symmetric. We, however, do not really backpropagate this very process. Instead, the gradient is computed as if the function is restricted to a Riemannian submanifold of R_{n \times n} consisting just of positive-definite symmetric matrices.

How would you rephrase the documentation for eigh similar to this?

s/Cholesky decomposition/eigenvalue decomposition of symmetric matrices/
s/positive-definite//

I admit that it is superfluous to call the linear subspace Sym(n) of R^{n×n} a Riemannian submanifold of R^{n×n}.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Could you just elaborate what you mean with We, however, do not really backpropagate this very process.? It sounded a bit unnatural to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sentence like "LAPACK reconstructs a symmetric matrix from a triangular part of the input" should be inserted. However the word "reconstruct" there does not mean copying on the memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the notes, please take a look at them.

@@ -807,6 +807,13 @@ def _docs_linalg():
Returns:
:class:`~chainerx.ndarray`: Output array. Cholesky factor of ``a``.

Note:
This functions assumes that ``a`` is symmetric and is undefined otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undefined sounds ValueError from the math perspective or undefined behavior from the C++ perspective. I suggest:

This functions assumes that ``a`` is symmetric. The output of the forward/backward computation is unspecified otherwise.

IMO, the backward of eigh is also returning an inconsistent value unless the forward input is symmetric.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this too.

@hvy
Copy link
Member Author

hvy commented Oct 28, 2019

PTAL.

@hvy
Copy link
Member Author

hvy commented Oct 28, 2019

PTAl.

Copy link
Member

@toslunar toslunar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@toslunar
Copy link
Member

Jenkins & flexCI, test this please.

@toslunar toslunar added the st:test-and-merge State indicating that pull request is approved by a reviewer and can be merged after CI passes. label Oct 28, 2019
@chainer-ci
Copy link
Member

Jenkins CI test (for commit d078f3e, target branch master) failed with status FAILURE.

@toslunar toslunar added this to the v7.0.0 milestone Oct 28, 2019
@toslunar
Copy link
Member

The Jenkins failure is unrelated to the PR.

@toslunar toslunar merged commit a96c85d into chainer:master Oct 28, 2019
@hvy hvy deleted the doc-linalg-symmetric-matrix-grads branch October 28, 2019 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:document Documentation such as function documentations, comments and tutorials. ChainerX Related to ChainerX. st:test-and-merge State indicating that pull request is approved by a reviewer and can be merged after CI passes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants