linear_operator cat_rows performance improvement #93

naefjo · 2024-03-11T13:42:04Z

Hello :)

This PR addresses the observations about computational bottlenecks made in cornellius-gp/gpytorch#2468

In cat_rows, the schur_root is converted to a dense operator using to_dense. Hence, the subsequent inversion of the root fails to exploit the structure of the operator and defaults to stable_pinverse which uses a QR decomposition.

The PR contains the following modifications:

Don't convert schur_root to a dense operator unless needed for tensor assignment.
In root_inv_decomposition exploit structure of the resulting inversion using cholesky by casting the result to a TriangularLinearOperator instead of a DenseLinearOperator.
Add an option to specify the matrix size threshold where QR decomposition should be performed on the CPU instead of the GPU which is more in line with the observations made in the linked issue.

Balandat · 2024-03-16T21:07:45Z

Thanks for the contribution! Overall this makes sense to me. Would you be able to provide some benchmark results of this change relative to the previous implementation?

naefjo · 2024-03-17T20:55:38Z

Sure thing.
Timing cat_rows in isolation seems to not make any noticable difference at all in a toy example i tried to cook up. However, it is very noticable in gpytorch's get_fantasy_model. Here is a graph showing the computation times of gpytorch's get_fantasy_model method as a function of number of datapoints based on the basic example notebook. The updates were performed in a "batched" setting, i.e. 10 points at a time. Note that this is in conjunction with the changes from cornellius-gp/gpytorch#2494.

BTW, the failing CI seems to be related to an updated version of mpmath downloaded here. 1.4.0 seems to have some breaking API changes. A quick google search shows that other repos were affected as well pytorch/pytorch#120995 NVIDIA/TensorRT-LLM#1145

naefjo · 2024-03-17T22:05:46Z

Disregard my last comment about there being no difference in cat_rows. Apparently the matrices I was testing were not p.d. enough for cholesky factorization which led to root decompositions being performed with symeig which again led to root inverses being computed with stable_pinverse in cat_rows.... If the matrices are well conditioned enough for cholesky not to fail, then the result look as follows:

Balandat · 2024-03-18T04:33:46Z

Nice, this seems like a meaningful improvement. Thanks for the perf fix.

BTW, the failing CI seems to be related to an updated version of mpmath

Yes, thanks, I've run into this with other libraries before. #94 pins the version to avoid this issue for now. Could you rebase on that change so we can run the test?

…ter cholesky root inv decomposition

Balandat · 2024-03-18T15:14:52Z

Not sure what is going on with the docs, but it's unrelated to this PR

naefjo mentioned this pull request Mar 11, 2024

Bug: Exploit Structure in get_fantasy_strategy cornellius-gp/gpytorch#2494

Open

naefjo added 3 commits March 18, 2024 14:54

Make QR size for CPU dispatch a setting

af52bd5

Fix bug in cat_rows which prevented efficient root inversion

37e2e14

Return a TriangularLinearOperator instead of a DenseLinearOperator af…

0a94f8b

…ter cholesky root inv decomposition

naefjo force-pushed the feature/online-learning-improvements branch from 716222d to 0a94f8b Compare March 18, 2024 13:54

Balandat merged commit a0a9c42 into cornellius-gp:main Mar 18, 2024
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linear_operator cat_rows performance improvement #93

linear_operator cat_rows performance improvement #93

naefjo commented Mar 11, 2024

Balandat commented Mar 16, 2024

naefjo commented Mar 17, 2024 •

edited

naefjo commented Mar 17, 2024

Balandat commented Mar 18, 2024

Balandat commented Mar 18, 2024

linear_operator cat_rows performance improvement #93

linear_operator cat_rows performance improvement #93

Conversation

naefjo commented Mar 11, 2024

Balandat commented Mar 16, 2024

naefjo commented Mar 17, 2024 • edited

naefjo commented Mar 17, 2024

Balandat commented Mar 18, 2024

Balandat commented Mar 18, 2024

naefjo commented Mar 17, 2024 •

edited