Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mat4x4_invert implementation #52

Closed
pr0g opened this issue Dec 3, 2022 · 3 comments
Closed

mat4x4_invert implementation #52

pr0g opened this issue Dec 3, 2022 · 3 comments

Comments

@pr0g
Copy link

pr0g commented Dec 3, 2022

Hi @datenwolf,

While noodling around on my own C math library, I stumbled across linmath.h and was very interested in your implementation of mat4x4_invert (see below).

linmath.h/linmath.h

Lines 278 to 318 in 0538757

LINMATH_H_FUNC void mat4x4_invert(mat4x4 T, mat4x4 const M)
{
float s[6];
float c[6];
s[0] = M[0][0]*M[1][1] - M[1][0]*M[0][1];
s[1] = M[0][0]*M[1][2] - M[1][0]*M[0][2];
s[2] = M[0][0]*M[1][3] - M[1][0]*M[0][3];
s[3] = M[0][1]*M[1][2] - M[1][1]*M[0][2];
s[4] = M[0][1]*M[1][3] - M[1][1]*M[0][3];
s[5] = M[0][2]*M[1][3] - M[1][2]*M[0][3];
c[0] = M[2][0]*M[3][1] - M[3][0]*M[2][1];
c[1] = M[2][0]*M[3][2] - M[3][0]*M[2][2];
c[2] = M[2][0]*M[3][3] - M[3][0]*M[2][3];
c[3] = M[2][1]*M[3][2] - M[3][1]*M[2][2];
c[4] = M[2][1]*M[3][3] - M[3][1]*M[2][3];
c[5] = M[2][2]*M[3][3] - M[3][2]*M[2][3];
/* Assumes it is invertible */
float idet = 1.0f/( s[0]*c[5]-s[1]*c[4]+s[2]*c[3]+s[3]*c[2]-s[4]*c[1]+s[5]*c[0] );
T[0][0] = ( M[1][1] * c[5] - M[1][2] * c[4] + M[1][3] * c[3]) * idet;
T[0][1] = (-M[0][1] * c[5] + M[0][2] * c[4] - M[0][3] * c[3]) * idet;
T[0][2] = ( M[3][1] * s[5] - M[3][2] * s[4] + M[3][3] * s[3]) * idet;
T[0][3] = (-M[2][1] * s[5] + M[2][2] * s[4] - M[2][3] * s[3]) * idet;
T[1][0] = (-M[1][0] * c[5] + M[1][2] * c[2] - M[1][3] * c[1]) * idet;
T[1][1] = ( M[0][0] * c[5] - M[0][2] * c[2] + M[0][3] * c[1]) * idet;
T[1][2] = (-M[3][0] * s[5] + M[3][2] * s[2] - M[3][3] * s[1]) * idet;
T[1][3] = ( M[2][0] * s[5] - M[2][2] * s[2] + M[2][3] * s[1]) * idet;
T[2][0] = ( M[1][0] * c[4] - M[1][1] * c[2] + M[1][3] * c[0]) * idet;
T[2][1] = (-M[0][0] * c[4] + M[0][1] * c[2] - M[0][3] * c[0]) * idet;
T[2][2] = ( M[3][0] * s[4] - M[3][1] * s[2] + M[3][3] * s[0]) * idet;
T[2][3] = (-M[2][0] * s[4] + M[2][1] * s[2] - M[2][3] * s[0]) * idet;
T[3][0] = (-M[1][0] * c[3] + M[1][1] * c[1] - M[1][2] * c[0]) * idet;
T[3][1] = ( M[0][0] * c[3] - M[0][1] * c[1] + M[0][2] * c[0]) * idet;
T[3][2] = (-M[3][0] * s[3] + M[3][1] * s[1] - M[3][2] * s[0]) * idet;
T[3][3] = ( M[2][0] * s[3] - M[2][1] * s[1] + M[2][2] * s[0]) * idet;
}

The implementation is quite a bit more efficient than the more traditional mat4x4 inverse functions I've seen and implemented. Is the implementation above derived from a Guauss-Jordan elimination approach? I'm just curious how you arrived at it and if there are any corner cases to watch out for.

I see it was added as part of this commit - 0f1050f which does mention Guauss-Jordan elimination and iterative Jacobian (which I hadn't heard of before).

Thanks very much for your time,

Tom

@pr0g
Copy link
Author

pr0g commented Dec 3, 2022

(please feel free to close this as it's not an issue just a question)

@datenwolf
Copy link
Owner

This is an implementation of matrix inversion by using the Cayley-Hamilton theorem

https://en.wikipedia.org/wiki/Invertible_matrix#Cayley%E2%80%93Hamilton_method

https://en.wikipedia.org/wiki/Cayley%E2%80%93Hamilton_theorem#Determinant_and_inverse_matrix

However I'd strongly argue that it's not very efficient in general. It works well enough for small matrices, that nicely fit into vector units of modern CPUs and has enough common expressions, that a decent compiler will be able to make good use of the register set. Most importantly it has deterministic run time, memory load and store behavior and no branches which makes it trivial in terms of branch prediction and prefetching.

But overall the total number of calculations is growing exponentially O(2^(m·n)) with the dimensions of the matrix. Curiously enough this algorithm is well known among German secondary school students (at least in my time at school), where it was often used as an example for an easy to understand, but badly scaling algorithm and to motivate the use of computationally more efficient algorithms like the ones I did mention in my commit message.

But as it turned out, for this particular special case of 4×4 matrices that fit very well into the resources of modern CPUs, the computational expenses do not weigh as much as memory access latencies.

Generally speaking though, I strongly advise against making use of inverted matrices if the goal is solving a problem of the form Ax = b, x = inv(A)b if there's only a small number of xs. Instead you should just use a linear solver that just munches through this one problem, as it's usually more efficient, numerically more robust and taking care for degenerate solutions is easier (also such algorithms will tell you, how much the set of equations is underdetermined and in which directions).

In computer graphics there's exactly one application of the inverse matrix, and that is for transformation of the normal vector in lighting calculations. In hindsight I should have named the thing as mat4x4_normaltransform and written it in a way that it returns the transposed of the inverted, just to nudge people in the direction I wanted it to be used like.

Cheers,
Wolfgang

@pr0g
Copy link
Author

pr0g commented Dec 3, 2022

Thank you very much for getting back to me so quickly @datenwolf and for all the additional details and context, much appreciated! I'll definitely be looking into the Cayley-Hamilton theorem you mentioned.

Completely understand this is only really useful in a few situations. The other thing I've used a 4x4 matrix inverse for is when taking a point in screen space and projecting to world space (usually onto the near clip plane). For nearly all other transforms you can use a transpose combined with flipping the translation or something similar, you usually don't need the full inverse.

It was just interesting to me as I'd essentially followed the seemingly more common approach (outlined here) which has quite a few more ops than the Cayley-Hamilton version you're using. I did some quick benchmarks and the Cayley-Hamilton version takes 12ns on my machine with the other version taking closer to 20ns, such as this implementation:

https://github.com/pr0g/as-c-math/blob/7bc78cffcb102a41bb9126ae340f94fd61ba8e84/as-ops.c#L798:L916

Thanks again for the helpful reply, all the best! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants