Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Affine3D #157

Merged
merged 24 commits into from Apr 12, 2021
Merged

Affine3D #157

merged 24 commits into from Apr 12, 2021

Conversation

emilk
Copy link
Contributor

@emilk emilk commented Apr 6, 2021

Part of #25. A simpler version of #156.

This PR introduces the Affine3D type, implemented as 3x Vec4 (NOTE: only f32 version included).

Banchmark results on my Intel MacBook (best of a few runs):

op Mat4 sse2 Affine3D sse2 Mat4 scalar Affine3D scalar
inverse 12 ns 9 ns 🥇 31 ns 16 ns 🥇
Self * Self 6.1 ns 4.2 ns 🥇 21 ns 14 ns 🥇
transform point3 2.6 ns 🥇 3.8 ns 3.8 ns 3.5 ns 🥇
transform vector3 2.5 ns 🥇 3.6 ns 3.3 ns 2.9 ns 🥇

( 🥇 is the fastest in each pair of columns)

It would be nice to speed up the sse2 vector transforms more, as those are probably among the most common operations to do with a matrix. I've tried several different approaches, but I can't get further than this. Unless someone has a better idea, I think we should just recommend turning your Affine3D into a Mat4 on SSE2 targets when doing a lot of mat * vec transforms.

It is also possible that for some platforms (in particular spirv) a 4x Vec3 may actually be faster.

@emilk emilk force-pushed the mat3x4-take2 branch 5 times, most recently from d500858 to 92b45ad Compare April 6, 2021 13:39
@emilk emilk marked this pull request as ready for review April 6, 2021 13:55
@emilk emilk marked this pull request as draft April 6, 2021 14:29
@emilk emilk force-pushed the mat3x4-take2 branch 3 times, most recently from a98fb57 to d525f64 Compare April 6, 2021 16:35
@emilk emilk marked this pull request as ready for review April 6, 2021 17:28
src/mat3x4.rs Outdated Show resolved Hide resolved
src/mat3x4.rs Outdated Show resolved Hide resolved
@bitshifter
Copy link
Owner

Thanks for all your work on this!

There are a couple of things I'm not sure about as it currently stands, on naming and being row major.

Naming wise I was thinking of making this a special type, e.g. Transform3D or Affine3D or something along those lines. There are a few reasons:

  • It doesn't tie you to an internal representation, e.g. 4x3 vs 3x4 or rows vs columns
  • Methods like inverse, determinant etc. don't make sense for non square matrices, to be fair I have helper methods on Mat4 like transform_point3 etc. that don't make mathematical sense either
  • It doesn't make sense to multiply two 3x4 matrices but these transforms be able to be multiplied.
  • Making it a special type makes it really clear that it contains a valid affine transform and nothing else

I'm not totally sure about using row major when everything else is column major, it might cause confusion when debugging if the internals behave differently to the other matrix types. For scalar code, the size would be the same for 4x3 or 3x4 and I suspect it wouldn't make a huge difference one way or another for scalar performance. At the moment the SSE2 code isn't benefiting from row major, although I think switching to Vec3A in the implementation would get it closer to the Mat4 performance. Another concern I have is I think accessing the translation and the basis vectors is a very common thing to do and this might be slower if the data is organised in rows, also direct access is nice. Also converting to a Mat4 or Mat3 would require a transpose with row major.

@emilk
Copy link
Contributor Author

emilk commented Apr 7, 2021

Naming wise I was thinking of making this a special type, e.g. Transform3D or Affine3D or something along those lines.

I agree. I think Affine3D is the better name here (since it is more specific).

I'm not totally sure about using row major when everything else is column major.

I made this choice because I wanted to represent the transform using 12xf32, to save memory (for better cache locality). If we don't care about that then 4xVec3A columns makes much more sense. What is "best" choice will depend a lot on the circumstance. The extra saved memory can make a big difference to decrease cache misses, but for applications where you can read data in order, it won't matter much at all.

I can make a competing 4xVec3A column major implementation so we can compare raw numbers.

@emilk
Copy link
Contributor Author

emilk commented Apr 7, 2021

I think if we rename this Affine3D we should also hide all functions related to columns and rows to make it clear that this is a higher abstraction.

I've started work on a 4xVec3A version, but am having some problems. Maybe it can wait a bit and come as a follow-up PR where we switch to a (potentially) faster implementation?

@bitshifter
Copy link
Owner

I think it makes sense to remove those methods. My approach has usually been to provide a pretty minimal amount of functionality and add things that people are requesting.

No problem skipping the 4xVec3A implementation for now if you are having problems with it.

@bitshifter
Copy link
Owner

One thing I was planning on experimenting with was loading scalar types into SIMD to perform math heavy operations, so it wouldn't just be things like Vec3A that are SIMD optimised. If doing this improve performance of Mat3 inverse and vector multiplication functions potentially an Affine3D type could just be a composition Mat3 and Vec3. It would be similar in essence to the existing transform type https://github.com/bitshifter/glam-rs/blob/master/src/transform.rs#L217-L224. This does depend a bit on the results of my experiments though :)

The SSE2 dot product is not super fast. The column major vector multiplication doesn't require dot product so it should be a lot faster if SIMD is available, see https://github.com/bitshifter/glam-rs/blob/master/src/core/sse2/matrix.rs#L406-L421.

@bitshifter
Copy link
Owner

Related to my comment above, I wrote a bit more detail about where I'm thinking of going with glam here #159. You don't really have to do anything with this information for this PR, just thought it might be useful to know what I'm thinking on the performance side of things.

@emilk emilk changed the title Mat3x4 (take2) Affine3D Apr 7, 2021
@emilk
Copy link
Contributor Author

emilk commented Apr 7, 2021

Current public interface:

Screen Shot 2021-04-07 at 15 25 33

@emilk
Copy link
Contributor Author

emilk commented Apr 8, 2021

I have a working 4xVec3A implementation now too, but so far it is slower at simd self*self than the 3xVec4 version. The transforming of points and vectors is faster though (as fast as for Mat4), so probably worth the trade. I'll do a separate PR for it with proper comparisons once this PR is merged.

src/quat.rs Outdated Show resolved Hide resolved
tests/support/mod.rs Outdated Show resolved Hide resolved
src/features/impl_serde.rs Outdated Show resolved Hide resolved
src/affine3d.rs Outdated Show resolved Hide resolved
src/affine3d.rs Outdated Show resolved Hide resolved
src/affine3d.rs Outdated Show resolved Hide resolved
src/affine3d.rs Outdated Show resolved Hide resolved
src/affine3d.rs Show resolved Hide resolved
@emilk emilk requested a review from bitshifter April 8, 2021 11:47
@bitshifter
Copy link
Owner

The weird clippy error was just a run of the mill warning due to the method not being used by the DQuat implementation. If you still wanted to use it internally you could make it #[allow(dead_code)].

I've been asked to do a release by another embarker, so I might hold off merging this until I've done that. If you wanted you could put it on a feature so it won't affect the release, or just wait. I want to hold off having this on by default until the column major PR lands. I don't expect it to take long to do a release.

@bitshifter
Copy link
Owner

The release has been done, so should be good to merge once the conflicts are resolved.

@emilk
Copy link
Contributor Author

emilk commented Apr 12, 2021

Rebased to fix merge conflicts in CHANGELOG.md

@bitshifter bitshifter merged commit 55f7cd4 into bitshifter:master Apr 12, 2021
@khyperia khyperia deleted the mat3x4-take2 branch June 1, 2021 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants