Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glm_vec4_normalize #1224

Open
gottfriedleibniz opened this issue Jan 17, 2024 · 2 comments
Open

glm_vec4_normalize #1224

gottfriedleibniz opened this issue Jan 17, 2024 · 2 comments

Comments

@gottfriedleibniz
Copy link

The current implementation of glm_vec4_normalize uses _mm_rsqrt_ps. While its relative error (<= 1.5*2^-12) is small, when used in more complicated functions it can lead inconsistent results as imprecision amplifies.

For example, modifying test/gtx/gtx_matrix_factorisation.cpp to operate on floats instead of doubles: replacing glm::dmat with glm::mat and reducing both T const epsilon = ... values to 1E-3:

# Scalar:
└> g++ -I${GLM_PATH} gtx_matrix_factorisation.cpp
└> ./a.out ; echo $?
0

# SSE:
└> g++ -I${GLM_PATH} -DGLM_FORCE_SSE2 -DGLM_FORCE_DEFAULT_ALIGNED_GENTYPES gtx_matrix_factorisation.cpp
└> ./a.out ; echo $?
2

# Arm64+NEON (different box):
└> g++ -I${GLM_PATH} -DGLM_FORCE_NEON -DGLM_FORCE_DEFAULT_ALIGNED_GENTYPES gtx_matrix_factorisation.cpp
└> ./a.out ; echo $?
2

On instruction sets that support it, I see little issue w/ replacing _mm_rsqrt_ps with _mm_div_ps+_mm_sqrt_ps (or vdivq_f32+vsqrtq_f32 on A64). Newton iterations are not much of a win as they once were on modern processors.

@christophe-lunarg
Copy link

christophe-lunarg commented Feb 8, 2024

Would you say @gottfriedleibniz that limiting the _mm_rsqrt_ps implementation to a lowp version good enough?
And use the _mm_div_ps+_mm_sqrt_ps implemenation for mediump and highp.

@gottfriedleibniz
Copy link
Author

Yeah, that seems fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants