Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster floor/ceil/round in pre SSE 4.1 cases #44

Open
jackmott opened this issue Jul 1, 2018 · 0 comments
Open

faster floor/ceil/round in pre SSE 4.1 cases #44

jackmott opened this issue Jul 1, 2018 · 0 comments

Comments

@jackmott
Copy link

jackmott commented Jul 1, 2018

If I am reading the code correctly, it looks like in the case of SSE2 Faster currently falls back to calling round()/floor() etc on each individual lane via the fallback macro.

You may be able to use these methods instead:
http://dss.stephanierct.com/DevBlog/?p=8

Or Agner Fog has a different method in his vector library:
http://www.agner.org/optimize/vectorclass.zip

edit:
Agner's functions are slower but can handle floating point values that don't fit in an i32, the first functions only handle values that do fit in an i32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant