Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow building without AVX. #243

Merged
merged 1 commit into from
Aug 4, 2021

Conversation

iSLC
Copy link
Contributor

@iSLC iSLC commented Aug 4, 2021

At this moment, compiling fails if AVX is not enabled. First, because the code using AVX is not excluded from the list of files to be compiled. And second, because there is no macro guard to not attempt to use that code if AVX was not enabled.

This change makes it so that files containing Avx in their name are excluded from the list of files to be compiled (fixes the compiler failure). And also adds a macro to not attempt to use code from those files since they were excluded (fixes the linker failure that inevitably comes after the first change).

At this moment, compiling fails if AVX is not enabled. First, because the code using AVX is not excluded from the list of files to be compiled. And second, because there is no macro guard to not attempt to use that code if AVX was not enabled.
@JulioJerez JulioJerez merged commit d467189 into MADEAPPS:master Aug 4, 2021
@JulioJerez
Copy link
Contributor

merged:
this was a good one thanks.

any reason why you would not wnat avx2 on a GCC system?
the AVX2 is hugely faster that all other solvers because is use the 8 way simd lane as a 8 independent cores.
AVX2 provide and instruction that call gather and scatter that allows for the use of a register as it is was a GPU multiprocessor.
they are allow in AVX2 but they are still faster than the equivalent version using C code.

ther is an overhead for transposing the data each tick, but this overhead is linear and so it is not worse than the number of iterations for calculation teh joint forces. but with the overhead and emulation gather and scatter the SSE soa version of teh solve can only solve 4 joint per call. the make the sse soa solver marginally faster that the scalar solver. so it is ther as a reference to start from when making new solvers.

for the AVX2 solver the number of joints per call is 8, so when solving large number of joints, is much faster that ther other almost twice as fast.

so is the engine has to solve say 2000 joint, the scalar solve will call 2000 * number of iterations.
the SSE sao will call 2000 / 4 * number of iterations = 500 * number of iterations (but the transpose over head is significant)
the avx2 will call 2000 / 8 * number of iterations = 250 * number of iterations (now we see a substantial performance gain)

I can only imagine what a avx512 would do since is has even more powerful swizzle and gather instructions.

anyway thank for this patch

@iSLC
Copy link
Contributor Author

iSLC commented Aug 4, 2021

merged:
this was a good one thanks.

any reason why you would not wnat avx2 on a GCC system?

There is absolutely no reason. I just happen to compile using default options which makes AVX2 to be OFF by default and it failed. And though I should address that because for example you wouldn't have AVX on ARM or some other platform and it would make sense to be able to build in generic mode.

@JulioJerez
Copy link
Contributor

JulioJerez commented Aug 4, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants