-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better Vectorization for GCC #7
Comments
How about even generating intrinsics calls? That way it will always work. That would be something I would like to play with. |
Oh, that's what you meant, so, yeah, would like to look into it, if I find time... |
The trouble with intrinsics is that they're tied to the architecture, so it seems messy to get something that works in general. Although, looking at https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html, they may be generic enough that all you need to specify is the width? |
I think it should be pissible to write specializations for the most important operations and architectures. Von meinem iPhone gesendet
|
Likely a relevant discussion on a try to replace all "#pragma ivdep" intended for the Intel C++ compiler with corresponding "GCC ivdep" pragma in Blitz++: |
Patrik Jonsson wrote:
One more thing that can be added to the improvements list is better handling of vectorization. The last big update to blitz was when I added support for vectorization by making it more obvious to the compiler when arrays were contiguous. However, this relies on the compiler to do the actual vectorization. The inter compiler was quite good at this, but as far as I remember gcc does not vectorize loops at all. Since the majority of users probably use gcc, this is a substantial disadvantage. If someone wanted to look into ways to add explicitly vectorized operations, that would greatly improve blitz's performance under gcc, I think. That does require diving deep into the guts of the expression template mechanism, though.
The text was updated successfully, but these errors were encountered: