-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide (user code) switchable vectorization support #11719
Comments
I instantly thought what you thought as well: "This is a bad idea because then the notion of what
|
@bangerth The main issue is really that having a default argument for the VectorizedArray template argument might lead to situations where one might miss instantiating code. Differing default arguments might also lead to API issues - but I think with the latest change for allowing to "downgrade" vectorization (e.g. only using AVX2 instead of AVX512 even though it is configured) I don't think that we would be facing any ABI issues. After all, the SIMD width is a template argument, so it is part of the API (and thus ABI). I just want to point out that I think that this is a real issue here. We have gone into considerable effort and length to reliably package deal.II for Debian/Ubuntu and other Linux distributions. And I want that binary distributions of the library are "first class" citizens in terms of features as well. |
This is quite an interesting topic.
This sounds reasonable, but I am not sure that the issue is limited to Furthermore, we are using dealii/include/deal.II/numerics/vector_tools_project.templates.h Lines 202 to 203 in 79b124c
where we implicitly select the highest ISA. What would be the solution here? |
I agree with the general outline regarding the steps. Overall, we are in a much better state than two years ago thanks to #8342 and some follow-up work we did there, because we support other vectorization variants in our ABI.
I think the problem is not so much the vectorization width (as that can be solved as suggested by @tamiko above by making sure all visible interfaces are instantiated appropriately), but rather the instruction set support in general that we can expect. If we compile the deal.II library for an AVX-512 target, we cannot use the library on a machine without AVX-512 support - even without the code in Of course, we could find some intermediate approach: If we have different binaries with support for a specific instruction set (say, an AVX-512 and an AVX-2 target, which should cover most Intel/AMD machines sold today), we could then allow the users to go below the vectorization width compiled in deal.II in their codes and simply pick their favorite
That should not be a problem, as long as the user's CPU understands the instruction set extension, because it does not leak to the outside world here; what we need to be careful about is that we instantiate all widths so that a user entering with a different default for the width gets valid code. |
Just to give an example: We template even these classes here which are pure consumers of |
I think what @tamiko has in mind is compiling the library with SSE2 (the minimal supported instruction set on x86_64) but allow the user to use higher vectorization levels. I wonder whether that would make a substantial difference in practice -- I'm sure that some percentage of time is actually spent in user code and in inlined functions that would benefit from it, principally in the assembly of matrices or matrix-free operators. But surely a good amount of time is spent in library functions that would not benefit from this. The idea of variants could be implemented either by the compiler (very expensive, because everything would have to be compiled more than once) or in the form of building multiple shared libraries that contain all code, and then |
Ah, that would mean we create a |
This involves more time than we have at the moment. Let's postpone this and revisit the issue during the workshop! |
@tamiko @kronbichler I think you talked about this topic recently. Do we have a plan here? |
Our current configuration logic for SIMD vectorization support is as follows:
DEAL_II_VECTORIZATION_WIDTH_IN_BITS
(the successor ofDEAL_II_VECTORIZATION_LEVEL
).base/vectorization.h
with a lower degree of vectorization support we throw an error.Now all of this works really well when you compile the library and your application code for a specific hardware - say a laptop/desktop, small compute server, or cluster. Unfortunately, it doesn't work that well for the binary deal.II library we ship with Debian/Ubuntu. Here, we have to compile the library in a very generic configuration so that it can run on all supported machines for an architecture.
It would be very nice to allow for "dynamically" chosen (aka at user project configuration/compilation time) vectorization support. So that all of the binary distributed deal.II versions would become "first class" vectorization citizens as well.
Achieving this should not be too difficult in principle - after all, most of our explicitly vectorized code is templated and compiled in user code. An approach might be to
DEAL_II_VECTORIZATION_WIDTH_IN_BITS
dynamically inconfig.h
depending on current vectorization support,VectorizedArray
depend on thatThis has one drawback, though: Compilation units might have a different notion of
VectorizedArray
depending on compiler flags.Or am I having a fever dream and we already support this by just using
VectorizedArray
with an explicit template width?Opinions?
@kronbichler ping
The text was updated successfully, but these errors were encountered: