Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VectorizedArray<float,2> does not exist. #16827

Closed
bangerth opened this issue Apr 1, 2024 · 6 comments
Closed

VectorizedArray<float,2> does not exist. #16827

bangerth opened this issue Apr 1, 2024 · 6 comments
Milestone

Comments

@bangerth
Copy link
Member

bangerth commented Apr 1, 2024

The way I understand VectorizedArrayWidthSpecifier<T>::max_width to work is that if, for example, for a type T there is a 4-element vector type but nothing larger, then max_width is set to 4 and that that implies that for every smaller power of two, a vectorized type exists. This is true for VectorizedArrayWidthSpecifier<double> as far as I can see: For some instruction sets, it is set to 4 but there are also vectorized arrays with width 2.

Not so for float: VectorizedArray<float,2> does not exist, regardless of instruction set. The problem is that only being able to query the max width leaves no way to figure out whether for width 2 a vectorized type actually exists.

How do we address this? Is it an oversight that that class was never implemented? Or do vector intrinsics just not exist for this case?

In reference to #16465 .

@tamiko
Copy link
Member

tamiko commented Apr 1, 2024

On x86 the smallest vectorization instructions available are SSE which use 128bit registers, thus, 4 floats (or 2 doubles).

We could create such a truncated data type (i.e. only use 2 out of 4 packed floats), but on the other hand, we chose a one-to-one mapping between VectorizedArray and the underlying (register) storage so far.

On a related note, @kronbichler discussed some plans of making the larger VectorizedArray units available unconditionally by piecing together smaller packed doubles/floats on architectures that don't support full 512bit (or 256bit) registers. Not to mention the freshly introduced 267bit registers on macs.

@bangerth
Copy link
Member Author

bangerth commented Apr 1, 2024

Ah, bummer. I'll disable vectorization for the Vectorized<float,2> case then for now. This seems an unusual case anyway where perhaps it is not worth doubling memory consumption.

@bangerth bangerth added this to the Release 9.6 milestone Apr 2, 2024
@tamiko
Copy link
Member

tamiko commented Apr 2, 2024

@bangerth What about we close this as not planned?

@bangerth
Copy link
Member Author

bangerth commented Apr 2, 2024

Yes, that's reasonable. I wished there was a way to test whether a specific specialization exists, rather than just asking what the largest one is, but I can work with the current state.

@bangerth bangerth closed this as completed Apr 2, 2024
@tamiko
Copy link
Member

tamiko commented Apr 2, 2024

@bangerth Would it help to manually specify all available variants in vectorization.h as a constexpr variable so that one can if constexpr (...) from that?

I mean we are already defining all overloads there, so we could also add specializations to a type trait

template<typename T, std::size_t size> 
struct VectorizationAvailable {
  constexpr bool value = false;
}

// and within specific `#ifdef` clauses:
template<> struct VectorizationAvailable<double,4> {
  constexpr bool value = true;
}

// etc.

That way you can simply if constexpr (...) { } else if constexpr (...) { } particular implementations.

@bangerth
Copy link
Member Author

bangerth commented Apr 2, 2024

Yes, something like that would work.

As I said, I need it in one place. It's perhaps not worth inventing heavy machinery just for that case. We can consider this if we ever need it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants