Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposed alignment specifier #33

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Steve132
Copy link

TThis is a patch demonstrating a proposal to add a new alignment tag which allows a safer and more user-friendly experience across platforms for alignment and facilitates generic code.

Specifically, when specifying alignment, the current options are only overaligned<N>, vector_aligned, or element_aligned.

The difficulty comes from the fact that correctly writing this code is actually impossible in a cross-platform way. For example, suppose we take the following code:

float* data=(float*)aligned_alloc(16);
stdx::native_simd<float> vec(data,stdx::vector_aligned);

This code only works on NEON and SSE, and would be incorrect on AVX, because AVX requires 32 and 64 byte alignment for native size vectors.
Fixing it is impossible without changing the code for the allocation,

float* data=(float*)aligned_alloc(stdx::memory_alignment<stdx::native_simd<float>>);
stdx::native_simd<float> vec(data,stdx::vector_aligned);

But this code might not always be possible if the data buffer is allocated by a library.

Furthermore, if you are trying to use simd::copy_to, then this actually becomes impossible to write back to a variable defined on the stack in a cross platform way:

std::array<float,64> get_vecdata(....){
	....
	std::array<float,64> output; //this is 16-byte aligned by default
	for(int i=0;i<64;i+=stdx::native_simd<float>::size())
	{
		stdx::native_simd<float> vec_result=...;
		vec_result.copy_to(&output[i],vector_aligned); //correct on NEON, SSE2, incorrect on AVX
		vec_result.copy_to(&output[i],element_aligned); //correct on all platforms, but slow.
	}

Similarly, reading from a stack variable fails as well

	float read_vecdata(const std::array<float,64>& data)
	{	
		stdx::native_simd<float> vec_result(&data[0],vector_aligned); //is it aligned on this platform? Maybe!
	}

So, writing correct code on all platforms actually becomes impossible and it is the programmers responsibility to know what alignemnt requirements are satisfied on all targets.
This might involve writing lots of ifdefs, or just dropping back to the slow case. Or maybe using template metaprogramming
Which is exactly the kind of code std::simd is supposed to prevent!

Writing generic code becomes even more difficult. Consider.

template<class T>
std::array<T,64> get_vecdata(....){
	....
	std::array<T,64> output; //this is min(16,alignof(T))-byte aligned by default
	for(int i=0;i<64;i+=stdx::native_simd<T>::size())
	{
		stdx::native_simd<T> vec_result=...;
		vec_result.copy_to(&output[i],vector_aligned);   //which platforms and types does this work on without causing an unaligned access?  Who knows!
	}
	return output;
}

This proposed patch fixes that problem. By giving the programmer a way to specify the exact alignment that is used as a tag, generic code and cross platform code becomes possible again.
Internally, the new "stdx::aligned" tag automatically correctly selects an aligned vector load or an unaligned vector load at compile time, based on the platform architecture and data type of the vector the load is using and the byte alignment passed in by the user. It also throws a compile-time assertion if the given byte alignment isn't even element-aligned (which is common to misunderstand if using structure types). This allows platform agnostic and generic code to be written which doesn't throw alignment exceptions in any case. This takes the load of deciding what
works and what doesn't off of the client code's mind.

Example of reading from a pre-allocated buffer with a fixed alignment:

float* data=(float*)aligned_alloc(16);
stdx::native_simd<float> vec(data,stdx::__proposal::aligned<16>);

Example of reading from a reference

float read_vecdata(const std::array<float,64>& data)
{	
	stdx::native_simd<float> vec_result(&data[0],stdx::__proposal::aligned<alignof(data)>);
}

Example of writing back generic code to an output

template<class T>
std::array<T,64> get_vecdata(....){
	....
	std::array<T,64> output; //this is min(16,alignof(T))-byte aligned by default
	for(int i=0;i<64;i+=stdx::native_simd<T>::size())
	{
		stdx::native_simd<T> vec_result=...;
		vec_result.copy_to(&output[i],stdx::__proposal::aligned<alignof(output)>); 
	}
	return output;
}

This pull request is actually meant to really be something which is incorporated by @mattkretz into the standards proposal and into the libstdc++ implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant