Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for macos universal2 (arm64+x86_64 build) #431

Merged
merged 5 commits into from
Dec 14, 2022

Conversation

t20100
Copy link
Contributor

@t20100 t20100 commented Dec 14, 2022

This PR proposes a fix to allow building c-blosc2 as part of a Python package that is built as a universal2 binary package on macos with enabled AVX2/SSE2 optimisation for the x86_64 target only, NEON for the arm target and ALTIVEC for ppc64le.

To achieve this, c-blosc2 should support defining SHUFFLE_*_ENABLED and be able to build all files (including *-avx2.c, *-sse2.c, *-neon.c and *-altivec.c) even when the compilation target does not support the given instruction set.

This is done by toggling SIMD instructions usage with both SHUFFLE_*_ENABLED and AVX2, SSE2, __ARM_NEON and ALTIVEC.

closes #430

related to Blosc/c-blosc#347
related to silx-kit/hdf5plugin#201

blosc/shuffle.h Show resolved Hide resolved
@t20100
Copy link
Contributor Author

t20100 commented Dec 14, 2022

I had trouble building on macos through Python (universal2 build using clang) with the _xgetbv function.

Commit 89f9335 fixes it for my case, but I'm really not sure it is the right thing to do.
Is this xgetbv intel only?
clang is not using #define _IMMINTRIN_H_INCLUDED but rather #define __IMMINTRIN_H in immintrin.h, is it related?

@FrancescAlted
Copy link
Member

I had trouble building on macos through Python (universal2 build using clang) with the _xgetbv function.

Commit 89f9335 fixes it for my case, but I'm really not sure it is the right thing to do. Is this xgetbv intel only?

It looks like it is, yes: https://www.felixcloutier.com/x86/xgetbv

clang is not using #define _IMMINTRIN_H_INCLUDED but rather #define __IMMINTRIN_H in immintrin.h, is it related?

Uh, not sure about this. But if that works for you, let's go with that.

@FrancescAlted
Copy link
Member

This looks promising! I'll merge soon. Thanks!

@t20100
Copy link
Contributor Author

t20100 commented Dec 14, 2022

It looks like it is, yes: https://www.felixcloutier.com/x86/xgetbv

Good! I'm not really comfortable with this kind of compilation issue.

clang is not using #define _IMMINTRIN_H_INCLUDED but rather #define __IMMINTRIN_H in immintrin.h, is it related?
Uh, not sure about this. But if that works for you, let's go with that.

It looks like it: https://github.com/llvm/llvm-project/blob/a274d62fecfc3f49065f3fcdcb9577637778e0bc/clang/lib/Headers/immintrin.h#L10-L11
Anyway, it is not the issue here.

@FrancescAlted FrancescAlted merged commit 318f799 into Blosc:main Dec 14, 2022
@FrancescAlted
Copy link
Member

It is in. I'll try to cut a release soon.

@t20100 t20100 deleted the universal2 branch December 14, 2022 17:57
@t20100
Copy link
Contributor Author

t20100 commented Dec 14, 2022

Great! Thanks a lot!

@FrancescAlted
Copy link
Member

I have been doing some profiling on a recent Intel CPU, and I discovered that latest version of C-Blosc2 do not use AVX2 anymore by default:

image

After some digging, the commit that started to not use AVX2 is: c603171. Before this, the profile was something like:

image

To reproduce the profilings:

$ perf record -F 99 -a -g -- bench/b2bench blosclz shuffle suite
$ perf report --sort comm,dso

when browsing the results, press 'e'xpand to see the details.

Any hint @t20100 ?

@t20100
Copy link
Contributor Author

t20100 commented Jan 9, 2023

From a look at the content of this PR, I don't see why c603171 would change anything... I'll test it.
BTW, did you look whether AVX2 is not compiled or is it not detected at runtime?

Without your tests, I would more suspect 89f9335.

@t20100
Copy link
Contributor Author

t20100 commented Jan 9, 2023

-mavx2|/arch:AVX2 is not set when compiling shuffle.c while __AVX2__ is used to define SHUFFLE_USE_AVX2:

#if defined(SHUFFLE_AVX2_ENABLED) && defined(__AVX2__)
#define SHUFFLE_USE_AVX2
#endif

@FrancescAlted
Copy link
Member

-mavx2|/arch:AVX2 is not set when compiling shuffle.c while __AVX2__ is used to define SHUFFLE_USE_AVX2:

#if defined(SHUFFLE_AVX2_ENABLED) && defined(__AVX2__)
#define SHUFFLE_USE_AVX2
#endif

IIRC it is not necessary to use -mavx2|/arch:AVX2 is not set when compiling shuffle.c. [xgetbv](https://www.felixcloutier.com/x86/xgetbv) was enough to detect the AVX2 presence and dispatch the correct workflow there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build issue with universal2 on macos
2 participants