Export `-march=native` for Clang and prevent it from being included during cross compilation. #2416

SergioRAgostinho · 2018-09-06T22:41:15Z

Closes: #2256

Tried it on OS X and its adding the flag and the flag is being passed to the downstream target (I applied it after #2100). I also ran #2013 and it seems fine.

Rebase Merge after #2100 i.e., just consider the last commit.

svenevs · 2018-09-09T10:16:06Z

Hey I'm sorry I'm confused. Am I supposed to be testing this as-is or waiting for a rebase now that #2100 is merged? I was waiting for the rebase but now I'm not sure...it's 3:15am for me right now, got plans in the morning but i should be able to check this out tomorrow afternoon (~12-16 hrs from now).

SergioRAgostinho · 2018-09-09T11:59:41Z

Sorry for not being clear. The rebase merge comment was more for Sergey in case he wanted to merge this following #2100, since it includes commits from that PR. Now that #2100 is merged, I'll do the rebase manually.

Regardless of that and for future reference, you can always test it as is. The rebasing operations are usually just to have a clean history before the merge commits.

svenevs · 2018-09-10T03:34:35Z

Looks good to me!!!

Building PCL Itself

(at very endish there's a -march=native)

[1/407] /opt/spack/opt/spack/linux-fedora27-x86_64/gcc-7.3.1/llvm-6.0.1-ls7mvjhlg4qn37rp7p5v55ybnttpjtls/bin/clang++  -DPCLAPI_EXPORTS -DQT_CONCURRENT_LIB -DQT_CORE_LIB -DQT_GUI_LIB -DQT_OPENGL_LIB -DQT_WIDGETS_LIB -Dqh_QHpointer -I/opt/spack/opt/spack/linux-fedora27-x86_64/clang-6.0.1/flann-1.9.1-xbbwnltjq34m4oyd75g7zy7i3reams2w/include -I../recognition/include/pcl/recognition/3rdparty -I/opt/spack/opt/spack/linux-fedora27-x86_64/clang-6.0.1/libpng-1.6.34-ncs3zqm3niyngbtnrkzsh4rgkjuxbehy/include -I/opt/spack/opt/spack/linux-fedora27-x86_64/clang-6.0.1/qhull-2015.2-mf4i3x32ay5qm4ciulpkslycqay7sjfo/include -I/usr/include/qt5 -I/usr/include/qt5/QtCore -I/usr/lib64/qt5/./mkspecs/linux-g++ -I/usr/include/qt5/QtGui -I/usr/include/qt5/QtWidgets -I/usr/include/qt5/QtConcurrent -I/usr/include/qt5/QtOpenGL -Iinclude -I../common/include -isystem /opt/spack/opt/spack/linux-fedora27-x86_64/clang-6.0.1/eigen-3.3.4-bfkdaij4s2kp4xnhyet2umn5ontrxjzj/include/eigen3 -isystem /opt/spack/opt/spack/linux-fedora27-x86_64/clang-6.0.1/boost-1.68.0-4plhxwpwc7rtsia6u3jc6cs6y4cbcb4t/include -ftemplate-depth=1024 -Qunused-arguments -Wno-invalid-offsetof -march=native -msse4.2 -mfpmath=sse -fopenmp=libomp      -O3 -DNDEBUG -fPIC   -DBOOST_DISABLE_ASSERTS -DEIGEN_NO_DEBUG -MD -MT common/CMakeFiles/pcl_common.dir/src/correspondence.cpp.o -MF common/CMakeFiles/pcl_common.dir/src/correspondence.cpp.o.d -o common/CMakeFiles/pcl_common.dir/src/correspondence.cpp.o -c ../common/src/correspondence.cpp

AKA -march=native with clang@6.0.1 got in there as desired 💥

FWIW, it shouldn't matter too much, but this was with the following configuration:

$ cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/pcl/clang/6.0.1 -DWITH_CUDA=OFF -DWITH_VTK=OFF

Compilers verified:

gcc@7.3.1
clang@5.0.0
clang@6.0.1

Using PCL From CMake

I didn't actually create / test a project yet, but it should work out just fine.

sven:/opt/pcl> grep march gcc/7.3.1/share/pcl-1.8/PCLConfig.cmake 
gcc/7.3.1/share/pcl-1.8/PCLConfig.cmake:437:list(APPEND PCL_COMPILE_OPTIONS -march=native;-msse4.2;-mfpmath=sse)
sven:/opt/pcl> grep march clang/6.0.1/share/pcl-1.8/PCLConfig.cmake 
clang/6.0.1/share/pcl-1.8/PCLConfig.cmake:437:list(APPEND PCL_COMPILE_OPTIONS -march=native;-msse4.2;-mfpmath=sse)
sven:/opt/pcl> grep march clang/5.0.0/share/pcl-1.8/PCLConfig.cmake 
clang/5.0.0/share/pcl-1.8/PCLConfig.cmake:437:list(APPEND PCL_COMPILE_OPTIONS -march=native;-msse4.2;-mfpmath=sse)

AKA this should definitely be working as expected.

I am fixing compilation issues with intel right now, fixed the deprecated macros, working on mm_malloc right now, then the only other thing I'm gonna add is -xHost when Intel (or the Windows version if windows and intel). I expect to have this PR finished before I go to bed ;)

taketwo · 2018-09-10T13:34:35Z

Honestly, I don't think we need to check for GCC version being greater than 4.2. It is more than a decade old! And the oldest active Ubuntu LTS (which is 12.04) has 4.8. Thus I would propose to drop versions testing and reduce the script to:

if(CMAKE_COMPILER_IS_GNUCC OR CMAKE_COMPILER_IS_GNUCXX OR CMAKE_COMPILER_IS_CLANG)
    list(APPEND SSE_FLAGS "-march=native")
    message(STATUS "Using CPU native flags for SSE optimization: ${SSE_FLAGS}")
endif()

SergioRAgostinho · 2018-09-10T13:36:50Z

👍 I'll modify it tonight.

svenevs · 2018-09-10T16:02:37Z

In that event, I will open an alternate PR. The Intel stuff is ummm...despicable. They have some really inconvenient / arbitrary compiler flag rules.

In addition to fixing intel compilation, I also added AVX and AVX2 checks. Posting so you know it's inbound, I'll remove the GCC 4.2 check in that PR. I'm just waiting to check on all the compilers again to make sure it's legit (intel's compiler is actually really slow...).

I think you all will be pleased with the result though, it looks like a big diff but really not IMO. I should have participated in #2100 but I was trying to not get sucked in lol.

taketwo · 2018-09-10T16:13:39Z

In addition to fixing intel compilation, I also added AVX and AVX2 checks.

I suppose you are talking about those check_cxx_source_runs and setting HAVE_*_EXTENSIONS variables? Actually, do you have any idea why these are needed at all?

svenevs · 2018-09-10T16:22:49Z

I suppose you are talking about those check_cxx_source_runs and setting HAVE_*_EXTENSIONS variables?

Yup.

Actually, do you have any idea why these are needed at all?

Only a very fuzzy picture. Here's how I understand the gcc docs, inlining some for convenience with some emphasis

-march=cpu-type

Generate instructions for the machine type cpu-type. In contrast to -mtune=cpu-type, which merely tunes the generated code for the specified cpu-type, -march=cpu-type allows GCC to generate code that may not run at all on processors other than the one indicated. Specifying -march=cpu-type implies -mtune=cpu-type.

In contrast, at the bottom the giant listing of all the possible -m's (I refuse to do AVX512 btw, it's a nightmare).

These switches enable the use of instructions in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, ........

So while you may think that -march=native, it's not really a guarantee, so checking that the capabilities exist is still a good idea in my opinion. This blog post talks about some of the oddities.

To be clear, I'm not an expert, and definitely not 100%. I think adding -march=native -mavx2 won't hurt though, it covers when -march=native may not do what you want (???)

On the other hand, now that I take a second look, these also need to be guarded by if( NOT CMAKE_CROSSCOMPILING) don't they? There's no reason the target machine would have those.

svenevs · 2018-09-10T16:34:58Z

Sorry for double response. I forgot the most important. There's no -march=native for visual studio (though they do support /arch:AVX now), and the Intel one is actually only for Intel CPUs (can't use the -xHost flag for an AMD CPU).

After retesting the Linux ones I will test visual studio and this specific module, but I won't actually be testing building PCL itself on Windows.

taketwo · 2018-09-10T17:49:50Z

So while you may think that -march=native, it's not really a guarantee, so checking that the capabilities exist is still a good idea in my opinion. This blog post talks about some of the oddities.

Thanks for the post, interesting reading. Did you also check comments? This one provides a very reasonable explanation why the oddities that the author talks about actually make sense. That said, I don't think this post is relevant to our discussion since it's about -mtune :)

I think adding -march=native -mavx2 won't hurt though, it covers when -march=native may not do what you want (???)

GCC docs claim that using -march=native enables all instruction subsets supported by the local machine. It indeed does not hurt to add -mavx2 explicitly, but why would we do that? Are you aware of any cases when -march=native does not do what you want? (In terms of enabling instruction sets.)

On the other hand, now that I take a second look, these also need to be guarded by if( NOT CMAKE_CROSSCOMPILING) don't they? There's no reason the target machine would have those.

Certainly.

There's no -march=native for visual studio

I'm no Visual Studio user, but I think I've heard that -march=nativeish is the default behavior.

svenevs · 2018-09-10T18:11:04Z

Ok I proposed #2426 in a way that the changes to the file being discussed here can be blown away (kill the last commit) in favor of something less complex.

Yeah I choose to pretend -mtune does not exist xD

I will test and see what's going on with Windows. Apparently it used to be /arch:SSE2 for a long time as the default, but then I've found other things mentioning that /arch:SSE2 is invalid on a 64bit Visual Studio build (as indicated by the linked docs in the PR, 64bit only allows /arch:AVX and /arch:AVX2).

Maybe the best approach here is to have @SergioRAgostinho finish of this PR with the elimination of ancient checks #2416 (comment)

I could then open a different PR with just the first two commits there so that Intel officially works in the next release, and more time can be spent deciding wtf is actually going on with these darn flags. I'm stepping away for this for at least 24 hours because I'm starting to get too turned around. I will test Windows tomorrow. Sorry for stream-of-consciousness-posting here.......I'm completely frazzled

taketwo · 2018-09-10T18:31:56Z

Thanks for all the effort. I feel like merging this and then Intel parts of your PR would be the best plan for this release.

After that we can think about all the questions that have popped up and decide on the plan. Your PCL_SSE_COMPILER_ARCH_FLAG function will be a great starting point, maybe we won't need to do much more. (By the way, we may check how OpenCV deals with architecture issues, I expect them to have an advanced state-of-the-art CMake script.)

svenevs · 2018-09-10T18:46:26Z

Sounds good. I'll open a new PR with those two commits after this merges 👍

I'll take a closer look at theirs, it's scary. But I really need to solve this once and for all for my own library so PCL will benefit too (I stole a lot of code from here so I'm paying down my debts lol).

taketwo · 2018-09-10T18:50:27Z

I'll take a closer look at theirs, it's scary

Some of their stuff may be too advanced (for example, we don't have runtime dispatching). But we can certainly get inspired and borrow some functions/snippets.

I'm paying down my debts lol

Very welcome :)

SergioRAgostinho · 2018-09-10T21:23:41Z

In addition to fixing intel compilation, I also added AVX and AVX2 checks. Posting so you know it's inbound, I'll remove the GCC 4.2 check in that PR. I'm just waiting to check on all the compilers again to make sure it's legit (intel's compiler is actually really slow...).

It feels like this one was superseeded by #2426 correct?

taketwo · 2018-09-10T21:25:29Z

No:

Thanks for all the effort. I feel like merging this and then Intel parts of your PR would be the best plan for this release.

SergioRAgostinho · 2018-09-10T21:26:40Z

Ok. Is there anything preventing this one from being merged?

taketwo · 2018-09-10T21:32:39Z

Yes:

+1 I'll modify it tonight.

during cross compilation.

SergioRAgostinho · 2018-09-10T21:42:36Z

That should be it I believe. Travis seems to have died.

taketwo · 2018-09-10T23:56:09Z

Travis is alright, but GitHub does not show it for some reason.

svenevs · 2018-09-11T05:53:39Z

@SergioRAgostinho sorry for adding all the noise / making it confusing...

@taketwo I cherry-picked the two fixes needed for Intel unrelated to vectorization stuff in #2432 .

SergioRAgostinho added the module: cmake label Sep 6, 2018

SergioRAgostinho added this to the pcl-1.9.0 milestone Sep 6, 2018

SergioRAgostinho mentioned this pull request Sep 7, 2018

Fix SSE flags for Clang / AppleClang and Cross Compile #2256

Closed

SergioRAgostinho force-pushed the clang-native branch from 6fbe8a6 to 54b90da Compare September 9, 2018 12:02

Export -march=native for Clang and prevent it from being included

c71d2ce

during cross compilation.

SergioRAgostinho force-pushed the clang-native branch from 54b90da to c71d2ce Compare September 10, 2018 21:40

taketwo merged commit 88a7adc into PointCloudLibrary:master Sep 10, 2018

SergioRAgostinho deleted the clang-native branch September 11, 2018 10:06

SergioRAgostinho mentioned this pull request Sep 25, 2018

Change Log generation tool. Automates change log generation. #2396

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export `-march=native` for Clang and prevent it from being included during cross compilation. #2416

Export `-march=native` for Clang and prevent it from being included during cross compilation. #2416

SergioRAgostinho commented Sep 6, 2018

svenevs commented Sep 9, 2018

SergioRAgostinho commented Sep 9, 2018 •

edited

Loading

svenevs commented Sep 10, 2018

taketwo commented Sep 10, 2018

SergioRAgostinho commented Sep 10, 2018

svenevs commented Sep 10, 2018

taketwo commented Sep 10, 2018

svenevs commented Sep 10, 2018

svenevs commented Sep 10, 2018

taketwo commented Sep 10, 2018

svenevs commented Sep 10, 2018

taketwo commented Sep 10, 2018

svenevs commented Sep 10, 2018

taketwo commented Sep 10, 2018

SergioRAgostinho commented Sep 10, 2018

taketwo commented Sep 10, 2018

SergioRAgostinho commented Sep 10, 2018

taketwo commented Sep 10, 2018

SergioRAgostinho commented Sep 10, 2018

taketwo commented Sep 10, 2018

svenevs commented Sep 11, 2018

Export -march=native for Clang and prevent it from being included during cross compilation. #2416

Export -march=native for Clang and prevent it from being included during cross compilation. #2416

Conversation

SergioRAgostinho commented Sep 6, 2018

svenevs commented Sep 9, 2018

SergioRAgostinho commented Sep 9, 2018 • edited Loading

svenevs commented Sep 10, 2018

Building PCL Itself

Using PCL From CMake

taketwo commented Sep 10, 2018

SergioRAgostinho commented Sep 10, 2018

svenevs commented Sep 10, 2018

taketwo commented Sep 10, 2018

svenevs commented Sep 10, 2018

svenevs commented Sep 10, 2018

taketwo commented Sep 10, 2018

svenevs commented Sep 10, 2018

taketwo commented Sep 10, 2018

svenevs commented Sep 10, 2018

taketwo commented Sep 10, 2018

SergioRAgostinho commented Sep 10, 2018

taketwo commented Sep 10, 2018

SergioRAgostinho commented Sep 10, 2018

taketwo commented Sep 10, 2018

SergioRAgostinho commented Sep 10, 2018

taketwo commented Sep 10, 2018

svenevs commented Sep 11, 2018

Export `-march=native` for Clang and prevent it from being included during cross compilation. #2416

Export `-march=native` for Clang and prevent it from being included during cross compilation. #2416

SergioRAgostinho commented Sep 9, 2018 •

edited

Loading