Segfault on TF tutorials #77

mirh · 2017-12-17T11:05:34Z

ComputeCpp Info (CE 0.5.0)
GLIBC version: 2.26
GLIBCXX: 20160609
This version of libstdc++ is supported.

Device Info:
Device 0:

  Device is supported                     : UNTESTED - Untested OS
  CL_DEVICE_NAME                          : Loveland
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 1800.11
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 
Device 1:

  Device is supported                     : UNTESTED - Untested OS
  CL_DEVICE_NAME                          : AMD E-350 Processor
  CL_DEVICE_VENDOR                        : AuthenticAMD
  CL_DRIVER_VERSION                       : 1800.11 (sse2)
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_CPU

Gdb backtrace with MNIST

#0  0x00007f91d284712d in ?? () from /usr/lib/libamdocl12cl64.so
#1  0x00007f91d1e67eb9 in ?? () from /usr/lib/libamdocl12cl64.so
#2  0x00007f91d1fe6e1d in ?? () from /usr/lib/libamdocl12cl64.so
#3  0x00007f91d1e635be in ?? () from /usr/lib/libamdocl12cl64.so
#4  0x00007f91d283978f in ?? () from /usr/lib/libamdocl12cl64.so
#5  0x00007f91d2839b83 in ?? () from /usr/lib/libamdocl12cl64.so
#6  0x00007f91d2839e3f in ?? () from /usr/lib/libamdocl12cl64.so
#7  0x00007f91d2839f7c in ?? () from /usr/lib/libamdocl12cl64.so
#8  0x00007f91d194fb0a in ?? () from /usr/lib/libamdocl12cl64.so
#9  0x00007f91d194fdb0 in ?? () from /usr/lib/libamdocl12cl64.so
#10 0x00007f91d195c60a in ?? () from /usr/lib/libamdocl12cl64.so
#11 0x00007f91d195e71c in ?? () from /usr/lib/libamdocl12cl64.so
#12 0x00007f91b52578c9 in aclCompile () from /usr/lib/libamdocl64.so
#13 0x00007f91b495a0c5 in ?? () from /usr/lib/libamdocl64.so
#14 0x00007f91b497e3dc in ?? () from /usr/lib/libamdocl64.so
#15 0x00007f91b492802f in ?? () from /usr/lib/libamdocl64.so
#16 0x00007f91b4938120 in ?? () from /usr/lib/libamdocl64.so
#17 0x00007f91b49190e0 in clBuildProgram () from /usr/lib/libamdocl64.so
#18 0x00007f91dc5b465b in clBuildProgram () from /usr/lib/libOpenCL.so.1
#19 0x00007f91dcb9af00 in cl::sycl::detail::program::build_current_program(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) ()
   from /opt/ComputeCpp-CE-0.5.0-Ubuntu-16.04-64bit/lib/libComputeCpp.so
#20 0x00007f91dcb9b26e in cl::sycl::detail::program::build(unsigned char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) ()
   from /opt/ComputeCpp-CE-0.5.0-Ubuntu-16.04-64bit/lib/libComputeCpp.so
#21 0x00007f91dcb9b4dd in cl::sycl::detail::program::create_program_for_binary(unsigned char const*, int, std::shared_ptr<cl::sycl::detail::context>, bool) ()
   from /opt/ComputeCpp-CE-0.5.0-Ubuntu-16.04-64bit/lib/libComputeCpp.so
#22 0x00007f91dcb1da6e in cl::sycl::program::create_program_for_kernel_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned char const*, int, char const* const*, std::shared_ptr<cl::sycl::detail::context>, bool) ()
   from /opt/ComputeCpp-CE-0.5.0-Ubuntu-16.04-64bit/lib/libComputeCpp.so
#23 0x00007f91e162cdf7 in cl::sycl::program cl::sycl::program::create_program_for_kernel<tensorflow::functor::FillRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> > >(cl::sycl::context) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#24 0x00007f91e162c7fc in void cl::sycl::handler::parallel_for_impl<tensorflow::functor::FillRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> >, tensorflow::functor::FillPhiloxRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>, true> >(cl::sycl::detail::nd_range_base const&, tensorflow::functor::FillPhiloxRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>, true> const&) () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so

The text was updated successfully, but these errors were encountered:

DuncanMcBain · 2017-12-17T17:58:44Z

Hi @mirh, thanks for the report. Unfortunately without more information it's hard to say what's going wrong, essentially all this tells us is that when AMD's OpenCL implementation is trying to build the program it is crashing. I know that at least one version of the driver crashes when you try to use doubles on a device that does not support them, but I am sure there could be other reasons. What does the device code look like in this example?

mirh · 2017-12-17T18:04:32Z

It's the example you recommend here, nothing more.

znmeb · 2017-12-17T20:41:34Z

@mirh what's your operating system and OpenCL library? On Linux, the Mesa OpenCL implementation "Clover" is quite buggy - I can only get OpenCL to work with the proprietary AMD code.

mirh · 2017-12-17T21:34:05Z

Yeah, I absolutely know.
I'm using fglrx 15.9 here (which is the last one available for my gpu - lucky you).

mirh · 2017-12-21T18:20:04Z

Great news, everyone!
Seems like I'm getting the very same "ending" crash trace for yours gaussian-blur test (and only that, so I guess it's not that bad).
Could that help?

DuncanMcBain · 2017-12-21T18:27:00Z

Ending? Could you elaborate, please?

mirh · 2017-12-21T18:48:26Z

I meant, the instructions "on the top" of the stack trace where the same.
(ie aclCompile followed by clbuildprogram)

DuncanMcBain · 2018-01-05T14:00:36Z

Hi @mirh, sorry it's taken me a while to get back to you. Getting back up to speed, you get build failures in Tensorflow, and likewise in the gaussian blur sample? Some small changes were made over the holidays, could you maybe try them?

mirh · 2018-01-05T14:50:17Z

Yes, exactly. No difference though.

DuncanMcBain · 2018-01-05T15:29:10Z

Hmmmm... Depending on how much time you want to spend on this, you could start removing code from the kernel in the Gaussian Blur sample, to see if it ever passes the compilation... that would at least let us know what sort of construct is causing the failures. I can understand if you'd rather not do that though, as it is something of a slog.

mirh · 2018-01-05T16:34:18Z

Commenting this, then replacing elem in L129 with 10 made the program execute (a blank image in the end eventually, but still)

DuncanMcBain · 2018-01-05T20:26:30Z

So it doesn't like the maths functions? Huh, I wonder if we've changed something there...

So I found some AMD hardware and tried the Gaussian myself. It failed. Turns out, when I made changes to it last, I added this:

const auto pi = std::atan(1) * 4;

to calculate the value of pi as a const. The return type of the function is... double, this hardware doesn't support double as I suspect is also true of yours, the driver simply crashes when encountering a double - by changing it to const float, the code compiles and the test passes. I'll push that fix.

It's possible that somewhere in Tensorflow, we're using doubles - I don't really know the code at all, and I thought we were quite careful about that sort of thing. I can't look at this more tonight but might be able to look through the kernels on Monday to see if it's the same thing happening in there! Thanks for your help tracking this down.

(If you're interested, you can use the "extract-ir" script in the SDK to see the SPIR code, which is where I tracked down the use of double).

DuncanMcBain · 2018-01-05T20:34:23Z

OK, pushed here: 19be0cf

mirh · 2018-01-05T21:37:46Z

Cool! Now all ctests work!
Then, lack of double precision is just a matter of my/ours ancient hardware (and I think just straight crashing instead of reporting an error or something should even be a problem in the driver)
But you shouldn't need to have to change programs I guess?
Either because binary64 is found not to bring any actually meaningful improvement and should just be 'converted' as binary32, or because computecpp stops (or at least warn!) you on compilation.

DuncanMcBain · 2018-01-05T22:02:40Z

As I mentioned, unfortunately on this older AMD driver that we have to use it straight up crashes when you use double, anywhere in the code, even if it's not in the kernel you're trying to run. There's no way we can recover from that unfortunately 😄

It might be useful to warn at the compiler level but honestly I think it'd be far too noisy. Lots of hardware supports it, too, and lots of drivers don't segfault but instead report an error!

mirh · 2018-01-05T22:23:55Z

You are right.
Maybe you should just make a wiki page, with every error symptom "linked" to its reported cause.
EDIT: for as much this might give some other ideas

In other news
Back to our original problem about TF, I don't *think* that code would be supposed to have doubles at all.
Soo.. I dunno, guess like you could check yourself its samples? 😃

lukeiwanski · 2018-01-17T11:03:38Z

Doubles and Halfs should be optional in TF.
@guoyejun started process of making that happen here tensorflow/tensorflow#11545 as beignet exposed this problem

However it seems like mentioned PR is stuck.

@jwlawson and I are introducing config option that enables / disables half and double here: jwlawson/tensorflow@5ec5964

It should be in https://github.com/lukeiwanski/tensorflow/tree/dev/amd_gpu soon-ish.

After that happens, someone needs to go through all registered Ops and use TF_CALL_SYCL_NUMBER_TYPES macro for registration.

@mirh would you like to give it a go? :)

mirh · 2018-01-17T13:42:22Z

Guess like.
But given all the time I need for compile, I'd first hope some degree of confidence in the thing working 🙃

DuncanMcBain · 2018-01-17T14:30:36Z

In the end, while I intended to look at the sycl files to see if double turned up in them, I didn't really have time. I still might be able to investigate but certainly not today or tomorrow.

mirh · 2018-02-01T03:25:06Z

Aaaand fixed. Thank you all.
(I mean still segfaults, but it's a totally different deal and stacktrace)

mirh · 2018-05-01T20:07:58Z

#17 0x00007fffb49190e0 in clBuildProgram () from /usr/lib/libamdocl64.so
#18 0x00007fffe0e8382f in clBuildProgram () from /usr/lib/libOpenCL.so.1
#19 0x00007fffe15fdc28 in cl::sycl::detail::program::build_current_program(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) () from /opt/ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#20 0x00007fffe15fdf62 in cl::sycl::detail::program::build(unsigned char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) () from /opt/ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#21 0x00007fffe157b326 in cl::sycl::detail::context::create_program_for_binary(std::shared_ptr<cl::sycl::detail::context> const&, unsigned char const*, int, bool) () from /opt/ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#22 0x00007fffe157e12f in cl::sycl::program::create_program_for_kernel_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned char const*, int, char const* const*, std::shared_ptr<cl::sycl::detail::context>, bool) ()
   from /opt/ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#23 0x00007fffe85c3ea4 in cl::sycl::program cl::sycl::program::create_program_for_kernel<Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> > > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true> >(cl::sycl::context) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#24 0x00007fffe85c3670 in void cl::sycl::handler::parallel_for_impl<Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> > > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true>, Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> > > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true> >(cl::sycl::detail::nd_range_base const&, Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> > > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true> const&) () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#25 0x00007fffe85c25bf in Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const
    () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#26 0x00007fffe85c237f in cl::sycl::event cl::sycl::detail::command_group::submit_handler<Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}>(Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}, std::shared_ptr<cl::sycl::detail::queue>const&, cl::sycl::detail::standard_handler_tag) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#27 0x00007fffe85c219f in cl::sycl::event cl::sycl::queue::submit<Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}>(Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#28 0x00007fffe85c2021 in Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#29 0x00007fffe85c1e50 in std::_Function_handler<void (tensorflow::OpKernelContext*, tensorflow::Tensor const&, tensorflow::Tensor*), tensorflow::GetSyclCastFromUint8(tensorflow::DataType)::$_20>::_M_invoke(std::_Any_data const&, tensorflow::OpKernelContext*&&, tensorflow::Tensor const&, tensorflow::Tensor*&&) () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#30 0x00007fffe83e66bc in tensorflow::CastOpBase::Compute(tensorflow::OpKernelContext*) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#31 0x00007fffe229d06d in tensorflow::(anonymous namespace)::ExecutorState::Process(tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, long long) () from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#32 0x00007fffe229dc58 in std::_Function_handler<void (), tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady(tensorflow::gtl::InlinedVector<tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, 8> const&, tensorflow::(anonymous namespace)::ExecutorState::TaggedNodeReadyQueue*)::$_1>::_M_invoke(std::_Any_data const&) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#33 0x00007fffe22f32f2 in Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so

Aaaand.. It's here again (computecpp 0.7.0, tensorflow 1.8). ~~I'll try to downgrade some stuff and see what comes out.~~ 0.6.1 doesn't compile with that.

mirh · 2018-05-04T21:23:58Z

Ok so.. lukeiwanski/tensorflow@0fc77bd with 0.6.1 still segfaults...
While an older build I had on ~~lukeiwanski/tensorflow@9b6db88~~ is fine. Will try to recompile this, to see if the magic is on some code on my side changing or not.

mirh · 2018-05-07T18:30:47Z

Well, FML.
lukeiwanski/tensorflow@9b6db88 crash in libamdocl64 (I don't have stack trace, but I guess not very dissimilarly from the reported issue).
And so does the slightly older lukeiwanski/tensorflow@d7bc636.

But.. lukeiwanski/tensorflow@591d829 actually runs? Without even lukeiwanski/tensorflow#205 ?!
I mean, it fails (not segfault!) after some good amount of minutes complaining about "Tensor had NaN values", but that seems a total cakewalk to take care of then.

I'll now try to bisect the last handful of commits, but I cannot understand what.. Build dependencies must have changed in this month or so to change code behavior? Same commits, different results.
EDIT: mhh, thinking better, *I* did one actual change myself, this time I have been disabling all possible goddamn configure switches, to speed up compilation.
Also going to check that I guess..

DuncanMcBain · 2018-05-08T13:36:11Z

Hi @mirh,
One of my colleagues had a good idea for what this issue might be - historically, some OpenCL implementations had issues with kernel names that were really long, like the ones found in Eigen. To that end, I think you could try checking out the tip of dev/amd_gpu and applying the following patch:

diff --git a/third_party/sycl/crosstool/CROSSTOOL.tpl b/third_party/sycl/crosstool/CROSSTOOL.tpl
index 3078b5b534..c62b5b93e1 100755
--- a/third_party/sycl/crosstool/CROSSTOOL.tpl
+++ b/third_party/sycl/crosstool/CROSSTOOL.tpl
@@ -175,6 +175,7 @@ toolchain {
   cxx_flag: "-DEIGEN_HAS_CXX11_MATH=1"
   cxx_flag: "-Wno-unused-variable"
   cxx_flag: "-Wno-unused-const-variable"
+  cxx_flag: "-sycl-compress-name"
 
   unfiltered_cxx_flag: "-Wno-builtin-macro-redefined"
   unfiltered_cxx_flag: "-D__DATE__=\"redacted\""

This patch will make the compiler output kernel names that are hashed versions of the "true" kernel name, which means they will have fixed length. Looking at the repo at the commits you mention, this does seem to be consistently one of the changes. Thanks so much for persevering with this!

mirh · 2018-05-13T09:25:04Z

I went further on with bissecting, and indeed lukeiwanski/tensorflow@3cc8566 was making the deal.
On the other hand, when I tried the tip of dev/amd_gpu (plus that line) I still was getting crashes.

So.. progressing for whatever else might be.

EDIT: fuck, I just noticed I put it in the *wrong* lines (local abi, instead of cross_target)

mirh · 2018-05-14T11:53:08Z

Well, I just put it on both, and it still didn't work (with the tip)
So I guess I'll try to bisect very slowly wherever the hell that might have stopped to work in turn?

DuncanMcBain · 2018-05-14T12:27:10Z

If you have the time, that would be very useful! It's a shame that didn't work, I really thought we were onto something there...

mirh · 2018-05-14T16:47:27Z

I'm still doing my testings, but are you sure that option goes as a cxx_flag, rather than compiler_flag or linker_flag something?

DuncanMcBain · 2018-05-14T20:10:33Z

cxx_flags are passed to the compiler when compiling C++ source flags (I say this because the compiler could be, for example, gcc -x c++, in some cases). It's definitely not a linker flag, In fact, that particular flag is specific to compute++, so no other compiler will be able to understand what it might mean.

I'd be surprised if there were issues, because lots of flags get added in the same way by bazel, and if they don't get added you will have many more compile errors (or sometimes runtime errors - std::err messages about "missing kernels" would indicate that this had happened). Technically there's a way to check - but it's a little bit involved (a nontrivial change, with some thought attached I'm afraid).

If you're willing to keep bisecting to find the time when it broke, that might be our new best chance!

mirh · 2018-05-17T22:11:18Z

Ok so, updates..
For the moment, after quite some WTFing I found it should be something between lukeiwanski/tensorflow@8410038 and lukeiwanski/tensorflow@8fd87cd making the holy -sycl-compress-name flag stop to work

mirh · 2018-05-21T05:21:03Z

lukeiwanski/tensorflow@f850c60 is the breaking commit

DuncanMcBain · 2018-05-21T13:38:56Z

Oh - maybe it's the half-type code. @lukeiwanski halfs are a configurable option, right? Maybe we can turn it off...

lukeiwanski · 2018-05-21T14:51:01Z

hmm that's good point - disabling halfs might not be enough.. let me try something.

lukeiwanski · 2018-05-23T11:25:00Z

@Rbiessy could you take a look at this? I believe we need to add check around the double and half cases based on the ./configure options

Rbiessy · 2018-05-23T15:35:16Z

That should do the trick: lukeiwanski/tensorflow#245
So now we are not compiling kernels in CastOp with half or with double if they are disabled. There may be other places where we do a similar mistake though.

lukeiwanski · 2018-05-23T16:14:40Z

@mirh can you give it a spin at lukeiwanski/tensorflow@1e0dd42 ?

mirh · 2018-05-24T11:24:03Z

It did it! (together with --sycl-compress-name)

DuncanMcBain · 2018-05-24T12:56:20Z

Haha, finally! Well done everyone & @Rbiessy, and thanks for sticking with it @mirh!

mirh · 2018-05-25T08:59:24Z

together with --sycl-compress-name

So.. Are you like doing anything for that?

DuncanMcBain · 2018-05-25T10:56:46Z

What do you mean? This option makes the compiler hash the name of the kernel and output that instead because of some buggy drivers, that's all. It makes debugging harder though so we tend not to enable it by default.

mirh · 2018-05-25T13:52:52Z

Makes sense.
Could it be auto enabled for, I dunno, say only amd's CL_DRIVER_VERSION below 2000?

DuncanMcBain · 2018-05-25T14:39:49Z

Since it's a compiler flag, there's no way for the ComputeCpp runtime to identify which driver it is running on and change the flag accordingly.

mirh · 2018-05-25T16:54:05Z

Lol, right.
Could you at least have a FAQ for this issue?
I don't know, like "python segmentation fault in libamdocl64.so"?

DuncanMcBain · 2018-05-25T18:41:39Z

That's a good idea. We'll try to add to the FAQ on our website - this issue is more likely to affect TensorFlow code, because of the frankly huge kernel names, but could strike anywhere.

mirh · 2018-06-07T19:43:00Z

--cxxopt="-sycl-compress-name" on command line also does it.

DuncanMcBain self-assigned this Dec 18, 2017

mirh mentioned this issue Dec 20, 2017

Warn user about blacklisted device (allow ovverride too?) #78

Closed

mirh mentioned this issue Jan 21, 2018

remove Catalyst support for GCN(RadeonSI supported) hardware in mhwd manjaro/packages-extra#141

Merged

mirh closed this as completed Feb 1, 2018

mirh mentioned this issue Feb 1, 2018

Segfault in FilterSupportedDevices lukeiwanski/tensorflow#205

Closed

mirh reopened this May 1, 2018

torokati44 mentioned this issue May 16, 2018

Unhandled exception, "Failed to build program" error in TensorFlow sample #116

Closed

mirh closed this as completed May 24, 2018

mirh mentioned this issue Jun 27, 2018

Segfault on mnist tutorial lukeiwanski/tensorflow#249

Closed

Segfault on TF tutorials #77

Segfault on TF tutorials #77

Comments

mirh commented Dec 17, 2017

DuncanMcBain commented Dec 17, 2017

mirh commented Dec 17, 2017

znmeb commented Dec 17, 2017

mirh commented Dec 17, 2017

mirh commented Dec 21, 2017 • edited

DuncanMcBain commented Dec 21, 2017

mirh commented Dec 21, 2017 • edited

DuncanMcBain commented Jan 5, 2018

mirh commented Jan 5, 2018

DuncanMcBain commented Jan 5, 2018

mirh commented Jan 5, 2018

DuncanMcBain commented Jan 5, 2018

DuncanMcBain commented Jan 5, 2018

mirh commented Jan 5, 2018

DuncanMcBain commented Jan 5, 2018

mirh commented Jan 5, 2018 • edited

lukeiwanski commented Jan 17, 2018

mirh commented Jan 17, 2018

DuncanMcBain commented Jan 17, 2018

mirh commented Feb 1, 2018

mirh commented May 1, 2018 • edited

mirh commented May 4, 2018 • edited

mirh commented May 7, 2018 • edited

DuncanMcBain commented May 8, 2018

mirh commented May 13, 2018 • edited

mirh commented May 14, 2018

DuncanMcBain commented May 14, 2018

mirh commented May 14, 2018

DuncanMcBain commented May 14, 2018

mirh commented May 17, 2018

mirh commented May 21, 2018

DuncanMcBain commented May 21, 2018

lukeiwanski commented May 21, 2018

lukeiwanski commented May 23, 2018

Rbiessy commented May 23, 2018

lukeiwanski commented May 23, 2018

mirh commented May 24, 2018 • edited

DuncanMcBain commented May 24, 2018

mirh commented May 25, 2018

DuncanMcBain commented May 25, 2018

mirh commented May 25, 2018 • edited

DuncanMcBain commented May 25, 2018

mirh commented May 25, 2018

DuncanMcBain commented May 25, 2018

mirh commented Jun 7, 2018

mirh commented Dec 21, 2017 •

edited

mirh commented Dec 21, 2017 •

edited

mirh commented Jan 5, 2018 •

edited

mirh commented May 1, 2018 •

edited

mirh commented May 4, 2018 •

edited

mirh commented May 7, 2018 •

edited

mirh commented May 13, 2018 •

edited

mirh commented May 24, 2018 •

edited

mirh commented May 25, 2018 •

edited