Skip to content
This repository has been archived by the owner on Aug 11, 2023. It is now read-only.

Segfault on TF tutorials #77

Closed
mirh opened this issue Dec 17, 2017 · 45 comments
Closed

Segfault on TF tutorials #77

mirh opened this issue Dec 17, 2017 · 45 comments
Assignees

Comments

@mirh
Copy link

mirh commented Dec 17, 2017

ComputeCpp Info (CE 0.5.0)
GLIBC version: 2.26
GLIBCXX: 20160609
This version of libstdc++ is supported.

Device Info:
Device 0:

  Device is supported                     : UNTESTED - Untested OS
  CL_DEVICE_NAME                          : Loveland
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 1800.11
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 
Device 1:

  Device is supported                     : UNTESTED - Untested OS
  CL_DEVICE_NAME                          : AMD E-350 Processor
  CL_DEVICE_VENDOR                        : AuthenticAMD
  CL_DRIVER_VERSION                       : 1800.11 (sse2)
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_CPU 

Gdb backtrace with MNIST

#0  0x00007f91d284712d in ?? () from /usr/lib/libamdocl12cl64.so
#1  0x00007f91d1e67eb9 in ?? () from /usr/lib/libamdocl12cl64.so
#2  0x00007f91d1fe6e1d in ?? () from /usr/lib/libamdocl12cl64.so
#3  0x00007f91d1e635be in ?? () from /usr/lib/libamdocl12cl64.so
#4  0x00007f91d283978f in ?? () from /usr/lib/libamdocl12cl64.so
#5  0x00007f91d2839b83 in ?? () from /usr/lib/libamdocl12cl64.so
#6  0x00007f91d2839e3f in ?? () from /usr/lib/libamdocl12cl64.so
#7  0x00007f91d2839f7c in ?? () from /usr/lib/libamdocl12cl64.so
#8  0x00007f91d194fb0a in ?? () from /usr/lib/libamdocl12cl64.so
#9  0x00007f91d194fdb0 in ?? () from /usr/lib/libamdocl12cl64.so
#10 0x00007f91d195c60a in ?? () from /usr/lib/libamdocl12cl64.so
#11 0x00007f91d195e71c in ?? () from /usr/lib/libamdocl12cl64.so
#12 0x00007f91b52578c9 in aclCompile () from /usr/lib/libamdocl64.so
#13 0x00007f91b495a0c5 in ?? () from /usr/lib/libamdocl64.so
#14 0x00007f91b497e3dc in ?? () from /usr/lib/libamdocl64.so
#15 0x00007f91b492802f in ?? () from /usr/lib/libamdocl64.so
#16 0x00007f91b4938120 in ?? () from /usr/lib/libamdocl64.so
#17 0x00007f91b49190e0 in clBuildProgram () from /usr/lib/libamdocl64.so
#18 0x00007f91dc5b465b in clBuildProgram () from /usr/lib/libOpenCL.so.1
#19 0x00007f91dcb9af00 in cl::sycl::detail::program::build_current_program(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) ()
   from /opt/ComputeCpp-CE-0.5.0-Ubuntu-16.04-64bit/lib/libComputeCpp.so
#20 0x00007f91dcb9b26e in cl::sycl::detail::program::build(unsigned char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) ()
   from /opt/ComputeCpp-CE-0.5.0-Ubuntu-16.04-64bit/lib/libComputeCpp.so
#21 0x00007f91dcb9b4dd in cl::sycl::detail::program::create_program_for_binary(unsigned char const*, int, std::shared_ptr<cl::sycl::detail::context>, bool) ()
   from /opt/ComputeCpp-CE-0.5.0-Ubuntu-16.04-64bit/lib/libComputeCpp.so
#22 0x00007f91dcb1da6e in cl::sycl::program::create_program_for_kernel_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned char const*, int, char const* const*, std::shared_ptr<cl::sycl::detail::context>, bool) ()
   from /opt/ComputeCpp-CE-0.5.0-Ubuntu-16.04-64bit/lib/libComputeCpp.so
#23 0x00007f91e162cdf7 in cl::sycl::program cl::sycl::program::create_program_for_kernel<tensorflow::functor::FillRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> > >(cl::sycl::context) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#24 0x00007f91e162c7fc in void cl::sycl::handler::parallel_for_impl<tensorflow::functor::FillRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> >, tensorflow::functor::FillPhiloxRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>, true> >(cl::sycl::detail::nd_range_base const&, tensorflow::functor::FillPhiloxRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>, true> const&) () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
@DuncanMcBain
Copy link
Member

Hi @mirh, thanks for the report. Unfortunately without more information it's hard to say what's going wrong, essentially all this tells us is that when AMD's OpenCL implementation is trying to build the program it is crashing. I know that at least one version of the driver crashes when you try to use doubles on a device that does not support them, but I am sure there could be other reasons. What does the device code look like in this example?

@mirh
Copy link
Author

mirh commented Dec 17, 2017

It's the example you recommend here, nothing more.

@znmeb
Copy link

znmeb commented Dec 17, 2017

@mirh what's your operating system and OpenCL library? On Linux, the Mesa OpenCL implementation "Clover" is quite buggy - I can only get OpenCL to work with the proprietary AMD code.

@mirh
Copy link
Author

mirh commented Dec 17, 2017

Yeah, I absolutely know.
I'm using fglrx 15.9 here (which is the last one available for my gpu - lucky you).

@mirh
Copy link
Author

mirh commented Dec 21, 2017

Great news, everyone!
Seems like I'm getting the very same "ending" crash trace for yours gaussian-blur test (and only that, so I guess it's not that bad).
Could that help?

@DuncanMcBain
Copy link
Member

Ending? Could you elaborate, please?

@mirh
Copy link
Author

mirh commented Dec 21, 2017

I meant, the instructions "on the top" of the stack trace where the same.
(ie aclCompile followed by clbuildprogram)

@DuncanMcBain
Copy link
Member

Hi @mirh, sorry it's taken me a while to get back to you. Getting back up to speed, you get build failures in Tensorflow, and likewise in the gaussian blur sample? Some small changes were made over the holidays, could you maybe try them?

@mirh
Copy link
Author

mirh commented Jan 5, 2018

Yes, exactly. No difference though.

@DuncanMcBain
Copy link
Member

Hmmmm... Depending on how much time you want to spend on this, you could start removing code from the kernel in the Gaussian Blur sample, to see if it ever passes the compilation... that would at least let us know what sort of construct is causing the failures. I can understand if you'd rather not do that though, as it is something of a slog.

@mirh
Copy link
Author

mirh commented Jan 5, 2018

Commenting this, then replacing elem in L129 with 10 made the program execute (a blank image in the end eventually, but still)

@DuncanMcBain
Copy link
Member

So it doesn't like the maths functions? Huh, I wonder if we've changed something there...

So I found some AMD hardware and tried the Gaussian myself. It failed. Turns out, when I made changes to it last, I added this:

const auto pi = std::atan(1) * 4;

to calculate the value of pi as a const. The return type of the function is... double, this hardware doesn't support double as I suspect is also true of yours, the driver simply crashes when encountering a double - by changing it to const float, the code compiles and the test passes. I'll push that fix.

It's possible that somewhere in Tensorflow, we're using doubles - I don't really know the code at all, and I thought we were quite careful about that sort of thing. I can't look at this more tonight but might be able to look through the kernels on Monday to see if it's the same thing happening in there! Thanks for your help tracking this down.

(If you're interested, you can use the "extract-ir" script in the SDK to see the SPIR code, which is where I tracked down the use of double).

@DuncanMcBain
Copy link
Member

OK, pushed here: 19be0cf

@mirh
Copy link
Author

mirh commented Jan 5, 2018

Cool! Now all ctests work!
Then, lack of double precision is just a matter of my/ours ancient hardware (and I think just straight crashing instead of reporting an error or something should even be a problem in the driver)
But you shouldn't need to have to change programs I guess?
Either because binary64 is found not to bring any actually meaningful improvement and should just be 'converted' as binary32, or because computecpp stops (or at least warn!) you on compilation.

@DuncanMcBain
Copy link
Member

As I mentioned, unfortunately on this older AMD driver that we have to use it straight up crashes when you use double, anywhere in the code, even if it's not in the kernel you're trying to run. There's no way we can recover from that unfortunately 😄

It might be useful to warn at the compiler level but honestly I think it'd be far too noisy. Lots of hardware supports it, too, and lots of drivers don't segfault but instead report an error!

@mirh
Copy link
Author

mirh commented Jan 5, 2018

You are right.
Maybe you should just make a wiki page, with every error symptom "linked" to its reported cause.
EDIT: for as much this might give some other ideas

In other news
Back to our original problem about TF, I don't *think* that code would be supposed to have doubles at all.
Soo.. I dunno, guess like you could check yourself its samples? 😃

@lukeiwanski
Copy link

Doubles and Halfs should be optional in TF.
@guoyejun started process of making that happen here tensorflow/tensorflow#11545 as beignet exposed this problem

However it seems like mentioned PR is stuck.

@jwlawson and I are introducing config option that enables / disables half and double here: jwlawson/tensorflow@5ec5964

It should be in https://github.com/lukeiwanski/tensorflow/tree/dev/amd_gpu soon-ish.

After that happens, someone needs to go through all registered Ops and use TF_CALL_SYCL_NUMBER_TYPES macro for registration.

@mirh would you like to give it a go? :)

@mirh
Copy link
Author

mirh commented Jan 17, 2018

Guess like.
But given all the time I need for compile, I'd first hope some degree of confidence in the thing working 🙃

@DuncanMcBain
Copy link
Member

In the end, while I intended to look at the sycl files to see if double turned up in them, I didn't really have time. I still might be able to investigate but certainly not today or tomorrow.

@mirh
Copy link
Author

mirh commented Feb 1, 2018

Aaaand fixed. Thank you all.
(I mean still segfaults, but it's a totally different deal and stacktrace)

@mirh
Copy link
Author

mirh commented May 1, 2018

#17 0x00007fffb49190e0 in clBuildProgram () from /usr/lib/libamdocl64.so
#18 0x00007fffe0e8382f in clBuildProgram () from /usr/lib/libOpenCL.so.1
#19 0x00007fffe15fdc28 in cl::sycl::detail::program::build_current_program(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) () from /opt/ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#20 0x00007fffe15fdf62 in cl::sycl::detail::program::build(unsigned char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) () from /opt/ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#21 0x00007fffe157b326 in cl::sycl::detail::context::create_program_for_binary(std::shared_ptr<cl::sycl::detail::context> const&, unsigned char const*, int, bool) () from /opt/ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#22 0x00007fffe157e12f in cl::sycl::program::create_program_for_kernel_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned char const*, int, char const* const*, std::shared_ptr<cl::sycl::detail::context>, bool) ()
   from /opt/ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#23 0x00007fffe85c3ea4 in cl::sycl::program cl::sycl::program::create_program_for_kernel<Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> > > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true> >(cl::sycl::context) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#24 0x00007fffe85c3670 in void cl::sycl::handler::parallel_for_impl<Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> > > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true>, Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> > > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true> >(cl::sycl::detail::nd_range_base const&, Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> > > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true> const&) () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#25 0x00007fffe85c25bf in Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const
    () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#26 0x00007fffe85c237f in cl::sycl::event cl::sycl::detail::command_group::submit_handler<Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}>(Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}, std::shared_ptr<cl::sycl::detail::queue>const&, cl::sycl::detail::standard_handler_tag) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#27 0x00007fffe85c219f in cl::sycl::event cl::sycl::queue::submit<Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}>(Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#28 0x00007fffe85c2021 in Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#29 0x00007fffe85c1e50 in std::_Function_handler<void (tensorflow::OpKernelContext*, tensorflow::Tensor const&, tensorflow::Tensor*), tensorflow::GetSyclCastFromUint8(tensorflow::DataType)::$_20>::_M_invoke(std::_Any_data const&, tensorflow::OpKernelContext*&&, tensorflow::Tensor const&, tensorflow::Tensor*&&) () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#30 0x00007fffe83e66bc in tensorflow::CastOpBase::Compute(tensorflow::OpKernelContext*) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#31 0x00007fffe229d06d in tensorflow::(anonymous namespace)::ExecutorState::Process(tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, long long) () from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#32 0x00007fffe229dc58 in std::_Function_handler<void (), tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady(tensorflow::gtl::InlinedVector<tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, 8> const&, tensorflow::(anonymous namespace)::ExecutorState::TaggedNodeReadyQueue*)::$_1>::_M_invoke(std::_Any_data const&) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#33 0x00007fffe22f32f2 in Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so

Aaaand.. It's here again (computecpp 0.7.0, tensorflow 1.8). I'll try to downgrade some stuff and see what comes out. 0.6.1 doesn't compile with that.

@mirh mirh reopened this May 1, 2018
@mirh
Copy link
Author

mirh commented May 4, 2018

Ok so.. lukeiwanski/tensorflow@0fc77bd with 0.6.1 still segfaults...
While an older build I had on lukeiwanski/tensorflow@9b6db88 is fine. Will try to recompile this, to see if the magic is on some code on my side changing or not.

@mirh
Copy link
Author

mirh commented May 7, 2018

Well, FML.
lukeiwanski/tensorflow@9b6db88 crash in libamdocl64 (I don't have stack trace, but I guess not very dissimilarly from the reported issue).
And so does the slightly older lukeiwanski/tensorflow@d7bc636.

But.. lukeiwanski/tensorflow@591d829 actually runs? Without even lukeiwanski/tensorflow#205 ?!
I mean, it fails (not segfault!) after some good amount of minutes complaining about "Tensor had NaN values", but that seems a total cakewalk to take care of then.

I'll now try to bisect the last handful of commits, but I cannot understand what.. Build dependencies must have changed in this month or so to change code behavior? Same commits, different results.
EDIT: mhh, thinking better, *I* did one actual change myself, this time I have been disabling all possible goddamn configure switches, to speed up compilation.
Also going to check that I guess..

@DuncanMcBain
Copy link
Member

Hi @mirh,
One of my colleagues had a good idea for what this issue might be - historically, some OpenCL implementations had issues with kernel names that were really long, like the ones found in Eigen. To that end, I think you could try checking out the tip of dev/amd_gpu and applying the following patch:

diff --git a/third_party/sycl/crosstool/CROSSTOOL.tpl b/third_party/sycl/crosstool/CROSSTOOL.tpl
index 3078b5b534..c62b5b93e1 100755
--- a/third_party/sycl/crosstool/CROSSTOOL.tpl
+++ b/third_party/sycl/crosstool/CROSSTOOL.tpl
@@ -175,6 +175,7 @@ toolchain {
   cxx_flag: "-DEIGEN_HAS_CXX11_MATH=1"
   cxx_flag: "-Wno-unused-variable"
   cxx_flag: "-Wno-unused-const-variable"
+  cxx_flag: "-sycl-compress-name"
 
   unfiltered_cxx_flag: "-Wno-builtin-macro-redefined"
   unfiltered_cxx_flag: "-D__DATE__=\"redacted\""

This patch will make the compiler output kernel names that are hashed versions of the "true" kernel name, which means they will have fixed length. Looking at the repo at the commits you mention, this does seem to be consistently one of the changes. Thanks so much for persevering with this!

@mirh
Copy link
Author

mirh commented May 13, 2018

I went further on with bissecting, and indeed lukeiwanski/tensorflow@3cc8566 was making the deal.
On the other hand, when I tried the tip of dev/amd_gpu (plus that line) I still was getting crashes.

So.. progressing for whatever else might be.

EDIT: fuck, I just noticed I put it in the *wrong* lines (local abi, instead of cross_target)

@mirh
Copy link
Author

mirh commented May 14, 2018

Well, I just put it on both, and it still didn't work (with the tip)
So I guess I'll try to bisect very slowly wherever the hell that might have stopped to work in turn?

@DuncanMcBain
Copy link
Member

If you have the time, that would be very useful! It's a shame that didn't work, I really thought we were onto something there...

@mirh
Copy link
Author

mirh commented May 14, 2018

I'm still doing my testings, but are you sure that option goes as a cxx_flag, rather than compiler_flag or linker_flag something?

@DuncanMcBain
Copy link
Member

cxx_flags are passed to the compiler when compiling C++ source flags (I say this because the compiler could be, for example, gcc -x c++, in some cases). It's definitely not a linker flag, In fact, that particular flag is specific to compute++, so no other compiler will be able to understand what it might mean.

I'd be surprised if there were issues, because lots of flags get added in the same way by bazel, and if they don't get added you will have many more compile errors (or sometimes runtime errors - std::err messages about "missing kernels" would indicate that this had happened). Technically there's a way to check - but it's a little bit involved (a nontrivial change, with some thought attached I'm afraid).

If you're willing to keep bisecting to find the time when it broke, that might be our new best chance!

@mirh
Copy link
Author

mirh commented May 17, 2018

Ok so, updates..
For the moment, after quite some WTFing I found it should be something between lukeiwanski/tensorflow@8410038 and lukeiwanski/tensorflow@8fd87cd making the holy -sycl-compress-name flag stop to work

@mirh
Copy link
Author

mirh commented May 21, 2018

lukeiwanski/tensorflow@f850c60 is the breaking commit

@DuncanMcBain
Copy link
Member

Oh - maybe it's the half-type code. @lukeiwanski halfs are a configurable option, right? Maybe we can turn it off...

@lukeiwanski
Copy link

hmm that's good point - disabling halfs might not be enough.. let me try something.

@lukeiwanski
Copy link

@Rbiessy could you take a look at this? I believe we need to add check around the double and half cases based on the ./configure options

@Rbiessy
Copy link
Contributor

Rbiessy commented May 23, 2018

That should do the trick: lukeiwanski/tensorflow#245
So now we are not compiling kernels in CastOp with half or with double if they are disabled. There may be other places where we do a similar mistake though.

@lukeiwanski
Copy link

@mirh can you give it a spin at lukeiwanski/tensorflow@1e0dd42 ?

@mirh
Copy link
Author

mirh commented May 24, 2018

It did it! (together with --sycl-compress-name)

@mirh mirh closed this as completed May 24, 2018
@DuncanMcBain
Copy link
Member

Haha, finally! Well done everyone & @Rbiessy, and thanks for sticking with it @mirh!

@mirh
Copy link
Author

mirh commented May 25, 2018

together with --sycl-compress-name

So.. Are you like doing anything for that?

@DuncanMcBain
Copy link
Member

What do you mean? This option makes the compiler hash the name of the kernel and output that instead because of some buggy drivers, that's all. It makes debugging harder though so we tend not to enable it by default.

@mirh
Copy link
Author

mirh commented May 25, 2018

Makes sense.
Could it be auto enabled for, I dunno, say only amd's CL_DRIVER_VERSION below 2000?

@DuncanMcBain
Copy link
Member

Since it's a compiler flag, there's no way for the ComputeCpp runtime to identify which driver it is running on and change the flag accordingly.

@mirh
Copy link
Author

mirh commented May 25, 2018

Lol, right.
Could you at least have a FAQ for this issue?
I don't know, like "python segmentation fault in libamdocl64.so"?

@DuncanMcBain
Copy link
Member

That's a good idea. We'll try to add to the FAQ on our website - this issue is more likely to affect TensorFlow code, because of the frankly huge kernel names, but could strike anywhere.

@mirh
Copy link
Author

mirh commented Jun 7, 2018

--cxxopt="-sycl-compress-name" on command line also does it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants