Instacrash with `fluid.mlpclassifier` when trying to `fit` something #364

rconstanzo · 2023-04-22T13:53:38Z

As mentioned on the discourse thread got an (unrepeated) crash when trying to fit some data with fluid.mlpclassifier.

Attaching the isolated patch bit in question, along with the data/labels I was using at the time. Also the crash report.

This is, I believe, the crash-y bit:

12  fluid.libmanipulation         	       0x1320628dc Eigen::DenseStorage<double, -1, -1, -1, 1>::resize(long, long, long) + 80
13  fluid.libmanipulation         	       0x1322763d8 Eigen::internal::product_evaluator<Eigen::Product<Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> const>, Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> >, 0>, 8, Eigen::DenseShape, Eigen::DenseShape, double, double>::product_evaluator(Eigen::Product<Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> const>, Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> >, 0> const&) + 108
14  fluid.libmanipulation         	       0x132276048 void Eigen::internal::call_dense_assignment_loop<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Transpose<Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, Eigen::Product<Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> const>, Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> >, 0> const, Eigen::Replicate<Eigen::Matrix<double, -1, 1, 0, -1, 1>, 1, -1> const> >, Eigen::internal::assign_op<double, double> >(Eigen::Matrix<double, -1, -1, 0, -1, -1>&, Eigen::Transpose<Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<double, double>, Eigen::Product<Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> const>, Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1> >, 0> const, Eigen::Replicate<Eigen::Matrix<double, -1, 1, 0, -1, 1>, 1, -1> const> > const&, Eigen::internal::assign_op<double, double> const&) + 40
15  fluid.libmanipulation         	       0x132275b34 fluid::algorithm::NNLayer::forward(Eigen::Ref<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::Ref<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >) const + 136
16  fluid.libmanipulation         	       0x132275880 fluid::algorithm::MLP::forward(Eigen::Ref<Eigen::Array<double, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::Ref<Eigen::Array<double, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, long, long) const + 344
17  fluid.libmanipulation         	       0x1322743bc fluid::algorithm::SGD::train(fluid::algorithm::MLP&, fluid::FluidTensorView<double, 2ul>, fluid::FluidTensorView<double, 2ul>, long, long, double, double, double) + 2060
18  fluid.libmanipulation         	       0x13229ec8c fluid::client::mlpclassifier::MLPClassifierClient::fit(fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>) + 1748
19  fluid.libmanipulation         	       0x1322b29f8 auto fluid::client::makeMessage<fluid::client::MessageResult<double>, fluid::client::mlpclassifier::MLPClassifierClient, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const> >(char const*, fluid::client::MessageResult<double> (fluid::client::mlpclassifier::MLPClassifierClient::*)(fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>))::'lambda'(fluid::client::mlpclassifier::MLPClassifierClient&, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>)::operator()('lambda'(fluid::client::mlpclassifier::MLPClassifierClient&, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>), fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>) const + 96
20  fluid.libmanipulation         	       0x1322b27e0 fluid::client::Message<auto fluid::client::makeMessage<fluid::client::MessageResult<double>, fluid::client::mlpclassifier::MLPClassifierClient, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const> >(char const*, fluid::client::MessageResult<double> (fluid::client::mlpclassifier::MLPClassifierClient::*)(fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>))::'lambda'(fluid::client::mlpclassifier::MLPClassifierClient&, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>), fluid::client::MessageResult<double>, fluid::client::mlpclassifier::MLPClassifierClient, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const> >::operator()(auto fluid::client::makeMessage<fluid::client::MessageResult<double>, fluid::client::mlpclassifier::MLPClassifierClient, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const> >(char const*, fluid::client::MessageResult<double> (fluid::client::mlpclassifier::MLPClassifierClient::*)(fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>))::'lambda'(fluid::client::mlpclassifier::MLPClassifierClient&, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>), fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>) const + 80
21  fluid.libmanipulation         	       0x1322b255c _ZNK5fluid6client10MessageSetINSt3__15tupleIJNS0_7MessageIZNS0_11makeMessageINS0_13MessageResultIdEENS0_13mlpclassifier19MLPClassifierClientEJNS0_15SharedClientRefIKNS0_7dataset13DataSetClientEEENSA_IKNS0_8labelset14LabelSetClientEEEEEEDaPKcMT0_FT_DpT1_EEUlRS9_SE_SI_E_S7_S9_JSE_SI_EEENS4_IZNS5_INS6_IvEES9_JSE_NSA_ISG_EEEEESJ_SL_SR_EUlSS_SE_SW_E_SV_S9_JSE_SW_EEENS4_IZNS5_INS6_INS2_12basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEEEES9_JNS2_10shared_ptrIKNS0_13BufferAdaptorEEEEEESJ_SL_SR_EUlSS_S19_E_S15_S9_JS19_EEENS4_IZNS5_ISV_NS0_10DataClientINS8_17MLPClassifierDataEEEJEEESJ_SL_SR_EUlRS1E_E_SV_S1E_JEEENS4_IZNS0_11makeMessageINS6_IlEES1E_JEEESJ_SL_MSM_KFSN_SP_EEUlS1F_E_S1J_S1E_JEEES1N_NS4_IZNS5_INS6_INS3_IJNSZ_IcS11_N9foonathan6memory13std_allocatorIcNS_17FallbackAllocatorEEEEENS_11FluidTensorIlLm1EEEllddldEEEEES9_JS14_EEESJ_SL_SR_EUlSS_S14_E_S1X_S9_JS14_EEENS4_IZNS5_IS15_S1E_JEEESJ_SL_SR_EUlS1F_E_S15_S1E_JEEENS4_IZNS5_ISV_S1E_JS14_EEESJ_SL_SR_EUlS1F_S14_E_SV_S1E_JS14_EEES1Z_EEEE6invokeILm0EJRNS0_24NRTSharedInstanceAdaptorIS9_E12SharedClientERSE_RSI_EEEDcDpOT0_ + 144
22  fluid.libmanipulation         	       0x1322b1ef8 decltype(auto) fluid::client::NRTThreadingAdaptor<fluid::client::NRTSharedInstanceAdaptor<fluid::client::mlpclassifier::MLPClassifierClient> >::invoke<0ul, fluid::client::NRTThreadingAdaptor<fluid::client::NRTSharedInstanceAdaptor<fluid::client::mlpclassifier::MLPClassifierClient> >, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>&, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>&>(fluid::client::NRTThreadingAdaptor<fluid::client::NRTSharedInstanceAdaptor<fluid::client::mlpclassifier::MLPClassifierClient> >&, fluid::client::SharedClientRef<fluid::client::dataset::DataSetClient const>&, fluid::client::SharedClientRef<fluid::client::labelset::LabelSetClient const>&) + 360
23  fluid.libmanipulation         	       0x1322b17a0 void fluid::client::FluidMaxWrapper<fluid::client::NRTThreadingAdaptor<fluid::client::NRTSharedInstanceAdaptor<fluid::client::mlpclassifier::MLPClassifierClient> > >::invokeMessageImpl<0ul, 0ul, 1ul>(fluid::client::FluidMaxWrapper<fluid::client::NRTThreadingAdaptor<fluid::client::NRTSharedInstanceAdaptor<fluid::client::mlpclassifier::MLPClassifierClient> > >*, symbol*, long, atom*, std::__1::integer_sequence<unsigned long, 0ul, 1ul>) + 172

crashbits.zip

The text was updated successfully, but these errors were encountered:

rconstanzo · 2023-04-22T14:13:26Z

Got a second crash (also with a really long @hiddenlayers network.

Also, narrowed down when it happens. It seems like I got the crash when I pause the training (i.e. toggle off the toggle in the loop) then toggle it back on. I got the instacrash when toggling it back on.

crash2.zip

tremblap · 2023-04-23T12:14:22Z

few observations:

it is a huge network. have you pca'd the datasets first to reduce the number of dimensions? I'm saying this because I get spinning wheel of death here when I start the patch, but no crash.
the first crash seems chromium related
the second crash is mem-alloc related but maybe not flucoma...

on my machine, there is no memory leak after running it for 15 minutes without a crash - I thought of checking this since both crashes are linked to memory allocation... and I start and stop it, to no avail.

so that brings us to how we can help you help us help you: are you set up for compilation? if so you could gain 2 things:

when you are in dev mode, you could use objects that are a little more explicit when they crash (line numbers in code) so that helps us volunteer coding people know where to look for problems in our ginormous code base
when you are in gig mode, you could have super optimised versions of the objects, tailored for your actual hardware.

this comes at the expense of having to compile, but also maybe being confused by which version you are actually using. I have scripts to swap them in the OS, but that might not be exciting for you. in all cases, I'm happy to help.

anyway, as I am unable to reproduce, we are stalled. let us know if you find something more reproducible.

tremblap · 2023-04-23T13:02:45Z

now running for 45 minutes in 'test' compile mode, starting and stopping and resetting - still no crash.

rconstanzo · 2023-04-24T08:38:17Z

I'll see if I can get it to crash again.

Not saying the network is useful, I was just testing different structures on to see what type/direction/style was better (maybe changing structures often, via the attrui is a component of this?).

It could just be coincidence, but happening twice with the same object/process seems unlikely.

tremblap · 2023-04-24T11:55:39Z

If you don't mind, try this version of the object (keep the other one you have for real-life use) so if it crashes we'll know better if it is fluid.verse-related and where from...
fluid.libmanipulation.mxo.zip

tremblap · 2023-05-04T07:28:25Z

@rconstanzo any more crash with my magic custom compile?

rconstanzo · 2023-05-05T21:56:32Z

Was in the UK teaching, will give it a test now. Not gotten any new crashes since though (but haven't been testing super long network structures since.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instacrash with `fluid.mlpclassifier` when trying to `fit` something #364

Instacrash with `fluid.mlpclassifier` when trying to `fit` something #364

rconstanzo commented Apr 22, 2023

rconstanzo commented Apr 22, 2023

tremblap commented Apr 23, 2023

tremblap commented Apr 23, 2023

rconstanzo commented Apr 24, 2023

tremblap commented Apr 24, 2023

tremblap commented May 4, 2023

rconstanzo commented May 5, 2023

Instacrash with fluid.mlpclassifier when trying to fit something #364

Instacrash with fluid.mlpclassifier when trying to fit something #364

Comments

rconstanzo commented Apr 22, 2023

rconstanzo commented Apr 22, 2023

tremblap commented Apr 23, 2023

tremblap commented Apr 23, 2023

rconstanzo commented Apr 24, 2023

tremblap commented Apr 24, 2023

tremblap commented May 4, 2023

rconstanzo commented May 5, 2023

Instacrash with `fluid.mlpclassifier` when trying to `fit` something #364

Instacrash with `fluid.mlpclassifier` when trying to `fit` something #364