Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PMEM] It will abort when using PMEM allocator in EV #211

Open
shanzhou2186 opened this issue May 11, 2022 · 0 comments
Open

[PMEM] It will abort when using PMEM allocator in EV #211

shanzhou2186 opened this issue May 11, 2022 · 0 comments

Comments

@shanzhou2186
Copy link

While using pmem allocator in the WDL model both on libpmem or memkind mode, it would cause "./tensorflow/core/framework/embedding/value_ptr.h:273] Unsupport FreqCounter in subclass of ValuePtrBase
Aborted (core dumped)
"

Here are the call stack information.
#3 0x00001464e19d0f4e in tensorflow::ValuePtr::AddFreq (this=)
at ./tensorflow/core/framework/embedding/value_ptr.h:273
#4 0x00001464e19d6566 in tensorflow::NullableFilter<long long, float, tensorflow::EmbeddingVar<long long, float> >::LookupOrCreateWithFreq (this=0x145fb0105c90, key=, val=0x14609c00cac0, default_value_ptr=)
at ./tensorflow/core/framework/embedding/embedding_filter.h:526
#5 0x00001464e19c35cc in std::function<void (long long, float*, float*)>::operator()(long long, float*, float*) const (
__args#2=, __args#1=, __args#0=, this=0x146100083cb8)
at /usr/include/c++/7/bits/std_function.h:706
#6 tensorflow::KvResourceGatherOp<long long, float>::Compute(tensorflow::OpKernelContext*)::{lambda(long long, long long)#4}::operator()(long long, long long) const (limit=4, start=, __closure=0x146100083c80)
at tensorflow/core/kernels/kv_variable_ops.cc:413
#7 std::_Function_handler<void (long long, long long), tensorflow::KvResourceGatherOp<long long, float>::Compute(tensorflow::OpKernelContext*)::{lambda(long long, long long)#4}>::_M_invoke(std::_Any_data const&, long long&&, std::_Any_data const&) (
__functor=..., __args#0=, __args#1=) at /usr/include/c++/7/bits/std_function.h:316
#8 0x00001464d9948f1e in std::_Function_handler<void (long, long), tensorflow::thread::ThreadPool::ParallelFor(long long, long long, std::function<void (long long, long long)>)::{lambda(long, long)#1}>::_M_invoke(std::_Any_data const&, long&&, std::_Any_data const&) () from /home/zshan/deeprec-env/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#9 0x00001464d994f48f in tensorflow::thread::ThreadPool::ParallelFor(long long, long long, std::function<void (long long, long long)>) () from /home/zshan/deeprec-env/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#10 0x00001464d971fb52 in tensorflow::Shard(int, tensorflow::thread::ThreadPool*, long long, long long, std::function<void (long long, long long)>) ()
from /home/zshan/deeprec-env/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#11 0x00001464e19dce74 in tensorflow::KvResourceGatherOp<long long, float>::Compute (this=0x560ce5050590, c=)
at tensorflow/core/kernels/kv_variable_ops.cc:427
#12 0x00001464d98766a6 in tensorflow::(anonymous namespace)::ExecutorStatetensorflow::PropagatorState::BatchProcess(std::vector<tensorflow::PropagatorState::TaggedNode, std::allocatortensorflow::PropagatorState::TaggedNode >, int, long) ()
--Type for more, q to quit, c to continue without paging--
from /home/zshan/deeprec-env/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#13 0x00001464d9876a88 in tensorflow::(anonymous namespace)::ExecutorStatetensorflow::PropagatorState::Process(tensorflow::PropagatorState::TaggedNode, long) ()
from /home/zshan/deeprec-env/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#14 0x00001464d9876b5f in std::_Function_handler<void (), tensorflow::(anonymous namespace)::ExecutorStatetensorflow::PropagatorState::RunTask<tensorflow::(anonymous namespace)::ExecutorStatetensorflow::PropagatorState::ScheduleReady(absl::InlinedVector<tensorflow::PropagatorState::TaggedNode, 8ul, std::allocatortensorflow::PropagatorState::TaggedNode >, tensorflow::PropagatorState::TaggedNodeReadyQueue)::{lambda()#1}>(tensorflow::(anonymous namespace)::ExecutorStatetensorflow::PropagatorState::ScheduleReady(absl::InlinedVector<tensorflow::PropagatorState::TaggedNode, 8ul, std::allocatortensorflow::PropagatorState::TaggedNode >, tensorflow::PropagatorState::TaggedNodeReadyQueue)::{lambda()#1}&&)::{lambda()#1}>::_M_invoke(std::_Any_data const&)
() from /home/zshan/deeprec-env/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#15 0x00001464d994bb4f in std::_Function_handler<void (), Eigen::ThreadPoolTempltensorflow::thread::EigenEnvironment::ThreadPoolTempl(int, bool, tensorflow::thread::EigenEnvironment)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
from /home/zshan/deeprec-env/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#16 0x00001464d9948f78 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
from /home/zshan/deeprec-env/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.1
#17 0x00001464d83a9ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#18 0x0000146577a1a17a in start_thread () from /lib64/libpthread.so.0
#19 0x0000146576fbfdc3 in clone () from /lib64/libc.so.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant