Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velox memory manager throw exception #4106

Closed
ulysses-you opened this issue Dec 19, 2023 · 9 comments · Fixed by #4111
Closed

Velox memory manager throw exception #4106

ulysses-you opened this issue Dec 19, 2023 · 9 comments · Fixed by #4111
Labels
bug Something isn't working triage

Comments

@ulysses-you
Copy link
Contributor

ulysses-you commented Dec 19, 2023

Backend

VL (Velox)

Bug description

It seems the issue is caused by facebookincubator/velox#7168

java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: The memory manager is not set
Retriable: False
Expression: instanceRef != nullptr
Function: getInstance
File: ../../velox/common/memory/Memory.cpp
Line: 129
Stack trace:
# 0  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 1  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, char const*>(facebook::velox::detail::VeloxCheckFailArgs const&, char const*)
# 2  facebook::velox::memory::MemoryManager::getInstance()
# 3  facebook::velox::memory::spillMemoryPool()
# 4  facebook::velox::exec::HashBuild::setupSpiller(facebook::velox::exec::SpillPartition*)
# 5  facebook::velox::exec::HashBuild::HashBuild(int, facebook::velox::exec::DriverCtx*, std::shared_ptr<facebook::velox::core::HashJoinNode const>)
# 6  std::_Function_handler<std::unique_ptr<facebook::velox::exec::Operator, std::default_delete<facebook::velox::exec::Operator> > (int, facebook::velox::exec::DriverCtx*), facebook::velox::exec::detail::makeConsumerSupplier(std::shared_ptr<facebook::velox::core::PlanNode const> const&)::{lambda(int, facebook::velox::exec::DriverCtx*)#3}>::_M_invoke(std::_Any_data const&, int&&, facebook::velox::exec::DriverCtx*&&)
# 7  facebook::velox::exec::DriverFactory::createDriver(std::unique_ptr<facebook::velox::exec::DriverCtx, std::default_delete<facebook::velox::exec::DriverCtx> >, std::shared_ptr<facebook::velox::exec::ExchangeClient>, std::function<int (int)>)
# 8  facebook::velox::exec::Task::createDriversLocked(unsigned int)
# 9  facebook::velox::exec::Task::next(folly::SemiFuture<folly::Unit>*)
# 10 gluten::WholeStageResultIterator::next()
# 11 Java_io_glutenproject_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 12 0x00007fc37211f4a8
image

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@ulysses-you ulysses-you added bug Something isn't working triage labels Dec 19, 2023
@ulysses-you
Copy link
Contributor Author

cc @zhztheplayer @zhouyuan @PHILO-HE

@zhouyuan
Copy link
Contributor

@ulysses-you thanks, I ran int this in some unit tests with the default memory manager used. The TPC-H/DS runs ok w/o this patch.
https://github.com/oap-project/gluten/blob/main/cpp/velox/memory/VeloxMemoryManager.h#L84

@zhztheplayer do you see the RCA why the memory manager is not initialized?

thanks, -yuan

@zhztheplayer
Copy link
Member

Somewhere in Velox still requests for the default memory pool. In the case facebook::velox::memory::spillMemoryPool().

Another options is to see if we could pass a context pool for spilling use, but that can be another topic to me.

@zhouyuan
Copy link
Contributor

@zhztheplayer the spill memory pool should also use "the same" memory manager instance. I was wondering why tpch/ds does not fail if the memory manager singleton instance is not registered.

@zhztheplayer
Copy link
Member

@zhztheplayer the spill memory pool should also use "the same" memory manager instance. I was wondering why tpch/ds does not fail if the memory manager singleton instance is not registered.

It's because Velox still has some code calling a function named deprecatedDefaultMemoryManager which sets the memory manager if missing. Not sure why the function not called in this issue's case. @ulysses-you if you could help investigating? I see the memory manager is being set via the following call stack:

facebook::velox::memory::instance Memory.cpp:34
facebook::velox::memory::MemoryManager::deprecatedGetInstance Memory.cpp:107
facebook::velox::memory::deprecatedDefaultMemoryManager Memory.cpp:290
facebook::velox::core::QueryCtx::initPool QueryCtx.h:139
facebook::velox::core::QueryCtx::QueryCtx QueryCtx.cpp:36
Java_io_glutenproject_vectorized_PlanEvaluatorJniWrapper_nativeValidateWithFailureReason VeloxJniWrapper.cc:109

@ulysses-you
Copy link
Contributor Author

@zhztheplayer it seems the nativeValidateWithFailureReason is only called at driver ?

@zhztheplayer
Copy link
Member

@zhztheplayer it seems the nativeValidateWithFailureReason is only called at driver ?

Aha correct. I was testing with local mode. Probably we should add some for local-cluster mode later.

@FelixYBW
Copy link
Contributor

Is it a new error intorduced by rebase?

@ulysses-you
Copy link
Contributor Author

@FelixYBW it is intriduced since 12.18 rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants