Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guarantee failed: [get_safety_boundary() >= num_elements] #7005

Open
zedtux opened this issue Feb 1, 2022 · 2 comments
Open

Guarantee failed: [get_safety_boundary() >= num_elements] #7005

zedtux opened this issue Feb 1, 2022 · 2 comments

Comments

@zedtux
Copy link

zedtux commented Feb 1, 2022

We have a single instance of a rethinkdb 2.4.1~0bionic (CLANG 6.0.0 (tags/RELEASE_600/final)) on a Ubuntu 18.04.6 LTS which crashes randomly like 3 times in January, 2 times in December, never in Novembre, 2 times in October and so forth.

Here is the crash logs:

2022-01-30T06:31:35.445342898 1725898.209731s error: Error in thread 3 in ./src/containers/shared_buffer.hpp at line 81:
2022-01-30T06:31:35.449719119 1725898.214078s error: Guarantee failed: [get_safety_boundary() >= num_elements]
2022-01-30T06:31:35.449799992 1725898.214159s error: Backtrace:
2022-01-30T06:31:36.585571618 1725899.349935s error: Sun Jan 30 06:31:35 2022\n\n1 [0x796330]: backtrace_t::backtrace_t() at ??:?\n2 [0x796fa6]: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at ??:?\n3 [0x795d38]: format_backtrace[abi:cxx11](bool) at ??:?\n4 [0x7f951d]: report_fatal_error(char const*, int, char const*, ...) at ??:?\n5 [0x9ea90e]: ql::datum_get_element_offset(shared_buf_ref_t<char> const&, unsigned long) at ??:?\n6 [0x9f24d9]: ql::datum_t::unchecked_get_pair(unsigned long) const at ??:?\n7 [0x9ee1ca]: ql::datum_t::get_field(datum_string_t const&, ql::throw_bool_t) const at ??:?\n8 [0x9ee0f2]: ql::datum_t::is_ptype() const at ??:?\n9 [0x95a787]: ql::obj_or_seq_op_impl_t::eval_impl_dereferenced(ql::term_t const*, ql::scope_env_t*, ql::args_t*, scoped_ptr_t<ql::val_t> const&, std::function<scoped_ptr_t<ql::val_t> ()>) const at ??:?\n10 [0x95ca05]: ql::bracket_term_t::eval_impl(ql::scope_env_t*, ql::args_t*, ql::eval_flags_t) const at ??:?\n11 [0xa02c5b]: ql::op_term_t::term_eval(ql::scope_env_t*, ql::eval_flags_t) const at ??:?\n12 [0x85f55a]: ql::runtime_term_t::eval_on_current_stack(ql::scope_env_t*, ql::eval_flags_t) const at ??:?\n13 [0x85f712]: ql::runtime_term_t::eval(ql::scope_env_t*, ql::eval_flags_t) const at ??:?\n14 [0xa03230]: ql::op_term_t::maybe_grouped_data(ql::scope_env_t*, ql::argvec_t*, ql::eval_flags_t, counted_t<ql::grouped_data_t>*, scoped_ptr_t<ql::val_t>*) const at ??:?\n15 [0xa0274b]: ql::op_term_t::term_eval(ql::scope_env_t*, ql::eval_flags_t) const at ??:?\n16 [0x85f55a]: ql::runtime_term_t::eval_on_current_stack(ql::scope_env_t*, ql::eval_flags_t) const at ??:?\n17 [0x85f712]: ql::runtime_term_t::eval(ql::scope_env_t*, ql::eval_flags_t) const at ??:?\n18 [0x9a5632]: ql::reql_func_t::call(ql::env_t*, std::vector<ql::datum_t, std::allocator<ql::datum_t> > const&, ql::eval_flags_t) const at ??:?\n19 [0x9a7fe0]: ql::reql_func_t::filter_helper(ql::env_t*, ql::datum_t) const at ??:?\n20 [0x9a8ae7]: ql::func_t::filter_call(ql::env_t*, ql::datum_t, counted_t<ql::func_t const>) const at ??:?\n21 [0x92b6e2]: ql::filter_trans_t::lst_transform(ql::env_t*, std::vector<ql::datum_t, std::allocator<ql::datum_t> >*, std::function<ql::datum_t ()> const&) at ??:?\n22 [0x929425]: ql::ungrouped_op_t::operator()(ql::env_t*, std::map<ql::datum_t, std::vector<ql::datum_t, std::allocator<ql::datum_t> >, optional_datum_less_t, std::allocator<std::pair<ql::datum_t const, std::vector<ql::datum_t, std::allocator<ql::datum_t> > > > >*, std::function<ql::datum_t ()> const&) at ??:?\n23 [0x8b6bd6]: rget_cb_t::handle_pair(scoped_key_value_t&&, unsigned long, optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, concurrent_traversal_fifo_enforcer_signal_t) at ??:?\n24 [0xaa871e]: concurrent_traversal_adapter_t::handle_pair_coro(scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t) at ??:?\n25 [0xaa8825]: callable_action_instance_t<std::_Bind<void (concurrent_traversal_adapter_t::*(concurrent_traversal_adapter_t*, scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t))(scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t)> >::run_action() at ??:?\n26 [0x781af4]: coro_t::run() at ??:?
2022-01-30T06:31:36.587596867 1725899.351955s error: Exiting.

Restarting the DB solves the problem.

Is there anything I could do in order to help you track that bug ?

@srh
Copy link
Contributor

srh commented Feb 7, 2022

The bug is most likely caused by there being corrupt data on disk. This could have been caused by a hardware bitflip or possibly a RethinkDB bug that wrote incorrect data, either because of a bug in code that manages blocks on disk, or the code which serialized a ReQL object into a byte-stream. It turns into a crash when a particular object gets referenced. Maybe it only occurs if a specific replica is used to retrieve the data.

For your sake, it would be good to figure out what document has this problem. I guess the first step would be to figure out what query caused the crash. It seems to be a group-by query.

If this theory is correct, enumerating every object in the database would likely trigger the crash. If can do this without crashing, in such a manner that the data gets retrieved from every replica, then this explanation is probably wrong.

Reformatting the backtrace:


2022-01-30T06:31:35.449719119 1725898.214078s error: Guarantee failed: [get_safety_boundary() >= num_elements]
2022-01-30T06:31:35.449799992 1725898.214159s error: Backtrace:
2022-01-30T06:31:36.585571618 1725899.349935s error: Sun Jan 30 06:31:35 2022
1 [0x796330]: backtrace_t::backtrace_t() at ??:?
2 [0x796fa6]: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at ??:?
3 [0x795d38]: format_backtrace[abi:cxx11](bool) at ??:?
4 [0x7f951d]: report_fatal_error(char const*, int, char const*, ...) at ??:?
5 [0x9ea90e]: ql::datum_get_element_offset(shared_buf_ref_t<char> const&, unsigned long) at ??:?
6 [0x9f24d9]: ql::datum_t::unchecked_get_pair(unsigned long) const at ??:?
7 [0x9ee1ca]: ql::datum_t::get_field(datum_string_t const&, ql::throw_bool_t) const at ??:?
8 [0x9ee0f2]: ql::datum_t::is_ptype() const at ??:?
9 [0x95a787]: ql::obj_or_seq_op_impl_t::eval_impl_dereferenced(ql::term_t const*, ql::scope_env_t*, ql::args_t*, scoped_ptr_t<ql::val_t> const&, std::function<scoped_ptr_t<ql::val_t> ()>) const at ??:?
10 [0x95ca05]: ql::bracket_term_t::eval_impl(ql::scope_env_t*, ql::args_t*, ql::eval_flags_t) const at ??:?
11 [0xa02c5b]: ql::op_term_t::term_eval(ql::scope_env_t*, ql::eval_flags_t) const at ??:?
12 [0x85f55a]: ql::runtime_term_t::eval_on_current_stack(ql::scope_env_t*, ql::eval_flags_t) const at ??:?
13 [0x85f712]: ql::runtime_term_t::eval(ql::scope_env_t*, ql::eval_flags_t) const at ??:?
14 [0xa03230]: ql::op_term_t::maybe_grouped_data(ql::scope_env_t*, ql::argvec_t*, ql::eval_flags_t, counted_t<ql::grouped_data_t>*, scoped_ptr_t<ql::val_t>*) const at ??:?
15 [0xa0274b]: ql::op_term_t::term_eval(ql::scope_env_t*, ql::eval_flags_t) const at ??:?
16 [0x85f55a]: ql::runtime_term_t::eval_on_current_stack(ql::scope_env_t*, ql::eval_flags_t) const at ??:?
17 [0x85f712]: ql::runtime_term_t::eval(ql::scope_env_t*, ql::eval_flags_t) const at ??:?
18 [0x9a5632]: ql::reql_func_t::call(ql::env_t*, std::vector<ql::datum_t, std::allocator<ql::datum_t> > const&, ql::eval_flags_t) const at ??:?
19 [0x9a7fe0]: ql::reql_func_t::filter_helper(ql::env_t*, ql::datum_t) const at ??:?
20 [0x9a8ae7]: ql::func_t::filter_call(ql::env_t*, ql::datum_t, counted_t<ql::func_t const>) const at ??:?
21 [0x92b6e2]: ql::filter_trans_t::lst_transform(ql::env_t*, std::vector<ql::datum_t, std::allocator<ql::datum_t> >*, std::function<ql::datum_t ()> const&) at ??:?
22 [0x929425]: ql::ungrouped_op_t::operator()(ql::env_t*, std::map<ql::datum_t, std::vector<ql::datum_t, std::allocator<ql::datum_t> >, optional_datum_less_t, std::allocator<std::pair<ql::datum_t const, std::vector<ql::datum_t, std::allocator<ql::datum_t> > > > >*, std::function<ql::datum_t ()> const&) at ??:?
23 [0x8b6bd6]: rget_cb_t::handle_pair(scoped_key_value_t&&, unsigned long, optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, concurrent_traversal_fifo_enforcer_signal_t) at ??:?
24 [0xaa871e]: concurrent_traversal_adapter_t::handle_pair_coro(scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t) at ??:?
25 [0xaa8825]: callable_action_instance_t<std::_Bind<void (concurrent_traversal_adapter_t::*(concurrent_traversal_adapter_t*, scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t))(scoped_key_value_t*, semaphore_acq_t*, fifo_enforcer_write_token_t, auto_drainer_t::lock_t)> >::run_action() at ??:?
26 [0x781af4]: coro_t::run() at ??:?
2022-01-30T06:31:36.587596867 1725899.351955s error: Exiting.

@zedtux
Copy link
Author

zedtux commented Feb 7, 2022

Thank you @srh for your reply !

I will review our source code in order to check for a group-by query and see if I can do as you said. I'll back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants