Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] fix large binary column overflow when column writer append #36281

Merged
merged 1 commit into from
Dec 2, 2023

Conversation

luohaha
Copy link
Contributor

@luohaha luohaha commented Dec 1, 2023

Why I'm doing:
In BinaryColumn, we use vector<uint32_t> _offset to store each rows offset in vector<uint8_t> _bytes. But if _bytes is large than 4GB, which means that offset will large than UINT32_MAX, offset overflow in uint32. To prevent this overflow, we add this check in pr #33521, and because overflow happens, check fail with BE crash.

template <typename T>
void BinaryColumnBase<T>::_build_slices() const {
    if constexpr (std::is_same_v<T, uint32_t>) {
        CHECK_LT(_bytes.size(), (size_t)UINT32_MAX) << "BinaryColumn size overflow";
    }

BE crash:

   @          0x5e50559 google::LogMessageFatal::~LogMessageFatal()
    @          0x27ecfaf starrocks::BinaryColumnBase<>::_build_slices()
    @          0x2757dd5 starrocks::BinaryColumnBase<>::raw_data()
    @          0x480a0da starrocks::ScalarColumnWriter::append()
    @          0x480a33c starrocks::StringColumnWriter::append()
    @          0x43a2806 starrocks::SegmentWriter::append_chunk()
    @          0x4a98da0 starrocks::VerticalRowsetWriter::add_columns()
    @          0x4a73ac7 starrocks::VerticalCompactionTask::_compact_data()
    @          0x4a74d0f starrocks::VerticalCompactionTask::_compact_column_group()
    @          0x4a75632 starrocks::VerticalCompactionTask::_vertical_compaction_data()
    @          0x4a76129 starrocks::VerticalCompactionTask::run_impl()
    @          0x4a6b6fc starrocks::CompactionTask::run()
    @          0x44d30d3 _ZNSt17_Function_handlerIFvvEZN9starrocks17CompactionManager9_scheduleEvEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x4cc33d2 starrocks::ThreadPool::dispatch_thread()
    @          0x4cbde6a starrocks::Thread::supervise_thread()
    @     0x7f4658c6644b start_thread
    @     0x7f465803f40f __GI___clone
    @                0x0 (unknown)

What I'm doing:
Add byte size limit to dictionary speculate, and make sure dictionary speculate will not batch too large column and cause overflow.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.2
    • 3.1
    • 3.0
    • 2.5

be/src/common/constexpr.h Outdated Show resolved Hide resolved
be/src/storage/rowset/column_writer.cpp Outdated Show resolved Hide resolved
@luohaha luohaha force-pushed the fix-bc-overflow branch 2 times, most recently from 9ead095 to 8758ff6 Compare December 1, 2023 22:24
be/src/common/constexpr.h Outdated Show resolved Hide resolved
Signed-off-by: luohaha <18810541851@163.com>
Copy link

github-actions bot commented Dec 2, 2023

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Dec 2, 2023

[BE Incremental Coverage Report]

pass : 8 / 8 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 src/column/binary_column.cpp 1 1 100.00% []
🔵 src/storage/rowset/column_writer.cpp 7 7 100.00% []

@imay imay merged commit f06e2c8 into StarRocks:main Dec 2, 2023
38 checks passed
Copy link

github-actions bot commented Dec 2, 2023

@Mergifyio backport branch-3.2

@github-actions github-actions bot removed the 3.2 label Dec 2, 2023
Copy link

github-actions bot commented Dec 2, 2023

@Mergifyio backport branch-3.1

@github-actions github-actions bot removed the 3.1 label Dec 2, 2023
Copy link
Contributor

mergify bot commented Dec 2, 2023

backport branch-3.2

✅ Backports have been created

Copy link
Contributor

mergify bot commented Dec 2, 2023

backport branch-3.1

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Dec 2, 2023
…36281)

Why I'm doing:
In BinaryColumn, we use `vector<uint32_t> _offset` to store each rows offset in `vector<uint8_t> _bytes`. But if `_bytes` is large than 4GB, which means that offset will large than UINT32_MAX, offset overflow in `uint32`.  To prevent this overflow, we add this check in pr #33521, and because overflow happens, check fail with BE crash.
```
template <typename T>
void BinaryColumnBase<T>::_build_slices() const {
    if constexpr (std::is_same_v<T, uint32_t>) {
        CHECK_LT(_bytes.size(), (size_t)UINT32_MAX) << "BinaryColumn size overflow";
    }
```
BE crash:
```
   @          0x5e50559 google::LogMessageFatal::~LogMessageFatal()
    @          0x27ecfaf starrocks::BinaryColumnBase<>::_build_slices()
    @          0x2757dd5 starrocks::BinaryColumnBase<>::raw_data()
    @          0x480a0da starrocks::ScalarColumnWriter::append()
    @          0x480a33c starrocks::StringColumnWriter::append()
    @          0x43a2806 starrocks::SegmentWriter::append_chunk()
    @          0x4a98da0 starrocks::VerticalRowsetWriter::add_columns()
    @          0x4a73ac7 starrocks::VerticalCompactionTask::_compact_data()
    @          0x4a74d0f starrocks::VerticalCompactionTask::_compact_column_group()
    @          0x4a75632 starrocks::VerticalCompactionTask::_vertical_compaction_data()
    @          0x4a76129 starrocks::VerticalCompactionTask::run_impl()
    @          0x4a6b6fc starrocks::CompactionTask::run()
    @          0x44d30d3 _ZNSt17_Function_handlerIFvvEZN9starrocks17CompactionManager9_scheduleEvEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x4cc33d2 starrocks::ThreadPool::dispatch_thread()
    @          0x4cbde6a starrocks::Thread::supervise_thread()
    @     0x7f4658c6644b start_thread
    @     0x7f465803f40f __GI___clone
    @                0x0 (unknown)
```

What I'm doing:
Add byte size limit to dictionary speculate, and make sure dictionary speculate will not batch too large column and cause overflow.

Signed-off-by: luohaha <18810541851@163.com>
(cherry picked from commit f06e2c8)
mergify bot pushed a commit that referenced this pull request Dec 2, 2023
…36281)

Why I'm doing:
In BinaryColumn, we use `vector<uint32_t> _offset` to store each rows offset in `vector<uint8_t> _bytes`. But if `_bytes` is large than 4GB, which means that offset will large than UINT32_MAX, offset overflow in `uint32`.  To prevent this overflow, we add this check in pr #33521, and because overflow happens, check fail with BE crash.
```
template <typename T>
void BinaryColumnBase<T>::_build_slices() const {
    if constexpr (std::is_same_v<T, uint32_t>) {
        CHECK_LT(_bytes.size(), (size_t)UINT32_MAX) << "BinaryColumn size overflow";
    }
```
BE crash:
```
   @          0x5e50559 google::LogMessageFatal::~LogMessageFatal()
    @          0x27ecfaf starrocks::BinaryColumnBase<>::_build_slices()
    @          0x2757dd5 starrocks::BinaryColumnBase<>::raw_data()
    @          0x480a0da starrocks::ScalarColumnWriter::append()
    @          0x480a33c starrocks::StringColumnWriter::append()
    @          0x43a2806 starrocks::SegmentWriter::append_chunk()
    @          0x4a98da0 starrocks::VerticalRowsetWriter::add_columns()
    @          0x4a73ac7 starrocks::VerticalCompactionTask::_compact_data()
    @          0x4a74d0f starrocks::VerticalCompactionTask::_compact_column_group()
    @          0x4a75632 starrocks::VerticalCompactionTask::_vertical_compaction_data()
    @          0x4a76129 starrocks::VerticalCompactionTask::run_impl()
    @          0x4a6b6fc starrocks::CompactionTask::run()
    @          0x44d30d3 _ZNSt17_Function_handlerIFvvEZN9starrocks17CompactionManager9_scheduleEvEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x4cc33d2 starrocks::ThreadPool::dispatch_thread()
    @          0x4cbde6a starrocks::Thread::supervise_thread()
    @     0x7f4658c6644b start_thread
    @     0x7f465803f40f __GI___clone
    @                0x0 (unknown)
```

What I'm doing:
Add byte size limit to dictionary speculate, and make sure dictionary speculate will not batch too large column and cause overflow.

Signed-off-by: luohaha <18810541851@163.com>
(cherry picked from commit f06e2c8)
imay pushed a commit that referenced this pull request Dec 2, 2023
…backport #36281) (#36299)

Co-authored-by: Yixin Luo <18810541851@163.com>
imay pushed a commit that referenced this pull request Dec 2, 2023
…backport #36281) (#36300)

Co-authored-by: Yixin Luo <18810541851@163.com>
dingxin-tech pushed a commit to dingxin-tech/starrocks-odps that referenced this pull request Dec 12, 2023
…tarRocks#36281)

Why I'm doing:
In BinaryColumn, we use `vector<uint32_t> _offset` to store each rows offset in `vector<uint8_t> _bytes`. But if `_bytes` is large than 4GB, which means that offset will large than UINT32_MAX, offset overflow in `uint32`.  To prevent this overflow, we add this check in pr StarRocks#33521, and because overflow happens, check fail with BE crash.
```
template <typename T>
void BinaryColumnBase<T>::_build_slices() const {
    if constexpr (std::is_same_v<T, uint32_t>) {
        CHECK_LT(_bytes.size(), (size_t)UINT32_MAX) << "BinaryColumn size overflow";
    }
```
BE crash:
```
   @          0x5e50559 google::LogMessageFatal::~LogMessageFatal()
    @          0x27ecfaf starrocks::BinaryColumnBase<>::_build_slices()
    @          0x2757dd5 starrocks::BinaryColumnBase<>::raw_data()
    @          0x480a0da starrocks::ScalarColumnWriter::append()
    @          0x480a33c starrocks::StringColumnWriter::append()
    @          0x43a2806 starrocks::SegmentWriter::append_chunk()
    @          0x4a98da0 starrocks::VerticalRowsetWriter::add_columns()
    @          0x4a73ac7 starrocks::VerticalCompactionTask::_compact_data()
    @          0x4a74d0f starrocks::VerticalCompactionTask::_compact_column_group()
    @          0x4a75632 starrocks::VerticalCompactionTask::_vertical_compaction_data()
    @          0x4a76129 starrocks::VerticalCompactionTask::run_impl()
    @          0x4a6b6fc starrocks::CompactionTask::run()
    @          0x44d30d3 _ZNSt17_Function_handlerIFvvEZN9starrocks17CompactionManager9_scheduleEvEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x4cc33d2 starrocks::ThreadPool::dispatch_thread()
    @          0x4cbde6a starrocks::Thread::supervise_thread()
    @     0x7f4658c6644b start_thread
    @     0x7f465803f40f __GI___clone
    @                0x0 (unknown)
```

What I'm doing:
Add byte size limit to dictionary speculate, and make sure dictionary speculate will not batch too large column and cause overflow.

Signed-off-by: luohaha <18810541851@163.com>
Signed-off-by: 鼎昕 <zhangdingxin.zdx@alibaba-inc.com>
mergify bot added a commit that referenced this pull request Dec 12, 2023
…backport #36281) (#36299)

Co-authored-by: Yixin Luo <18810541851@163.com>
(cherry picked from commit 2af81c3)
@luohaha
Copy link
Contributor Author

luohaha commented Dec 12, 2023

https://github.com/Mergifyio backport branch-3.0

@luohaha
Copy link
Contributor Author

luohaha commented Dec 12, 2023

https://github.com/Mergifyio backport branch-2.5

Copy link
Contributor

mergify bot commented Dec 12, 2023

backport branch-3.0

✅ Backports have been created

Copy link
Contributor

mergify bot commented Dec 12, 2023

backport branch-2.5

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Dec 12, 2023
…36281)

Why I'm doing:
In BinaryColumn, we use `vector<uint32_t> _offset` to store each rows offset in `vector<uint8_t> _bytes`. But if `_bytes` is large than 4GB, which means that offset will large than UINT32_MAX, offset overflow in `uint32`.  To prevent this overflow, we add this check in pr #33521, and because overflow happens, check fail with BE crash.
```
template <typename T>
void BinaryColumnBase<T>::_build_slices() const {
    if constexpr (std::is_same_v<T, uint32_t>) {
        CHECK_LT(_bytes.size(), (size_t)UINT32_MAX) << "BinaryColumn size overflow";
    }
```
BE crash:
```
   @          0x5e50559 google::LogMessageFatal::~LogMessageFatal()
    @          0x27ecfaf starrocks::BinaryColumnBase<>::_build_slices()
    @          0x2757dd5 starrocks::BinaryColumnBase<>::raw_data()
    @          0x480a0da starrocks::ScalarColumnWriter::append()
    @          0x480a33c starrocks::StringColumnWriter::append()
    @          0x43a2806 starrocks::SegmentWriter::append_chunk()
    @          0x4a98da0 starrocks::VerticalRowsetWriter::add_columns()
    @          0x4a73ac7 starrocks::VerticalCompactionTask::_compact_data()
    @          0x4a74d0f starrocks::VerticalCompactionTask::_compact_column_group()
    @          0x4a75632 starrocks::VerticalCompactionTask::_vertical_compaction_data()
    @          0x4a76129 starrocks::VerticalCompactionTask::run_impl()
    @          0x4a6b6fc starrocks::CompactionTask::run()
    @          0x44d30d3 _ZNSt17_Function_handlerIFvvEZN9starrocks17CompactionManager9_scheduleEvEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x4cc33d2 starrocks::ThreadPool::dispatch_thread()
    @          0x4cbde6a starrocks::Thread::supervise_thread()
    @     0x7f4658c6644b start_thread
    @     0x7f465803f40f __GI___clone
    @                0x0 (unknown)
```

What I'm doing:
Add byte size limit to dictionary speculate, and make sure dictionary speculate will not batch too large column and cause overflow.

Signed-off-by: luohaha <18810541851@163.com>
(cherry picked from commit f06e2c8)
mergify bot pushed a commit that referenced this pull request Dec 12, 2023
…36281)

Why I'm doing:
In BinaryColumn, we use `vector<uint32_t> _offset` to store each rows offset in `vector<uint8_t> _bytes`. But if `_bytes` is large than 4GB, which means that offset will large than UINT32_MAX, offset overflow in `uint32`.  To prevent this overflow, we add this check in pr #33521, and because overflow happens, check fail with BE crash.
```
template <typename T>
void BinaryColumnBase<T>::_build_slices() const {
    if constexpr (std::is_same_v<T, uint32_t>) {
        CHECK_LT(_bytes.size(), (size_t)UINT32_MAX) << "BinaryColumn size overflow";
    }
```
BE crash:
```
   @          0x5e50559 google::LogMessageFatal::~LogMessageFatal()
    @          0x27ecfaf starrocks::BinaryColumnBase<>::_build_slices()
    @          0x2757dd5 starrocks::BinaryColumnBase<>::raw_data()
    @          0x480a0da starrocks::ScalarColumnWriter::append()
    @          0x480a33c starrocks::StringColumnWriter::append()
    @          0x43a2806 starrocks::SegmentWriter::append_chunk()
    @          0x4a98da0 starrocks::VerticalRowsetWriter::add_columns()
    @          0x4a73ac7 starrocks::VerticalCompactionTask::_compact_data()
    @          0x4a74d0f starrocks::VerticalCompactionTask::_compact_column_group()
    @          0x4a75632 starrocks::VerticalCompactionTask::_vertical_compaction_data()
    @          0x4a76129 starrocks::VerticalCompactionTask::run_impl()
    @          0x4a6b6fc starrocks::CompactionTask::run()
    @          0x44d30d3 _ZNSt17_Function_handlerIFvvEZN9starrocks17CompactionManager9_scheduleEvEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x4cc33d2 starrocks::ThreadPool::dispatch_thread()
    @          0x4cbde6a starrocks::Thread::supervise_thread()
    @     0x7f4658c6644b start_thread
    @     0x7f465803f40f __GI___clone
    @                0x0 (unknown)
```

What I'm doing:
Add byte size limit to dictionary speculate, and make sure dictionary speculate will not batch too large column and cause overflow.

Signed-off-by: luohaha <18810541851@163.com>
(cherry picked from commit f06e2c8)
wanpengfei-git pushed a commit that referenced this pull request Dec 12, 2023
…backport #36281) (#36901)

Co-authored-by: Yixin Luo <18810541851@163.com>
luohaha pushed a commit that referenced this pull request Dec 12, 2023
…backport #36281) (backport #36299) (#36900)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
trueeyu pushed a commit that referenced this pull request Mar 22, 2024
…36281)

Why I'm doing:
In BinaryColumn, we use `vector<uint32_t> _offset` to store each rows offset in `vector<uint8_t> _bytes`. But if `_bytes` is large than 4GB, which means that offset will large than UINT32_MAX, offset overflow in `uint32`.  To prevent this overflow, we add this check in pr #33521, and because overflow happens, check fail with BE crash.
```
template <typename T>
void BinaryColumnBase<T>::_build_slices() const {
    if constexpr (std::is_same_v<T, uint32_t>) {
        CHECK_LT(_bytes.size(), (size_t)UINT32_MAX) << "BinaryColumn size overflow";
    }
```
BE crash:
```
   @          0x5e50559 google::LogMessageFatal::~LogMessageFatal()
    @          0x27ecfaf starrocks::BinaryColumnBase<>::_build_slices()
    @          0x2757dd5 starrocks::BinaryColumnBase<>::raw_data()
    @          0x480a0da starrocks::ScalarColumnWriter::append()
    @          0x480a33c starrocks::StringColumnWriter::append()
    @          0x43a2806 starrocks::SegmentWriter::append_chunk()
    @          0x4a98da0 starrocks::VerticalRowsetWriter::add_columns()
    @          0x4a73ac7 starrocks::VerticalCompactionTask::_compact_data()
    @          0x4a74d0f starrocks::VerticalCompactionTask::_compact_column_group()
    @          0x4a75632 starrocks::VerticalCompactionTask::_vertical_compaction_data()
    @          0x4a76129 starrocks::VerticalCompactionTask::run_impl()
    @          0x4a6b6fc starrocks::CompactionTask::run()
    @          0x44d30d3 _ZNSt17_Function_handlerIFvvEZN9starrocks17CompactionManager9_scheduleEvEUlvE_E9_M_invokeERKSt9_Any_data
    @          0x4cc33d2 starrocks::ThreadPool::dispatch_thread()
    @          0x4cbde6a starrocks::Thread::supervise_thread()
    @     0x7f4658c6644b start_thread
    @     0x7f465803f40f __GI___clone
    @                0x0 (unknown)
```

What I'm doing:
Add byte size limit to dictionary speculate, and make sure dictionary speculate will not batch too large column and cause overflow.

Signed-off-by: luohaha <18810541851@163.com>
(cherry picked from commit f06e2c8)
wanpengfei-git pushed a commit that referenced this pull request Mar 22, 2024
…backport #36281) (#36902)

Signed-off-by: Yixin Luo <18810541851@163.com>
Co-authored-by: Yixin Luo <18810541851@163.com>
trueeyu pushed a commit to trueeyu/starrocks that referenced this pull request Mar 26, 2024
…backport StarRocks#36281) (StarRocks#36902)

Signed-off-by: Yixin Luo <18810541851@163.com>
Co-authored-by: Yixin Luo <18810541851@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants