Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DuckDB crashed in duckdb::Utf8Proc::Analyze at third_party/utf8proc/utf8proc_wrapper.cpp:74 #7348

Closed
2 tasks done
fuboat opened this issue May 3, 2023 · 5 comments · Fixed by #7442
Closed
2 tasks done
Assignees

Comments

@fuboat
Copy link

fuboat commented May 3, 2023

What happens?

The DuckDB binary (/usr/local/bin/duckdb) crashed in duckdb::Utf8Proc::Analyze at third_party/utf8proc/utf8proc_wrapper.cpp:74.

To Reproduce

PoC:

BEGIN;
CREATE TABLE t1(a VARCHAR(256) PRIMARY KEY, b INTEGER) WITH (storage_parameter='heap');
INSERT INTO t1 VALUES('    4-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', 2+1);
INSERT INTO t1 VALUES('   34-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', 18);
INSERT INTO t1 SELECT b, b+1 FROM t1 WHERE b<5;
UPDATE t1 SET a = CONCAT(a, 'x') WHERE b%2=0;

ASAN report:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==2032597==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55c833e0d9a1 bp 0x7fff1082c3f0 sp 0x7fff1082c320 T0)
==2032597==The signal is caused by a READ memory access.
==2032597==Hint: address points to the zero page.
    #0 0x55c833e0d9a0 in duckdb::Utf8Proc::Analyze(char const*, unsigned long, duckdb::UnicodeInvalidReason*, unsigned long*) /root/duckdb_master/third_party/utf8proc/utf8proc_wrapper.cpp:74
    #1 0x55c8344bf7ef in duckdb::string_t::Verify() const /root/duckdb_master/src/common/types/string_type.cpp:15
    #2 0x55c834520c39 in duckdb::Vector::Verify(duckdb::Vector&, duckdb::SelectionVector const&, unsigned long) /root/duckdb_master/src/common/types/vector.cpp:1307
    #3 0x55c834523118 in duckdb::Vector::Verify(unsigned long) /root/duckdb_master/src/common/types/vector.cpp:1440
    #4 0x55c8350c5f73 in duckdb::ExpressionExecutor::Verify(duckdb::Expression const&, duckdb::Vector&, unsigned long) /root/duckdb_master/src/execution/expression_executor.cpp:137
    #5 0x55c8350c7608 in duckdb::ExpressionExecutor::Execute(duckdb::Expression const&, duckdb::ExpressionState*, duckdb::SelectionVector const*, unsigned long, duckdb::Vector&) /root/duckdb_master/src/execution/expression_executor.cpp:217
    #6 0x55c8350c4e9c in duckdb::ExpressionExecutor::ExecuteExpression(unsigned long, duckdb::Vector&) /root/duckdb_master/src/execution/expression_executor.cpp:105
    #7 0x55c8350c403b in duckdb::ExpressionExecutor::Execute(duckdb::DataChunk*, duckdb::DataChunk&) /root/duckdb_master/src/execution/expression_executor.cpp:76
    #8 0x55c835e68981 in duckdb::ExpressionExecutor::Execute(duckdb::DataChunk&, duckdb::DataChunk&) /root/duckdb_master/src/include/duckdb/execution/expression_executor.hpp:50
    #9 0x55c835de07fd in duckdb::Index::ExecuteExpressions(duckdb::DataChunk&, duckdb::DataChunk&) /root/duckdb_master/src/storage/index.cpp:78
    #10 0x55c835052337 in duckdb::ART::Delete(duckdb::IndexLock&, duckdb::DataChunk&, duckdb::Vector&) /root/duckdb_master/src/execution/index/art/art.cpp:519
    #11 0x55c835ddf5ba in duckdb::Index::Delete(duckdb::DataChunk&, duckdb::Vector&) /root/duckdb_master/src/storage/index.cpp:48
    #12 0x55c835bfe223 in operator() /root/duckdb_master/src/storage/table/row_group_collection.cpp:557
    #13 0x55c835c48eaa in Scan<duckdb::RowGroupCollection::RemoveFromIndexes(duckdb::TableIndexList&, duckdb::Vector&, duckdb::idx_t)::<lambda(duckdb::Index&)> > /root/duckdb_master/src/include/duckdb/storage/table/table_index_list.hpp:26
    #14 0x55c835bfee9f in duckdb::RowGroupCollection::RemoveFromIndexes(duckdb::TableIndexList&, duckdb::Vector&, unsigned long) /root/duckdb_master/src/storage/table/row_group_collection.cpp:556
    #15 0x55c835df22c0 in duckdb::LocalStorage::Delete(duckdb::DataTable&, duckdb::Vector&, unsigned long) /root/duckdb_master/src/storage/local_storage.cpp:476
    #16 0x55c835dd6b50 in duckdb::DataTable::Delete(duckdb::TableCatalogEntry&, duckdb::ClientContext&, duckdb::Vector&, unsigned long) /root/duckdb_master/src/storage/data_table.cpp:984
    #17 0x55c837b4fd53 in duckdb::PhysicalUpdate::Sink(duckdb::ExecutionContext&, duckdb::GlobalSinkState&, duckdb::LocalSinkState&, duckdb::DataChunk&) const /root/duckdb_master/src/execution/operator/persistent/physical_update.cpp:110
    #18 0x55c83577657a in duckdb::PipelineExecutor::ExecutePushInternal(duckdb::DataChunk&, unsigned long) /root/duckdb_master/src/parallel/pipeline_executor.cpp:104
    #19 0x55c8357774b8 in duckdb::PipelineExecutor::FlushCachingOperatorsPush() /root/duckdb_master/src/parallel/pipeline_executor.cpp:136
    #20 0x55c8357778d1 in duckdb::PipelineExecutor::PushFinalize() /root/duckdb_master/src/parallel/pipeline_executor.cpp:157
    #21 0x55c835775491 in duckdb::PipelineExecutor::Execute(unsigned long) /root/duckdb_master/src/parallel/pipeline_executor.cpp:61
    #22 0x55c835786cb2 in duckdb::PipelineTask::ExecuteTask(duckdb::TaskExecutionMode) /root/duckdb_master/src/parallel/pipeline.cpp:37
    #23 0x55c835754309 in duckdb::ExecutorTask::Execute(duckdb::TaskExecutionMode) /root/duckdb_master/src/parallel/executor_task.cpp:17
    #24 0x55c835760f99 in duckdb::Executor::ExecuteTask() /root/duckdb_master/src/parallel/executor.cpp:402
    #25 0x55c83531e511 in duckdb::ClientContext::ExecuteTaskInternal(duckdb::ClientContextLock&, duckdb::PendingQueryResult&) /root/duckdb_master/src/main/client_context.cpp:439
    #26 0x55c8353676a8 in duckdb::PendingQueryResult::ExecuteTaskInternal(duckdb::ClientContextLock&) /root/duckdb_master/src/main/pending_query_result.cpp:53
    #27 0x55c8353679a4 in duckdb::PendingQueryResult::ExecuteInternal(duckdb::ClientContextLock&) /root/duckdb_master/src/main/pending_query_result.cpp:58
    #28 0x55c835368004 in duckdb::PendingQueryResult::Execute() /root/duckdb_master/src/main/pending_query_result.cpp:70
    #29 0x55c83536a5da in duckdb::PreparedStatement::Execute(duckdb::vector<duckdb::Value, std::allocator<duckdb::Value> >&, bool) /root/duckdb_master/src/main/prepared_statement.cpp:77
    #30 0x55c833da8367 in duckdb_shell_sqlite3_print_duckbox /root/duckdb_master/tools/sqlite3_api_wrapper/sqlite3_api_wrapper.cpp:238
    #31 0x55c833d38864 in exec_prepared_stmt /root/duckdb_master/tools/shell/shell.c:12898
    #32 0x55c833d3be85 in shell_exec /root/duckdb_master/tools/shell/shell.c:13233
    #33 0x55c833d719fb in runOneSqlLine /root/duckdb_master/tools/shell/shell.c:20075
    #34 0x55c833d72f39 in process_input /root/duckdb_master/tools/shell/shell.c:20193
    #35 0x55c833d77a00 in main /root/duckdb_master/tools/shell/shell.c:21014
    #36 0x7f5a60b8f082 in __libc_start_main ../csu/libc-start.c:308
    #37 0x55c833cd970d in _start (/root/duckdb_master/bin/usr/local/bin/duckdb+0xd7e470d)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /root/duckdb_master/third_party/utf8proc/utf8proc_wrapper.cpp:74 in duckdb::Utf8Proc::Analyze(char const*, unsigned long, duckdb::UnicodeInvalidReason*, unsigned long*)
==2032597==ABORTING

OS:

Ubuntu 20.04 64bit

DuckDB Version:

v0.7.2-dev2867 aa20f17

DuckDB Client:

Binary (/usr/local/bin/duckdb)

Full Name:

Jingzhou Fu

Affiliation:

Wingtecher Lab of Tsinghua University

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree
@c8ef
Copy link

c8ef commented May 5, 2023

Seems that cannot reproduce on the latest branch: 97f3f09

@fuboat
Copy link
Author

fuboat commented May 5, 2023

Seems that cannot reproduce on the latest branch: 97f3f09

A little bit strange, it seems only reproducible when compiling with -DCMAKE_BUILD_TYPE=Debug. My steps in bash are the following:

git clone https://github.com/duckdb/duckdb.git duckdb_97f3f09
cd duckdb_97f3f09/
git checkout 97f3f09
mkdir bld
cd bld/
cmake .. -DCMAKE_BUILD_TYPE=Debug
make -j20
echo "BEGIN;                                 
CREATE TABLE t1(a VARCHAR(256) PRIMARY KEY, b INTEGER) WITH (storage_parameter='heap');
INSERT INTO t1 VALUES('    4-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', 2+1);
INSERT INTO t1 VALUES('   34-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', 18);
INSERT INTO t1 SELECT b, b+1 FROM t1 WHERE b<5;
UPDATE t1 SET a = CONCAT(a, 'x') WHERE b%2=0;" | ./duckdb

@xuke-hat
Copy link
Contributor

xuke-hat commented May 5, 2023

Seems that cannot reproduce on the latest branch: 97f3f09

I can reproduce it in the latest branch. Did you try debug build with asan?

Minimal reproduction:

BEGIN;
CREATE TABLE t1(a VARCHAR(10) PRIMARY KEY, b INTEGER);
INSERT INTO t1 VALUES('aa', 1);
INSERT INTO t1 VALUES('bb', 2);
INSERT INTO t1 SELECT b, b FROM t1;
UPDATE t1 SET a = a || b WHERE b%2=0;

The problem is it tried to delete 2nd and 4th tuple and reinsert, but the scan only returns the 1st and 2nd tuple, and then it tries to access 4th tuple, which is an invalid address.

@xuke-hat
Copy link
Contributor

xuke-hat commented May 5, 2023

I have read the code and the problem seems to be at RowGroupCollection::RemoveFromIndexes

It just use the row group that includes row_ids[0], is it correct? In this case row_ids[0] and row_ids[1] are in different row group.

May be we need to loop through all row group that includes row_id?

@Mytherin Mytherin self-assigned this May 10, 2023
Mytherin added a commit to Mytherin/duckdb that referenced this issue May 10, 2023
…y account for the case where the row identifiers might not all be present in the same row group
@Mytherin
Copy link
Collaborator

Thanks for the report! I have pushed a fix in #7442.

Indeed the issue is that the row_id access is out of range in RowGroupCollection::RemoveFromIndexes. This comes from an assumption that all row groups are completely filled until the maximum aside from the last row group - which is no longer the case since last release when we reworked our storage to allow for partially-full row groups. As a result of this sequence of queries there are two row groups (Row group A, count 2) -> (Row group B, count 1). This code then incorrectly assumed since there are 3 tuples in the table, fetching a vector from row group A would fetch all 3 tuples.

The fix is to correctly check if the tuples are present in the current row group, and fetch the next row group if required.

Mytherin added a commit that referenced this issue May 11, 2023
Fix #7348 - In RowGroupCollection::RemoveFromIndexes - correctly account for the case where the row identifiers might not all be present in the same row group
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants