New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INTERNAL Error: Invalid unicode (byte sequence mismatch) detected in segment statistics update #7263
Comments
This problem is not reproducible. I ran the test suite 10 times and it occurred just 1 time. |
It looks to me like it is a problem in the benchmark itself where the data generator is randomly generating invalid UTF8 and inserting that into the database using the C++ API (avoiding the UTF8 verification until the data reaches the storage). |
Yes, that was my assumption too and so I reported this issue. |
I have a hunch about what it could be. We could try running the benchmark with the build flag Edit: cannot reproduce even with these flags |
Reproducible parquet in #5882 (although not benchmark related). |
I'm gonna take a guess at the issue, from looking into it a bit. We create a stack allocated string_t StringVector::AddStringOrBlob(Vector &vector, string_t data) {
D_ASSERT(vector.GetType().InternalType() == PhysicalType::VARCHAR);
if (data.IsInlined()) {
// string will be inlined: no need to store in string heap
return data;
}
if (!vector.auxiliary) {
vector.auxiliary = make_buffer<VectorStringBuffer>();
}
D_ASSERT(vector.auxiliary->GetBufferType() == VectorBufferType::STRING_BUFFER);
auto &string_buffer = (VectorStringBuffer &)*vector.auxiliary;
return string_buffer.AddBlob(data);
} This pointer is also not allocated, because it's supposed to be inlined, so we store the pointer in the Vector. But if As a side note, shouldn't |
Problem noted by Thisj here: duckdb#7263
Problem noted by Thisj in duckdb#7263
Possible fix to duckdb#7263 (area_code[0] might be uninitialized)
Fix to duckdb#7263 (area_code[0] might be uninitialized otherwise)
Fix to duckdb#7263 (area_code[0] might be uninitialized otherwise)
What happens?
To Reproduce
Normal Performance Test, called by
OS:
Linux archlinux 6.2.12-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 20 Apr 2023 16:11:55 +0000 x86_64 GNU/Linux
DuckDB Version:
Latest Git
DuckDB Client:
Shell
Full Name:
Andreas Reichel
Affiliation:
manticore-projects Co. Ltd.
Have you tried this on the latest
master
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: