You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inserting a simple string-columns-only CSV into a table will corrupt the data deterministically (by replacing some values with empty strings) when DuckDB happens to choose dictionary compression for a segment.
(discovered by executing select * from pragma_storage_info('urls'); for the segment the corrupted data is in)
After executing eg. SET force_compression = 'FSST'; and rebuilding the database from the same source CSV with the same query, the corruption disappears.
What happens?
Inserting a simple string-columns-only CSV into a table will corrupt the data deterministically (by replacing some values with empty strings) when DuckDB happens to choose dictionary compression for a segment.
(discovered by executing
select * from pragma_storage_info('urls');
for the segment the corrupted data is in)After executing eg.
SET force_compression = 'FSST';
and rebuilding the database from the same source CSV with the same query, the corruption disappears.To Reproduce
The CSV & SQL file (content warning: contains random URLs from the Internet):
https://gist.github.com/jaens/0be7a28adeec547e520ffcdc6dfc8d85
OS:
Linux x64
DuckDB Version:
0.6.0 & master 9479be7
DuckDB Client:
CLI
Full Name:
Jaen
Affiliation:
none
Have you tried this on the latest
master
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: