v0.10.3 Bugfix Release
This is a bug fix release for various issues discovered after we released 0.10.2. There are no new major features, just bug fixes. Database files created by DuckDB v0.10.* or v0.9.* can be read by DuckDB v0.10.3.
Highlights
Even though this is "only" a bug fix release, there have been some major areas of work that warrant a separate mention:
- We have added a feature to update extensions using the
UPDATE EXTENSIONS
syntax #11677 - There have been some serious internal improvements around checkpointing, most notably, checkpoints can run while other connections are reading, and no longer block new connections while checkpointing #11918. Also,
FORCE CHECKPOINT
no longer actively cancels transactions, it now waits until it can checkpoint #12061 - DuckDB now has native support to load data from HuggingFace using the
hf://
prefix #11831 - We have slightly changed
NULL
casting behaviour with theMAP
type #11745 - The Java JDBC driver has been moved to its own repo: https://github.com/duckdb/duckdb-java #11873
- DuckDB now cleanly compiles with
-Wconversion
and all conversions are actually being checked #11716, #11673
What's Changed
- Add setting to control the maximum swap space by @Tishj in #10978
- [Python][Dev] Dynamically generate the Connection wrapper methods by @Tishj in #11202
- Fixes duckdb wasm by @carlopi in #11688
- Checked conversions between signed and unsigned integers by @hannes in #11673
- Bump Julia to v0.10.2 by @Mytherin in #11700
- Minor improvements to sql_reduce script by @Mytherin in #11701
- Properly avoid build-time dependency on Python by @carlopi in #11713
- Test dockerized compilation in Alpine:latest and Ubuntu:20.04 by @carlopi in #11708
- [COPY CSV] Enable TIMESTAMP_TZ formats by @Tishj in #11711
- Full conversion warnings / checks by @hannes in #11716
- [Safety] Add safety checks to
shared_ptr
access by @Tishj in #11696 - Remove bound_defaults from BoundCreateTableInfo by @Mytherin in #11721
- Improve mkdir error reporting by @Mytherin in #11723
- [Dev] Fix failing CI in Python SQLLogicTest Runner by @Tishj in #11724
- More docker tests, fix compilation up to C++23 standard by @carlopi in #11725
- Upload staging: from 'git describe --tags' to 'git log -1' by @carlopi in #11715
- Internal #1848: Window Progress by @hawkfish in #11702
- Remove BoundConstraint from the TableCatalogEntry by @Mytherin in #11735
- Implicit Cast for any Date/Timestamp by @pdet in #11733
- feat: rewrite which_secret() into a table function by @stephaniewang526 in #11726
- [Map] Rework
MAP
creation method behavior when input is NULL by @Tishj in #11730 - [Dev] Always use
SQLStatement->Copy()
when ALTERNATIVE_VERIFY is defined by @Tishj in #11732 - Reconstruct Error Messages for Flush Cast by @pdet in #11736
- Getting Rid of Value.TryCast in the CSV Sniffer by @pdet in #11717
- Fix Join order optimizer so that plan generation is always via the most current entry in the DP table. by @Tmonster in #11719
- fix(py): support DuckDBPyType#children for array and enum by @Mause in #11754
- Consider not null values when doing export database by @pdet in #11679
- Add missing space in error message by @szarnyasg in #11759
- Allow to build python packages without c++ sources by @carlopi in #11758
- No Mark to Semi join conversion in statistics propagation by @Tmonster in #11596
- Hive partitioned write: lazy partitioning initialization by @Mytherin in #11765
- Hive partitioning: avoid calling CreateDirectories for every flush, instead create the directory for a partition only when that partition is instantiated by @Mytherin in #11777
- [Parquet] Support reading the non-standard NULL ConvertedType by @Tishj in #11774
- Only store CSV Errors if we are doing rejects table, otherwise just ignore it. by @pdet in #11763
- CI: Add job for 'expected behavior' label by @szarnyasg in #11784
- Move recursive_query_csv.test to slow test by @pdet in #11770
- [StatementVerifier] Fix up issues in ToString implementations of classes derived from SQLStatement by @Tishj in #11625
- Hive partitioning: make OVERWRITE_OR_IGNORE remove files on local file systems by @Mytherin in #11787
- [ODBC] Add ODBC Test for Database Reconnection and Data Persistence by @maiadegraaf in #11783
- Correctly parse dollar-quoted strings in sqlite3_complete and linenoise by @Mytherin in #11789
- Add a configurable compression_level parameter to the parquet writer by @Mytherin in #11791
- Close file after file lock failure by @awitten1 in #11795
- Python: Add missing options to write_parquet by @jzavala-gonzalez in #11790
- [PythonDev] Fix up failing tests in CI by @Tishj in #11801
- Fix
static bitpacking_width_t FindMinimumBitWidth(T *values, idx_t count)
inclass BitpackingPrimitives
by @Lloyd-Pottiger in #11757 - Add note on CMAKE_BUILD_PARALLEL_LEVEL by @mlafeldt in #11808
- Elaborate on internal errors by @szarnyasg in #11816
- Fix #11756: Don't throw exception on CREATE UNIQUE INDEX IF NOT EXISTS if index already exists by @ewencp in #11821
- Python CI fixes: skip two tests by @carlopi in #11818
- Fix #11798 - lateral join parameters should not be visible in views by @Mytherin in #11825
- Fix #11804: make sure json_type can check null by @lnkuiper in #11807
- Fixing performance regression in [u]hugeint cast by @hannes in #11829
- [Dev] ClientContextWrapper yak shaving by @Tishj in #11830
- [Python] Add
checkpoint
method, improve shutdown experience by @Tishj in #11810 - [Benchmark] Enable benchmarking result collection by @Tishj in #11529
- [DependencyManager] Create dependencies between foreign key tables and primary key tables. by @Tishj in #11524
- [Python] Synchronize defaults of DuckDBPyRelation method
fetch_df_chunk
by @Tishj in #11834 - Internal #1888 TIMETZ Collation Keys by @hawkfish in #11861
- Removing old code that used to check if a buffer was the last buffer from the file handler by @pdet in #11846
- Use
ToSQLString()
inConstantFilter
for escaped filter output by @rcurtin in #11797 - [StatementVerifier] Add
ToString
for every remaining SQLStatement, is pure virtual now by @Tishj in #11788 - Pushdown Tables Types to CSV Scanner by @pdet in #11792
- [Python Dev] Fix shift between
requirements-dev.txt
andpyproject.toml
before-test
section by @Tishj in #11863 - Join order optimizer asan bug Follow up by @Tmonster in #11794
- BugFix: Introducing Introducing Delim Joins and Delim_Get(s) should respect positionally by @Tmonster in #11812
- Provide the native OID of PG type in pg_type by @goldmedal in #11746
- Move JDBC (Java) Driver to Separate Repo by @hannes in #11873
- Link Java client in issue template by @szarnyasg in #11877
- Change specificity of sniffed types to check time related types earlier by @pdet in #11878
- fix complex top n test case for constant vector verification by @Tmonster in #11882
- [Dev] Merge overloads for HUGEINT cast functions by @Tishj in #11879
- Make " default for quote and " default for escape by @pdet in #11880
- Set secret directory to a test directory when running sqllogictest by @Mytherin in #11885
- Bugfixes by @lnkuiper in #11785
- [Map] Rework interaction (entries, keys, values, extract) of NULL MAPs by @Tishj in #11745
- Add case when expression for grouping sets when collations are used. by @Tmonster in #11884
- Internal #11892: Interval Quarter Keyword by @hawkfish in #11898
- HTTP Logging by @lnkuiper in #11771
- [Dev] Use strings in the SQLLogicTest
REQUIRE
calls so they are visible with-s
by @Tishj in #11714 - [Dev] Fix a SerializationException on CopyInfo by @Tishj in #11902
- MultiFileReader refactor by @samansmink in #11806
- Allow checkpoints to run while other connections are reading, and no longer block new connections while checkpointing by @Mytherin in #11918
- Allow converting
TIMETZ
to Arrow by @LoganDark in #11906 - Issue #11894: MIN/MAX_BY DECIMAL Casting by @hawkfish in #11912
- Issue #1917: WinNode 22 Compilation by @hawkfish in #11913
- [Relation] Add MaterializedRelation by @Tishj in #11835
- Enable purging of BufferPool pages based on time-since-last-unpinned by @jkub in #11441
- Correctly render duckbox for empty results by @Mytherin in #11920
- Always store transactions that had errors during the commit phase by @Mytherin in #11929
- More anonymous struct zapping in RE2 by @hannes in #11956
- Add the corrupt block location to the exception by @Vegetable26 in #11966
- Fix assertion in bitpacking by @nickgerrets in #11955
- [Python] Add
CoalesceOperator
to Python Expression API. by @Tishj in #11941 - CMake: Handle git failures on invalid inputs better by @carlopi in #11951
- Internal #2005: DISTINCT ORDER BY by @hawkfish in #11967
- Fix overlooked function argument rename that leads to seg faults. by @smonkewitz in #11969
- [Nightly] Block size test fixes by @taniabogatsch in #11972
- Optimizing InsertionSort by reducing the size of the comparison by @gitccl in #11964
- [Python] Keep referenced Python objects alive by @Tishj in #11761
- Move mysql_scanner into main duckdb CI by @carlopi in #11999
- Fix CURRENT_SETTING with a NULL string arg by @gitccl in #12015
- Issue #12009: APPROX_QUANTILE NULL List by @hawkfish in #12014
- Issue #12003: TIMESTAMP Stack Overflow by @hawkfish in #12012
- fix extension load error message grammar by @softprops in #11994
- [Python] Fix InternalException from scanning Polars DF with no columns by @Tishj in #11982
- Issue #11959: TIMESTAMPTZ >= DATE by @hawkfish in #11987
- More fixes for RE2 to pass CRAN tests by @hannes in #11978
- chore: update exception message by @stephaniewang526 in #11965
- Issue #12005: RESERVOIR_QUANTILE DECIMAL Binding by @hawkfish in #12013
- [Python] Grab the GIL in the destructor of PyFilesystem by @Tishj in #11980
- [Python] Make the NumPy module optional, not throwing if it's not installed by @Tishj in #11981
- Add support for HuggingFace to httpfs by @samansmink in #11831
- [Fix] lambda binding in ALTER TABLE statements by @taniabogatsch in #11976
- Distinguish between exact and case insensitive matching JSON keys in
json_structure
by @lnkuiper in #11948 - Rework index binding by @Maxxen in #11867
- Issue #11995: TIMESTAMP Rounding by @hawkfish in #12011
- Fix sample serialization by @Tmonster in #12025
- Correctly skipping errors when ignore_errors is set and we have columns with escaped values by @pdet in #12027
- Update comment to reflect correct data state post-compression by @wangxuqi in #12022
- Fix ordering issue with nested list type by @gitccl in #11937
- Adding Fix to properly pass timestamp/date formats in the relational API for CSV Files by @pdet in #12029
- Add more MultiFilereader features/hooks by @samansmink in #11984
- Rethrow serialization errors by @carlopi in #12030
- Move yyjson into core by @Maxxen in #11998
- Bugfixes + large allocation hardening by @Maxxen in #12028
- Ensure HT capacity is greater than lower bound by @lnkuiper in #12039
- Fix materialized CTE plan issue by @kryonix in #11874
- Fix some fuzzer issues by @hannes in #12043
- [Fix] Return NULL for deprecated getter calls in the C API by @taniabogatsch in #12035
- Grab checkpoint lock during storage metadata reads by @Mytherin in #12053
- Issue #12041: TIMETZ Parquet Nanoseconds by @hawkfish in #12052
- Parquet: Correctly return min/max string stats if empty by @lnkuiper in #12054
- Even more fuzzer fixes by @Maxxen in #12050
- [Fix] Silent constraint violation error when destroying the appender in the C API by @taniabogatsch in #12051
- Add "Tags" support to catalog entries by @Maxxen in #12044
- Rework FORCE CHECKPOINT - instead of actively cancelling transactions it now blocks until it can checkpoint by @Mytherin in #12061
- Aggregation bugfixes by @lnkuiper in #12055
- [Fix] Disable test for block size nightly run by @taniabogatsch in #12062
- Bind art index in local storage by @Maxxen in #12064
- Cast keys to VARCHAR before creating JSON from MAP by @lnkuiper in #12065
- [Python] Add pyspark hash and organize unit tests by @mariotaddeucci in #11935
- Check context.interrupted during force checkpoint by @Mytherin in #12068
- [Fix] Lazy WAL creation by @taniabogatsch in #12049
- Test docker images: improvement and connected fixes by @carlopi in #12026
- More fuzzer fixes by @hannes in #12045
- [Python] Add pyspark null functions by @mariotaddeucci in #11940
- CI fixes: unused variable & toolchain version by @carlopi in #12083
- Add autoloading for delta extension by @samansmink in #12063
- S3FileHandle Destructor should call
Close()
conditionally by @onderkalaci in #12031 - [Fix] Internal segment tree exception in on conflict clause by @taniabogatsch in #12084
- Remove ClientContext usage in Checkpoint Reader by @Mytherin in #12076
- Fixed Parquet crash on missing dictionary by @hannes in #12085
- [Fix] Add lambda binding to the HAVING binder by @taniabogatsch in #12070
- Decimal/Time implicit casting + Multi-Error store in Flush by @pdet in #11848
- [Testing Infra Fix] Make input data chunks immutable in the vector verification tests by @taniabogatsch in #12088
- Correctly rewrite correlated columns inside window functions by @Mytherin in #12087
- Fix #11780 - handle qualifications in ORDER BY of ARRAY clause by @Mytherin in #12090
- Nightly CI fixes by @Mytherin in #12093
- Change ExtensionOptimizer input by @Maxxen in #12094
- Fix for issue related to the execution of union by all from .sql in Python by @pdet in #12098
- yyjson bump version to 2020 by @carlopi in #12072
- [Dev] Collect CatalogEntry Dependencies during Binding by @Tishj in #11493
- Internal #2040: ICU Collation Serialisation by @hawkfish in #12077
- Run python tests in Pyodide build by @cpcloud in #11914
- Add support for type modifiers on extension types by @Maxxen in #12081
- Bump extensions by @carlopi in #12107
- fix huggingface credential_chain autoload issue by @samansmink in #12112
- Fix fuzzer issue 2690 by @lnkuiper in #12108
- Throw exception in case of WAL failure instead of only printing a message by @Mytherin in #12091
- Change type of columns from sniff_csv to list of structs by @pdet in #12099
- [Python][Dev] Skip statements with decorators (only if, skip if) in the Python SQLLogicTester by @Tishj in #12102
- Mark unspecialized C++
Append
template as delete by @j1ah0ng in #12116 - SQLLogicTest - skip these tests now that we have dependencies between views by @Mytherin in #12118
- Correctly determine if we need to scan flat vectors in all cases - and add an enum to clarify code by @Mytherin in #12119
- Avoid signed integer overflow in sequence generation by @Mytherin in #12120
- Use Binder::BindCreateTableCheckpoint in WAL ReplayCreateTable by @Mytherin in #12121
- Avoid checking if wal is set directly and call GetWALSize instead - a WAL might be present even if wal is not set by @Mytherin in #12124
- Call StringVector::AddString here for when inlining is disabled by @Mytherin in #12125
- Minor fixes for vsize=2 tests by @Mytherin in #12126
- Internal #2078: Nested Nulls First by @hawkfish in #12131
- Bump extensions, part 2 by @carlopi in #12122
- Internal #2081: Window Distinct Reset by @hawkfish in #12130
- Read scan count once instead of once per vector to avoid issue where scan counts between vectors could become mis-aligned in concurrent scenarios by @Mytherin in #12135
- Extension Updating by @samansmink in #11677
- Move pyodide from repository_dispatch to NightlyTests.yml by @carlopi in #12153
- [Storage] Add
storage_compatibility_version
to control for what version the DB has to be serialized. by @Tishj in #12110 - Allow quotes to be escaped in JSON path by @lnkuiper in #12033
- [Python] Fix issue in the SQLLogicTestRunner implementation by @Tishj in #12155
- Higher memory limit for test by @lnkuiper in #12158
- Fix internal error of list_zip and map_concat by @gitccl in #12086
- fix row format of arrays larger than vector size with null by @Maxxen in #12143
- Issue #12136: Streaming Window Structs by @hawkfish in #12150
- Set max vector size to 128GB instead of 4GB by @Mytherin in #12144
- Pass prepared statement parameters to OnExecutePrepared callback by @Mytherin in #12156
- In string to list try_cast - set the target index to NULL, not the source index by @Mytherin in #12160
- More Nightly CI Fixes by @Mytherin in #12154
- Fixing unchecked malloc() calls in Parser and elsewhere by @hannes in #12162
- Modify the pandas analyzer code to always respect the sample size by @pdet in #12097
- Allow community extensions: add setting and keys by @carlopi in #12152
- Fixing parquet dictionary / data page offset bug by @hannes in #12109
- small fix to extension origin checks and direct installing over http by @samansmink in #12165
- [DependencyManager] Provide details in case of a DROP statement that needs CASCADE. by @Tishj in #12159
- Remove UnsafeNumericCast in create_sort_key by @Mytherin in #12168
- [Dev]
enable_verification
now serializes for compatibility version'latest'
by @Tishj in #12157 - [Relation] Disable creating a VIEW from a MaterializedRelation by @Tishj in #12163
- Move community keys to proper values by @carlopi in #12175
- Remove release assertions timeout by @Mytherin in #12176
- Internal #2095: Streaming Window Structs by @hawkfish in #12173
- [CSV Reader] Bug-fix related to skip parameter over vector size in the sniffer by @pdet in #12167
- Expression rewrite filter pushdown for dates by @Tmonster in #12056
- [Python] Throw if replacement scan is attempted on cross-connection DuckDBPyRelation by @Tishj in #12169
- [Fix] Correctly allocate the ARRAY target child vector in a MAP function by @taniabogatsch in #12111
- Remove java from CI invoker by @hannes in #12182
- Mark correct database as modified in CreateIndex by @Mytherin in #12183
Full Changelog: v0.10.2...v0.10.3