A pluggable storage engine that brings DuckDB's columnar analytical engine inside MariaDB Server.
Create a table with ENGINE=DuckDB and analytical queries against it are executed through DuckDB's columnar vectorized engine — no ETL pipelines, no separate cluster, no additional protocols. One server, one SQL interface, the familiar mariadb client.
- HTAP (Hybrid Transactional/Analytical Processing) — InnoDB handles OLTP, DuckDB handles analytics, both in the same database.
- Ad-hoc analytical queries — complex joins, aggregations, subqueries, and window functions over large datasets without exporting data to a separate system.
- Eliminating ETL complexity — no need for a dedicated analytical cluster or data movement pipelines; the analytical engine runs in-process.
TPC-H Scale Factor 10 (~10 GB raw data, ~60M rows in lineitem):
| Metric | Result |
|---|---|
| Data loading | 250 seconds |
| All 22 TPC-H queries | 3.7 seconds total |
DuckDB is an in-process analytical database. Its performance rests on three pillars:
- Columnar storage — minimizes unnecessary data reads.
- Vectorized execution — processes data in batches for maximum CPU cache efficiency.
- Parallelism — leverages all available processor cores.
Tables created with ENGINE=DuckDB store data in DuckDB's native format. Queries are translated and executed by the DuckDB engine. InnoDB and DuckDB tables coexist in the same database.
The engine is built as part of the MariaDB server tree. It lives under storage/duckdb/ and uses ExternalProject_Add to build upstream DuckDB v1.5.2 from source (git submodule at third_parties/duckdb/).
Until the patch is accepted into MariaDB upstream, clone one of the prepared branches from the server fork:
# Pick the branch matching your target MariaDB version
git clone --recurse-submodules -b bb-11.4-duckdb https://github.com/drrtuy/mdb-server.git mariadb-server
# or: -b 11.8-duckdb
# or: -b 12.3-duckdb
cd mariadb-server
# Install build dependencies (requires root)
./storage/duckdb/duckdb/build.sh -D
# Build and install
./storage/duckdb/duckdb/build.shRPM (Rocky/Fedora/Amazon Linux):
./storage/duckdb/duckdb/build.sh -pDEB (Debian/Ubuntu):
./storage/duckdb/duckdb/build.sh -pThe engine supports cross-engine joins — a single SELECT can combine DuckDB tables with tables from other engines (e.g. InnoDB). When the query planner detects a mix of engines, the select_handler does the following:
- Opens the non-DuckDB tables via MariaDB's handler API and registers them in a thread-local table registry.
- Derives a per-table predicate from the query's
WHERE(viamake_cond_for_table) and registers it alongside each external table. - Pushes the entire query — including all
WHERE,JOIN,GROUP BY, andORDER BYclauses — down to DuckDB as a single SQL statement. - DuckDB's replacement scan callback transparently redirects references to non-DuckDB tables to the
_mdb_scantable function.
The _mdb_scan table function does not loop over the handler directly. On its first call it lazily spawns a cooperative fiber (stack-switching runtime borrowed from libmariadb) running on a dedicated background THD. Inside the fiber it builds a synthetic, projection- and predicate-pushed
SELECT <needed columns> FROM <db>.<table> [WHERE <pushed predicate>]
and executes it through the full MariaDB pipeline (lex_start → parse_sql → mysql_execute_command). A custom select_result_interceptor (select_to_duckdb) converts each result row's Items into a DuckDB DataChunk; when a chunk fills (STANDARD_VECTOR_SIZE) the fiber yields back to DuckDB, which consumes the chunk and resumes the fiber for the next batch.
Because the external scan runs through mysql_execute_command, it uses the MariaDB optimizer and all available access paths (index range/ref access, etc.), and the pushed WHERE predicate is evaluated by MariaDB before rows reach DuckDB. On the DuckDB side, the full query text lets DuckDB's optimizer reorder joins, build hash tables, and run all aggregation/sorting over the streamed rows. A direct ha_rnd_next loop is retained only as a fallback when no fiber is used.
This means queries like the following just work:
SELECT d.id, d.amount, i.name
FROM analytics.orders d -- ENGINE=DuckDB
JOIN inventory.products i -- ENGINE=InnoDB
ON d.product_id = i.id
WHERE d.amount > 1000;DuckDB handles the join, aggregation, and sorting; InnoDB rows are produced on demand by a fiber-driven MariaDB query. No data copying or ETL is required.
- DECIMAL precision > 38 — DuckDB supports up to 38 digits; wider MariaDB DECIMALs will fail on DDL conversion.
- Some MariaDB functions are yet not pushdown-compatible —
GROUP_CONCAT(),DATE_FORMAT(),JSON_CONTAINS(),FOUND_ROWS(),LAST_INSERT_ID(), and a few others have no DuckDB equivalent or differ in syntax. Such queries fall back to MariaDB execution. - Strict GROUP BY — DuckDB rejects
SELECTcolumns not inGROUP BYand not aggregated, even when MariaDB'ssql_modeallows it. - XA transactions —
XA PREPAREis not supported by the engine. - Collations — MariaDB UCA-based collation rules are approximated via DuckDB's built-in
NOCASE/NOACCENTcollations for UTF-8 charsets; non-UTF8 charsets fall back to binary comparison. Seedocs/collation-mapping.mdfor the full mapping and known gaps. - Cross-engine scan is yet single-threaded — each external (non-DuckDB) table is produced by a single fiber-driven MariaDB query (
_mdb_scanreportsMaxThreads() == 1); only the DuckDB side of the query is parallelized. - ALTER COLUMN DROP DEFAULT — not propagated to DuckDB catalog.
- Timezone propagation —
TIMESTAMPcolumns are stored asTIMESTAMPTZ; timezone must be set consistently between MariaDB and DuckDB contexts to avoid shifts.
See docs/mariadb-duckdb-incompatibilities.md for a detailed compatibility matrix.
This project is licensed under the GNU General Public License v2. See COPYING for details.
DuckDB itself is licensed under the MIT License.
Alibaba and the AliSQL Project — A special thank you to Alibaba and their engineering team for open-sourcing AliSQL. AliSQL is a MySQL branch developed at Alibaba Group and extensively used in their production infrastructure. The December 2025 open-source release of AliSQL 8.0 included integration of DuckDB as a native storage engine, providing a valuable reference implementation and validating the viability of embedding DuckDB inside a MySQL-compatible server. The DuckDB Engine for MariaDB draws heavily on this experience.
The DuckDB Project — None of this would be possible without the remarkable work of the DuckDB team. DuckDB has grown from an academic research project into one of the most impressive analytical engines available today — fast, embeddable, dependency-free, and released under the permissive MIT License. Its clean C++ codebase and well-defined embedding API are what make integrations like this one feasible.