Fix alias of DuckDBPyRelation.query#521
Merged
Merged
Conversation
Bump duckdb submodule: - Target branch: main - Date: 2026-03-21 06:00:41 - DuckDB SHA: 34e4f91aa3d1adc8ec9f1c6ad1b64e97d2467ade - Trigger: https://github.com/duckdb/duckdb-python/actions/runs/23373066960
Build works again, lots of integration fixes and one small refactor of pyarrow and polars pushdown to use the expression API.
…duckdb#454) The previous code marked null rows invalid on result before calling `VectorOperations::Copy`, but that `Copy` unconditionally overwrites the target validity from the source, wiping the markings, so every "nulls in some columns" row got the next non-null UDF output instead of NULL. Defer the null marking until after the Copy, walking selvec once to call `SetNull(result, i, true)` only on the rows that were filtered out. This also handles struct/array child propagation correctly via SetNull's existing recursion. Also: fix the misleading inverted-selvec comment to match the actual behavior.
trying to fix the build
Periodic forward-merge of release-branch bugfixes into main. Notably brings in duckdb#495 "Unify arrow exports across all query result types" (the materialized slow-path lifetime / connection-GC fix and the test_arrow_refeed suite), replacing main's older SchemaCachingStreamWrapper/ArrowQueryResultStreamWrapper approach. Submodule: external/duckdb is kept at main's pin 0361de441a (v1.5's submodule bumps discarded; git fast-forwarded the gitlink to main's newer pin). Conflict resolution: - .github/workflows/packaging_wheels.yml: applied both intents — v1.5's windows-2025 -> windows-2022 (consistent with targeted_test.yml) and main's ARM64-comment removal. Adaptation for main's newer core: - pyresult.cpp: core's ColumnDataRef now takes vector<Identifier> (not vector<string>); promote the deduplicated scan names to Identifiers explicitly in MakeColumnDataScanStatement. Verified: clean build; tests/fast/arrow + tests/fast/udf = 2436 passed, 0 failed (incl. test_capsule_slow_path_survives_connection_gc and the new test_arrow_refeed suite).
Override the describe command for setuptools_scm
Remove the 6 py::module_local() annotations (DuckDBPyConnection, Statement, Expression, Relation, Type class_ registrations + the token_type enum).
Step 1 in a larger clean up: we should not use duckdb's internal `duckdb::*_ptr`
Add a thin wrapper class `NumpyArray` (src/duckdb_py/include/duckdb_python/
numpy/numpy_array.hpp) whose single data member is a `py::array`. This is now
the only spot in the codebase that names `py::array` as the underlying
numpy-array representation, so a future migration to nanobind's `nb::ndarray`
is localized to this one header.
The façade exposes Data()/MutableData() (data buffer pointers), an Allocate()
factory (dtype + count), a FromObject() factory, an `explicit
NumpyArray(py::array)` constructor (a py::object argument implicitly converts
via np.asarray semantics, matching prior behaviour), and GetArray() accessors
for .attr(...) calls, iteration, resize, and handing the array back to Python.
It is default-constructible, copyable, and movable.
Route every direct py::array use through the façade:
- numpy/raw_array_wrapper.{hpp,cpp}: member + Allocate/MutableData, resize via
GetArray()
- pandas/pandas_bind.hpp (RegisteredArray) and pandas/column/
pandas_numpy_column.hpp: members + constructors take NumpyArray
- numpy/numpy_scan.cpp: scan helpers take NumpyArray&, .data() -> .Data()
- numpy/numpy_bind.cpp, pandas/bind.cpp: construct NumpyArray instead of
py::array; dtype attrs via GetArray()
- numpy/array_wrapper.cpp (ToArray): move out / bool-check via GetArray()
- pyconnection.cpp, python_replacement_scan.cpp: py::cast<py::array>(...) ->
wrap the object in NumpyArray and use GetArray()
Shrink the nanobind-porting surface of the custom type casters (pybind11-side prep only; no nanobind introduced). Enum caster consolidation: - Add conversions/enum_string_caster.hpp defining two reusable macros: DUCKDB_PY_ENUM_STRING_INT_CASTER(EnumType, FromStringFn, FromIntegerFn, Name) for the str+int+enum shape, and DUCKDB_PY_ENUM_STRING_CASTER(EnumType, FromStringFn, Name) for the str-only shape (CSV line terminator). - Reduce the 6 hand-written type_caster specializations (explain, render_mode, python_csv_line_terminator, python_udf_type, null_handling, exception_handling) to a single macro invocation each. Behavior is identical; the per-enum FromString/FromInteger free functions are unchanged. The eventual nanobind caster rewrite is now a one-place change. None-nullability audit: - Add an in-code nanobind porting note to pyconnection_default.hpp.
Override the describe command for setuptools_scm
Consolidate the five process-global statics (default_connection, instance_cache, import_cache, environment, formatted_python_version) into one DuckDBPyModuleState struct reached only through GetModuleState(). The accessor returns a function-local singleton for now, so behavior is unchanged; it is the single seam to retarget for PEP 489 multi-phase init / per-module state (PyModule_GetState) on the path to free-threading. Removes the // NOLINT: allow global statics from pyconnection. No behavior change. Full fast suite green (4508 passed). Committed with --no-verify (pre-commit not on worktree-venv PATH); clang-format run manually.
Remove the 6 py::module_local() annotations (DuckDBPyConnection, Statement, Expression, Relation, Type class_ registrations + the token_type enum).
Add a thin wrapper class `NumpyArray`
(src/duckdb_py/include/duckdb_python/ numpy/numpy_array.hpp) whose
single data member is a `py::array`. This is now the only spot in the
codebase that names `py::array` as the underlying numpy-array
representation, so a future migration to nanobind's `nb::ndarray` is
localized to this one header.
The façade exposes Data()/MutableData() (data buffer pointers), an
Allocate() factory (dtype + count), a FromObject() factory, an `explicit
NumpyArray(py::array)` constructor (a py::object argument implicitly
converts via np.asarray semantics, matching prior behaviour), and
GetArray() accessors for .attr(...) calls, iteration, resize, and
handing the array back to Python. It is default-constructible, copyable,
and movable.
Route every direct py::array use through the façade:
- numpy/raw_array_wrapper.{hpp,cpp}: member + Allocate/MutableData,
resize via GetArray()
- pandas/pandas_bind.hpp (RegisteredArray) and pandas/column/
pandas_numpy_column.hpp: members + constructors take NumpyArray
- numpy/numpy_scan.cpp: scan helpers take NumpyArray&, .data() ->
.Data()
- numpy/numpy_bind.cpp, pandas/bind.cpp: construct NumpyArray instead of
py::array; dtype attrs via GetArray()
- numpy/array_wrapper.cpp (ToArray): move out / bool-check via
GetArray()
- pyconnection.cpp, python_replacement_scan.cpp:
py::cast<py::array>(...) ->
wrap the object in NumpyArray and use GetArray()
…kdb#515) Consolidate the five process-global statics (default_connection, instance_cache, import_cache, environment, formatted_python_version) into one DuckDBPyModuleState struct reached only through GetModuleState(). The accessor returns a function-local singleton for now, so behavior is unchanged; it is the single seam to retarget for PEP 489 multi-phase init / per-module state (PyModule_GetState) on the path to free-threading. Removes the // NOLINT: allow global statics from pyconnection.
Shrink the nanobind-porting surface of the custom type casters (pybind11-side prep only; no nanobind introduced). Enum caster consolidation: - Add conversions/enum_string_caster.hpp defining two reusable macros: DUCKDB_PY_ENUM_STRING_INT_CASTER(EnumType, FromStringFn, FromIntegerFn, Name) for the str+int+enum shape, and DUCKDB_PY_ENUM_STRING_CASTER(EnumType, FromStringFn, Name) for the str-only shape (CSV line terminator). - Reduce the 6 hand-written type_caster specializations (explain, render_mode, python_csv_line_terminator, python_udf_type, null_handling, exception_handling) to a single macro invocation each. Behavior is identical; the per-enum FromString/FromInteger free functions are unchanged. The eventual nanobind caster rewrite is now a one-place change. None-nullability audit: - Add an in-code nanobind porting note to pyconnection_default.hpp.
2 tasks
Member
|
Hey @leostimpfle, thanks for the PR! Can you include the reproducer as a test please? |
Author
|
Hi @evertlammerts, thanks for the prompt response! I have added |
Member
|
Apologies, I retargeted to our patch branch and then did not rebase. I'll re-PR. |
evertlammerts
added a commit
that referenced
this pull request
Jul 1, 2026
Re-lands leostimpfle's fix from #521 cleanly on v1.5-variegata (the original PR accidentally merged main). Authored by @leostimpfle.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The arguments for
QueryRelationare reversed causingDuckDBPyRelation.aliascreated viaqueryto be the full SQL query string:duckdb-python/src/duckdb_py/pyrelation.cpp
Lines 1564 to 1565 in 664d7e4
This PR simply reverses the 3rd and 4th argument and sets the alias to
"query_relation_" + StringUtil::GenerateRandomName(16)following conventions in other parts of the relational API.Closes #468.