Skip to content

Fix alias of DuckDBPyRelation.query#521

Merged
evertlammerts merged 78 commits into
duckdb:v1.5-variegatafrom
leostimpfle:fix-issue468
Jul 1, 2026
Merged

Fix alias of DuckDBPyRelation.query#521
evertlammerts merged 78 commits into
duckdb:v1.5-variegatafrom
leostimpfle:fix-issue468

Conversation

@leostimpfle

Copy link
Copy Markdown

The arguments for QueryRelation are reversed causing DuckDBPyRelation.alias created via query to be the full SQL query string:

auto query_relation = make_shared_ptr<QueryRelation>(rel->context->GetContext(), std::move(select_statement),
sql_query, "query_relation");

This PR simply reverses the 3rd and 4th argument and sets the alias to "query_relation_" + StringUtil::GenerateRandomName(16) following conventions in other parts of the relational API.

Closes #468.

evertlammerts and others added 30 commits February 27, 2026 13:35
Bump duckdb submodule:
- Target branch: main
- Date: 2026-03-21 06:00:41
- DuckDB SHA: 34e4f91aa3d1adc8ec9f1c6ad1b64e97d2467ade
- Trigger:
https://github.com/duckdb/duckdb-python/actions/runs/23373066960
Build works again, lots of integration fixes and one small refactor of
pyarrow and polars pushdown to use the expression API.
…duckdb#454)

The previous code marked null rows invalid on result before calling
`VectorOperations::Copy`, but that `Copy` unconditionally overwrites the
target validity from the source, wiping the markings, so every "nulls in
some columns" row got the next non-null UDF output instead of NULL.

Defer the null marking until after the Copy, walking selvec once to call
`SetNull(result, i, true)` only on the rows that were filtered out. This
also handles struct/array child propagation correctly via SetNull's
existing recursion.

Also: fix the misleading inverted-selvec comment to match the actual
behavior.
evertlammerts and others added 18 commits June 25, 2026 14:15
Periodic forward-merge of release-branch bugfixes into main. Notably brings
in duckdb#495 "Unify arrow exports across all query result types" (the materialized
slow-path lifetime / connection-GC fix and the test_arrow_refeed suite),
replacing main's older SchemaCachingStreamWrapper/ArrowQueryResultStreamWrapper
approach.

Submodule: external/duckdb is kept at main's pin 0361de441a (v1.5's submodule
bumps discarded; git fast-forwarded the gitlink to main's newer pin).

Conflict resolution:
- .github/workflows/packaging_wheels.yml: applied both intents — v1.5's
  windows-2025 -> windows-2022 (consistent with targeted_test.yml) and main's
  ARM64-comment removal.

Adaptation for main's newer core:
- pyresult.cpp: core's ColumnDataRef now takes vector<Identifier> (not
  vector<string>); promote the deduplicated scan names to Identifiers
  explicitly in MakeColumnDataScanStatement.

Verified: clean build; tests/fast/arrow + tests/fast/udf = 2436 passed,
0 failed (incl. test_capsule_slow_path_survives_connection_gc and the new
test_arrow_refeed suite).
Override the describe command for setuptools_scm
Remove the 6 py::module_local() annotations (DuckDBPyConnection, Statement,
Expression, Relation, Type class_ registrations + the token_type enum).
Step 1 in a larger clean up: we should not use duckdb's internal
`duckdb::*_ptr`
Add a thin wrapper class `NumpyArray` (src/duckdb_py/include/duckdb_python/
numpy/numpy_array.hpp) whose single data member is a `py::array`. This is now
the only spot in the codebase that names `py::array` as the underlying
numpy-array representation, so a future migration to nanobind's `nb::ndarray`
is localized to this one header.

The façade exposes Data()/MutableData() (data buffer pointers), an Allocate()
factory (dtype + count), a FromObject() factory, an `explicit
NumpyArray(py::array)` constructor (a py::object argument implicitly converts
via np.asarray semantics, matching prior behaviour), and GetArray() accessors
for .attr(...) calls, iteration, resize, and handing the array back to Python.
It is default-constructible, copyable, and movable.

Route every direct py::array use through the façade:
- numpy/raw_array_wrapper.{hpp,cpp}: member + Allocate/MutableData, resize via
        GetArray()
- pandas/pandas_bind.hpp (RegisteredArray) and pandas/column/
        pandas_numpy_column.hpp: members + constructors take NumpyArray
- numpy/numpy_scan.cpp: scan helpers take NumpyArray&, .data() -> .Data()
- numpy/numpy_bind.cpp, pandas/bind.cpp: construct NumpyArray instead of
        py::array; dtype attrs via GetArray()
- numpy/array_wrapper.cpp (ToArray): move out / bool-check via GetArray()
- pyconnection.cpp, python_replacement_scan.cpp: py::cast<py::array>(...) ->
        wrap the object in NumpyArray and use GetArray()
Shrink the nanobind-porting surface of the custom type casters (pybind11-side
prep only; no nanobind introduced).

Enum caster consolidation:
- Add conversions/enum_string_caster.hpp defining two reusable macros:
  DUCKDB_PY_ENUM_STRING_INT_CASTER(EnumType, FromStringFn, FromIntegerFn, Name)
  for the str+int+enum shape, and DUCKDB_PY_ENUM_STRING_CASTER(EnumType,
  FromStringFn, Name) for the str-only shape (CSV line terminator).
- Reduce the 6 hand-written type_caster specializations (explain, render_mode,
  python_csv_line_terminator, python_udf_type, null_handling,
  exception_handling) to a single macro invocation each. Behavior is identical;
  the per-enum FromString/FromInteger free functions are unchanged. The eventual
  nanobind caster rewrite is now a one-place change.

None-nullability audit:
- Add an in-code nanobind porting note to pyconnection_default.hpp.
Override the describe command for setuptools_scm
Consolidate the five process-global statics (default_connection, instance_cache,
import_cache, environment, formatted_python_version) into one DuckDBPyModuleState
struct reached only through GetModuleState(). The accessor returns a function-local
singleton for now, so behavior is unchanged; it is the single seam to retarget for
PEP 489 multi-phase init / per-module state (PyModule_GetState) on the path to
free-threading. Removes the // NOLINT: allow global statics from pyconnection.

No behavior change. Full fast suite green (4508 passed).

Committed with --no-verify (pre-commit not on worktree-venv PATH); clang-format run manually.
Remove the 6 py::module_local() annotations (DuckDBPyConnection,
Statement, Expression, Relation, Type class_ registrations + the
token_type enum).
Add a thin wrapper class `NumpyArray`
(src/duckdb_py/include/duckdb_python/ numpy/numpy_array.hpp) whose
single data member is a `py::array`. This is now the only spot in the
codebase that names `py::array` as the underlying numpy-array
representation, so a future migration to nanobind's `nb::ndarray` is
localized to this one header.

The façade exposes Data()/MutableData() (data buffer pointers), an
Allocate() factory (dtype + count), a FromObject() factory, an `explicit
NumpyArray(py::array)` constructor (a py::object argument implicitly
converts via np.asarray semantics, matching prior behaviour), and
GetArray() accessors for .attr(...) calls, iteration, resize, and
handing the array back to Python. It is default-constructible, copyable,
and movable.

Route every direct py::array use through the façade:
- numpy/raw_array_wrapper.{hpp,cpp}: member + Allocate/MutableData,
resize via GetArray()
- pandas/pandas_bind.hpp (RegisteredArray) and pandas/column/
pandas_numpy_column.hpp: members + constructors take NumpyArray
- numpy/numpy_scan.cpp: scan helpers take NumpyArray&, .data() ->
.Data()
- numpy/numpy_bind.cpp, pandas/bind.cpp: construct NumpyArray instead of
py::array; dtype attrs via GetArray()
- numpy/array_wrapper.cpp (ToArray): move out / bool-check via
GetArray()
- pyconnection.cpp, python_replacement_scan.cpp:
py::cast<py::array>(...) ->
        wrap the object in NumpyArray and use GetArray()
…kdb#515)

Consolidate the five process-global statics (default_connection,
instance_cache, import_cache, environment, formatted_python_version)
into one DuckDBPyModuleState struct reached only through
GetModuleState(). The accessor returns a function-local singleton for
now, so behavior is unchanged; it is the single seam to retarget for PEP
489 multi-phase init / per-module state (PyModule_GetState) on the path
to free-threading. Removes the // NOLINT: allow global statics from
pyconnection.
Shrink the nanobind-porting surface of the custom type casters
(pybind11-side prep only; no nanobind introduced).

Enum caster consolidation:
- Add conversions/enum_string_caster.hpp defining two reusable macros:
DUCKDB_PY_ENUM_STRING_INT_CASTER(EnumType, FromStringFn, FromIntegerFn,
Name) for the str+int+enum shape, and
DUCKDB_PY_ENUM_STRING_CASTER(EnumType, FromStringFn, Name) for the
str-only shape (CSV line terminator).
- Reduce the 6 hand-written type_caster specializations (explain,
render_mode, python_csv_line_terminator, python_udf_type, null_handling,
exception_handling) to a single macro invocation each. Behavior is
identical; the per-enum FromString/FromInteger free functions are
unchanged. The eventual nanobind caster rewrite is now a one-place
change.

None-nullability audit:
- Add an in-code nanobind porting note to pyconnection_default.hpp.
@evertlammerts

Copy link
Copy Markdown
Member

Hey @leostimpfle, thanks for the PR!

Can you include the reproducer as a test please?

@leostimpfle

Copy link
Copy Markdown
Author

Hi @evertlammerts, thanks for the prompt response!

I have added test_query_chain_using_alias in tests/fast/relational_api/test_rapi_query.py that includes the reprex in #468. I think it would also be possible to update test_query_chain setting virtual_table_name=rel.alias instead of adding a new test. Let me know if you have a preference for the latter.

@evertlammerts evertlammerts changed the base branch from main to v1.5-variegata July 1, 2026 10:58
@evertlammerts evertlammerts merged commit ea34b03 into duckdb:v1.5-variegata Jul 1, 2026
evertlammerts added a commit that referenced this pull request Jul 1, 2026
@evertlammerts

Copy link
Copy Markdown
Member

Apologies, I retargeted to our patch branch and then did not rebase. I'll re-PR.

evertlammerts added a commit that referenced this pull request Jul 1, 2026
Re-lands leostimpfle's fix from #521 cleanly on v1.5-variegata (the
original PR accidentally merged main). Authored by
  @leostimpfle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DuckDBPyRelation.query returns relation with full SQL string as alias

3 participants