Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove struct implicit casting when names mismatch and prevent table initialization using ROW #8942

Merged
merged 15 commits into from Sep 22, 2023

Conversation

maiadegraaf
Copy link
Contributor

@maiadegraaf maiadegraaf commented Sep 15, 2023

This PR is a slightly different implementation of #8854 but does the same thing but more succinctly.
Solves issues #3884, #7303, #7765, and #8301.


Structs can no longer be inserted unless the names match.
The following will now throw an error:

CREATE TABLE t1 (s STRUCT(d DOUBLE, v VARCHAR)); 
INSERT INTO t1 VALUES ({i: 20, v: 'foobar'}), ({i: 20, v: 50});

The ROW function is still supported and unnamed structs are properly auto-casted to any struct.


Additionally tests can no longer be initialized using the ROW function, the following will now throw an exception:

CREATE TABLE t1 AS (
	SELECT ARRAY[
		(1, 'x'),
		(2, 'y'),
		(4, 's')
	] AS list
);

And should be rewritten as the following:

CREATE TABLE t1 (list STRUCT(a INT, b VARCHAR)[]);
INSERT INTO t1 VALUES (ARRAY[(1, 'x'), (2, 'y'), (4, 's')]);

@maiadegraaf maiadegraaf changed the title Remove struct implicit casting when names mismatch and prevent table initialized Remove struct implicit casting when names mismatch and prevent table initialization using ROW Sep 15, 2023
@Tishj
Copy link
Contributor

Tishj commented Sep 17, 2023

To not kill the usability of CREATE TABLE AS, it's also possible to use

statement ok
CREATE TABLE t1 AS (
	SELECT ARRAY[
		(1, 'x'),
		(2, 'y'),
		(4, 's')
	]::STRUCT(a int, b varchar)[] AS list
);

So it doesn't have to be a two step process

@github-actions github-actions bot marked this pull request as draft September 18, 2023 11:33
Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I only have two minor questions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these changes included in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because spatial tests were failing I had to update the, and therefore add the most recent version of spatial to the CI run, but @Maxxen had also created new functions in spatial so those had to be added too.

@@ -75,6 +75,7 @@ def test_struct_type(self):
type = duckdb.struct_type({'a': BIGINT, 'b': BOOLEAN})
assert str(type) == 'STRUCT(a BIGINT, b BOOLEAN)'

# FIXME: create an unnamed struct when fields are provided as a list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if this is still the case? I think that we are using default names for the struct fields, as this is not the ROW function.

Copy link
Contributor

@Tishj Tishj Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import duckdb
from duckdb.value.constant import StructValue

duckdb.execute("""
	create table tbl (a STRUCT(a BIGINT, b VARCHAR, c BOOL))
""")

# Equivalent to ROW(42, 'bla', True)
duckdb.execute("insert into tbl select $col_a", {
	'col_a': StructValue((42, 'bla', True), [int, str, bool])
})

res = duckdb.table("tbl").fetchall()

print(res)

Currently this passes, because we don't check for name equality, but after this PR this no longer works because the provided struct value is considered "named"


Internally this calls struct_type(children) where children here is [int, str, bool]

@Mytherin Mytherin marked this pull request as ready for review September 18, 2023 12:42
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good - some comments:

src/include/duckdb/common/types.hpp Outdated Show resolved Hide resolved
@@ -31,4 +31,4 @@ SELECT count(c0 ORDER BY 0) FROM (SELECT 2 EXCEPT SELECT 2) c0;
query I
SELECT mode((c0, 0)) FROM (SELECT 1 c0), (SELECT 2);
----
{'c0': 1, 'v2': 0}
{'c0': 1, '': 0}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that () should always create an unnamed struct. That's the behavior in Postgres.

Perhaps we can also change the ToString of unnamed structs to leave out the name entirely, although that might be better done in a future PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about the ToString modification, but that would break roundtripping of the type string, which is something to consider

@github-actions github-actions bot marked this pull request as draft September 20, 2023 14:15
@maiadegraaf maiadegraaf marked this pull request as ready for review September 20, 2023 15:55
@github-actions github-actions bot marked this pull request as draft September 21, 2023 07:44
@maiadegraaf maiadegraaf marked this pull request as ready for review September 21, 2023 09:22
@Mytherin Mytherin merged commit e19fee4 into duckdb:main Sep 22, 2023
48 of 50 checks passed
@Mytherin
Copy link
Collaborator

Thanks! LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants