Skip to content

Add nested "union"-type #4966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Oct 27, 2022
Merged

Add nested "union"-type #4966

merged 21 commits into from
Oct 27, 2022

Conversation

Maxxen
Copy link
Member

@Maxxen Maxxen commented Oct 12, 2022

This PR adds a nested "tagged union" type, complete with casts and a minimal set of helper functions.
Unions are a nested type capable of holding one of multiple "alternative" values, much like the union in C, with the difference being that these are "tagged" and thus always have an discriminator "tag" which signals which alternative it is currently holding, even if the inner value itself is null. They are thus more similar to C++ std::variant, Rusts Enum or the "sum type" present in most functional languages.

Unions must always have at least one member, and while they can hold multiple alternatives of the same type, the tags must be unique. Unions can have at most 256 members.

Under the hood unions are implemented on top of struct types, and simply keep the "tag" as a first member of the struct.

Example:

CREATE TABLE tbl1(u UNION(num INT, str VARCHAR));
INSERT INTO tbl1 values (1) , ('two') , ('three');
SELECT * FROM tbl1;
----
1
two
three

SELECT u.str FROM tbl1;
----
NULL
two
three

Functions

  • union_value(<tag> := <value>)
    Create a single-member union with a specific tag and a value. This only takes a keyword argument.

  • union_tag(<union>)
    Return a enum containing all the tags of the argument union.

  • union_extract(<union>, <tag>)
    Return the value corresponding to the member "tag". NULL if the tag is not the currently held member. Much like structs, this is also available with the "dot syntax" and can be invoked with u.tag.

Casts

The casting rules are as follow:

Casting to unions

A type can always be implicitly cast to a union if:

  • the union contains a single member of the same type as the source type,
  • or one or more members who can be implicitly cast from the source type. In case there are multiple, the lowest cost is used.

If the cast is ambiguous, i.e. there are multiple members of the same type as the source, or multiple candidates with the same implicit casting cost, a binder error is raised.

Use the union_value(<tag>:=<value>) function to create a single member union with the same tag as the desired target union member and perform a union to union cast instead.

Casting from unions

A union can always be implicitly cast to a type if:

  • the union contains single member of the same type as the target type,
  • or one or more members who can be implicitly cast to the target type. In case there are multiple, the lowest cost cast is used.

If the cast is ambiguous, i.e. there are multiple members of the same type as the target, or multiple candidates with the same implicit casting cost, a binder error is raised. Use the union_extract function or dot syntax to select the desired member before casting.

Casting between unions

unions can be cast between each other if the source type "saturates" the target type. In other words, all the tags in the source union must be present in the target union, and all the types of the matching tags must be implicitly cast-able between source and target. In short, the source must be a subset of target and the member types convertible.

Ok Source Target Comments
UNION(a A, b B) UNION(a A, b B, c C)
UNION(a A, b B) UNION(a A, b C) if B can be implicitly cast to C
UNION(a A, b B, c C) UNION(a A, b B)
UNION(a A, b B) UNION(a A, b C) if B can't be implicitly cast to C
UNION(A, B, D) UNION(A, B, C)

This ensures that no actual values are "lost" between casts, but we might want to look into relaxing this further and simply pad missing or unconvertible members with NULL to allow any union->union casts as long as the intersection of their members are not empty.

@Maxxen
Copy link
Member Author

Maxxen commented Oct 12, 2022

image

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put an assertion that the value is of the correct type and that the tag is in range?

@Alex-Monahan
Copy link
Contributor

Howdy! Could you add the really helpful explanations you have put in this PR into the docs repo as well? Thanks!

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks great. Some comments below:

@Maxxen
Copy link
Member Author

Maxxen commented Oct 12, 2022

Howdy! Could you add the really helpful explanations you have put in this PR into the docs repo as well? Thanks!

@Alex-Monahan Sure! I'll make sure to write up some docs once this is merged!

@Maxxen
Copy link
Member Author

Maxxen commented Oct 19, 2022

Updating the casting rules:
A union can be cast to a type if all the members can be implicitly cast to the target type.

Nvm, we're getting rid of this cast entirely, the equivalent can be done by extracting and coalescing instead.

@hannes hannes added this to the 0.6.0 milestone Oct 24, 2022
@Maxxen
Copy link
Member Author

Maxxen commented Oct 26, 2022

@Mytherin All is green!

@Mytherin Mytherin merged commit e4ee601 into duckdb:master Oct 27, 2022
@Mytherin
Copy link
Collaborator

Thanks! Everything looks great :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants