Skip to content

Add support for registering custom casts (and types) through c api #13499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 22, 2024

Conversation

Maxxen
Copy link
Member

@Maxxen Maxxen commented Aug 21, 2024

This PR is a follow-up that contains #13490 but also adds new functions for registering custom cast functions.

These work similarly to scalar functions and provide similar capabilities to set e.g. custom user data. But compared to implementing cast functions in C++, the c-api also provides the possibility to detect whether or not the cast is executed within a TRY_CAST. Additionally it won't throw an exception in a non-try cast as soon as you set the error message (as would be the case if you used c++'s HandleCastError util). This allow c-based cast functions to be able to clean up any temporary resources when an invalid cast input is encountered during a non-try cast, while also being able to explicitly short-circuit execution by simply returning false when they are ready after setting the error message.

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good - one comment:

output_data[i * 3 + 2] = z;
} else {
// Error
if (cast_mode == DUCKDB_CAST_TRY) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can have a helper function for this similar to how we handle this in the C++ functions, e.g.:

void (*duckdb_cast_function_set_row_error)(duckdb_function_info info, const char *error, idx_t index, duckdb_vector output);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this would basically invalidate the validity at the current row and set the error message (if not already set) in the same call? I worry that if the user is doing some sort of string formatting to create the error message they are going to perform that work unnecessarily for each invalid row after the first one in the case of a try cast, but maybe that's something they can guard against themselves (by checking the cast mode and passing nullptr for subsequent errors) if that's the case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the C++ layer templating takes care of that - but this would require some user handling if they want to avoid that cost. That being said I still think it's cleaner than having to essentially duplicate this code in every cast function - and also less error prone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, do I keep the original cast_function_set_error()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine yeah, there might be situations where we want to return errors from a try_cast as well.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft August 22, 2024 11:26
@Maxxen Maxxen marked this pull request as ready for review August 22, 2024 12:27
@Maxxen
Copy link
Member Author

Maxxen commented Aug 22, 2024

@Mytherin addresses your feedback

@Mytherin Mytherin merged commit 862852f into duckdb:main Aug 22, 2024
39 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Sep 7, 2024
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Sep 7, 2024
Merge pull request duckdb/duckdb#13499 from Maxxen/c-api-casts

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
@Tishj Tishj added the Needs Documentation Use for issues or PRs that require changes in the documentation label Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation Use for issues or PRs that require changes in the documentation Ready For Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants