Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature suggestion: LAST_INSERT_UUID() #7547

Closed
gitasmus opened this issue Feb 28, 2024 · 11 comments
Closed

Feature suggestion: LAST_INSERT_UUID() #7547

gitasmus opened this issue Feb 28, 2024 · 11 comments
Assignees
Labels
customer issue enhancement New feature or request sql Issue with SQL

Comments

@gitasmus
Copy link

In his blog Tim pointed out the usefulness of UUIDs: https://www.dolthub.com/blog/2023-10-27-uuid-keys/

A minimal example could look as follows. Create table with a UUID as default primary key:

CREATE TABLE uuid_test(
    id BINARY(16) DEFAULT (UUID_TO_BIN(UUID())),
    some_text VARCHAR(255),
    PRIMARY KEY (id)
);

Next Insert a row without giving an ID:

INSERT INTO uuid_test (some_text) VALUES ('abc');

If we are interested in the last inserted ID

SELECT LAST_INSERT_ID();

does not return the UUID as it's return type is hard-wired to UNSIGNED BIGINT.

In analogy to LAST_INSERT_ID() it would thus be nice if dolt provided a feature that is also missing in MySQL: LAST_INSERT_UUID(), which would retur the last auto-created UUID.

The lack of such a LAST_INSERT_UUID() function could be circumvented in MySQL and dolt with a TRIGGER as described in
https://stackoverflow.com/questions/4687196/mysql-insert-id-or-something-like-that-to-return-last-mysql-uuid

A dedicated LAST_INSERT_UUID() function in dolt would IMHO be appropriate to the importance of UUIDs (and convenient), but would break the compatibility with MySQL. The alternative of implementing the return type of LAST_INSERT_ID() in a more general way to allow for ID to be UNSIGNED BIGINT, UUID() to be VARCHAR(36) and UUID_TO_BIN(UUID()) to be BINARY(16) could be even more logical but would cause even more compatibility confusion with MySQL.

Perhaps, it would be the better approach to ask for such a feature in MySQL first and then ask for the compatibility of dolt here ...

@timsehn
Copy link
Sponsor Contributor

timsehn commented Feb 28, 2024

This is a good feature request. I think it's fine for us to extend MySQL. We'll have someone implement this function today. Likely @fulghum.

@timsehn timsehn added enhancement New feature or request sql Issue with SQL labels Feb 28, 2024
@fulghum fulghum self-assigned this Feb 28, 2024
@fulghum
Copy link
Contributor

fulghum commented Feb 28, 2024

Thanks for the great feature request @gitasmus! 🙏 I'm getting started on this one now for you.

@fulghum
Copy link
Contributor

fulghum commented Feb 29, 2024

Quick update... I've got something rough working for this. The code for this is a little different from auto_increment, since auto_increment is a property on the schema and this is based on looking at the tuple values/expressions for each row of data being inserted.

There's a design question around whether nested expressions using UUID() should be captured – I'm currently leaning towards only capturing the UUID() result and exposing it through last_insert_uuid() if the value inserted in the cell is directly from the UUID() function without being further transformed. In other words, if it's a complex expression like CONCAT('foo-', UUID()) we wouldn't capture that value for last_insert_uuid(). My reasoning is that the captured UUID wouldn't be directly usable and capturing the result of the complex expression felt a little surprising and wouldn't even necessarily return the same type as UUID(). Let me know if that lines up with how you'd like to use the last_insert_uuid() function if you have any thoughts there.

I'll keep working on getting this cleaned up, finishing the tests, and getting a PR out.

@hannasty
Copy link

hannasty commented Feb 29, 2024

Hey @fulghum , thank you for your fast response and work on this topic. I'm working with @gitasmus and we are very impressed by how fast you guys react to our issues. I want to add that using complex expressions is really common when using UUID(), because it only returns a CHAR(36). But if you want to store the UUID as a BINARY(16) you need to call UUID_TO_BIN() also.

Would it be possible to just store the last generated UUID (e.g. in a session variable) whenever UUID() is called? This of course wouldn't necessarily only apply to inserts, so maybe you could name the procedure something like last_generated_uuid() instead of last_insert_uuid().

Another idea I had, was to supply a second procedure to generate UUIDs directly as a BINARY(16), maybe called UUID_BIN(). This way you could choose if you want a CHAR(36) or a BINARY(16) without the need of complex expressions.

@fulghum
Copy link
Contributor

fulghum commented Feb 29, 2024

Thanks for the feedback @hannasty. It's very helpful to hear more about how you're using the UUID() function and how you want to access the last generated value.

I like your idea of keeping this really simple and just having the UUID() function set a session variable with the last generated value. This makes the implementation a lot simpler, too (e.g. not restricting to only usage in INSERTs and following the same behavior as last_insert_id()).

Instead of a new function, I'm picturing just directly accessing the session variable, something like this:

insert into t values (UUID(), "data");
select @@last_generated_uuid;

I'll get started building that out and seeing how it looks.

@fulghum
Copy link
Contributor

fulghum commented Mar 1, 2024

I implemented this new approach today and spent some time chatting with a teammate about it. We like this feature idea, and we're debating the best interface to expose this information. One advantage of the session variable approach is that it has a very simple implementation. The biggest disadvantage we see is that it's not the same contract as last_insert_id(), specifically... it's not guaranteed to be a UUID that was generated for use in a key when a row was inserted. We think there is some benefit for matching the last_insert_id() contract for UUID and want to explore that approach a little bit more. One of the concerns was that if a UUID() function was invoked multiple times when inserting a row (e.g. for multiple columns), you wouldn't have a guarantee that we return the right identifier. Having some sort of identifier on the column (analogous to auto_increment) is one potential way to ensure we would return the right value.

I'll keep discussing the different approaches with the team. If you have thoughts/opinions, please share, it'd be very helpful for us to make sure we build the right thing for you. For example, it'd be helpful to hear if the callout on multiple UUID() function invocations is a potential concern for you or not.

@zachmu
Copy link
Member

zachmu commented Mar 1, 2024

Part of the issue here is that MySQL doesn't have an actual UUID type. Typically you store them as either the string representation (char(36)) or as the raw bytes (binary(16)). In line with the AUTO_INCREMENT feature, this implies two different column definitions that could use this new feature:

CREATE TABLE uuid_bytes(
    id BINARY(16) DEFAULT (UUID_TO_BIN(UUID())),
    some_text VARCHAR(255),
    PRIMARY KEY (id)
);
CREATE TABLE uuid_varchar(
    id CHAR(36) DEFAULT (UUID()),
    some_text VARCHAR(255),
    PRIMARY KEY (id)
);

The idea would be that if your table has a single primary key column matching the type and default of one of the above column definitions, then last_insert_uuid() would return the value of the ID column inserted by the last query. Because the function needs a consistent return type, this would probably always return the 36-character string form for both types of column definitions, and clients using the binary(16) definition would need to transform this string into the appropriate bytes.

Does that match your expectations for how such a feature should work?

@hannasty
Copy link

hannasty commented Mar 1, 2024

@fulghum

If you have thoughts/opinions, please share, it'd be very helpful for us to make sure we build the right thing for you. For example, it'd be helpful to hear if the callout on multiple UUID() function invocations is a potential concern for you or not.

In our case it would be no problem because we are generating UUIDs only for primary keys. But as you mentioned you could use UUID() more often when inserting a row. For example when working with the JSON type I can imagine that someone wants to generate UUIDs to identify JSON objects. In this case it wouldn't be clear which ID the session variable would contain. So you're right. This can lead to confusion.

@zachmu

Does that match your expectations for how such a feature should work?

Yes, absolutely. I think this solution would perfectly solve the issue mentioned above. Also it's completely fine that the return type would always be a string as long as the actual column could be a BINARY(16).

Thank you guys for your work on this!

@fulghum
Copy link
Contributor

fulghum commented Mar 2, 2024

Thanks @hannasty. I've got dolthub/go-mysql-server#2362 opened up for review now, with the last_insert_uuid() approach we just discussed. I think this interface turned out pretty nice! Thank you for helping provide feedback while we iterated on it. 🙏

We'll get this reviewed and released and keep you updated.

@fulghum
Copy link
Contributor

fulghum commented Mar 6, 2024

Just a quick update that we're getting really close on this one. We found an interesting edge case for auto_increment that our current structure couldn't support (#7565), so we took a slightly different approach for the auto UUID feature so that it won't have the same limitation. We'll get that reviewed tomorrow and will keep you updated on the progress.

@fulghum
Copy link
Contributor

fulghum commented Mar 6, 2024

We just released Dolt 1.35.1, which contains the new last_insert_uuid() function. Thank you @gitasmus and @hannasty for the awesome feature request! 🙏 This was a fun one to build, and I think it's going to be really useful for working with UUIDs.

I'll update our docs to include this function and we'll probably write a quick blog for it in the next month or so, too. Until then, you can also find the example usage and requirements for auto UUID columns in the 1.35.1 release notes.

Let us know if you have any issues using this new function or if there's anything else we can do to help you build with Dolt!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer issue enhancement New feature or request sql Issue with SQL
Projects
None yet
Development

No branches or pull requests

6 participants