Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unique_rowid() function that allows smaller ranges #68505

Closed
jhatcher9999 opened this issue Aug 5, 2021 · 5 comments
Closed

unique_rowid() function that allows smaller ranges #68505

jhatcher9999 opened this issue Aug 5, 2021 · 5 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@jhatcher9999
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
When migrating from a database to CockroachDB, it is common to have table keys that are based on Int4 with id generation done by sequences. When moving to Cockroach, the best practice is to replace these int-based keys with UUIDs since the UUID values create spread in INSERT workloads. However, changing the data types of keys from int to UUID is an expensive operation. A nice compromise is to use the unique_rowid() function to return int-based values which have more spread. unique_rowid() creates values in the int8 range.

However, the maximum safe value that can be passed through JSON (and therefore through most RESTful APIs) is 2^53. When you pass an int8 value (i.e., in the 2^63 range) that exceeds 2^53, Java/JSON truncates digits. A solution to this is to force these values to be passed as strings, but again, this is a painful workaround.

Describe the solution you'd like
I think it would be helpful to have a version of unique_rowid() that produces values in a smaller range. Ideally, this range could be specified through a maxValue parameter, but I think even a hard-coded unique_rowid_guaranteed_to_be_less_javascript_max_value() function would be a big step towards making many migrations eaiser.

Describe alternatives you've considered
Alternative 1: recode the whole app stack to use UUIDs
Alternative 2: record the app stack to convert int8 values to string for transmission through JSON layers
Alternative 3: use an "alternate key" where the table has a PK which is a UUID but also has a unique int4-based id field (with sharded unique index). The PK is used to distribute the primary index, and the ID is used to pass data through the existing API layers.

Additional context
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER

@jhatcher9999 jhatcher9999 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Aug 5, 2021
@yuzefovich yuzefovich added this to Triage in SQL Sessions - Deprecated via automation Aug 5, 2021
@blathers-crl blathers-crl bot added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Aug 5, 2021
@ajwerner
Copy link
Contributor

ajwerner commented Sep 2, 2021

Have you tried CACHE sequences? They allow you to amortize the cost of sequences pretty dramatically.

See #56954.

@ajwerner
Copy link
Contributor

ajwerner commented Sep 2, 2021

It's really hard to imagine losing 10 bits from unique_rowid and having it still maintain its uniqueness properties. As it stands, we're already really sort of sad about the width we have to play with. Hence my suggestion for the above approach with CACHE.

@ajwerner
Copy link
Contributor

ajwerner commented Sep 2, 2021

To demonstrate the power, see below. 380ms vs 22.95s to insert 1000 rows.

root@localhost:26257/defaultdb> set serial_normalization = sql_sequence;
SET

Time: 0ms total (execution 0ms / network 0ms)

root@localhost:26257/defaultdb> create table sequence_uncached(i serial primary key, j int);
CREATE TABLE

Time: 197ms total (execution 197ms / network 0ms)

root@localhost:26257/defaultdb> insert into sequence_uncached(j) select * from generate_series(1, 1000);
INSERT 1000

Time: 22.950s total (execution 22.950s / network 0.000s)

root@localhost:26257/defaultdb> set serial_normalization = sql_sequence_cached;
SET

Time: 0ms total (execution 0ms / network 0ms)

root@localhost:26257/defaultdb> create table sequence_cached(i serial primary key, j int);
CREATE TABLE

Time: 198ms total (execution 198ms / network 0ms)

root@localhost:26257/defaultdb> insert into sequence_cached(j) select * from generate_series(1, 1000);
INSERT 1000

Time: 380ms total (execution 380ms / network 0ms)

@ajwerner
Copy link
Contributor

ajwerner commented Sep 2, 2021

Honestly, I'm inclined to close this issue. I don't want to say that it's impossible, but it's hard to imagine an approach that's that far from the cached sequences. One thing about the cached sequences is that the cache is per session rather than per node so if there's a lot of connection churn, you might blow through the sequence values relatively fast. If that's a concern, feel free to file another issue.

@ajwerner ajwerner closed this as completed Sep 2, 2021
SQL Sessions - Deprecated automation moved this from Triage to Done Sep 2, 2021
@jhatcher9999
Copy link
Collaborator Author

really helpful -- thanks Andrew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
No open projects
Development

No branches or pull requests

2 participants