unique_rowid() function that allows smaller ranges #68505

jhatcher9999 · 2021-08-05T20:40:00Z

Is your feature request related to a problem? Please describe.
When migrating from a database to CockroachDB, it is common to have table keys that are based on Int4 with id generation done by sequences. When moving to Cockroach, the best practice is to replace these int-based keys with UUIDs since the UUID values create spread in INSERT workloads. However, changing the data types of keys from int to UUID is an expensive operation. A nice compromise is to use the unique_rowid() function to return int-based values which have more spread. unique_rowid() creates values in the int8 range.

However, the maximum safe value that can be passed through JSON (and therefore through most RESTful APIs) is 2^53. When you pass an int8 value (i.e., in the 2^63 range) that exceeds 2^53, Java/JSON truncates digits. A solution to this is to force these values to be passed as strings, but again, this is a painful workaround.

Describe the solution you'd like
I think it would be helpful to have a version of unique_rowid() that produces values in a smaller range. Ideally, this range could be specified through a maxValue parameter, but I think even a hard-coded unique_rowid_guaranteed_to_be_less_javascript_max_value() function would be a big step towards making many migrations eaiser.

Describe alternatives you've considered
Alternative 1: recode the whole app stack to use UUIDs
Alternative 2: record the app stack to convert int8 values to string for transmission through JSON layers
Alternative 3: use an "alternate key" where the table has a PK which is a UUID but also has a unique int4-based id field (with sharded unique index). The PK is used to distribute the primary index, and the ID is used to pass data through the existing API layers.

Additional context
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER

ajwerner · 2021-09-02T21:05:18Z

Have you tried CACHE sequences? They allow you to amortize the cost of sequences pretty dramatically.

See #56954.

ajwerner · 2021-09-02T21:06:13Z

It's really hard to imagine losing 10 bits from unique_rowid and having it still maintain its uniqueness properties. As it stands, we're already really sort of sad about the width we have to play with. Hence my suggestion for the above approach with CACHE.

ajwerner · 2021-09-02T21:17:55Z

To demonstrate the power, see below. 380ms vs 22.95s to insert 1000 rows.

root@localhost:26257/defaultdb> set serial_normalization = sql_sequence;
SET

Time: 0ms total (execution 0ms / network 0ms)

root@localhost:26257/defaultdb> create table sequence_uncached(i serial primary key, j int);
CREATE TABLE

Time: 197ms total (execution 197ms / network 0ms)

root@localhost:26257/defaultdb> insert into sequence_uncached(j) select * from generate_series(1, 1000);
INSERT 1000

Time: 22.950s total (execution 22.950s / network 0.000s)

root@localhost:26257/defaultdb> set serial_normalization = sql_sequence_cached;
SET

Time: 0ms total (execution 0ms / network 0ms)

root@localhost:26257/defaultdb> create table sequence_cached(i serial primary key, j int);
CREATE TABLE

Time: 198ms total (execution 198ms / network 0ms)

root@localhost:26257/defaultdb> insert into sequence_cached(j) select * from generate_series(1, 1000);
INSERT 1000

Time: 380ms total (execution 380ms / network 0ms)

ajwerner · 2021-09-02T21:20:46Z

Honestly, I'm inclined to close this issue. I don't want to say that it's impossible, but it's hard to imagine an approach that's that far from the cached sequences. One thing about the cached sequences is that the cache is per session rather than per node so if there's a lot of connection churn, you might blow through the sequence values relatively fast. If that's a concern, feel free to file another issue.

jhatcher9999 · 2021-09-03T15:49:11Z

really helpful -- thanks Andrew

jhatcher9999 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Aug 5, 2021

yuzefovich added this to Triage in SQL Sessions - Deprecated via automation Aug 5, 2021

blathers-crl bot added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Aug 5, 2021

ajwerner closed this as completed Sep 2, 2021

SQL Sessions - Deprecated automation moved this from Triage to Done Sep 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unique_rowid() function that allows smaller ranges #68505

unique_rowid() function that allows smaller ranges #68505

jhatcher9999 commented Aug 5, 2021

ajwerner commented Sep 2, 2021

ajwerner commented Sep 2, 2021

ajwerner commented Sep 2, 2021

ajwerner commented Sep 2, 2021

jhatcher9999 commented Sep 3, 2021

unique_rowid() function that allows smaller ranges #68505

unique_rowid() function that allows smaller ranges #68505

Comments

jhatcher9999 commented Aug 5, 2021

ajwerner commented Sep 2, 2021

ajwerner commented Sep 2, 2021

ajwerner commented Sep 2, 2021

ajwerner commented Sep 2, 2021

jhatcher9999 commented Sep 3, 2021