|
| 1 | +Originally from: [tweet](https://twitter.com/samokhvalov/status/1730107298369171943), [LinkedIn post](). |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +# UUID v7 and partitioning (TimescaleDB) |
| 6 | + |
| 7 | +> I post a new PostgreSQL "howto" article every day. Join me in this |
| 8 | +> journey – [subscribe](https://twitter.com/samokhvalov/), provide feedback, share! |
| 9 | +
|
| 10 | +Ok, you've asked – here it is, a draft recipe to use UUIDv7 and partitioning (we'll |
| 11 | +use [@TimescaleDB](https://twitter.com/TimescaleDB)). It's not super elegant, might be not the best, and requires some |
| 12 | +effort to reach efficient plans (with partition pruning involved). If you have an alternative or improvement ideas in |
| 13 | +mind – let me know. |
| 14 | + |
| 15 | +We'll take the function by [@DanielVerite](https://twitter.com/DanielVerite) to generate UUIDv7 as basis: |
| 16 | + |
| 17 | +```sql |
| 18 | +create or replace function uuid_generate_v7() returns uuid |
| 19 | +as $$ |
| 20 | + -- use random v4 uuid as starting point (which has the same variant we need) |
| 21 | + -- then overlay timestamp |
| 22 | + -- then set version 7 by flipping the 2 and 1 bit in the version 4 string |
| 23 | +select encode( |
| 24 | + set_bit( |
| 25 | + set_bit( |
| 26 | + overlay( |
| 27 | + uuid_send(gen_random_uuid()) |
| 28 | + placing substring(int8send(floor(extract(epoch from clock_timestamp()) * 1000)::bigint) from 3) |
| 29 | + from 1 for 6 |
| 30 | + ), |
| 31 | + 52, 1 |
| 32 | + ), |
| 33 | + 53, 1 |
| 34 | + ), |
| 35 | + 'hex')::uuid; |
| 36 | +$$ language SQL volatile; |
| 37 | +``` |
| 38 | + |
| 39 | +## Helper functions, UUIDv7 <-> timestamptz |
| 40 | + |
| 41 | +Next, we'll create two functions: |
| 42 | + |
| 43 | +1. `ts_to_uuid_v7` – generate UUIDv7 based on any arbitrary `timestamptz` value, and |
| 44 | +2. `uuid_v7_to_ts` – extract `timestamptz` from the existing UUIDv7 value. |
| 45 | + |
| 46 | +Note that this approach is not what the authors of revised RFC4122 (that will likely be finalized soon) would encourage; |
| 47 | +see [this discussion and the words](https://postgresql.org/message-id/flat/C80B8FDB-8D9E-48A2-82A2-48863987A1B1%40yandex-team.ru#074a05d31c9ce38bee2f8c8097877485) |
| 48 | +by |
| 49 | +[@x4mmmmmm](https://twitter.com/x4mmmmmm): |
| 50 | + |
| 51 | +> ... as far as I know, RFC discourages extracting timestamps from UUIDs. |
| 52 | +
|
| 53 | +Anyway, let's just do it: |
| 54 | + |
| 55 | +```sql |
| 56 | +create extension pgcrypto; |
| 57 | + |
| 58 | +create or replace function ts_to_uuid_v7(timestamptz) returns uuid |
| 59 | +as $$ |
| 60 | + select encode( |
| 61 | + set_bit( |
| 62 | + set_bit( |
| 63 | + overlay( |
| 64 | + uuid_send(gen_random_uuid()) |
| 65 | + placing substring(int8send(floor(extract(epoch from $1) * 1000)::bigint) from 3) |
| 66 | + from 1 for 6 |
| 67 | + ), |
| 68 | + 52, 1 |
| 69 | + ), |
| 70 | + 53, 1 |
| 71 | + ), |
| 72 | + 'hex')::uuid; |
| 73 | +$$ language SQL volatile; |
| 74 | + |
| 75 | +create or replace function uuid_v7_to_ts(uuid_v7 uuid) returns timestamptz |
| 76 | +as $$ |
| 77 | + select |
| 78 | + to_timestamp( |
| 79 | + ( |
| 80 | + 'x' || substring( |
| 81 | + encode(uuid_send(uuid_v7), 'hex') |
| 82 | + from 1 for 12 |
| 83 | + ) |
| 84 | + )::bit(48)::bigint / 1000.0 |
| 85 | + )::timestamptz; |
| 86 | +$$ language sql; |
| 87 | +``` |
| 88 | + |
| 89 | +Checking the functions: |
| 90 | + |
| 91 | +```sql |
| 92 | +test=# select now(), ts_to_uuid_v7(now() - interval '1y'); |
| 93 | + now | ts_to_uuid_v7 |
| 94 | +-------------------------------+-------------------------------------- |
| 95 | + 2023-11-30 05:36:32.205093+00 | 0184c709-63cd-7bd1-99c3-a4773ab1e697 |
| 96 | +(1 row) |
| 97 | + |
| 98 | +test=# select uuid_v7_to_ts('0184c709-63cd-7bd1-99c3-a4773ab1e697'); |
| 99 | + uuid_v7_to_ts |
| 100 | +---------------------------- |
| 101 | + 2022-11-30 05:36:32.205+00 |
| 102 | +(1 row) |
| 103 | +``` |
| 104 | + |
| 105 | +Pretending that we haven't noticed the loss of microseconds, we continue. |
| 106 | + |
| 107 | +> 🎯 **TODO:** : |
| 108 | +> 1) may it be the case when we need that precision? |
| 109 | +> 2) timezones |
| 110 | +
|
| 111 | +## Hypertable |
| 112 | + |
| 113 | +Create a table, where we'll store ID as UUID, but additionally have a `timestamptz` column – this column will be used as |
| 114 | +partitioning key when we convert the table to partitioned table ("hypertable" in TimescaleDB's terminology): |
| 115 | + |
| 116 | +```sql |
| 117 | +create table my_table ( |
| 118 | + id uuid not null |
| 119 | + default '00000000-0000-0000-0000-000000000000'::uuid, |
| 120 | + payload text, |
| 121 | + uuid_ts timestamptz not null default clock_timestamp() -- or now(), depending on goals |
| 122 | +); |
| 123 | +``` |
| 124 | + |
| 125 | +The default value `00000000-...00` for "id" is "fake" – it will always be replaced in trigger, based on the timestamp: |
| 126 | + |
| 127 | +```sql |
| 128 | +create or replace function t_update_uuid() returns trigger |
| 129 | +as $$ |
| 130 | +begin |
| 131 | + if new.id is null or new.id = '00000000-0000-0000-0000-000000000000'::uuid then |
| 132 | + new.id := ts_to_uuid_v7(new.uuid_ts); |
| 133 | + end if; |
| 134 | + |
| 135 | + return new; |
| 136 | +end; |
| 137 | +$$ language plpgsql; |
| 138 | + |
| 139 | +create trigger t_update_uuid_before_insert_update |
| 140 | +before insert or update on my_table |
| 141 | +for each row execute function t_update_uuid(); |
| 142 | +``` |
| 143 | + |
| 144 | +Now, use TimescaleDB partitioning: |
| 145 | + |
| 146 | +```sql |
| 147 | +create extension timescaledb; |
| 148 | + |
| 149 | +select create_hypertable( |
| 150 | + relation := 'my_table', |
| 151 | + time_column_name := 'uuid_ts', |
| 152 | + -- !! very small interval is just for testing |
| 153 | + chunk_time_interval := '1 minute'::interval |
| 154 | +); |
| 155 | +``` |
| 156 | + |
| 157 | +## Test data - fill the chunks |
| 158 | + |
| 159 | +And now insert some test data – some rows for the "past" and some "current" rows: |
| 160 | + |
| 161 | +```sql |
| 162 | +insert into my_table(payload, uuid_ts) |
| 163 | +select random()::text, ts |
| 164 | +from generate_series( |
| 165 | + timestamptz '2000-01-01 00:01:00', |
| 166 | + timestamptz '2000-01-01 00:05:00', |
| 167 | + interval '5 second' |
| 168 | +) as ts; |
| 169 | + |
| 170 | +insert into my_table(payload) |
| 171 | +select random()::text |
| 172 | +from generate_series(1, 10000); |
| 173 | + |
| 174 | +vacuum analyze my_table; |
| 175 | +``` |
| 176 | + |
| 177 | +Checking the structure of `my_table` in psql using `\d+` we now see that multiple partitions ("chunks") were created by |
| 178 | +TimescaleDB: |
| 179 | + |
| 180 | +```sql |
| 181 | +test=# \d+ my_table |
| 182 | +... |
| 183 | +Child tables: _timescaledb_internal._hyper_2_3_chunk, |
| 184 | + _timescaledb_internal._hyper_2_4_chunk, |
| 185 | + _timescaledb_internal._hyper_2_5_chunk, |
| 186 | + _timescaledb_internal._hyper_2_6_chunk, |
| 187 | + _timescaledb_internal._hyper_2_7_chunk, |
| 188 | + _timescaledb_internal._hyper_2_8_chunk, |
| 189 | + _timescaledb_internal._hyper_2_9_chunk |
| 190 | +``` |
| 191 | + |
| 192 | +## Test queries – partition pruning |
| 193 | + |
| 194 | +Now we just need to remember that `created_at` should always participate in queries, to let planner deal with as few |
| 195 | +partitions as possible – but knowing the `id` values, we can always reconstruct the `created_at` values, using |
| 196 | +`uuid_v7_to_ts()`: |
| 197 | + |
| 198 | +```sql |
| 199 | +test=# explain select * from my_table where created_at = uuid_v7_to_ts('00dc6ad0-9660-7b92-a95e-1d7afdaae659'); |
| 200 | + QUERY PLAN |
| 201 | +-------------------------------------------------------------------------------------------------------------------------- |
| 202 | + Append (cost=0.14..8.16 rows=1 width=41) |
| 203 | + -> Index Scan using _hyper_5_11_chunk_my_table_created_at_idx on _hyper_5_11_chunk (cost=0.14..8.15 rows=1 width=41) |
| 204 | + Index Cond: (created_at = '2000-01-01 00:01:00+00'::timestamp with time zone) |
| 205 | +(3 rows) |
| 206 | + |
| 207 | +test=# explain select * from my_table |
| 208 | + where created_at >= uuid_v7_to_ts('018c1ecb-d3b7-75b1-add9-62878b5152c7') |
| 209 | + order by created_at desc limit 10; |
| 210 | + QUERY PLAN |
| 211 | +----------------------------------------------------------------------------------------------------------------------------------- |
| 212 | + Limit (cost=0.29..1.17 rows=10 width=41) |
| 213 | + -> Custom Scan (ChunkAppend) on my_table (cost=0.29..11.49 rows=126 width=41) |
| 214 | + Order: my_table.created_at DESC |
| 215 | + -> Index Scan using _hyper_5_16_chunk_my_table_created_at_idx on _hyper_5_16_chunk (cost=0.29..11.49 rows=126 width=41) |
| 216 | + Index Cond: (created_at >= '2023-11-30 05:55:23.703+00'::timestamp with time zone) |
| 217 | +(5 rows) |
| 218 | +``` |
| 219 | + |
| 220 | +– partition pruning in play, although it will require certain effort to have it in various queries. But it works. |
| 221 | + |
| 222 | + |
| 223 | +-------- |
| 224 | + |
| 225 | +## Postscript |
| 226 | + |
| 227 | +Also read the following comment by [@jamessewell](https://twitter.com/jamessewell), originaly posted |
| 228 | +[here](https://twitter.com/jamessewell/status/1730125437903450129): |
| 229 | + |
| 230 | +> If update your `create_hypertable` call with: |
| 231 | +> |
| 232 | +> ``` |
| 233 | +> time_column_name => 'id' |
| 234 | +> time_partitioning_func => 'uuid_v7_to_ts' |
| 235 | +> ``` |
| 236 | +> |
| 237 | +> Then you'll be able to drop the `uuid_ts` col and your trigger! |
| 238 | +> |
| 239 | +> ```sql |
| 240 | +> SELECT * FROM my_table WHERE id = '018c1ecb-d3b7-75b1-add9-62878b5152c7'; |
| 241 | +> ``` |
| 242 | +> |
| 243 | +> Will just work 🪄 |
0 commit comments