adapter: Change id column in mz_databases and mz_schemas #18719

do-not-use-parker-timmerman · 2023-04-11T18:05:29Z

Motivation

This PR adds a known-desirable feature: #18506

This PR namespaces database and schema ids, by either "user" or "system, currently everything is "user". There are two major changes:

Update our stash types, DatabaseKey, SchemaKey, SchemaValue, and ItemValue, so they contain namespaces.
Update all of the builtin tables and views that depend on either database id or schema id.

To add a namespace to DatabaseKey and SchemaKey I created new DatabaseNamespace and SchemaNamespace types, which are optional. This is necessary to support serializing the existing entries, and the new entries. Another approach would be to make DatabaseKey and SchemaKey enums, and then write a custom implementation of Serialize and Deserialize, but after prototyping it, that seemed trickier to maintain.

Tips for reviewer

I'm mostly unsure about the migrations of the builtin tables. I believe it should "just work" since the fingerprint of each builtin should change, but if someone could confirm that would be great!

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

- add namespace to DatabaseKey, SchemaKey, and ItemValue - add stash migration - update Builtin tables and view - update sqllogictests - update testdrive

jkosh44

I had a couple of questions and comments. Also since we're adding a system variant to schemas, we may want to use it for the catalog schemas.

src/stash/src/lib.rs

src/sql/src/names.rs

src/adapter/src/catalog/storage.rs

jkosh44 · 2023-04-13T18:01:05Z

src/sql/src/names.rs

+/// A type that specifies how we serialize and deserialize a [`SchemaId`].
+///
+/// Note: JSON maps require their keys to be strings, and we use [`SchemaId`] as the key
+/// in some maps, so we need to serialize them as plain strings, which is what this type
+/// does.
+#[derive(Clone, Debug, Serialize, Deserialize)]
+#[serde(transparent)]
+struct SchemaIdJson(String);


Ideally this should belong in storage.rs or somewhere nearby, since this create and module don't normally care how things are stored on disk.

I have a slight preference (but only slight) to use

#[serde(into = "BTreeMap<String, _>")] #[serde(try_from = "BTreeMap<String, _>")]

where ever this is being used as a key if that means we can move this logic out of the sql create.

This is only necessary for dumping the catalog to a string. In terms of location, I did try putting it in storage.rs but it kind of felt out of place there since its not really related to how we persist things to disk, since it's only used for this one function. Instead of these custom impls, something else I tried was the following:

pub fn dump(&self) -> String { let database_by_str: BTreeMap<String, _> = self.database_by_id.iter() .map(|(key, value)| (key.to_string(), value.debug_json())) .collect(); serde_json::to_string(database_by_str).expect("serialization cannot fail") } impl Database { fn debug_json(&self) -> serde_json::Value { let schemas_by_str: BTreeMap<String, _> = self .schemas_by_id .iter() .map(|(key, value)| (key.to_string(), value)) .collect(); serde_json::json!({ name: self.name, id: self.id, schemas_by_id: schemas_by_str, schemas_by_name: self.schemas_by_name, owner_id: self.owner_id, }) } }

Maybe that's a bit better?

I slightly prefer your proposed approach compared to the current approach. The current approach forces us to always serialize SchemaId as a string while your proposed approach only serializes SchemaId as a string when it's required (in the dump function. I can imagine in the future if SchemaId is serialized in the stash then we would want to serialize it normally and not as a string.

This is all subjective though, so I'll leave the final decision up to you.

Updated to use the debug_json(...) approach! Since we only need this "special" case serialization for a single method, I think putting the special behavior in that method makes the most sense

src/sql/src/names.rs

maddyblue · 2023-04-13T20:48:12Z

src/adapter/src/catalog/storage.rs

+        // enums after this upgrade.
+        |txn: &mut Transaction<'_>, _now, _bootstrap_args| {
+            // Migrate all of our DatabaseKeys.
+            txn.databases.migrate(|key, value| {


The new struct fields and this migration can maybe be done in an easier way, if done before the big vec of migrations is run. You could write a stash function that calls iter_raw on these and attempts to marshal into the old struct. If successful, then you add a retraction to a batch and an addition with the new key. If the marshal into the old struct fails, it's already the new struct and should be skipped. The old structs would be copied and exist only within this migration function, and everything could then use the new enums elsewhere. Does that work? I think it'd simplify this.

Being able to have enum DatabaseKey { User(u64), System(u64) }, and a enum SchemaKey of the same, would definitely simplify things! My hesitation to adding custom migration logic like this, or custom serde impls was to prevent any complications when we later try to improve migrations, i.e. #17466. Although off the top of my head I don't think what you're proposing should complicate anything too much, so I don't really feel strongly one way or another, so I'll defer to you here on what you think is best :)

src/stash/src/lib.rs

Co-authored-by: Matt Jibson <matt.jibson@gmail.com>

do-not-use-parker-timmerman force-pushed the migration/mz_tables branch 4 times, most recently from 7227474 to 045fb9d Compare April 12, 2023 22:22

start

1d81b99

- add namespace to DatabaseKey, SchemaKey, and ItemValue - add stash migration - update Builtin tables and view - update sqllogictests - update testdrive

do-not-use-parker-timmerman force-pushed the migration/mz_tables branch from 045fb9d to 1d81b99 Compare April 12, 2023 22:27

do-not-use-parker-timmerman changed the title ~~[DRAFT] adapter: Change id column in mz_databases and mz_schemas~~ adapter: Change id column in mz_databases and mz_schemas Apr 12, 2023

do-not-use-parker-timmerman marked this pull request as ready for review April 12, 2023 23:20

do-not-use-parker-timmerman requested a review from a team as a code owner April 12, 2023 23:20

Fix upgrades

6f81b5b

do-not-use-parker-timmerman requested review from maddyblue and jkosh44 April 13, 2023 16:39

jkosh44 reviewed Apr 13, 2023

View reviewed changes

maddyblue reviewed Apr 13, 2023

View reviewed changes

review feedback, rename methods, remove footguns, add comments

f6c33ae

jkosh44 approved these changes Apr 14, 2023

View reviewed changes

get rid of the KeyJson types

9b89be9

maddyblue approved these changes Apr 15, 2023

View reviewed changes

src/stash/src/lib.rs Outdated Show resolved Hide resolved

teskje mentioned this pull request Apr 17, 2023

User/system replica IDs #18636

Merged

6 tasks

Update src/stash/src/lib.rs

efdcdd7

Co-authored-by: Matt Jibson <matt.jibson@gmail.com>

do-not-use-parker-timmerman enabled auto-merge (squash) April 17, 2023 13:12

do-not-use-parker-timmerman merged commit d6e4716 into MaterializeInc:main Apr 17, 2023

do-not-use-parker-timmerman mentioned this pull request Apr 17, 2023

Change id column in catalog tables mz_databases and mz_schemas to namespaced id #18506

Closed

do-not-use-parker-timmerman deleted the migration/mz_tables branch April 17, 2023 14:44

materialize-bot mentioned this pull request Apr 21, 2023

release: v0.52.0 required reviews #18907

Closed

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapter: Change id column in mz_databases and mz_schemas #18719

adapter: Change id column in mz_databases and mz_schemas #18719

do-not-use-parker-timmerman commented Apr 11, 2023 •

edited

Loading

jkosh44 left a comment

jkosh44 Apr 13, 2023

do-not-use-parker-timmerman Apr 14, 2023

jkosh44 Apr 14, 2023

do-not-use-parker-timmerman Apr 14, 2023

maddyblue Apr 13, 2023

do-not-use-parker-timmerman Apr 14, 2023

adapter: Change id column in mz_databases and mz_schemas #18719

adapter: Change id column in mz_databases and mz_schemas #18719

Conversation

do-not-use-parker-timmerman commented Apr 11, 2023 • edited Loading

Motivation

Tips for reviewer

Checklist

jkosh44 left a comment

Choose a reason for hiding this comment

jkosh44 Apr 13, 2023

Choose a reason for hiding this comment

do-not-use-parker-timmerman Apr 14, 2023

Choose a reason for hiding this comment

jkosh44 Apr 14, 2023

Choose a reason for hiding this comment

do-not-use-parker-timmerman Apr 14, 2023

Choose a reason for hiding this comment

maddyblue Apr 13, 2023

Choose a reason for hiding this comment

do-not-use-parker-timmerman Apr 14, 2023

Choose a reason for hiding this comment

do-not-use-parker-timmerman commented Apr 11, 2023 •

edited

Loading