-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adapter: Change id column in mz_databases and mz_schemas #18719
adapter: Change id column in mz_databases and mz_schemas #18719
Conversation
7227474
to
045fb9d
Compare
045fb9d
to
1d81b99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a couple of questions and comments. Also since we're adding a system variant to schemas, we may want to use it for the catalog schemas.
src/sql/src/names.rs
Outdated
/// A type that specifies how we serialize and deserialize a [`SchemaId`]. | ||
/// | ||
/// Note: JSON maps require their keys to be strings, and we use [`SchemaId`] as the key | ||
/// in some maps, so we need to serialize them as plain strings, which is what this type | ||
/// does. | ||
#[derive(Clone, Debug, Serialize, Deserialize)] | ||
#[serde(transparent)] | ||
struct SchemaIdJson(String); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this should belong in storage.rs
or somewhere nearby, since this create and module don't normally care how things are stored on disk.
I have a slight preference (but only slight) to use
#[serde(into = "BTreeMap<String, _>")]
#[serde(try_from = "BTreeMap<String, _>")]
where ever this is being used as a key if that means we can move this logic out of the sql
create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only necessary for dumping the catalog to a string. In terms of location, I did try putting it in storage.rs
but it kind of felt out of place there since its not really related to how we persist things to disk, since it's only used for this one function. Instead of these custom impls, something else I tried was the following:
pub fn dump(&self) -> String {
let database_by_str: BTreeMap<String, _> = self.database_by_id.iter()
.map(|(key, value)| (key.to_string(), value.debug_json()))
.collect();
serde_json::to_string(database_by_str).expect("serialization cannot fail")
}
impl Database {
fn debug_json(&self) -> serde_json::Value {
let schemas_by_str: BTreeMap<String, _> = self
.schemas_by_id
.iter()
.map(|(key, value)| (key.to_string(), value))
.collect();
serde_json::json!({
name: self.name,
id: self.id,
schemas_by_id: schemas_by_str,
schemas_by_name: self.schemas_by_name,
owner_id: self.owner_id,
})
}
}
Maybe that's a bit better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I slightly prefer your proposed approach compared to the current approach. The current approach forces us to always serialize SchemaId
as a string while your proposed approach only serializes SchemaId
as a string when it's required (in the dump
function. I can imagine in the future if SchemaId
is serialized in the stash then we would want to serialize it normally and not as a string.
This is all subjective though, so I'll leave the final decision up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use the debug_json(...)
approach! Since we only need this "special" case serialization for a single method, I think putting the special behavior in that method makes the most sense
// enums after this upgrade. | ||
|txn: &mut Transaction<'_>, _now, _bootstrap_args| { | ||
// Migrate all of our DatabaseKeys. | ||
txn.databases.migrate(|key, value| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new struct fields and this migration can maybe be done in an easier way, if done before the big vec of migrations is run. You could write a stash function that calls iter_raw on these and attempts to marshal into the old struct. If successful, then you add a retraction to a batch and an addition with the new key. If the marshal into the old struct fails, it's already the new struct and should be skipped. The old structs would be copied and exist only within this migration function, and everything could then use the new enums elsewhere. Does that work? I think it'd simplify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being able to have enum DatabaseKey { User(u64), System(u64) }
, and a enum SchemaKey
of the same, would definitely simplify things! My hesitation to adding custom migration logic like this, or custom serde impls was to prevent any complications when we later try to improve migrations, i.e. #17466. Although off the top of my head I don't think what you're proposing should complicate anything too much, so I don't really feel strongly one way or another, so I'll defer to you here on what you think is best :)
Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
Motivation
This PR adds a known-desirable feature: #18506
This PR namespaces database and schema ids, by either "user" or "system, currently everything is "user". There are two major changes:
DatabaseKey
,SchemaKey
,SchemaValue
, andItemValue
, so they contain namespaces.To add a namespace to
DatabaseKey
andSchemaKey
I created newDatabaseNamespace
andSchemaNamespace
types, which are optional. This is necessary to support serializing the existing entries, and the new entries. Another approach would be to makeDatabaseKey
andSchemaKey
enums, and then write a custom implementation ofSerialize
andDeserialize
, but after prototyping it, that seemed trickier to maintain.Tips for reviewer
I'm mostly unsure about the migrations of the builtin tables. I believe it should "just work" since the fingerprint of each builtin should change, but if someone could confirm that would be great!
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way) and therefore is tagged with aT-proto
label.