[ontology] extend ontology views to cover BuiltinLog / mz_introspection#36406
Conversation
8fcd2d1 to
a540be1
Compare
|
I think it would be good if @MaterializeInc/cluster also reviewed this, because I don't know this introspection stuff very well. |
|
I have a few comments for
|
| description: "Mapping from LIR node IDs to dataflow operator address ranges per worker.", | ||
| links: &const { | ||
| [OntologyLink { | ||
| name: "lir_of", |
There was a problem hiding this comment.
(This sounds like as if it were the LIR plan, but it's just an id, right?)
| [OntologyLink { | ||
| name: "address_of", | ||
| target: "dataflow_operator_per_worker", | ||
| properties: LinkProperties::fk("id", "id", Cardinality::ManyToOne), |
There was a problem hiding this comment.
Should this be an id, worker_id composite key? This seems to be an issue for all the _per_worker relations.
There was a problem hiding this comment.
Btw. there are some complicated join examples in the definition of mz_message_counts_per_worker:
JOIN received_cte USING (channel_id, from_worker_id, to_worker_id)
JOIN batch_sent_cte USING (channel_id, from_worker_id, to_worker_id)
JOIN batch_received_cte USING (channel_id, from_worker_id, to_worker_id)",
This also looks like we need some tricky composite keys, which can't be expressed by ForeignKey currently.
There was a problem hiding this comment.
Good point, I'll add extra keys!
| @@ -2349,6 +2695,18 @@ pub static MZ_MESSAGE_COUNTS_RECEIVED_RAW: LazyLock<BuiltinLog> = LazyLock::new( | |||
| oid: oid::LOG_MZ_MESSAGE_COUNTS_RECEIVED_RAW_OID, | |||
There was a problem hiding this comment.
I think all (or maybe only some of) the _raw relations have this weird thing where the actual payload is stuffed into the diff field, so you need to do weird grouped counts to get useful values. See e.g., the definition of received_cte. This can really mess up your day if you are not careful when querying one of these relations, so the ontology should definitely be made aware of this somehow.
(In that slack thread, it came up that we might even want to gate querying the raw stuff behind a session variable that needs to be explicitly turned on, to reduce the chances that people shoot themselves in the foot with queries that want to return 3049347435651 rows, like I did on my staging env.)
There was a problem hiding this comment.
My intuition is that for now we just remove the ontology from these ... if we think it is valuable we add more semantics later
Problem: The ontology views (mz_ontology_entity_types etc.) were generated by iterating builtins with the ontology type, but BuiltinLog entries were skipped. Therefore, the Compute introspection logs in mz_introspection had no representation in the ontology Solution: - Add ontology: Option<Ontology> field to BuiltinLog and annotate all 32 log entries with entity names, descriptions, FK/measures links, and semantic type annotations on GlobalId/MzTimestamp columns. - Extend generate_views() with a Builtin::Log arm Testing: - SLT smoke tests in mz_ontology.slt verify that per-worker entities appear in entity_types with correct keys and relations, that measures links resolve - Unit tests cover the Log inclusion/exclusion path in generate_views.
a540be1 to
2d11023
Compare
|
Thanks for all the comments!
That does make sense ... what do you think about adding a new UnenforcedForeignKey variant? That just has the slightly different semantic meaning
I have removed the optionality, good feedback.
mz_object_global_ids now uses ForeignKey so we should be able to track back to the mz_objects.
I fixed the docs to make it clear what the from/to relationship is
Updated the doc comments and switched mz_compute_dependencies.
The Measures link already records this information, so I left it out but we could add the foreign key too? |
antiguru
left a comment
There was a problem hiding this comment.
Seems fine! Spot-checked the ontology definition of the builtins, and seems to check out. Also, seems low risk if wrong.
| name: "global_id_of", | ||
| target: "compute_export_per_worker", | ||
| properties: LinkProperties::MapsTo { | ||
| source_column: "global_id", |
There was a problem hiding this comment.
Note that the global_id isn't guaranteed to exist on the other end, but it's not a foreign key, so probably ok?
| Builtin::MaterializedView(mv) => (mv.name, &mv.desc, mv.ontology.as_ref()), | ||
| Builtin::Source(s) => (s.name, &s.desc, s.ontology.as_ref()), | ||
| Builtin::Log(l) => { | ||
| desc_storage = l.variant.desc(); |
| @@ -17729,11 +18041,16 @@ mod tests { | |||
| // we still need to verify the string value matches a real column. | |||
| let mut bad_source_cols = Vec::new(); | |||
| for builtin in BUILTINS_STATIC.iter() { | |||
There was a problem hiding this comment.
Do you think it'd be possible to move the tests to a separate file? It's getting really hard to view this absolute unit of a file.
There was a problem hiding this comment.
Good point, IMO we should split this file up into things like mz_introspection.rs, pg_catalog.rs, mz_internal.rs and mz_catalog.rs.
This is a separate PR. I can do that separately.
7fcba25 to
419b6a4
Compare
Sounds fine to me!
Could |
builtin.rs had grown to 18k lines, making it difficult to navigate. (MaterializeInc#36406 (comment)) Split the builtin item definitions into five submodule files under src/catalog/src/builtin/, organized by schema: - pg_catalog.rs TYPE_* consts + pg_catalog views - mz_catalog.rs mz_catalog tables/views/sources - mz_internal.rs mz_internal views/tables/sources/indexes - mz_introspection.rs mz_introspection logs/views - information_schema.rs information_schema views builtin.rs now holds: type/trait/struct definitions, shared helper consts, roles/clusters, BUILTINS_STATIC, the BUILTINS pub mod, and lookup tables.
builtin.rs had grown to 18k lines, making it difficult to navigate. (MaterializeInc#36406 (comment)) Split the builtin item definitions into five submodule files under src/catalog/src/builtin/, organized by schema: - pg_catalog.rs TYPE_* consts + pg_catalog views - mz_catalog.rs mz_catalog tables/views/sources - mz_internal.rs mz_internal views/tables/sources/indexes - mz_introspection.rs mz_introspection logs/views - information_schema.rs information_schema views builtin.rs now holds: type/trait/struct definitions, shared helper consts, roles/clusters, BUILTINS_STATIC, the BUILTINS pub mod, and lookup tables.
builtin.rs had grown to 18k lines, making it difficult to navigate. (#36406 (comment)) Split the builtin item definitions into five submodule files under src/catalog/src/builtin/, organized by schema: - pg_catalog.rs TYPE_* consts + pg_catalog views - mz_catalog.rs mz_catalog tables/views/sources - mz_internal.rs mz_internal views/tables/sources/indexes - mz_introspection.rs mz_introspection logs/views - information_schema.rs information_schema views builtin.rs now holds: type/trait/struct definitions, shared helper consts, roles/clusters, BUILTINS_STATIC, the BUILTINS pub mod, and lookup tables.
Problem:
The ontology views (mz_ontology_entity_types etc.) were generated by iterating builtins with the ontology type, but BuiltinLog entries were skipped.
Therefore, the Compute introspection logs in mz_introspection had no representation in the ontology
Solution:
Testing: