Skip to content

[ontology] extend ontology views to cover BuiltinLog / mz_introspection#36406

Merged
mtabebe merged 8 commits intoMaterializeInc:mainfrom
mtabebe:ma/ontology/mz-introspection
May 8, 2026
Merged

[ontology] extend ontology views to cover BuiltinLog / mz_introspection#36406
mtabebe merged 8 commits intoMaterializeInc:mainfrom
mtabebe:ma/ontology/mz-introspection

Conversation

@mtabebe
Copy link
Copy Markdown
Contributor

@mtabebe mtabebe commented May 5, 2026

Problem:

The ontology views (mz_ontology_entity_types etc.) were generated by iterating builtins with the ontology type, but BuiltinLog entries were skipped.

Therefore, the Compute introspection logs in mz_introspection had no representation in the ontology

Solution:

  • Add ontology: Option field to BuiltinLog and annotate all 32 log entries with entity names, descriptions, FK/measures links, and semantic type annotations on GlobalId/MzTimestamp columns.
  • Extend generate_views() with a Builtin::Log arm

Testing:

  • SLT smoke tests in mz_ontology.slt verify that per-worker entities appear in entity_types with correct keys and relations, that measures links resolve
  • Unit tests cover the Log inclusion/exclusion path in generate_views.

@mtabebe mtabebe marked this pull request as ready for review May 5, 2026 15:32
@mtabebe mtabebe requested a review from a team as a code owner May 5, 2026 15:32
@mtabebe mtabebe requested a review from aljoscha May 5, 2026 15:32
@mtabebe mtabebe marked this pull request as draft May 5, 2026 15:32
@mtabebe mtabebe removed the request for review from aljoscha May 5, 2026 15:33
@mtabebe mtabebe force-pushed the ma/ontology/mz-introspection branch from 8fcd2d1 to a540be1 Compare May 5, 2026 16:02
@mtabebe mtabebe requested a review from ggevay May 5, 2026 17:14
@mtabebe mtabebe marked this pull request as ready for review May 5, 2026 17:14
@ggevay
Copy link
Copy Markdown
Contributor

ggevay commented May 6, 2026

I think it would be good if @MaterializeInc/cluster also reviewed this, because I don't know this introspection stuff very well.

@ggevay ggevay requested a review from a team May 6, 2026 12:41
@ggevay
Copy link
Copy Markdown
Contributor

ggevay commented May 6, 2026

I have a few comments for main (i.e., the state before the PR):

  • cargo fmt seems to have given up in some places, see e.g. the Ontology { for mz_objects. I think the standard fix in these cases is to manually get the code into into a state where cargo fmt is willing to operate on it.
  • LinkProperties:
    • ForeignKey's nullability still has the problem mentioned on the previous PR: I think foreign key traditionally means that you have referential integrity, i.e., when the column is not null then it is guaranteed to find a match. I think this is not true for the example mentioned there, sink_status_history, because a replica might go away, after which replica_id might no longer find a match. I think it needs some careful consideration how exactly to address this issue. E.g. one option would be to just drop these ForeignKey links, but that would lose valuable information: It's still a valid thing to do a join between these things, it's just that you need to make it an outer join. So, I guess we still want some kind of link, but I'm not sure what exactly to call this, if it's not exactly a foreign key.
    • MapsTo
      • Why are source_column and target_column optional?
      • MapsTo is on both mz_objects and mz_object_global_ids, but these look like totally different cases: mz_objects has a column, id, that can be mapped to global ids via another relation (mz_object_global_ids), while mz_object_global_ids is a relation that itself maps from catalog ids to global ids.
      • Also, I don't understand several things on the MapsTo link of mz_object_global_ids:
        • Why isn't the global_id column mentioned?
        • What is target_column here? Does target: "object" mean that the target is the thing that has entity_name: "object", which is mz_objects? But then to_type seems wrong: mz_object's id column is catalog id, not global id.
    • DependsOn
      • Is it intended that this is used only for mz_materialization_dependencies? The doc comment (not of DependsOn, but the doc comment of OntologyLink) gives mz_compute_dependencies as an example, but it's not actually used there.
      • What is source entity? "source_column"/"target_column" makes it look like as if it were the thing itself that has the annotation, but the doc comment is not consistent with this.
      • name: "materialization_depends_on" seems off. E.g. one could read it as "the source materialization depends on the target of this link", but this is not the case. This ties back to my earlier comment about OntologyLink::name not being defined precisely enough.
  • OntologyLink's doc comment has an overview of LinkProperties enum variants. Why not put that overview on LinkProperties instead?
  • mz_objects is missing some Unions, e.g. function and type, plus all the things inherited from mz_relations
  • mz_arrangement_sizes has a "Corresponds to" in column_comments, but no ForeignKey link. Is that intended?
  • There seems to be some confusion about the object_type for MVs whether it's materialized-view or materialized view. dash example, space example

Copy link
Copy Markdown
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the introspection relations have some complications that can't be modeled by the Ontology structs' current form, see comments below.

Comment thread src/catalog/src/builtin.rs Outdated
description: "Mapping from LIR node IDs to dataflow operator address ranges per worker.",
links: &const {
[OntologyLink {
name: "lir_of",
Copy link
Copy Markdown
Contributor

@ggevay ggevay May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This sounds like as if it were the LIR plan, but it's just an id, right?)

Comment thread src/catalog/src/builtin.rs Outdated
[OntologyLink {
name: "address_of",
target: "dataflow_operator_per_worker",
properties: LinkProperties::fk("id", "id", Cardinality::ManyToOne),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an id, worker_id composite key? This seems to be an issue for all the _per_worker relations.

Copy link
Copy Markdown
Contributor

@ggevay ggevay May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. there are some complicated join examples in the definition of mz_message_counts_per_worker:

JOIN received_cte USING (channel_id, from_worker_id, to_worker_id)
JOIN batch_sent_cte USING (channel_id, from_worker_id, to_worker_id)
JOIN batch_received_cte USING (channel_id, from_worker_id, to_worker_id)",

This also looks like we need some tricky composite keys, which can't be expressed by ForeignKey currently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll add extra keys!

@@ -2349,6 +2695,18 @@ pub static MZ_MESSAGE_COUNTS_RECEIVED_RAW: LazyLock<BuiltinLog> = LazyLock::new(
oid: oid::LOG_MZ_MESSAGE_COUNTS_RECEIVED_RAW_OID,
Copy link
Copy Markdown
Contributor

@ggevay ggevay May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all (or maybe only some of) the _raw relations have this weird thing where the actual payload is stuffed into the diff field, so you need to do weird grouped counts to get useful values. See e.g., the definition of received_cte. This can really mess up your day if you are not careful when querying one of these relations, so the ontology should definitely be made aware of this somehow.

(In that slack thread, it came up that we might even want to gate querying the raw stuff behind a session variable that needs to be explicitly turned on, to reduce the chances that people shoot themselves in the foot with queries that want to return 3049347435651 rows, like I did on my staging env.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition is that for now we just remove the ontology from these ... if we think it is valuable we add more semantics later

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!

mtabebe added 4 commits May 7, 2026 06:27
Problem:

The ontology views (mz_ontology_entity_types etc.) were generated
by iterating builtins with the ontology type, but BuiltinLog entries
were skipped.

Therefore, the Compute introspection logs in mz_introspection had no
representation in the ontology

Solution:
- Add ontology: Option<Ontology> field to BuiltinLog and annotate all
32 log entries with entity names, descriptions, FK/measures links, and
semantic type annotations on GlobalId/MzTimestamp columns.
- Extend generate_views() with a Builtin::Log arm

Testing:
- SLT smoke tests in mz_ontology.slt verify that per-worker entities
appear in entity_types with correct keys and relations, that measures links
resolve
- Unit tests cover the Log inclusion/exclusion path in generate_views.
@mtabebe mtabebe force-pushed the ma/ontology/mz-introspection branch from a540be1 to 2d11023 Compare May 7, 2026 16:45
@mtabebe
Copy link
Copy Markdown
Contributor Author

mtabebe commented May 7, 2026

Thanks for all the comments!

  • ForeignKey's nullability still has the problem mentioned on the previous PR

That does make sense ... what do you think about adding a new UnenforcedForeignKey variant? That just has the slightly different semantic meaning

Why are source_column and target_column optional?

I have removed the optionality, good feedback.

MapsTo

mz_object_global_ids now uses ForeignKey so we should be able to track back to the mz_objects.

What is source entity?

I fixed the docs to make it clear what the from/to relationship is

DependsOn

Updated the doc comments and switched mz_compute_dependencies.

mz_arrangement_sizes missing FK

The Measures link already records this information, so I left it out but we could add the foreign key too?

Copy link
Copy Markdown
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine! Spot-checked the ontology definition of the builtins, and seems to check out. Also, seems low risk if wrong.

name: "global_id_of",
target: "compute_export_per_worker",
properties: LinkProperties::MapsTo {
source_column: "global_id",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the global_id isn't guaranteed to exist on the other end, but it's not a foreign key, so probably ok?

Builtin::MaterializedView(mv) => (mv.name, &mv.desc, mv.ontology.as_ref()),
Builtin::Source(s) => (s.name, &s.desc, s.ontology.as_ref()),
Builtin::Log(l) => {
desc_storage = l.variant.desc();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💃

@@ -17729,11 +18041,16 @@ mod tests {
// we still need to verify the string value matches a real column.
let mut bad_source_cols = Vec::new();
for builtin in BUILTINS_STATIC.iter() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it'd be possible to move the tests to a separate file? It's getting really hard to view this absolute unit of a file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, IMO we should split this file up into things like mz_introspection.rs, pg_catalog.rs, mz_internal.rs and mz_catalog.rs.

This is a separate PR. I can do that separately.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtabebe mtabebe requested a review from a team as a code owner May 7, 2026 20:48
@mtabebe mtabebe force-pushed the ma/ontology/mz-introspection branch from 7fcba25 to 419b6a4 Compare May 8, 2026 01:28
@ggevay
Copy link
Copy Markdown
Contributor

ggevay commented May 8, 2026

what do you think about adding a new UnenforcedForeignKey variant? That just has the slightly different semantic meaning

Sounds fine to me!

DependsOn

Updated the doc comments and switched mz_compute_dependencies.

Could mz_object_dependencies and mz_object_transitive_dependencies use this as well?

@mtabebe mtabebe merged commit ac5e71a into MaterializeInc:main May 8, 2026
120 checks passed
mtabebe added a commit to mtabebe/materialize that referenced this pull request May 8, 2026
builtin.rs had grown to 18k lines, making it difficult to navigate.
(MaterializeInc#36406 (comment))
Split the builtin item definitions into five submodule files under
src/catalog/src/builtin/, organized by schema:

- pg_catalog.rs         TYPE_* consts + pg_catalog views
- mz_catalog.rs         mz_catalog tables/views/sources
- mz_internal.rs        mz_internal views/tables/sources/indexes
- mz_introspection.rs   mz_introspection logs/views
- information_schema.rs information_schema views

builtin.rs now holds: type/trait/struct definitions, shared helper consts,
roles/clusters, BUILTINS_STATIC, the BUILTINS pub mod, and lookup tables.
mtabebe added a commit to mtabebe/materialize that referenced this pull request May 8, 2026
builtin.rs had grown to 18k lines, making it difficult to navigate.
(MaterializeInc#36406 (comment))
Split the builtin item definitions into five submodule files under
src/catalog/src/builtin/, organized by schema:

- pg_catalog.rs         TYPE_* consts + pg_catalog views
- mz_catalog.rs         mz_catalog tables/views/sources
- mz_internal.rs        mz_internal views/tables/sources/indexes
- mz_introspection.rs   mz_introspection logs/views
- information_schema.rs information_schema views

builtin.rs now holds: type/trait/struct definitions, shared helper consts,
roles/clusters, BUILTINS_STATIC, the BUILTINS pub mod, and lookup tables.
mtabebe added a commit that referenced this pull request May 8, 2026
builtin.rs had grown to 18k lines, making it difficult to navigate.
(#36406 (comment))

Split the builtin item definitions into five submodule files under
src/catalog/src/builtin/, organized by schema:

- pg_catalog.rs         TYPE_* consts + pg_catalog views
- mz_catalog.rs         mz_catalog tables/views/sources
- mz_internal.rs        mz_internal views/tables/sources/indexes
- mz_introspection.rs   mz_introspection logs/views
- information_schema.rs information_schema views

builtin.rs now holds: type/trait/struct definitions, shared helper
consts, roles/clusters, BUILTINS_STATIC, the BUILTINS pub mod, and
lookup tables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants