Add design doc for proposed changes to `mz_internal` #17836

umanwizard · 2023-02-26T01:41:12Z

See rendered doc here for details.

benesch

Really excited to see this work happening, @umanwizard!

doc/developer/design/20230225_compute_introspection_schema.md

benesch · 2023-02-27T04:28:32Z

doc/developer/design/20230225_compute_introspection_schema.md

+* All per-worker relations should have a corresponding "global" view
+  where the results are aggregated across workers in the way most
+  likely to be useful (for example, summing, taking the minimum, or
+  restricting to worker 0). Because these global views are more likely
+  to be useful than the underlying sources, they should have the basic
+  name (e.g., `mz_arrangement_sharing`). The underlying source should
+  have a name with `per_worker` inserted (e.g.,
+  `mz_per_worker_arrangement_sharing`). Furthermore, the per-worker
+  variant should be clearly less prominent in documentation (either at
+  the bottom of the page, or in a "see more" section that has to be
+  clicked to expand).


Strong agree on this.

doc/developer/design/20230225_compute_introspection_schema.md

benesch · 2023-02-27T04:30:13Z

doc/developer/design/20230225_compute_introspection_schema.md

+* All per-replica sources and views should have their names prepended
+  with some string (perhaps `clusterd_`, though I'm open to
+  suggestions) that makes it obvious which ones they are. Furthermore,
+  the limitations of per-replica relations should be clearly
+  documented.


"clusterd" is not a user facing concept (outside of architecture discussion), but "cluster" is! Would strongly endorse calling these something with cluster in the name.

Cluster unification really did us a solid here, since we no longer need to figure out how to disambiguate between "storage" introspection views and "compute" introspection views.

This is tough. I would have intuitively suggested that we prefix with cluster_replica, to signify that these relations are implicitly scoped by the targeted replica. However, we already have a bunch of mz_cluster_replica_* relations that are written by the controller and expose the global view (with a replica_id column):

mz_cluster_replica_metrics

mz_cluster_replica_sizes

mz_cluster_replica_statuses

mz_cluster_replica_utilization

mz_cluster_replica_frontiers

FWIW, here is some prior discussion about the mz_cluster_replica_* naming in Slack.

@benesch , I think it's fine to expose non-user-facing concepts in internal stuff. I mean, dataflow channels and operators aren't a user-facing concept either :)

I liked clusterd because it precisely describes where the logging happens and where the relations actually live (as opposed to being persisted). I'd also be okay with cluster_replica (and renaming all the stuff that already has it)

I would be great to find something else than cluster_replica. mz_cluster_replicas is already a stable system table that we cannot rename, so using the very similar mz_cluster_replica as an introspection source prefix would be confusing.

Some brainstorm ideas, none of them great:

clusterd. Already mentioned above.

cluster_side. To avoid the internal name.

replica_side. A bit more specific/correct.

replica. Maybe harder to confuse than cluster_replica, but still quite confusing. It's not obvious that leaving out the "cluster" would have significance.

compute. We will use the introspection sources for storage dataflows too, so this would not be referring to COMPUTE the layer, but to our computation fabric.

computation. Less loaded alternative to the above.

Using a prefix different then mz_, strawman: log_.

Using a schema different from mz_internal, strawman: mz_log.

I like the idea of having special query syntax too. I think the reason we didn't really consider it before was that people pointed out that changing the SELECT syntax is scary. I don't remember the details though. I think @sploiselle might?

Another thing to consider is that SUBSCRIBEs can also target replicas, so if we introduce dedicated syntax, that needs to apply to subscribes as well.

For me, the value of having a specific prefix for log sources and views over them is not so much that you can immediately see that they are replica-specific, but rather that they were collected on the replica-side, rather than on the environmentd-side. That's a subtle difference, since the latter applies even if we lift the replica-targeting restriction in the future. My thought was that we might want to collect the same information on both sides and need a way to express the distinction to the user. For example today we have mz_cluster_replica_frontiers and mz_compute_frontiers, and it's not really clear how they differ. That said, I'm not actually sure mz_compute_frontiers actually provides much additional value over mz_cluster_replica_frontiers, so maybe we should just remove it and resolve the ambiguity that way? If it is never the case that we want to collect the same information from both the replicas and environmentd, then a per-replica prefix for relation names is not needed either.

Aside: another confusing point is that the terms "logs" and "sources" are often used interchangeably in this context (whereas "sources" from Storage's perspective are a rather different thing)

I've thought that maybe we should have an mz_logs table that's different from mz_sources. But I think right now the distinction between sources and tables is pretty muddled anyway, so there is opportunity for coming up with a holistic design separately.

@benesch , I think it's fine to expose non-user-facing concepts in internal stuff. I mean, dataflow channels and operators aren't a user-facing concept either :)

Dataflows, channels, and operators are user facing, in my view. Only for the most advanced of users, but they are concepts that must be understood to make sense of the introspection relations.

By contrast, I would like for clusterd to never leak into the user interface at all. If storaged or computed had leaked into the interface, for example, that would have sank cluster unification.

One idea I had was to simply refuse to allow replica-directed queries without special syntax, e.g. something like SELECT IN REPLICA (default.r1) * FROM mz_internal.mz_dataflow_channel_operators. This would also allow us to get rid of the cluster_replica session variable.

Another reason why I like this idea is that replica-directed queries are one of the very few ways of introducing non-determinism in Materialize (the only other one I can think of is non-materializable functions). I.e., if the state of persist and catalogs, the selected timestamps, and the query issued are all held constant, the results are fully determined. However, replica-directed queries are an exception to this rule so it makes sense to force them to be queried with different syntax.

I wrote up thoughts about this in the past: #15195 (comment)

IMO, introducing special syntax specifically for replica-targeted queries would be a mistake. There are plenty of other session variables that quietly impact how your query works (e.g., database, search_path, cluster). I don't see why cluster_replica is more special those session variables. The argument that it introduces non-determinism doesn't resonate for me, given that (as you say) SELECT mz_version() is also nondeterministic, and we don't make you use special syntax to type that.

As mentioned in the comment I linked above, no objection to extending all SQL statements to support setting arbitrary session variables.

Not sure how good of an argument that is, but typing queries against introspection sources is already annoying because you have to prefix all of them with mz_internal. every time.

As of recently, you can say set search_path = mz_catalog, mz_internal, public, and you'll no longer need to prefix all mz_internal tables with the schema name.

Aside: another confusing point is that the terms "logs" and "sources" are often used interchangeably in this context (whereas "sources" from Storage's perspective are a rather different thing)

I've thought that maybe we should have an mz_logs table that's different from mz_sources. But I think right now the distinction between sources and tables is pretty muddled anyway, so there is opportunity for coming up with a holistic design separately.

Totally agree that we should separate the concepts of "logs" and "sources".

I've never liked the term "log" though. It gets confused with what most people think of as a log: lines of text written to stdout and stderr. As unwieldy as "introspection" is, it's free of baggage.

The best idea I have is a prefix or suffix like active_replica. For example: mz_active_replica_dataflow_operators. It's wordy, though...

To come clean: pre-cluster unification I was opposed to ever using the term "replica" alone (vs. "cluster replica"). I've softened my stance there, since we no longer are thinking we'll need to distinguish between "storage replicas" and "cluster replicas".

By contrast, I would like for clusterd to never leak into the user interface at all. If storaged or computed had leaked into the interface, for example, that would have sank cluster unification.

Assuming that "user interface" here only concerns the introspection sources, why would that have been an issue? Backwards compatibility shouldn't be a concern, since we declare all the introspection sources and views as unstable.

As unwieldy as "introspection" is, it's free of baggage.

Although we now have the mz_introspection cluster, which uses the term for "running SHOW commands and the like". Which also means that adding a separate mz_introspection schema for introspection sources and views to live in would be confusing.

Maybe there is no need to assign a specific prefix to introspection sources. AFAICS, there are two use-cases for a special prefix:

Telling users which relations are per-replica and which are not. I think we can solve this with documentation, like having a separate section on the mz_internal catalog docs page that lists all the introspection sources. And hopefully we will end up with a solution that removes per-replica relations at some point.

Differentiating between sources that provide the same/similar data but collected at different places, for example mz_compute_frontiers (collected on the replica) vs. mz_cluster_replica_frontiers (collected by the controller). I think in many of these cases, we can just get rid of one of the two relations. If there is a controller-exported relation exposing the same data as a replica-exported one, the replica-exported one is probably not necessary. There will of course be exceptions to this rule, but maybe those can be solved on a case-by-case basis.

By contrast, I would like for clusterd to never leak into the user interface at all. If storaged or computed had leaked into the interface, for example, that would have sank cluster unification.

Assuming that "user interface" here only concerns the introspection sources, why would that have been an issue? Backwards compatibility shouldn't be a concern, since we declare all the introspection sources and views as unstable.

Ah, sorry, I meant if the terms "storaged" or "computed" had leaked into any of the stable, public relations in mz_catalog.

benesch · 2023-02-27T04:31:08Z

doc/developer/design/20230225_compute_introspection_schema.md

+
+(Not quite sure about this -- it's tricky to balance ease-of-use with
+not blowing up the set of documented relations)
+


I believe @frankmcsherry has a laundry list of controller-maintained introspection views that would be nice to have. Is this design document the right place to begin gathering those, or are those best left to a future design document?

Yes, I'd love to have the laundry list!

One thing Frank mentioned on Slack:

In particular, it is hard to get end-to-end lag (from sources all the way trhough to indexes) though "easy" to get lag for one dataflow (it's inputs to outputs).

We should definitely have this, if we think of a good way to actually compute it ... you need to know the transitive dependencies of each dataflow, which is fundamentally recursive, and we don't (yet) have recursive queries outside of unsafe mode. I'll check whether it's easy to get transitive dependencies in the code.

teskje

Thanks for writing this up!

doc/developer/design/20230225_compute_introspection_schema.md

teskje · 2023-02-27T11:21:48Z

doc/developer/design/20230225_compute_introspection_schema.md

+* All per-replica sources and views should have their names prepended
+  with some string (perhaps `clusterd_`, though I'm open to
+  suggestions) that makes it obvious which ones they are. Furthermore,
+  the limitations of per-replica relations should be clearly
+  documented.


This is tough. I would have intuitively suggested that we prefix with cluster_replica, to signify that these relations are implicitly scoped by the targeted replica. However, we already have a bunch of mz_cluster_replica_* relations that are written by the controller and expose the global view (with a replica_id column):

mz_cluster_replica_metrics

mz_cluster_replica_sizes

mz_cluster_replica_statuses

mz_cluster_replica_utilization

mz_cluster_replica_frontiers

doc/developer/design/20230225_compute_introspection_schema.md

Co-authored-by: Jan Teske <jteske@posteo.net>

umanwizard · 2023-02-27T20:39:54Z

I'm now thinking about whether we should also expose per-operator frontiers. I'm not sure why we don't have them; maybe the data volume was thought to be too high?

teskje · 2023-02-28T09:27:18Z

I'm now thinking about whether we should also expose per-operator frontiers. I'm not sure why we don't have them; maybe the data volume was thought to be too high?

We have mz_dataflow_operator_reachability, which I think is basically the same as operator frontiers? We don't have documentation for it, so I'm not entirely sure, but at least the amount of data should be similar to that of operator frontier tracking.

vmarcos

Thanks for taking this on! Overall, the suggestions here are really great and will help us a lot.

doc/developer/design/20230225_compute_introspection_schema.md

vmarcos · 2023-02-28T10:54:10Z

doc/developer/design/20230225_compute_introspection_schema.md

+
+#### More views enriched with operator names, dataflow names, etc.
+
+(Not quite sure about this -- it's tricky to balance ease-of-use with


Perhaps a start would be to co-design these additional views with the external introspection initiative? That way, the scope could be restricted to views that would make implementing external introspection significantly easier. If we think these would be optimized for UI use and might exceed the scope of mz_internal, one could think of separating them out in a different schema, for example.

Indeed! I plan to continue updating this PR if we discover any possible new views through that effort.

vmarcos · 2023-02-28T11:01:27Z

doc/developer/design/20230225_compute_introspection_schema.md

+
+## Open questions
+
+* Are there further views that could be useful but that we don't have


When calculating frontier lag, we usually have to write some tedious joins between mz_compute_frontiers and mz_compute_import_frontiers. Perhaps a ready-made view there would be pretty useful?

vmarcos · 2023-02-28T11:11:10Z

Ah, one other comment that crossed my mind: Our treatment of SUBSCRIBEs in introspection sources / views could probably be improved. For example, in mz_compute_frontiers et al., we show SUBSCRIBEs, though the documentation is inconsistent about it. Also, I am not sure whether we have a list of the SUBSCRIBEs that are running along with their SQL statements? We can look at dataflows and do some string matching, which is not ideal (but that is what our documentation suggests here).

teskje · 2023-02-28T11:21:06Z

Our treatment of SUBSCRIBEs in introspection sources / views could probably be improved.

Maybe we should just add an mz_subscribes table? Subscribes don't have names, but we could put in (time, user, SQL) instead, to make them identifiable.

doc/developer/design/20230225_compute_introspection_schema.md

Co-authored-by: Marcos Antonio Vaz Salles <vmarcos@materialize.com>

teskje · 2023-03-07T09:19:37Z

An mz_subscriptions table listing all active subscribes exists now: #17906.

doc/developer/design/20230225_compute_introspection_schema.md

teskje · 2023-03-10T11:50:43Z

We have mz_dataflow_operator_reachability, which I think is basically the same as operator frontiers? We don't have documentation for it, so I'm not entirely sure, but at least the amount of data should be similar to that of operator frontier tracking.

I've checked this: mz_dataflow_operator_reachability doesn't give us operator frontiers directly. It gives us the capabilities held by each operator output and input. Frontiers can be derived from that, if you know the structure of the dataflow graph and the path summaries between individual operators. But a) we don't know the path summaries and b) we would surely not want to perform the work of deriving frontiers ourselves, even if we had all required information.

I think timely's reachability tracker could be made to log frontier updates as well as capability updates. If we want to show operator frontiers to users, this is the only feasible way I can think of.

doc/developer/design/20230225_compute_introspection_schema.md

antiguru

This looks great!

I'm not sure we actually want to log per-operator frontiers. It's potentially a lot of data and we should first determine what the cost would be in relation to the benefit it would bring us.

* Duration intervals vs nanoseconds timestamps * Histogram sum fields * _raw variants

Add design doc for introspection schema changes

69ce05e

umanwizard requested review from antiguru, teskje, a team and frankmcsherry February 26, 2023 01:41

benesch reviewed Feb 27, 2023

View reviewed changes

teskje reviewed Feb 27, 2023

View reviewed changes

umanwizard and others added 2 commits February 27, 2023 09:22

Update doc/developer/design/20230225_compute_introspection_schema.md

f8bf6ba

Co-authored-by: Jan Teske <jteske@posteo.net>

Address comments

fcfa5a9

vmarcos approved these changes Feb 28, 2023

View reviewed changes

teskje reviewed Feb 28, 2023

View reviewed changes

doc/developer/design/20230225_compute_introspection_schema.md Outdated Show resolved Hide resolved

umanwizard and others added 2 commits February 28, 2023 09:18

Update doc/developer/design/20230225_compute_introspection_schema.md

243a6de

Co-authored-by: Marcos Antonio Vaz Salles <vmarcos@materialize.com>

Fix issue

712602c

teskje reviewed Mar 7, 2023

View reviewed changes

doc/developer/design/20230225_compute_introspection_schema.md Outdated Show resolved Hide resolved

teskje mentioned this pull request Mar 11, 2023

Refactor compute logging code for readability #18062

Merged

5 tasks

teskje reviewed Mar 13, 2023

View reviewed changes

doc/developer/design/20230225_compute_introspection_schema.md Outdated Show resolved Hide resolved

antiguru reviewed Mar 13, 2023

View reviewed changes

teskje mentioned this pull request Mar 14, 2023

Introspection sources improvements, part 1 #17980

Merged

6 tasks

teskje mentioned this pull request Mar 22, 2023

Introspection sources improvements, part 2 #18277

Merged

5 tasks

do-not-use-parker-timmerman mentioned this pull request Mar 28, 2023

Support IN CLUSTER for more commands. #15195

Open

Reflect discussion on the PR:

45a7c73

* Duration intervals vs nanoseconds timestamps * Histogram sum fields * _raw variants

teskje force-pushed the introspection_schema branch from 6f61d8e to 45a7c73 Compare April 6, 2023 12:08

umanwizard added 2 commits April 19, 2023 10:20

Remove controversial thing that we are probably not going to implement

fb6df9b

Delete trailing whitespace

cc6ba56

umanwizard enabled auto-merge (squash) April 19, 2023 14:24

umanwizard merged commit 4f6e48a into MaterializeInc:main Apr 19, 2023

materialize-bot mentioned this pull request Apr 21, 2023

release: v0.52.0 required reviews #18907

Closed

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add design doc for proposed changes to `mz_internal` #17836

Add design doc for proposed changes to `mz_internal` #17836

umanwizard commented Feb 26, 2023 •

edited

Loading

benesch left a comment

benesch Feb 27, 2023

benesch Feb 27, 2023

teskje Feb 27, 2023

teskje Feb 27, 2023 •

edited

Loading

umanwizard Feb 27, 2023

teskje Feb 27, 2023

teskje Mar 10, 2023

benesch Mar 12, 2023

benesch Mar 12, 2023

teskje Mar 20, 2023

benesch Mar 20, 2023

benesch Feb 27, 2023

umanwizard Feb 27, 2023

umanwizard Feb 27, 2023

teskje left a comment

teskje Feb 27, 2023

umanwizard commented Feb 27, 2023

teskje commented Feb 28, 2023

vmarcos left a comment

vmarcos Feb 28, 2023

umanwizard Mar 6, 2023

vmarcos Feb 28, 2023

vmarcos commented Feb 28, 2023

teskje commented Feb 28, 2023

teskje commented Mar 7, 2023

teskje commented Mar 10, 2023

antiguru left a comment


		(Not quite sure about this -- it's tricky to balance ease-of-use with
		not blowing up the set of documented relations)


		#### More views enriched with operator names, dataflow names, etc.

		(Not quite sure about this -- it's tricky to balance ease-of-use with


		## Open questions

		* Are there further views that could be useful but that we don't have

Add design doc for proposed changes to mz_internal #17836

Add design doc for proposed changes to mz_internal #17836

Conversation

umanwizard commented Feb 26, 2023 • edited Loading

benesch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

teskje Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

teskje left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

umanwizard commented Feb 27, 2023

teskje commented Feb 28, 2023

vmarcos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmarcos commented Feb 28, 2023

teskje commented Feb 28, 2023

teskje commented Mar 7, 2023

teskje commented Mar 10, 2023

antiguru left a comment

Choose a reason for hiding this comment

Add design doc for proposed changes to `mz_internal` #17836

Add design doc for proposed changes to `mz_internal` #17836

umanwizard commented Feb 26, 2023 •

edited

Loading

teskje Feb 27, 2023 •

edited

Loading