Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"External introspection" draft design doc #17858

Merged
merged 2 commits into from
Apr 17, 2023

Conversation

umanwizard
Copy link
Contributor

@umanwizard umanwizard commented Feb 27, 2023

See rendered doc for details.

@umanwizard umanwizard requested review from RobinClowers, teskje and a team February 27, 2023 22:40
Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far!

frontier across all replicas.

The view will incorporate all possible inputs and outputs of a
dataflow: indexes, MVs, sources, and sinks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "subscribes" instead of "sinks"? It would be nice to also be able to reflect sink progress, but I don't think we can until we have logging for storage dataflows as well.

The global frontier lag visualizer will allow users to see at a glance
whether dataflows are falling behind. It will display the controller's
view of each source and export frontier, and color nodes if their
outputs lag significantly behind their inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that the graph should display all dataflows in the system? We might want to provide ways to filter it, since the number of dataflows can grow very large. Alternatively, we could make the graph specific to a selected dataflow and show only its transitive dependencies and dependants.

* The optimized and physical query plans
* The ID of the dataflow (if it is not a simple peek) used to service
the query
* The SQL text of the query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool, because it lets users connect introspection data about subscribes with queries they have made. Right now they only see "this dataflow exports t1", but there is nowhere to lookup what t1 is because we don't have an mz_subscribes table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out in slack, there's RBAC concerns here. However, there are probably already existing RBAC concerns for all of our other system tables. Worrying about row-level security for just this new table thus seems unlikely to realistically improve security without auditing all of the other system tables.

Recommendation: just move ahead with this without worrying about RBAC, and we will include in our RBAC design a proposal for how to handle all system tables that expose existence information there.

* The optimized and physical query plans
* The ID of the dataflow (if it is not a simple peek) used to service
the query
* The SQL text of the query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably exclude queries made by system users (mz_system and mz_introspection), to avoid noise.

* `mz_arrangement_sizes`
* `mz_dataflow_addresses`
* For the global frontier lag visualizer:
* `mz_object_dependencies`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be mz_{worker_}compute_dependencies instead? The difference between those two is that mz_object_dependencies connects catalog objects while mz_worker_compute_dependencies connects dataflows. Specifically:

  • mz_object_dependencies contains views as dependencies of indexes, which don't have frontiers. mz_worker_compute_dependencies does not contain views, because they are inlined into the dataflows.
  • mz_object_dependencies does not contain subscribes, because those are not stored in the catalog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that mz_worker_compute_dependencies is per-replica, so it's quite annoying to stitch it together for all replicas (and impossible to do in one query). A more fundamental issue is that if the replica is unhealthy, you may not be able to query the per-replica sources at all.

I was thinking we would write a query to filter out the things we don't care about in mz_object_dependencies (e.g. views and types).

However, your point about SUBSCRIBE not being present is true. Maybe we need to start collecting subscribe dependencies in a table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that mz_worker_compute_dependencies is per-replica, so it's quite annoying to stitch it together for all replicas (and impossible to do in one query). A more fundamental issue is that if the replica is unhealthy, you may not be able to query the per-replica sources at all.

Since all replicas are running the same dataflows, the contents of that relation should also be the same, unless we have bugs. In contrast to operator logging, dependency information is only removed when clusterd receives an AllowCompaction([]) command, which only happens when the adapter has told the controller to drop the dataflow, so that information stays in the logging sources of all replicas as long as the corresponding catalog object exists. So we should be fine to query any replica for dependency information.

It is true that this stops working with unhealthy replicas. Persisted introspection would probably solve this, and we might be fine with enabling it for a low-traffic collection like this?

Otherwise, adding SUBSCRIBEs to mz_object_dependencies would work too, if the surfaces team doesn't think this table should only include catalog objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should be fine to query any replica for dependency information.

Yes, but if we have several clusters, we will still have to make several queries (one for each cluster) although indeed the replica we choose is arbitrary. We will also have to first query mz_replicas to get the list of replicas. So it does make things a bit more complicated.

Another possibility: we don't need to change mz_object_dependencies; we could add our own Compute-owned mz_subscribes table, as you suggested on the other PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having dataflow dependencies be a controller-level collection makes a lot of sense. I think it commits to instantiating these specific things, with these specific dependencies, to its clients. Even if it does crazy per-replica stuff, we could (imo should) present this information back consistently from the controller.

Rather than derive this from replicas (what if a cluster has no replicas?) I think we should just directly have the controller report it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but if we have several clusters, we will still have to make several queries

Right, that would be annoying.

I think that having dataflow dependencies be a controller-level collection makes a lot of sense.

This is already what we do to get a global view of the frontier information, so doing the same thing for dependency information would only make sense!

I wonder if this would make the replica-exported introspection collections redundant, since they contain the same information, just harder to reach.

* For the flamegraph views: same as the hierarchical dataflow
visualizer
* For the per-query metadata: New table to be created
(`mz_query_metadata`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just mz_queries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, thanks

## Testing and observability

(TBD -- will cover this in our meeting with Robin tomorrow and get an
overview from him of the testing strategy for this kind of feature)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just flagging that we'll want to make sure that we have good continuous testing for the features described here, as most of them rely either on our unstable introspection sources or our unstable EXPLAIN output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I haven't filled out this section yet because I'm not sure what frameworks/tools are used for UI testing now, but we definitely want good end-to-end test coverage.

# Conclusion and alternatives
[conclusion-and-alternatives]: #conclusion-and-alternatives

- I am unaware of any other possible designs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to the current mz_query_metadata design would be letting users specify the queries to record somehow, e.g. by users. That would allow them to control the amount of data they would need to store for the query log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cockroach does both! It has a "here's the explain plans of things that have happened in the last hour" and also a button in their UI that records the full execution plan results of the next time that exact statement is run.

it acceptable?
* How can we communicate query IDs back to the user for looking up the
query in the per-query metadata view? Is it acceptable to use
`NOTICE` messages in this case?
Copy link
Contributor

@teskje teskje Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If mz_query_metadata contains the query SQL and can be filtered by user and ordered by time, maybe that's sufficient for users to find their queries already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably sufficient if you filter by user.

However, now I'm wondering if there is a privacy concern. If we store query metadata for all users in the same table, then anyone can query it and see what other users are doing. To give a Unix analogy, it's a bit like if your shell $HISTFILE was user-readable. We should check with Graham about whether this is acceptable.

@frankmcsherry
Copy link
Contributor

I like where this is headed, and will read with more thoughtful comments later today. But I wanted to drop in one "principle" I guess that I've been trying to follow: we should surface any introspectable data at the highest level of abstraction for which it exists. So, e.g. the controller should surface anything that it is certain of and can express back to its clients. E.g. what are the dependencies among ids, what are their read and write frontiers, what clusters exist, which replicas exist, which dataflows are installed on which clusters, .. all these things should be in-bounds for introspection, and ideally through the controller rather than through a replica. Replicas are then where we would find replica-specific stuff, like what did the timely instance report about its structure, how long are operators taking to run, arrangement size, messages transmitted, etc.

@antiguru antiguru mentioned this pull request Mar 19, 2023
9 tasks
@umanwizard umanwizard merged commit 6fe2c54 into MaterializeInc:main Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants