-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"External introspection" draft design doc #17858
"External introspection" draft design doc #17858
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far!
frontier across all replicas. | ||
|
||
The view will incorporate all possible inputs and outputs of a | ||
dataflow: indexes, MVs, sources, and sinks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "subscribes" instead of "sinks"? It would be nice to also be able to reflect sink progress, but I don't think we can until we have logging for storage dataflows as well.
The global frontier lag visualizer will allow users to see at a glance | ||
whether dataflows are falling behind. It will display the controller's | ||
view of each source and export frontier, and color nodes if their | ||
outputs lag significantly behind their inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that the graph should display all dataflows in the system? We might want to provide ways to filter it, since the number of dataflows can grow very large. Alternatively, we could make the graph specific to a selected dataflow and show only its transitive dependencies and dependants.
* The optimized and physical query plans | ||
* The ID of the dataflow (if it is not a simple peek) used to service | ||
the query | ||
* The SQL text of the query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool, because it lets users connect introspection data about subscribes with queries they have made. Right now they only see "this dataflow exports t1
", but there is nowhere to lookup what t1
is because we don't have an mz_subscribes
table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As pointed out in slack, there's RBAC concerns here. However, there are probably already existing RBAC concerns for all of our other system tables. Worrying about row-level security for just this new table thus seems unlikely to realistically improve security without auditing all of the other system tables.
Recommendation: just move ahead with this without worrying about RBAC, and we will include in our RBAC design a proposal for how to handle all system tables that expose existence information there.
* The optimized and physical query plans | ||
* The ID of the dataflow (if it is not a simple peek) used to service | ||
the query | ||
* The SQL text of the query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably exclude queries made by system users (mz_system
and mz_introspection
), to avoid noise.
* `mz_arrangement_sizes` | ||
* `mz_dataflow_addresses` | ||
* For the global frontier lag visualizer: | ||
* `mz_object_dependencies` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that be mz_{worker_}compute_dependencies
instead? The difference between those two is that mz_object_dependencies
connects catalog objects while mz_worker_compute_dependencies
connects dataflows. Specifically:
mz_object_dependencies
contains views as dependencies of indexes, which don't have frontiers.mz_worker_compute_dependencies
does not contain views, because they are inlined into the dataflows.mz_object_dependencies
does not contain subscribes, because those are not stored in the catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that mz_worker_compute_dependencies
is per-replica, so it's quite annoying to stitch it together for all replicas (and impossible to do in one query). A more fundamental issue is that if the replica is unhealthy, you may not be able to query the per-replica sources at all.
I was thinking we would write a query to filter out the things we don't care about in mz_object_dependencies
(e.g. views and types).
However, your point about SUBSCRIBE
not being present is true. Maybe we need to start collecting subscribe dependencies in a table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that mz_worker_compute_dependencies is per-replica, so it's quite annoying to stitch it together for all replicas (and impossible to do in one query). A more fundamental issue is that if the replica is unhealthy, you may not be able to query the per-replica sources at all.
Since all replicas are running the same dataflows, the contents of that relation should also be the same, unless we have bugs. In contrast to operator logging, dependency information is only removed when clusterd receives an AllowCompaction([])
command, which only happens when the adapter has told the controller to drop the dataflow, so that information stays in the logging sources of all replicas as long as the corresponding catalog object exists. So we should be fine to query any replica for dependency information.
It is true that this stops working with unhealthy replicas. Persisted introspection would probably solve this, and we might be fine with enabling it for a low-traffic collection like this?
Otherwise, adding SUBSCRIBEs to mz_object_dependencies
would work too, if the surfaces team doesn't think this table should only include catalog objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we should be fine to query any replica for dependency information.
Yes, but if we have several clusters, we will still have to make several queries (one for each cluster) although indeed the replica we choose is arbitrary. We will also have to first query mz_replicas
to get the list of replicas. So it does make things a bit more complicated.
Another possibility: we don't need to change mz_object_dependencies
; we could add our own Compute-owned mz_subscribes
table, as you suggested on the other PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that having dataflow dependencies be a controller-level collection makes a lot of sense. I think it commits to instantiating these specific things, with these specific dependencies, to its clients. Even if it does crazy per-replica stuff, we could (imo should) present this information back consistently from the controller.
Rather than derive this from replicas (what if a cluster has no replicas?) I think we should just directly have the controller report it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but if we have several clusters, we will still have to make several queries
Right, that would be annoying.
I think that having dataflow dependencies be a controller-level collection makes a lot of sense.
This is already what we do to get a global view of the frontier information, so doing the same thing for dependency information would only make sense!
I wonder if this would make the replica-exported introspection collections redundant, since they contain the same information, just harder to reach.
* For the flamegraph views: same as the hierarchical dataflow | ||
visualizer | ||
* For the per-query metadata: New table to be created | ||
(`mz_query_metadata`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just mz_queries
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, thanks
## Testing and observability | ||
|
||
(TBD -- will cover this in our meeting with Robin tomorrow and get an | ||
overview from him of the testing strategy for this kind of feature) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just flagging that we'll want to make sure that we have good continuous testing for the features described here, as most of them rely either on our unstable introspection sources or our unstable EXPLAIN output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I haven't filled out this section yet because I'm not sure what frameworks/tools are used for UI testing now, but we definitely want good end-to-end test coverage.
# Conclusion and alternatives | ||
[conclusion-and-alternatives]: #conclusion-and-alternatives | ||
|
||
- I am unaware of any other possible designs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to the current mz_query_metadata
design would be letting users specify the queries to record somehow, e.g. by users. That would allow them to control the amount of data they would need to store for the query log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cockroach does both! It has a "here's the explain plans of things that have happened in the last hour" and also a button in their UI that records the full execution plan results of the next time that exact statement is run.
it acceptable? | ||
* How can we communicate query IDs back to the user for looking up the | ||
query in the per-query metadata view? Is it acceptable to use | ||
`NOTICE` messages in this case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If mz_query_metadata
contains the query SQL and can be filtered by user and ordered by time, maybe that's sufficient for users to find their queries already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably sufficient if you filter by user.
However, now I'm wondering if there is a privacy concern. If we store query metadata for all users in the same table, then anyone can query it and see what other users are doing. To give a Unix analogy, it's a bit like if your shell $HISTFILE
was user-readable. We should check with Graham about whether this is acceptable.
I like where this is headed, and will read with more thoughtful comments later today. But I wanted to drop in one "principle" I guess that I've been trying to follow: we should surface any introspectable data at the highest level of abstraction for which it exists. So, e.g. the controller should surface anything that it is certain of and can express back to its clients. E.g. what are the dependencies among ids, what are their read and write frontiers, what clusters exist, which replicas exist, which dataflows are installed on which clusters, .. all these things should be in-bounds for introspection, and ideally through the controller rather than through a replica. Replicas are then where we would find replica-specific stuff, like what did the timely instance report about its structure, how long are operators taking to run, arrangement size, messages transmitted, etc. |
See rendered doc for details.