Further restrict `mz_introspection` cluster #18075

benesch · 2023-03-12T04:38:20Z

We've had quite a bit of confusion about the mz_introspection cluster lately:

One customer ran a SELECT count(*) FROM big_source query on their mz_introspection cluster, which ran it out of CPU/memory. This is expected, given its size, but was not obvious to the customer.
Our docs mention that mz_introspection has improved performance for introspection queries, but some internal folks worry this is easy to misinterpret.
A few folks internally thought that mz_introspection was only usable for querying the system catalog.

We should chart a path forward here that reduces confusion. Here's a proposal: don't allow querying user objects on the mz_introspection cluster. Put another way: all queries that target mz_introspection may only reference objects whose name begins with mz_ or pg_.

This should make it very difficult to abuse the mz_introspection cluster. You can still construct a heinous join out of catalog tables that crashes the cluster, but .. that's adversarial. The proposed restriction will eliminate approximately all accidental misuse of the cluster, I think.

h/t @sjwiesman for the proposal

Open questions:

Does this break the dbt adapter? It uses mz_introspection for some metadata queries.
Does this break the web UI? I think no, because the web UI only queries metadata at the moment. It seems reasonable to say that if the web UI wants to query user data, it has to ask the user for a cluster. (E.g., if the web UI gets a built-in SQL editor, or just a "preview data in this {source|table|view}" feature.)

If we decide to do this, the sooner the better. This is technically a backwards incompatible change, but one that's ok if we don't break any of the known existing tools.

Work items:

Add the restriction to the coordinator.
Update documentation.

cc @materializesupport @sjwiesman @jpepin @umanwizard @RobinClowers @necaris @doy-materialize

The text was updated successfully, but these errors were encountered:

benesch · 2023-03-12T04:46:18Z

There were some alternatives proposed, like providing a larger mz_introspection cluster by default, or allowing users to size it up upon request. Wanted to explicitly consider those alternatives below!

Providing a larger cluster to everyone is a no-go from ops. It's already too expensive to have this cluster on default. We'd prefer to make this cluster even smaller by default—and perhaps make it autoscale to zero when idle.
Allowing users to size up this cluster doesn't strike me as the right approach. We already allow users to create as many "introspection" clusters as they like via CREATE CLUSTER. This one that you get by default is really for our use, in the sense that the web UI can assume that it exists, and also our docs and support folks can assume that it exists—if they need to direct users to run a query that hits the system catalog, for example, they can instruct the user to run the query on mz_introspection, without worrying about disrupting the user's production cluster.

jpepin · 2023-03-12T07:29:51Z

This one that you get by default is really for our use, in the sense that the web UI can assume that it exists, and also our docs and support folks can assume that it exists

If this is the central purpose for this cluster, then I agree that locking down its capabilities to avoid accidental misuse while preserving what the front end and support needs is ideal.

My primary concern with the use patterns we've seen against this cluster is that heavy usage from one user can negatively affect the entire organization's ability to interact with the console UI (manifesting in errors and slowness). It sounds like this proposal should mitigate that risk sufficiently.

One last thing to check is that this change wouldn't interfere with any of our observability tooling (I don't think it does, but worth verifying).

morsapaes · 2023-03-12T16:24:24Z

Whatever approach we settle on, I think #17609 would be the most effective way to prevent this situation while also improving the user experience (incl. removing the need for that annotation in SHOW command pages).

Does this break the dbt adapter? It uses mz_introspection for some metadata queries.

The introspection queries dbt runs hit pg_ and mz_, so this wouldn't affect the adapter at all.

necaris · 2023-03-13T15:39:48Z

This one that you get by default is really for our use, in the sense that the web UI can assume that it exists, and also our docs and support folks can assume that it exists

If this is the central purpose for this cluster, then I agree that locking down its capabilities to avoid accidental misuse while preserving what the front end and support needs is ideal.

Question: is this the central use for this cluster? It's also the one used by default by mzadmin commands and presumably Support needs for inspection, so my instinct is to be wary of locking it down too much. (The alternative I can see is to more widely promote use of mz_system internally, which also makes me nervous!)

benesch · 2023-03-13T16:27:03Z

This one that you get by default is really for our use, in the sense that the web UI can assume that it exists, and also our docs and support folks can assume that it exists

If this is the central purpose for this cluster, then I agree that locking down its capabilities to avoid accidental misuse while preserving what the front end and support needs is ideal.

Question: is this the central use for this cluster? It's also the one used by default by mzadmin commands and presumably Support needs for inspection, so my instinct is to be wary of locking it down too much. (The alternative I can see is to more widely promote use of mz_system internally, which also makes me nervous!)

Yes, this is the central purpose for this cluster: https://www.notion.so/materialize/System-clusters-26b20f5ecb93457a8164f1bd59650f91?pvs=4#229115f89e95421a943e4dc4cc23e659

necaris · 2023-03-13T16:29:20Z

Yes, this is the central purpose for this cluster: https://www.notion.so/materialize/System-clusters-26b20f5ecb93457a8164f1bd59650f91?pvs=4#229115f89e95421a943e4dc4cc23e659

If this is the case then I'd be strongly 👍🏽 in favor of limiting mz_introspection as proposed to only target internal tables.

### Motivation * This PR adds a known-desirable feature. * Fixes #18075 This PR restricts what queries can get run on the `mz_introspection` cluster. Specifically only queries that depend on system tables can get run. As part of this change, we also update the dbt-adapter to set the `auto_route_introspection_queries` session var to make sure all of its introspection queries automatically get run on the mz_introspection cluster, regardless of what cluster is currently set.  ### Checklist - [ ] This PR has adequate test coverage / QA involvement has been duly considered. - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design.  - [ ] This PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way) and therefore is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](https://github.com/MaterializeInc/cloud/pull/5021)).  - [ ] This PR includes the following [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note): -  --------- Co-authored-by: morsapaes <marta.paes.moreira@gmail.com> Co-authored-by: Matt Jibson <matt.jibson@gmail.com>

This was referenced Mar 13, 2023

Add way to identify if cluster load is coming from the console UI or end user queries #18094

Closed

[Epic] Improve user experience around clusters, schemas, and ad-hoc selects #18028

Closed

do-not-use-parker-timmerman self-assigned this Mar 20, 2023

This was referenced Mar 20, 2023

coord: restrict queries run on mz_introspection #18276

Closed

adapter: Restrict the queries that can run on mz_introspection #18312

Merged

do-not-use-parker-timmerman closed this as completed in #18312 Apr 18, 2023

ParkMyCar assigned ParkMyCar and unassigned do-not-use-parker-timmerman Apr 24, 2023

jpepin mentioned this issue May 25, 2023

Don't auto-route queries to unhealthy mz_introspection cluster #19482

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further restrict `mz_introspection` cluster #18075

Further restrict `mz_introspection` cluster #18075

benesch commented Mar 12, 2023 •

edited

Loading

benesch commented Mar 12, 2023 •

edited

Loading

jpepin commented Mar 12, 2023

morsapaes commented Mar 12, 2023 •

edited

Loading

necaris commented Mar 13, 2023

benesch commented Mar 13, 2023

necaris commented Mar 13, 2023

Further restrict mz_introspection cluster #18075

Further restrict mz_introspection cluster #18075

Comments

benesch commented Mar 12, 2023 • edited Loading

benesch commented Mar 12, 2023 • edited Loading

jpepin commented Mar 12, 2023

morsapaes commented Mar 12, 2023 • edited Loading

necaris commented Mar 13, 2023

benesch commented Mar 13, 2023

necaris commented Mar 13, 2023

Further restrict `mz_introspection` cluster #18075

Further restrict `mz_introspection` cluster #18075

benesch commented Mar 12, 2023 •

edited

Loading

benesch commented Mar 12, 2023 •

edited

Loading

morsapaes commented Mar 12, 2023 •

edited

Loading