Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further restrict mz_introspection cluster #18075

Closed
2 tasks
benesch opened this issue Mar 12, 2023 · 6 comments · Fixed by #18312
Closed
2 tasks

Further restrict mz_introspection cluster #18075

benesch opened this issue Mar 12, 2023 · 6 comments · Fixed by #18312
Assignees

Comments

@benesch
Copy link
Member

benesch commented Mar 12, 2023

We've had quite a bit of confusion about the mz_introspection cluster lately:

  • One customer ran a SELECT count(*) FROM big_source query on their mz_introspection cluster, which ran it out of CPU/memory. This is expected, given its size, but was not obvious to the customer.
  • Our docs mention that mz_introspection has improved performance for introspection queries, but some internal folks worry this is easy to misinterpret.
  • A few folks internally thought that mz_introspection was only usable for querying the system catalog.

We should chart a path forward here that reduces confusion. Here's a proposal: don't allow querying user objects on the mz_introspection cluster. Put another way: all queries that target mz_introspection may only reference objects whose name begins with mz_ or pg_.

This should make it very difficult to abuse the mz_introspection cluster. You can still construct a heinous join out of catalog tables that crashes the cluster, but .. that's adversarial. The proposed restriction will eliminate approximately all accidental misuse of the cluster, I think.

h/t @sjwiesman for the proposal

Open questions:

  • Does this break the dbt adapter? It uses mz_introspection for some metadata queries.
  • Does this break the web UI? I think no, because the web UI only queries metadata at the moment. It seems reasonable to say that if the web UI wants to query user data, it has to ask the user for a cluster. (E.g., if the web UI gets a built-in SQL editor, or just a "preview data in this {source|table|view}" feature.)

If we decide to do this, the sooner the better. This is technically a backwards incompatible change, but one that's ok if we don't break any of the known existing tools.

Work items:

  • Add the restriction to the coordinator.
  • Update documentation.

cc @materializesupport @sjwiesman @jpepin @umanwizard @RobinClowers @necaris @doy-materialize

@benesch
Copy link
Member Author

benesch commented Mar 12, 2023

There were some alternatives proposed, like providing a larger mz_introspection cluster by default, or allowing users to size it up upon request. Wanted to explicitly consider those alternatives below!

  • Providing a larger cluster to everyone is a no-go from ops. It's already too expensive to have this cluster on default. We'd prefer to make this cluster even smaller by default—and perhaps make it autoscale to zero when idle.

  • Allowing users to size up this cluster doesn't strike me as the right approach. We already allow users to create as many "introspection" clusters as they like via CREATE CLUSTER. This one that you get by default is really for our use, in the sense that the web UI can assume that it exists, and also our docs and support folks can assume that it exists—if they need to direct users to run a query that hits the system catalog, for example, they can instruct the user to run the query on mz_introspection, without worrying about disrupting the user's production cluster.

@jpepin
Copy link
Contributor

jpepin commented Mar 12, 2023

This one that you get by default is really for our use, in the sense that the web UI can assume that it exists, and also our docs and support folks can assume that it exists

If this is the central purpose for this cluster, then I agree that locking down its capabilities to avoid accidental misuse while preserving what the front end and support needs is ideal.

My primary concern with the use patterns we've seen against this cluster is that heavy usage from one user can negatively affect the entire organization's ability to interact with the console UI (manifesting in errors and slowness). It sounds like this proposal should mitigate that risk sufficiently.

One last thing to check is that this change wouldn't interfere with any of our observability tooling (I don't think it does, but worth verifying).

@morsapaes
Copy link
Contributor

morsapaes commented Mar 12, 2023

Whatever approach we settle on, I think #17609 would be the most effective way to prevent this situation while also improving the user experience (incl. removing the need for that annotation in SHOW command pages).

Does this break the dbt adapter? It uses mz_introspection for some metadata queries.

The introspection queries dbt runs hit pg_ and mz_, so this wouldn't affect the adapter at all.

@necaris
Copy link
Contributor

necaris commented Mar 13, 2023

This one that you get by default is really for our use, in the sense that the web UI can assume that it exists, and also our docs and support folks can assume that it exists

If this is the central purpose for this cluster, then I agree that locking down its capabilities to avoid accidental misuse while preserving what the front end and support needs is ideal.

Question: is this the central use for this cluster? It's also the one used by default by mzadmin commands and presumably Support needs for inspection, so my instinct is to be wary of locking it down too much. (The alternative I can see is to more widely promote use of mz_system internally, which also makes me nervous!)

@benesch
Copy link
Member Author

benesch commented Mar 13, 2023

This one that you get by default is really for our use, in the sense that the web UI can assume that it exists, and also our docs and support folks can assume that it exists

If this is the central purpose for this cluster, then I agree that locking down its capabilities to avoid accidental misuse while preserving what the front end and support needs is ideal.

Question: is this the central use for this cluster? It's also the one used by default by mzadmin commands and presumably Support needs for inspection, so my instinct is to be wary of locking it down too much. (The alternative I can see is to more widely promote use of mz_system internally, which also makes me nervous!)

Yes, this is the central purpose for this cluster: https://www.notion.so/materialize/System-clusters-26b20f5ecb93457a8164f1bd59650f91?pvs=4#229115f89e95421a943e4dc4cc23e659

@necaris
Copy link
Contributor

necaris commented Mar 13, 2023

Yes, this is the central purpose for this cluster: https://www.notion.so/materialize/System-clusters-26b20f5ecb93457a8164f1bd59650f91?pvs=4#229115f89e95421a943e4dc4cc23e659

If this is the case then I'd be strongly 👍🏽 in favor of limiting mz_introspection as proposed to only target internal tables.

do-not-use-parker-timmerman added a commit that referenced this issue Apr 18, 2023
### Motivation
 * This PR adds a known-desirable feature.
   * Fixes #18075 

This PR restricts what queries can get run on the `mz_introspection`
cluster. Specifically only queries that depend on system tables can get
run. As part of this change, we also update the dbt-adapter to set the
`auto_route_introspection_queries` session var to make sure all of its
introspection queries automatically get run on the mz_introspection
cluster, regardless of what cluster is currently set.

<!--
Leave some tips for your reviewer, like:

    * The diff is much smaller if viewed with whitespace hidden.
    * [Some function/module/file] deserves extra attention.
* [Some function/module/file] is pure code movement and only needs a
skim.

Delete this section if no tips.
-->

### Checklist

- [ ] This PR has adequate test coverage / QA involvement has been duly
considered.
- [ ] This PR has an associated up-to-date [design
doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md),
is a design doc
([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)),
or is sufficiently small to not require a design.
  <!-- Reference the design in the description. -->
- [ ] This PR evolves [an existing `$T ⇔ Proto$T`
mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md)
(possibly in a backwards-incompatible way) and therefore is tagged with
a `T-proto` label.
- [ ] If this PR will require changes to cloud orchestration, there is a
companion cloud PR to account for those changes that is tagged with the
release-blocker label
([example](https://github.com/MaterializeInc/cloud/pull/5021)).
<!-- Ask in #team-cloud on Slack if you need help preparing the cloud
PR. -->
- [ ] This PR includes the following [user-facing behavior
changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note):
- <!-- Add release notes here or explicitly state that there are no
user-facing behavior changes. -->

---------

Co-authored-by: morsapaes <marta.paes.moreira@gmail.com>
Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment