Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize handling of impure functions #10445

Closed
benesch opened this issue Feb 4, 2022 · 5 comments · Fixed by #10449
Closed

Standardize handling of impure functions #10445

benesch opened this issue Feb 4, 2022 · 5 comments · Fixed by #10449
Labels
A-optimization Area: query optimization and transformation A-sql Area: SQL planning C-feature Category: new feature or request

Comments

@benesch
Copy link
Member

benesch commented Feb 4, 2022

Feature request

Today in Materialize we allow some impure functions: i.e., functions whose results are not wholly determined by their inputs.

Most impure functions are easy to spot because they take no arguments. Common examples include:

  • now()
  • current_user()
  • current_database()

We're fairly inconsistent in what we allow you to do with these functions. You can always use them outside of a view:

materialize=> select now(), current_user(), current_database();
            now             | current_user | current_database 
----------------------------+--------------+------------------
 2022-02-04 12:04:47.055+00 | materialize  | materialize
(1 row)

You can use current_user() and current_database() in a view:

materialize=> create view weird as select current_user(), current_database();
CREATE VIEW
materialize=> select * from weird;
 current_user | current_database 
--------------+------------------
 materialize  | materialize

But using now() in a view is banned:

materialize=> create view now as select now();
ERROR:  Cannot call function now(): now cannot be used in static queries; see: https://materialize.com/docs/sql/functions/now_and_mz_logical_timestamp/

This doesn't make a whole lot of sense. Look what happens if you query weird after a restart:

materialize=> select * from weird;
 current_user | current_database 
--------------+------------------
 mz_system    | materialize

It's changed!

I propose that instead we allow all impure functions in views, and instead reject creating indexes on any view that references an impure function, whether directly or via another view. This would fix:

  • downstream tools like materialize-dbt-utils that want to create non-materialized views that call now()
  • our own pg_catalog views, which want to filter based on current_database() and perhaps current_user() (sql: pg_catalog views do not limit to the current database #10391)
  • the current_user() changing to mz_system after a restart
@benesch benesch added C-feature Category: new feature or request A-sql Area: SQL planning A-optimization Area: query optimization and transformation labels Feb 4, 2022
@benesch
Copy link
Member Author

benesch commented Feb 4, 2022

@MaterializeInc/dataflow-optimization any qualms with the proposal here?

@aalexandrov
Copy link
Contributor

Not in theory.

I will need to dig deeper and scope out what work needs to be done so we can have all checks in place.

To recap, what you are proposing is to allow impure functions in CREATE VIEW statements for the purpose of being able to reference those views in one-off SELECT queries and other CREATE VIEW statements. What we need to do as part of the compilation process is to reject use of such views in CREATE MATERIALIZED VIEW statements. Is that correct?

@benesch
Copy link
Member Author

benesch commented Feb 4, 2022

Precisely! It'll look quite similar to mz_logical_timestamp, I think. The SQL planner will emit e.g. NullaryFunc::CurrentUser. Then it's on the coordinator to swap that for the actual current user, if executing a one-off query, or to reject the query entirely, if creatin ga materialization.

I'm not entirely certain, but it's possible that this all gets transformed away before the optimizer ever sees it.

@aalexandrov aalexandrov added this to Needs Triage in Compute via automation Feb 4, 2022
@aalexandrov aalexandrov added the needs-discussion Requires discussion before work proceeds label Feb 4, 2022
@aalexandrov
Copy link
Contributor

Marking this with needs-discussion and adding this to the "Optimization and Dataflow" project so we can triage this in the next refinement session. We will probably create a sub-issue tracked here of our part of the work.

@mjibson
Copy link
Contributor

mjibson commented Feb 4, 2022

This SGTM.

@aalexandrov aalexandrov removed the needs-discussion Requires discussion before work proceeds label Feb 14, 2022
@aalexandrov aalexandrov removed this from Needs Triage in Compute Feb 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-optimization Area: query optimization and transformation A-sql Area: SQL planning C-feature Category: new feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants