Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add REFRESH options for MVs -- parsing, purification, planning #23870

Merged
merged 2 commits into from Jan 3, 2024

Conversation

ggevay
Copy link
Contributor

@ggevay ggevay commented Dec 13, 2023

This is the parsing, purification, planning of REFRESH options on materialized views. This is the full PR, which includes the Compute part.

Design document.

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@ggevay ggevay mentioned this pull request Dec 13, 2023
5 tasks
@ggevay ggevay added the A-ADAPTER Topics related to the ADAPTER layer label Dec 13, 2023
@ggevay ggevay marked this pull request as ready for review December 13, 2023 20:50
@ggevay ggevay requested a review from a team as a code owner December 13, 2023 20:51
@ggevay ggevay requested a review from a team December 13, 2023 20:51
@ggevay ggevay requested review from a team as code owners December 13, 2023 20:51
@ggevay ggevay requested review from ParkMyCar and jkosh44 and removed request for ParkMyCar December 13, 2023 20:51
Copy link

shepherdlybot bot commented Dec 13, 2023

Risk Score:80 / 100 Bug Hotspots:13 Resilience Coverage:60%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag 🔍 Detected
  • (Required) Integration Test 🔍 Detected
  • (Required) Observability
  • (Required) QA Review
  • Unit Test
Risk Summary:

The pull request carries a high risk score of 80, which is significantly influenced by the predictors: the sum of bug reports in the files changed and the delta of executable lines. Historically, pull requests with these characteristics are 128% more likely to introduce a bug compared to the repository baseline. Additionally, there are 13 files modified that have shown a high frequency of bug fixes recently, which further contributes to the risk. The repository's predicted bug trend indicates a decrease but with a recent increase in riskier pull requests, while the observed bug trend shows a general decrease with occasional spikes in bug fixes.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File Percentile
../catalog/builtin_table_updates.rs 95
../src/parser.rs 98
../catalog/state.rs 94
../catalog/migrate.rs 93
../session/vars.rs 98
../src/catalog.rs 100
../src/names.rs 91
../src/catalog.rs 93
../catalog/open.rs 93
../plan/statement.rs 94
../coord/command_handler.rs 94
../src/pure.rs 99
../statement/ddl.rs 97

src/sql-parser/src/ast/defs/statement.rs Outdated Show resolved Hide resolved
src/sql-parser/src/ast/defs/statement.rs Outdated Show resolved Hide resolved
src/sql-parser/src/parser.rs Outdated Show resolved Hide resolved
src/adapter/src/coord/command_handler.rs Outdated Show resolved Hide resolved
src/adapter/src/coord/command_handler.rs Show resolved Hide resolved
src/sql/src/plan/statement/ddl.rs Outdated Show resolved Hide resolved
Copy link
Member

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here looks really quite solid. I left you a bunch of comments around parser structure and error message style that should be straightforward to fix and don't fundamentally alter the nature of the PR.

There are two harder comments:

  1. I'd love to try to simplify the purification plumbing—specifically, reusing the existing purify_statement method to avoid introducing materialized view-specific public API to the sql crate. ISTM this might be straightforward by not caring about non-EpochMillis timelines. Holler if that's not the case.
  2. Seeing Joe's comment about DST with intervals and your comment about monthly intervals makes me nervous about supporting REFRESH EVERY. I think we should have a quick discussion on the Slack, wherever makes sense, about whether we should change gears and try REFRESH ON SCHEDULE instead.

src/repr/src/lib.rs Outdated Show resolved Hide resolved
src/sql-parser/src/ast/defs/statement.rs Outdated Show resolved Hide resolved
test/sqllogictest/materialized_views.slt Outdated Show resolved Hide resolved
src/sql/src/plan/statement/ddl.rs Outdated Show resolved Hide resolved
src/sql/src/plan/statement/ddl.rs Outdated Show resolved Hide resolved
src/sql/src/pure.rs Show resolved Hide resolved
src/sql/src/pure.rs Show resolved Hide resolved
src/sql/src/pure.rs Show resolved Hide resolved
src/sql/src/pure.rs Outdated Show resolved Hide resolved
src/sql/src/plan/statement.rs Show resolved Hide resolved
src/sql/src/plan/statement/ddl.rs Outdated Show resolved Hide resolved
src/sql/src/plan/statement/ddl.rs Outdated Show resolved Hide resolved
src/repr/src/refresh_schedule.rs Outdated Show resolved Hide resolved
@ggevay ggevay force-pushed the refresh-every-parsing-planning branch 7 times, most recently from 909ca77 to 0814846 Compare December 20, 2023 13:42
src/adapter/src/catalog/migrate.rs Outdated Show resolved Hide resolved
src/adapter/src/coord/command_handler.rs Show resolved Hide resolved
src/sql-parser/src/ast/defs/statement.rs Outdated Show resolved Hide resolved
src/sql/src/catalog.rs Outdated Show resolved Hide resolved
src/sql/src/plan/with_options.rs Show resolved Hide resolved
src/sql/src/plan/with_options.rs Show resolved Hide resolved
@ggevay ggevay force-pushed the refresh-every-parsing-planning branch 2 times, most recently from fbf3aa0 to 7ed1020 Compare December 22, 2023 14:29
src/sql-parser/src/ast/defs/value.rs Show resolved Hide resolved
src/sql-parser/src/ast/defs/statement.rs Show resolved Hide resolved
src/sql-parser/src/ast/defs/statement.rs Outdated Show resolved Hide resolved
src/sql/src/plan/with_options.rs Show resolved Hide resolved
src/sql/src/plan/statement/ddl.rs Outdated Show resolved Hide resolved
src/sql/src/pure.rs Show resolved Hide resolved
benesch added a commit to benesch/materialize that referenced this pull request Dec 25, 2023
WITH option planning previously ignored interval modifiers; at variance
from how intervals were planned when used in expressions. This commit
refactors the interval planning code into a helper method that is shared
by both expression planning and WITH option planning.

This commit also introduces a new `TryFromValue` implementation
targeting `Interval`, to tee up the planning of the forthcoming
`REFRESH` option (MaterializeInc#23870), which wants the interval as an `Interval`
rather than as a `Duration`.
benesch added a commit to benesch/materialize that referenced this pull request Dec 25, 2023
WITH option planning previously ignored interval modifiers; at variance
from how intervals were planned when used in expressions. This commit
refactors the interval planning code into a helper method that is shared
by both expression planning and WITH option planning.

This commit also introduces a new `TryFromValue` implementation
targeting `Interval`, to tee up the planning of the forthcoming
`REFRESH` option (MaterializeInc#23870), which wants the interval as an `Interval`
rather than as a `Duration`.
src/sql/src/plan/statement/ddl.rs Outdated Show resolved Hide resolved
src/adapter/src/coord/command_handler.rs Outdated Show resolved Hide resolved
src/adapter/src/coord/command_handler.rs Show resolved Hide resolved
@jkosh44
Copy link
Contributor

jkosh44 commented Dec 26, 2023

I've completely lost track of the details for this PR. Is there any specific questions/things I'm supposed to look at?

@ggevay ggevay force-pushed the refresh-every-parsing-planning branch 5 times, most recently from a7260a9 to ae67d41 Compare January 2, 2024 19:01
@ggevay
Copy link
Contributor Author

ggevay commented Jan 2, 2024

This is ready for a hopefully final round of review.

@benesch, could you please check #23870 (comment) ?

@jkosh44, I think I addressed all your comments.

Thanks a lot for the reviews!

(New Nightly run: https://buildkite.com/materialize/nightlies/builds/5828)


use mz_repr::Timestamp;
use serde::{Deserialize, Serialize};
use std::time::Duration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually separate the std imports into their own block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, fixed!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor, but I think this is still outstanding!

Copy link
Contributor Author

@ggevay ggevay Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I resolved it locally, just haven't pushed it yet, because I was waiting for a Nightly.

Edit: Pushed it now.

src/sql-parser/src/ast/defs/statement.rs Show resolved Hide resolved
src/sql/src/plan/statement/ddl.rs Show resolved Hide resolved
// Note: Seeing no REFRESH options at all (not even REFRESH ON COMMIT) should be acceptable:
// even though purification inserts REFRESH ON COMMIT if no other REFRESH option was given,
// we can't rely on this behavior in planning, because this won't happen for old
// materialized views that were created before this feature was introduced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't block on this, but .. won't the migration ensure that even old matviews get REFRESH ON COMMIT before they ever hit planning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, thx, I forgot to delete this comment when I added the migration. I'll delete it before merging the PR.

Comment on lines 698 to 742
let timeline_context =
match self.validate_timeline_context(resolved_ids.0.clone()) {
Ok(tc) => tc,
Err(e) => return ctx.retire(Err(e)),
};
let timeline = match timeline_context {
TimelineContext::TimelineDependent(timeline) => timeline,
TimelineContext::TimestampDependent
| TimelineContext::TimestampIndependent => {
// We default to EpochMilliseconds, similarly to `determine_timestamp_for`.
// Note that we didn't accurately decide whether we are TimestampDependent
// or TimestampIndependent, because for this we'd need to also check whether
// `query.contains_temporal()`, similarly to how `peek_stage_validate` does.
// However, this doesn't matter here, as we are just going to default to
// EpochMilliseconds in both cases.
Timeline::EpochMilliseconds
}
};
Some(self.get_timestamp_oracle(&timeline).read_ts().await)
// TODO: It might be good to take into account `least_valid_read` in addition to
// the oracle's `read_ts`, but there are two problems:
// 1. At this point, we don't know which indexes would be used. We could do an
// overestimation here by grabbing the ids of all indexes that are on ids
// involved in the query. (We'd have to recursively follow view definitions,
// similarly to `validate_timeline_context`.)
// 2. For a peak, when the `least_valid_read` is later than the oracle's
// `read_ts`, then the peak doesn't return before it completes at the chosen
// timestamp. However, for a CRATE MATERIALIZED VIEW statement, it's not clear
// whether we want to make it block until the chosen time. If it doesn't block,
// then the initial refresh wouldn't be linearized with the CREATE MATERIALIZED
// VIEW statement.
//
// Note: The Adapter is usually keeping a read hold of all objects at the oracle
// read timestamp, so `least_valid_read` usually won't actually be later than
// the oracle's `read_ts`. (see `Coordinator::advance_timelines`)
//
// Note 2: If we choose a timestamp here that is earlier than
// `least_valid_read`, that is somewhat bad, but not catastrophic: The only
// bad thing that happens is that we won't perform that refresh that was
// specified to be at `mz_now()` (which is usually the initial refresh)
// (similarly to how we don't perform refreshes that were specified to be in the
// past).
} else {
None
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, works for me.

Copy link
Contributor

@philip-stoev philip-stoev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the newly-introduced syntax needs to be in a Platform Checks test, so that its preservation across restarts and upgrades can be validated.

@ggevay
Copy link
Contributor Author

ggevay commented Jan 3, 2024

@philip-stoev, I'm planning to add more tests later (e.g. with the follow-up pull request that handles the Compute part of the new syntax). I'll make sure to include Platform Checks! Note that the feature is behind a feature flag, which will definitely be disabled for now, because this current PR implements only the Adapter part of the feature.

I'd like to merge this current PR as soon as possible, because I've been getting an insane amount of merge conflicts.

Edit: Discussed with Philip on slack.

@ggevay ggevay force-pushed the refresh-every-parsing-planning branch from a478f5f to 2b6291d Compare January 3, 2024 11:23
@ggevay ggevay dismissed philip-stoev’s stale review January 3, 2024 11:25

Will add tests later as explained here: #23870 (comment)

@ggevay ggevay enabled auto-merge (rebase) January 3, 2024 11:26
@ggevay
Copy link
Contributor Author

ggevay commented Jan 3, 2024

Merging! Thank you very much for all the reviews and discussions!

@ggevay ggevay merged commit 8f13eb8 into MaterializeInc:main Jan 3, 2024
68 checks passed
@ggevay ggevay mentioned this pull request Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ADAPTER Topics related to the ADAPTER layer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants