New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add REFRESH options for MVs -- parsing, purification, planning #23870
Add REFRESH options for MVs -- parsing, purification, planning #23870
Conversation
MitigationsCompleting required mitigations increases Resilience Coverage.
Risk Summary:The pull request carries a high risk score of 80, which is significantly influenced by the predictors: the sum of bug reports in the files changed and the delta of executable lines. Historically, pull requests with these characteristics are 128% more likely to introduce a bug compared to the repository baseline. Additionally, there are 13 files modified that have shown a high frequency of bug fixes recently, which further contributes to the risk. The repository's predicted bug trend indicates a decrease but with a recent increase in riskier pull requests, while the observed bug trend shows a general decrease with occasional spikes in bug fixes. Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity. Bug Hotspots:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here looks really quite solid. I left you a bunch of comments around parser structure and error message style that should be straightforward to fix and don't fundamentally alter the nature of the PR.
There are two harder comments:
- I'd love to try to simplify the purification plumbing—specifically, reusing the existing
purify_statement
method to avoid introducing materialized view-specific public API to thesql
crate. ISTM this might be straightforward by not caring about non-EpochMillis timelines. Holler if that's not the case. - Seeing Joe's comment about DST with intervals and your comment about monthly intervals makes me nervous about supporting
REFRESH EVERY
. I think we should have a quick discussion on the Slack, wherever makes sense, about whether we should change gears and tryREFRESH ON SCHEDULE
instead.
909ca77
to
0814846
Compare
fbf3aa0
to
7ed1020
Compare
WITH option planning previously ignored interval modifiers; at variance from how intervals were planned when used in expressions. This commit refactors the interval planning code into a helper method that is shared by both expression planning and WITH option planning. This commit also introduces a new `TryFromValue` implementation targeting `Interval`, to tee up the planning of the forthcoming `REFRESH` option (MaterializeInc#23870), which wants the interval as an `Interval` rather than as a `Duration`.
WITH option planning previously ignored interval modifiers; at variance from how intervals were planned when used in expressions. This commit refactors the interval planning code into a helper method that is shared by both expression planning and WITH option planning. This commit also introduces a new `TryFromValue` implementation targeting `Interval`, to tee up the planning of the forthcoming `REFRESH` option (MaterializeInc#23870), which wants the interval as an `Interval` rather than as a `Duration`.
I've completely lost track of the details for this PR. Is there any specific questions/things I'm supposed to look at? |
a7260a9
to
ae67d41
Compare
This is ready for a hopefully final round of review. @benesch, could you please check #23870 (comment) ? @jkosh44, I think I addressed all your comments. Thanks a lot for the reviews! (New Nightly run: https://buildkite.com/materialize/nightlies/builds/5828) |
src/expr/src/refresh_schedule.rs
Outdated
|
||
use mz_repr::Timestamp; | ||
use serde::{Deserialize, Serialize}; | ||
use std::time::Duration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually separate the std
imports into their own block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super minor, but I think this is still outstanding!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I resolved it locally, just haven't pushed it yet, because I was waiting for a Nightly.
Edit: Pushed it now.
src/sql/src/plan/statement/ddl.rs
Outdated
// Note: Seeing no REFRESH options at all (not even REFRESH ON COMMIT) should be acceptable: | ||
// even though purification inserts REFRESH ON COMMIT if no other REFRESH option was given, | ||
// we can't rely on this behavior in planning, because this won't happen for old | ||
// materialized views that were created before this feature was introduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't block on this, but .. won't the migration ensure that even old matviews get REFRESH ON COMMIT
before they ever hit planning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, thx, I forgot to delete this comment when I added the migration. I'll delete it before merging the PR.
let timeline_context = | ||
match self.validate_timeline_context(resolved_ids.0.clone()) { | ||
Ok(tc) => tc, | ||
Err(e) => return ctx.retire(Err(e)), | ||
}; | ||
let timeline = match timeline_context { | ||
TimelineContext::TimelineDependent(timeline) => timeline, | ||
TimelineContext::TimestampDependent | ||
| TimelineContext::TimestampIndependent => { | ||
// We default to EpochMilliseconds, similarly to `determine_timestamp_for`. | ||
// Note that we didn't accurately decide whether we are TimestampDependent | ||
// or TimestampIndependent, because for this we'd need to also check whether | ||
// `query.contains_temporal()`, similarly to how `peek_stage_validate` does. | ||
// However, this doesn't matter here, as we are just going to default to | ||
// EpochMilliseconds in both cases. | ||
Timeline::EpochMilliseconds | ||
} | ||
}; | ||
Some(self.get_timestamp_oracle(&timeline).read_ts().await) | ||
// TODO: It might be good to take into account `least_valid_read` in addition to | ||
// the oracle's `read_ts`, but there are two problems: | ||
// 1. At this point, we don't know which indexes would be used. We could do an | ||
// overestimation here by grabbing the ids of all indexes that are on ids | ||
// involved in the query. (We'd have to recursively follow view definitions, | ||
// similarly to `validate_timeline_context`.) | ||
// 2. For a peak, when the `least_valid_read` is later than the oracle's | ||
// `read_ts`, then the peak doesn't return before it completes at the chosen | ||
// timestamp. However, for a CRATE MATERIALIZED VIEW statement, it's not clear | ||
// whether we want to make it block until the chosen time. If it doesn't block, | ||
// then the initial refresh wouldn't be linearized with the CREATE MATERIALIZED | ||
// VIEW statement. | ||
// | ||
// Note: The Adapter is usually keeping a read hold of all objects at the oracle | ||
// read timestamp, so `least_valid_read` usually won't actually be later than | ||
// the oracle's `read_ts`. (see `Coordinator::advance_timelines`) | ||
// | ||
// Note 2: If we choose a timestamp here that is earlier than | ||
// `least_valid_read`, that is somewhat bad, but not catastrophic: The only | ||
// bad thing that happens is that we won't perform that refresh that was | ||
// specified to be at `mz_now()` (which is usually the initial refresh) | ||
// (similarly to how we don't perform refreshes that were specified to be in the | ||
// past). | ||
} else { | ||
None | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the newly-introduced syntax needs to be in a Platform Checks test, so that its preservation across restarts and upgrades can be validated.
@philip-stoev, I'm planning to add more tests later (e.g. with the follow-up pull request that handles the Compute part of the new syntax). I'll make sure to include Platform Checks! Note that the feature is behind a feature flag, which will definitely be disabled for now, because this current PR implements only the Adapter part of the feature. I'd like to merge this current PR as soon as possible, because I've been getting an insane amount of merge conflicts. Edit: Discussed with Philip on slack. |
a478f5f
to
2b6291d
Compare
Will add tests later as explained here: #23870 (comment)
Merging! Thank you very much for all the reviews and discussions! |
This is the parsing, purification, planning of REFRESH options on materialized views. This is the full PR, which includes the Compute part.
Design document.
Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.