conditionally create mz_system and mz_probe cluster replicas#31452
conditionally create mz_system and mz_probe cluster replicas#31452SangJunBak merged 4 commits intomainfrom
Conversation
d584a05 to
067349a
Compare
…lication factor This change allows configurable replication factors for builtin clusters during bootstrap. This will be useful for disabling certain clusters for self managed while also not breaking any of our test infra.
067349a to
2381138
Compare
2381138 to
1369e7f
Compare
|
@jubrad Unable to run . I'm guessing it's because I added |
1369e7f to
dba9f12
Compare
dba9f12 to
6993b53
Compare
jubrad
left a comment
There was a problem hiding this comment.
I'm not sure about the changes in src/orchestratord/src/controller/materialize/environmentd.rs
and some of the comments need to be cleaned up.
6993b53 to
4e4e92a
Compare
4e4e92a to
3a76dd4
Compare
src/adapter/src/catalog/open.rs
Outdated
| let cluster_config = match cluster_name { | ||
| name if name == mz_catalog::builtin::MZ_SYSTEM_CLUSTER.name => &self.system_cluster, | ||
| name if name == mz_catalog::builtin::MZ_CATALOG_SERVER_CLUSTER.name => { | ||
| &self.catalog_server_cluster | ||
| } | ||
| name if name == mz_catalog::builtin::MZ_PROBE_CLUSTER.name => &self.probe_cluster, | ||
| name if name == mz_catalog::builtin::MZ_SUPPORT_CLUSTER.name => &self.support_cluster, | ||
| name if name == mz_catalog::builtin::MZ_ANALYTICS_CLUSTER.name => { | ||
| &self.analytics_cluster | ||
| } | ||
| _ => { | ||
| return Err(mz_catalog::durable::CatalogError::Catalog( | ||
| SqlCatalogError::UnexpectedBuiltinCluster(cluster_name.to_owned()), | ||
| )) | ||
| } | ||
| }; |
There was a problem hiding this comment.
nit: it feels like this would be better written as if-else statements? If you want to prevent all of the Ok wrapping you can do:
let cluster_config = if cluster_name == "foo" {
&self.foo_cluster
} else {
return ...
}
| pub const SUPPORT_CLUSTER_DEFAULT_REPLICATION_FACTOR: u32 = 0; | ||
| pub const ANALYTICS_CLUSTER_DEFAULT_REPLICATION_FACTOR: u32 = 0; |
There was a problem hiding this comment.
While you're here, can you add a comment explaining why these have a default of 0? e.g. that they are ephemeral clusters we spin up only to scrape analytics or for support when debugging
| long, | ||
| env = "BOOTSTRAP_BUILTIN_SYSTEM_CLUSTER_REPLICATION_FACTOR", | ||
| default_value = SYSTEM_CLUSTER_DEFAULT_REPLICATION_FACTOR.to_string(), | ||
| value_parser = clap::value_parser!(u32).range(0..=1) |
There was a problem hiding this comment.
nit: I would maybe allow 0..=2 for these ranges, seems like it could maybe be useful in the future
| builtin_system_cluster_config: BootstrapBuiltinClusterConfig { | ||
| size: replica_size.clone(), | ||
| replication_factor: SYSTEM_CLUSTER_DEFAULT_REPLICATION_FACTOR, | ||
| }, | ||
| builtin_catalog_server_cluster_config: BootstrapBuiltinClusterConfig { | ||
| size: replica_size.clone(), | ||
| replication_factor: CATALOG_SERVER_CLUSTER_DEFAULT_REPLICATION_FACTOR, | ||
| }, | ||
| builtin_probe_cluster_config: BootstrapBuiltinClusterConfig { | ||
| size: replica_size.clone(), | ||
| replication_factor: PROBE_CLUSTER_DEFAULT_REPLICATION_FACTOR, | ||
| }, | ||
| builtin_support_cluster_config: BootstrapBuiltinClusterConfig { | ||
| size: replica_size.clone(), | ||
| replication_factor: SUPPORT_CLUSTER_DEFAULT_REPLICATION_FACTOR, | ||
| }, | ||
| builtin_analytics_cluster_config: BootstrapBuiltinClusterConfig { | ||
| size: replica_size.clone(), | ||
| replication_factor: ANALYTICS_CLUSTER_DEFAULT_REPLICATION_FACTOR, | ||
| }, |
There was a problem hiding this comment.
Specifying these 5 fields all the time feels brittle, e.g. easy to mix up the config values for two different clusters. A fix might be to use newtypes, e.g.
pub struct SystemClusterReplicationFactor(usize);
pub struct CatalogServerClusterReplicationFactor(usize);
...
But that seems quite tedious too. If you feel inspired thinking about how we can make this more succinct might be nice, but definitely not blocking!
There was a problem hiding this comment.
Specifying these 5 fields all the time feels brittle
hmm my Rust knowledge is kinda capped here! Like the cluster_config variables ? Would that require a new BootstrapBuiltinClusterConfig struct per <builtin cluster>ClusterReplicationFactor(usize)? Or could you do something like:
...
builtin_analytics_cluster_config: BootstrapBuiltinClusterConfig {
size: replica_size.clone(),
replication_factor: AnalyticsClusterReplicationFactor (ANALYTICS_CLUSTER_DEFAULT_REPLICATION_FACTOR),
},
If we had to make a new ...ClusterConfig struct per builtin cluster, i kinda feel like it makes each config less generic which might be bad? Might be helpful to go over this in person since I'm genuinely curious!
| config | ||
| .bootstrap_builtin_system_cluster_replication_factor | ||
| .as_ref() | ||
| .map(|replication_factor| { | ||
| format!("--bootstrap-builtin-system-cluster-replication-factor={replication_factor}") | ||
| }), | ||
| config | ||
| .bootstrap_builtin_probe_cluster_replication_factor | ||
| .as_ref() | ||
| .map(|replication_factor| format!("--bootstrap-builtin-probe-cluster-replication-factor={replication_factor}")), | ||
| config | ||
| .bootstrap_builtin_support_cluster_replication_factor | ||
| .as_ref() | ||
| .map(|replication_factor| format!("--bootstrap-builtin-support-cluster-replication-factor={replication_factor}")), | ||
| config | ||
| .bootstrap_builtin_analytics_cluster_replication_factor | ||
| .as_ref() | ||
| .map(|replication_factor| format!("--bootstrap-builtin-analytics-cluster-replication-factor={replication_factor}")), |
There was a problem hiding this comment.
Missing a config for the catalog_server replication factor, maybe that's intentional? It seems like it would be nice to have parity for all system clusters here
There was a problem hiding this comment.
Per Justin:
I think we should only include these options where it wouldn't be a complete footgun to set it to a value other than 1.
And I kinda agree with him here!
- modifies the allowed replication factor range for system clusters from 0-1 to 0-2, providing more flexibility in cluster configuration. - the code for retrieving builtin cluster configurations has been refactored to use a more concise if-else structure - Adds a comments for builtin clusters with replication factor 0
…lizeInc#31452) See commit messages for details <!-- Describe the contents of the PR briefly but completely. If you write detailed commit messages, it is acceptable to copy/paste them here, or write "see commit messages for details." If there is only one commit in the PR, GitHub will have already added its commit message above. --> * This PR adds a known-desirable feature. MaterializeInc/database-issues#8954 <!-- Which of the following best describes the motivation behind this PR? * This PR fixes a recognized bug. [Ensure issue is linked somewhere.] [Ensure issue is linked somewhere.] * This PR fixes a previously unreported bug. [Describe the bug in detail, as if you were filing a bug report.] * This PR adds a feature that has not yet been specified. [Write a brief specification for the feature, including justification for its inclusion in Materialize, as if you were writing the original feature specification.] * This PR refactors existing code. [Describe what was wrong with the existing code, if it is not obvious.] --> <!-- Leave some tips for your reviewer, like: * The diff is much smaller if viewed with whitespace hidden. * [Some function/module/file] deserves extra attention. * [Some function/module/file] is pure code movement and only needs a skim. Delete this section if no tips. --> - [ ] This PR has adequate test coverage / QA involvement has been duly considered. ([trigger-ci for additional test/nightly runs](https://trigger-ci.dev.materialize.com/)) - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design. <!-- Reference the design in the description. --> - [ ] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](MaterializeInc/cloud#5021)). <!-- Ask in #team-cloud on Slack if you need help preparing the cloud PR. --> - [ ] If this PR includes major [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note), I have pinged the relevant PM to schedule a changelog post.
…lizeInc#31452) See commit messages for details <!-- Describe the contents of the PR briefly but completely. If you write detailed commit messages, it is acceptable to copy/paste them here, or write "see commit messages for details." If there is only one commit in the PR, GitHub will have already added its commit message above. --> * This PR adds a known-desirable feature. MaterializeInc/database-issues#8954 <!-- Which of the following best describes the motivation behind this PR? * This PR fixes a recognized bug. [Ensure issue is linked somewhere.] [Ensure issue is linked somewhere.] * This PR fixes a previously unreported bug. [Describe the bug in detail, as if you were filing a bug report.] * This PR adds a feature that has not yet been specified. [Write a brief specification for the feature, including justification for its inclusion in Materialize, as if you were writing the original feature specification.] * This PR refactors existing code. [Describe what was wrong with the existing code, if it is not obvious.] --> <!-- Leave some tips for your reviewer, like: * The diff is much smaller if viewed with whitespace hidden. * [Some function/module/file] deserves extra attention. * [Some function/module/file] is pure code movement and only needs a skim. Delete this section if no tips. --> - [ ] This PR has adequate test coverage / QA involvement has been duly considered. ([trigger-ci for additional test/nightly runs](https://trigger-ci.dev.materialize.com/)) - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design. <!-- Reference the design in the description. --> - [ ] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](MaterializeInc/cloud#5021)). <!-- Ask in #team-cloud on Slack if you need help preparing the cloud PR. --> - [ ] If this PR includes major [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note), I have pinged the relevant PM to schedule a changelog post.
See commit messages for details
Motivation
https://github.com/MaterializeInc/database-issues/issues/8954
Tips for reviewer
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.