New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: Cascading Zone Configs #30426
rfc: Cascading Zone Configs #30426
Conversation
e37202e
to
fc768a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only skimmed this so far, but it looks directionally good.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20180919_cascading_zone_configs.md, line 91 at r1 (raw file):
by the Zone hierarchy above until they find a config with the field explicitly set. After [adding caching], we must ensure the Zone configs are fully set before cached.
Didn't Spencer's caching PR just land? You're thinking of caching the fully resolved zone config, so its probably worth calling out that the worst case chain of inheritance won't be a performance concern.
fc768a3
to
78dde63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good!
Reviewed 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20180919_cascading_zone_configs.md, line 91 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Didn't Spencer's caching PR just land? You're thinking of caching the fully resolved zone config, so its probably worth calling out that the worst case chain of inheritance won't be a performance concern.
I believe Ridwan's plan, to start, is to cache fully resolved database and table zone configs, but partial index/partition zone configs. So there's still a bit of a perf hit for subzones.
docs/RFCS/20180919_cascading_zone_configs.md, line 22 at r2 (raw file):
zone config by default and so also has a replication factor of `3`. Now the user updates the replication factor of the database to `5` but this change does not apply to the table as the change doesn't cascade down.
I think you're missing a step here. The user needs to install a zone config for the table that sets some field other than num_replicas. If there's no zone config for t, then the update to the default num_replicas will propagate.
docs/RFCS/20180919_cascading_zone_configs.md, line 81 at r2 (raw file):
Unless explicitly set, the fields in the zone config for each zone/subzone would inherit the value from its parent recursively. The `.default` fields are always explicitly set.
It's worth being aware that there are actually two defaults. The first is installed into system.zones
when the cluster is bootstrapped—i.e., the moment the cluster comes into existence. The second is in-memory in the binary itself. If the default entry in system.zones
is deleted, then the in-memory default is used instead. Ending up in this state is difficult—you have to manually run a DELETE FROM system.zones
—but it's been seen in the wild. The new cascading should continue to do something sensible if the default in system.zones
is missing or incomplete.
docs/RFCS/20180919_cascading_zone_configs.md, line 132 at r2 (raw file):
This change is acceptable (maybe shouldn't be?) as the inherited replication factor is 5. However, now it must become illegal for the replication factor of the database to be set to less than 5 (As that change would propagate to the table making its zone config invalid)
I think this is one of the biggest questions in this RFC. Do we want to validate a parent zone against all of its children, or do we want to live with the fact that it's possible to install an invalid zone config? I'm leaning towards the latter, but I'm curious what other folks think.
docs/RFCS/20180919_cascading_zone_configs.md, line 145 at r2 (raw file):
## Additional Considerations If a table is a `SubzonePlaceholder`, all of it's fields must be unset except for `NumReplicas`
nit: "its"
docs/RFCS/20180919_cascading_zone_configs.md, line 158 at r2 (raw file):
What is the performance penalty of making the `ZoneConfig` fields nullable? The fields become pointers, but how much does the extra heap allocations affect performance? This needs to be measured.
Now that we have caching, I doubt it'll be noticeable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice writeup, @ridwanmsharif!
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20180919_cascading_zone_configs.md, line 22 at r2 (raw file):
Previously, benesch (Nikhil Benesch) wrote…
I think you're missing a step here. The user needs to install a zone config for the table that sets some field other than num_replicas. If there's no zone config for t, then the update to the default num_replicas will propagate.
Right, the copying of zone config from parent to child happens when the child's zone config is modified, not when the child itself (i.e. the table) is created.
docs/RFCS/20180919_cascading_zone_configs.md, line 132 at r2 (raw file):
Previously, benesch (Nikhil Benesch) wrote…
I think this is one of the biggest questions in this RFC. Do we want to validate a parent zone against all of its children, or do we want to live with the fact that it's possible to install an invalid zone config? I'm leaning towards the latter, but I'm curious what other folks think.
I disagree, I don't think we should live with that fact. Zone config changes are infrequent and important enough that we should have guard rails of some sort to prevent potentially catastrophic mistakes. Validating zone config changes against all children sounds pretty annoying to implement, but I think would be worth it.
Or alternatively, as @ridwanmsharif suggests for this case it'd be possible to prevent such broken configs by not allowing per-replica constraints without also explicitly setting the number of replicas in the same zone. If it's possible to avoid all such mistakes in that way, using localized requirements, I think that would be acceptable to most users. And as far as I can tell, the only real dependencies between different fields in the config are (1) this and (2) RangeMinBytes
must be less than RangeMaxBytes
. Forcing users to set both of those in tandem seems more annoying than for NumReplicas
and per-replica constraints, but it seems better than the brokenness that accidentally having RangeMinBytes >=
RangeMaxBytes` could cause.
docs/RFCS/20180919_cascading_zone_configs.md, line 155 at r2 (raw file):
# Drawbacks # Unresolved Questions
I think this RFC is also missing a story for how the user can unset a field in a zone such that it starts being inherited from its parent again.
docs/RFCS/20180919_cascading_zone_configs.md, line 158 at r2 (raw file):
Previously, benesch (Nikhil Benesch) wrote…
Now that we have caching, I doubt it'll be noticeable.
Yeah, with caching in place I'd be pretty curious about this as well. Zone configs are almost never changed in real clusters. Although being more explicit about it with booleans indicating which fields were set by the user has a nice clarity to it that could prevent future coding mistakes.
78dde63
to
bba995d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20180919_cascading_zone_configs.md, line 91 at r1 (raw file):
Previously, benesch (Nikhil Benesch) wrote…
I believe Ridwan's plan, to start, is to cache fully resolved database and table zone configs, but partial index/partition zone configs. So there's still a bit of a perf hit for subzones.
Yes, currently there's still going to be a lookup for subzone
s inside a cached table zone. We could also cache the subzone
s as mentioned later in the RFC.
docs/RFCS/20180919_cascading_zone_configs.md, line 22 at r2 (raw file):
Previously, a-robinson (Alex Robinson) wrote…
Right, the copying of zone config from parent to child happens when the child's zone config is modified, not when the child itself (i.e. the table) is created.
Done.
docs/RFCS/20180919_cascading_zone_configs.md, line 132 at r2 (raw file):
Previously, a-robinson (Alex Robinson) wrote…
I disagree, I don't think we should live with that fact. Zone config changes are infrequent and important enough that we should have guard rails of some sort to prevent potentially catastrophic mistakes. Validating zone config changes against all children sounds pretty annoying to implement, but I think would be worth it.
Or alternatively, as @ridwanmsharif suggests for this case it'd be possible to prevent such broken configs by not allowing per-replica constraints without also explicitly setting the number of replicas in the same zone. If it's possible to avoid all such mistakes in that way, using localized requirements, I think that would be acceptable to most users. And as far as I can tell, the only real dependencies between different fields in the config are (1) this and (2)
RangeMinBytes
must be less thanRangeMaxBytes
. Forcing users to set both of those in tandem seems more annoying than forNumReplicas
and per-replica constraints, but it seems better than the brokenness that accidentally havingRangeMinBytes >=
RangeMaxBytes` could cause.
I don't think for something as infrequent as zone config changes, having invalid ones being installed is the best approach either. Validation is something we should do if we're not too worried about the time penalty validating every child recursively would result in. The hacky solution of requiring NumReplicas
when a constraint
is set and similarly for RangeMinBytes
and RangeMaxBytes
, is one I'm fine with for now, that is if the validating children option is one we're not going to try.
docs/RFCS/20180919_cascading_zone_configs.md, line 155 at r2 (raw file):
Previously, a-robinson (Alex Robinson) wrote…
I think this RFC is also missing a story for how the user can unset a field in a zone such that it starts being inherited from its parent again.
Done. What do think the sytax on the users end for unsetting a value should look like @a-robinson?
Very much a WIP. Modifies the `ZoneConfig` struct to make fields optional now. If fields are not set, they are looked for in the heirarchy. Refer to the RFC for more details. Issue: cockroachdb#28901 Implements most of RFC: cockroachdb#30426 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20180919_cascading_zone_configs.md, line 132 at r2 (raw file):
If you want to try implementing the validation of children, I won't stop you :) It's strictly better behavior from a user's perspective assuming its error messages are understandable. Just be aware it'll take significantly more time and effort to get right.
If you have 1k tables, that's potentially tens of thousands of zone configs. That's going to make changing the default zone config quite painful. I know we don't have many clusters of that scale today, but I think it'd be a shame to introduce that bottleneck. @bdarnell do you have thoughts?
docs/RFCS/20180919_cascading_zone_configs.md, line 155 at r2 (raw file):
Previously, knz (kena) wrote…
I am ok with using yaml nulls for this purpose, but I'd like to point out that SQL has another idiomatic way to do this, already exemplified in other ALTER statements (e.g. to remove constraints on columns): DROP
ALTER ... CONFIGURE ZONE DROP range_min_bytes
Yeah, the cockroach zone
command is deprecated in v2.1 so I'm hopeful we can remove it entirely in v2.2. Then we don't need to worry about YAML at all. I like your suggested SQL a lot, Raphael.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20180919_cascading_zone_configs.md, line 132 at r2 (raw file):
Previously, benesch (Nikhil Benesch) wrote…
If you want to try implementing the validation of children, I won't stop you :) It's strictly better behavior from a user's perspective assuming its error messages are understandable. Just be aware it'll take significantly more time and effort to get right.
If you have 1k tables, that's potentially tens of thousands of zone configs. That's going to make changing the default zone config quite painful. I know we don't have many clusters of that scale today, but I think it'd be a shame to introduce that bottleneck. @bdarnell do you have thoughts?
It doesn't bother me too much if changing the default zone config has costs proportional to the number of zone configs in the system, although it would be best if that could be avoided. I agree that the best thing to do here is to treat the number of replicas and the constraints as a unit. (This is how zone configs originally worked: there wasn't an independent num_replicas setting, you would ask for three replicas by setting something like constraints: [[], [], []]
. We added the num_replicas
setting because it's much easier for the user to understand, although maybe we should decouple the user-facing config format from the underlying representation).
Very much a WIP. Modifies the `ZoneConfig` struct to make fields optional now. If fields are not set, they are looked for in the heirarchy. Refer to the RFC for more details. Issue: cockroachdb#28901 Implements most of RFC: cockroachdb#30426 Release note: None
bba995d
to
d8b0191
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20180919_cascading_zone_configs.md, line 132 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
It doesn't bother me too much if changing the default zone config has costs proportional to the number of zone configs in the system, although it would be best if that could be avoided. I agree that the best thing to do here is to treat the number of replicas and the constraints as a unit. (This is how zone configs originally worked: there wasn't an independent num_replicas setting, you would ask for three replicas by setting something like
constraints: [[], [], []]
. We added thenum_replicas
setting because it's much easier for the user to understand, although maybe we should decouple the user-facing config format from the underlying representation).
We're in agreement then. Treating constraints and replication factor as a unit, and range min and max bytes as another, is an acceptable way to deal with this problem.
docs/RFCS/20180919_cascading_zone_configs.md, line 5 at r3 (raw file):
Previously, a-robinson (Alex Robinson) wrote…
This is usually meant to point to the PR in which the RFC was discussed, i.e. this one (#30426), in case anyone wants to look up the discussion that happened around the design decisions.
Done.
Very much a WIP. Modifies the `ZoneConfig` struct to make fields optional now. If fields are not set, they are looked for in the heirarchy. Refer to the RFC for more details. Issue: cockroachdb#28901 Implements most of RFC: cockroachdb#30426 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
docs/RFCS/20180919_cascading_zone_configs.md, line 155 at r2 (raw file):
Previously, benesch (Nikhil Benesch) wrote…
Yeah, the
cockroach zone
command is deprecated in v2.1 so I'm hopeful we can remove it entirely in v2.2. Then we don't need to worry about YAML at all. I like your suggested SQL a lot, Raphael.
SGTM. The SQL command still has a version that accepts YAML, but that version doesn't have to support unsetting a field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained
docs/RFCS/20180919_cascading_zone_configs.md, line 81 at r2 (raw file):
Previously, benesch (Nikhil Benesch) wrote…
It's worth being aware that there are actually two defaults. The first is installed into
system.zones
when the cluster is bootstrapped—i.e., the moment the cluster comes into existence. The second is in-memory in the binary itself. If the default entry insystem.zones
is deleted, then the in-memory default is used instead. Ending up in this state is difficult—you have to manually run aDELETE FROM system.zones
—but it's been seen in the wild. The new cascading should continue to do something sensible if the default insystem.zones
is missing or incomplete.
I concur, this needs to be addressed. it's not even "difficult": it happens in tests! I had changed this inadvertently (to ignore the root case) and CI was not happy.
docs/RFCS/20180919_cascading_zone_configs.md, line 132 at r2 (raw file):
Previously, ridwanmsharif (Ridwan Sharif) wrote…
We're in agreement then. Treating constraints and replication factor as a unit, and range min and max bytes as another, is an acceptable way to deal with this problem.
Maybe add a note in the RFC with a summary of the discussion and the outcome.
d8b0191
to
f5661d0
Compare
Adds an RFC to discuss proposals and design for cascading zone configs. Discusses changes to the `ZoneConfig` structure, why it is needed, the different ways we could change it and designs what the change should look like Release note: None
f5661d0
to
f59e534
Compare
bors r+ |
30426: rfc: Cascading Zone Configs r=ridwanmsharif a=ridwanmsharif Adds an RFC to discuss proposals and design for cascading zone configs. Discusses changes to the `ZoneConfig` structure, why it is needed, the different ways we could change it and designs what the change should look like Issue: #28901 Release note: None Co-authored-by: Ridwan Sharif <ridwan@cockroachlabs.com>
Build succeeded |
Very much a WIP. Modifies the `ZoneConfig` struct to make fields optional now. If fields are not set, they are looked for in the heirarchy. Refer to the RFC for more details. Issue: cockroachdb#28901 Implements most of RFC: cockroachdb#30426 Release note: None
30611: sql, config: make zone configs cascade r=ridwanmsharif a=ridwanmsharif Modifies the `ZoneConfig` struct to make fields optional now. If fields are not set, they are looked for in the heirarchy. Refer to the RFC for more details. Issue: #28901 Implements most of RFC: #30426 TODO: - [x] Update `set_zone_config` to use the new structure and validate them - [x] Exhaustively test cascading zone configs Release note: Changes behavior of Zone Configs. Co-authored-by: Ridwan Sharif <ridwan@cockroachlabs.com>
Adds an RFC to discuss proposals and design for cascading zone configs.
Discusses changes to the
ZoneConfig
structure, why it is needed, thedifferent ways we could change it and designs what the change should
look like
Issue: #28901
Release note: None