fix: Fix how supervisor validation works for cascading reindexing templates in compaction supervisors#19223
Conversation
…s in compaction supervisors
| { | ||
| if (compactionConfig == null) { | ||
| return CompactionConfigValidationResult.failure("Cannot be null"); | ||
| } else if (compactionConfig instanceof CascadingReindexingTemplate) { |
There was a problem hiding this comment.
I think it'd be better to avoid the instanceof by putting the config in control of its own validation, such as by adding an interface method on DataSourceCompactionConfig like:
CompactionConfigValidationResult validate(ClusterCompactionConfig clusterConfig)
Btw, I know this code was pre-existing, but the existing path uses getLatestClusterConfig() to validate. Isn't that strange? It means that whether a CompactionSupervisorSpec is readable or not will depend on the current cluster configuration. If the cluster config changes then a currently-running supervisor could stop having a valid spec. Also, validation happens only on construction of the spec when it's deserialized, even if the cluster configuration later changes in a way that makes a previously-valid spec now-invalid.
Wouldn't it be better to validate when we run the spec in resetCompactionJobQueue, rather than on spec deserialization?
There was a problem hiding this comment.
Wouldn't it be better to validate when we run the spec in resetCompactionJobQueue, rather than on spec deserialization?
hmm this is a good point. I guess one downside is delayed feedback after spec submission 🤔 . And for folks who specify an engine in spec they actually don't depend on the cluster config so kind of sad to wait to get feedback. but probably even worse is to have a spec that was good quietly become invalid if the cluster config changes. so I suppose I've talked myself into moving it
There was a problem hiding this comment.
or maybe we just do it in both spots 😅
There was a problem hiding this comment.
It worries me to have validation dependent on dynamic configuration in the constructor of CompactionSupervisorSpec. What happens if dynamic configuration changes in such a way that a previously-valid spec is now invalid? Will it now fail to show up in the supervisor list API? It will be tough for the user to fix it if they can't see it.
I believe the ideal approach would be to validate on submission (in SupervisorResource#specPost) and at runtime (in resetCompactionJobQueue) but not on construction.
There was a problem hiding this comment.
Did some local testing to get clarity on behavior -
It will not fail to show up. if a spec goes from valid to invalid because of the dynamic config, the next time it is deserialized it will flip to INVALID_SPEC state but still shows up. I'm still taking this suggestion under advisement though. even if it isn't awful UX currently, I think you're on to something in suggesting we rethink the location of validtion
There was a problem hiding this comment.
@gianm ok I tinkered with things to get the validation out of the constructor and into POST time, plus other status/state checks for the supervisor. net it gets it out of only checking during deserialization and now is much more responsive in surfacing config changes that make a spec invalid/valid. before you would need a failover, service restart, etc. to get the status to reflect the change in the supervisor data. now it is pretty rapidly reflective of the actual state
let me know if this feels more in the right direction, thanks!
|
@capistrant @gianm whats the status for this PR? are we ready to merge? |
@cecemei I am working on addressing Gian's latest comments. I was on vacation last week, sorry for the delay. |
…compaction config constructor
gianm
left a comment
There was a problem hiding this comment.
Thanks, latest changes LGTM.
…plates in compaction supervisors (apache#19223) * Fix how supervisor validation works for cascading reindexing templates in compaction supervisors * some refactoring of compaction spec validation * Add a validation check again before creating jobs * Make compaction spec validation more dynamic and fresh compared to previous approach * Fixup a place validation wasn't beging called after removing it from compaction config constructor (cherry picked from commit 0e91c6f)
…plates in compaction supervisors (#19223) (#19334) * Fix how supervisor validation works for cascading reindexing templates in compaction supervisors * some refactoring of compaction spec validation * Add a validation check again before creating jobs * Make compaction spec validation more dynamic and fresh compared to previous approach * Fixup a place validation wasn't beging called after removing it from compaction config constructor (cherry picked from commit 0e91c6f)
Description
Spec validation during spec
POSTAdded
validateSpectoSupervisorSpec. It has a noop default implementation, which follows a similar validation pattern that exists withvalidateSpecUpdateToinSupervisorSpec. We call this when the spec is posted. This gives us the early feedback to client if they try to submit an invalid spec.regular spec validation
In earlier iterations of this validation refactoring, compaction supervisor specs were validated on deserialization (constructor validation). This status could become stale since the spec validity is tied to cluster dynamic config in addition to the spec for compaction. Now we are constantly validating the status of the spec, so if a change to cluster dynamic config change makes a spec invalid, it will be quickly and clearly reflected in the status of the supervisor that is now invalid. in the past the spec would need to be deserialized again for it to reflect a now invalid status.
Cascading reindexing validation changes
The cascading reindexing template doesn't fit well into the existing supervisor validation that exists for the other compaction supervisor types. Ended up refactoring validation of a spec into the compaction config interface itself. the older config types delegate to the same validation that has always been called for them. the cascading config leverages some of the same underlying code but only does a subset of validations due to the more dynamic nature of config creation.
Release note
fixing an issue in a never before released feature so no fix note needed.
We could mention a more correct validation of compaction supervisor specs who are utilizing the default compaction engine configured for the cluster. If the default engine changes making a spec invalid, we will now catch it before creating tasks with the invalid config.
Key changed/added classes in this PR
DataSourceCompactionConfigCatalogDataSourceCompactionConfigInlineSchemaDataSourceCompactionConfigCascadingReindexingTemplateSupervisorResourceSupervisorSpecThis PR has: