Skip to content

interface: define override apis#646

Closed
zhiying-lin wants to merge 4 commits into
Azure:mainfrom
zhiying-lin:override-api
Closed

interface: define override apis#646
zhiying-lin wants to merge 4 commits into
Azure:mainfrom
zhiying-lin:override-api

Conversation

@zhiying-lin
Copy link
Copy Markdown
Contributor

@zhiying-lin zhiying-lin commented Dec 21, 2023

Description of your changes

Introduced a cluster scope resource "Override" and "ResourceOverride".

Fixes #

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

Special notes for your reviewer

@zhiying-lin zhiying-lin marked this pull request as draft December 21, 2023 09:02
@zhiying-lin zhiying-lin changed the title apis: define override apis interface: define override apis Dec 21, 2023
Comment thread apis/placement/v1beta1/commons.go Outdated
}

// OverrideSpec defines the desired state of the Override.
type OverrideSpec struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to make overridepolicy part of a placement to avoid creating N-M mappings between overrideds & placements.

You can think a placement as an application. Usually overrides are part of the application configuration.

They need to managed by the same developer. As an owner of the app, I definitely don't want others to arbitrarily override my app.

This is also how kubefed does it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

PlacementSpec {
  ...
  OverridePolicies []OverridePolicy 
}

OverridePolicy {
    // Target clusters.
    ClusterNames []string
    (or) ClusterSelector

    // Target resources.
    ClusterResourceSelectors []ClusterResourceSelector
    ResourceSelectors []ResourceSelector

    // Overrides.
    JSONPatchOverrides []JSONPatchOverride
  }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have discussed this option with Ryan and Patrick. We take the current option because of:

  • there is no relationship between CRP and override. The override is applying to the resources(s).
  • the apply result is determined by the resource itself. The value won't be changed if multiple CRPs select the same resource but use different override policies.

Comment thread apis/placement/v1beta1/clusterresourceplacement_types.go Outdated
Comment thread apis/placement/v1beta1/override_types.go Outdated
// If a namespace is selected, ALL the resources under the namespace are selected automatically.
// You can have 1-100 selectors.
// +optional
ClusterResourceSelectors []ClusterResourceSelector `json:"clusterResourceSelectors,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it still make sense to apply the override to all the resources in a namespace?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is to simplify the duplicate override configuration. For example, a user want to add a label to all of the resources under a namespace. In addition, we can keep it consistent with the CRP semantics.

Comment thread apis/placement/v1beta1/override_types.go Outdated
Comment thread apis/placement/v1beta1/override_types.go Outdated
Comment thread apis/placement/v1beta1/work_types.go Outdated

// Overrides defines a list of override applied to the resource before applying to the spoke cluster.
// +required
Overrides []Override `json:"overrides"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to find where "Override" is defined

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the full structure defined in the override_types.go.

I was thinking of populating the object metadata, overridePolicies, applyMode and priority fields for the troubleshooting purpose.
It may be too much and could exceed the 1mb limit of the object.

Now I'm leaning toward creating a new struct to include the object name and its observedGeneration.

Comment thread apis/placement/v1beta1/work_types.go Outdated
Comment on lines +53 to +54
// WorkloadOverrides represents a list of overrides applied to the resources.
WorkloadOverrides []WorkloadOverride `json:"workloadOverrides,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this? Which component is going to apply the overrides?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now hubagent will apply the override and then create the manifest. memberAgent need the applyMode info to know how to apply the resource, eg, createOnly or server-side apply.

In addition, this field provides the troubleshooting information.

@zhiying-lin zhiying-lin force-pushed the override-api branch 3 times, most recently from bf5eb65 to 755c026 Compare January 9, 2024 05:48
@zhiying-lin zhiying-lin force-pushed the override-api branch 6 times, most recently from 9ed13ed to 2e93ddf Compare January 19, 2024 09:35
Comment thread apis/placement/v1beta1/binding_types.go Outdated
Comment on lines +104 to +105
// - "True" means all the resources have been overridden successfully before applying to the target cluster or override
// is not needed if there is no override defined.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there situations that the customer wants to differentiate between these two cases (success vs no-op)?

// +kubebuilder:validation:Enum=CreateOnly;ServerSideApply;ServerSideApplyWithForceConflicts
// +kubebuilder:default=CreateOnly
// +optional
ApplyMode ApplyMode `json:"applyMode,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases that this mode depends on the resources? For example, only forceConflicts on secrets but create only for deployment? I feel like this single field will need to be expand in the future so an envelope struct will probably make it more future proof.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be folded into rollout strategy instead of a first level parameter?

Comment on lines +112 to +124
type ApplyMode string

const (
// ApplyModeCreateOnly creates the resources on the target clusters and will fail if the resources exist on the target
// cluster.
ApplyModeCreateOnly ApplyMode = "CreateOnly"
// ApplyModeServerSideApply applies the resources on the target cluster using server-side apply.
// https://kubernetes.io/docs/reference/using-api/server-side-apply/
ApplyModeServerSideApply ApplyMode = "ServerSideApply"
// ApplyModeServerSideApplyWithForceConflicts applies the resources on the target cluster using server-side apply with
// force-conflicts option. It forces the operation to succeed when resolving the conflicts.
ApplyModeServerSideApplyWithForceConflicts ApplyMode = "ServerSideApplyWithForceConflicts"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is essentially expressing two different controls

  1. server side apply or not
  2. force apply or not
    I wonder if it would make more sense to expand this single mode into a struct so we can add more options in the future?

Comment on lines +85 to +86

ApplyMode ApplyMode `json:"applyMode,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where the actual apply mode can be folded into each resource.

Licensed under the MIT license.
*/

package v1beta1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new apis normally don't start as v1beta1

}

type ClusterResourceOverrideSnapshotSpec struct {
OverrideName string `json:"overrideName"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on this field? Is this different from the label value "ClusterResourceOverrideTrackingLabel"?

// +genclient
// +genclient:nonNamespaced
// +kubebuilder:object:root=true
// +kubebuilder:resource:scope="Cluster",shortName=rss,categories={fleet,fleet-placement}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should be the shortname?

// +kubebuilder:object:root=true
// +kubebuilder:resource:scope="Cluster",shortName=rss,categories={fleet,fleet-placement}
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:JSONPath=`.metadata.generation`,name="Gen",type=string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? Is it immutable?

Comment on lines +25 to +26
// - `ClusterResourceOverrideTrackingLabel` which points to its owner ClusterResourceOverride.
// - `IsLatestSnapshotLabel` which indicates whether the snapshot is the latest one.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a sub-index for multiple snapshot?


// The desired state of ClusterResourceOverrideSpec.
// +required
Spec ClusterResourceOverrideSpec `json:"spec"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any case we want a status field?

type OverriddenResourceSnapshotSpec struct {
TargetCluster string `json:"targetCluster"`

ClusterResourceSnapshotName string `json:"clusterResourceSnapshotName"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure what does the "Cluster" mean in the field name. Will we add a new field "ResourceSnapshotName" if we introduce ResourcePlacemement?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments?

}

type OverriddenResourceSnapshotSpec struct {
TargetCluster string `json:"targetCluster"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments?


ClusterResourceSnapshotName string `json:"clusterResourceSnapshotName"`

ResourceOverrides []OverrideSnapshotIdentifier `json:"resourceOverrides"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we decided to not have priority for now? Are the elements in the array sorted by anything or just random?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, What exactly is this field for?

NumberOfSnapshots int `json:"numberOfSnapshots,omitempty"`

// Workload represents the manifest workload to be deployed on spoke cluster.
Workload WorkloadTemplate `json:"workload,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, most of the fields in the status are actually known when the CR object is created instead of having another controller reconcile based on the spec. Hence, I wonder why those are in the status? I can see condition be part of status just to keep the convention.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should we actually use the work generator to reconcile this object which makes the flow a lot more natural? I know the problem is that we won't be able to "keep the last known good state" but I am not sure if that's what a state driven system normally guarantees.

}

// ResourceOverrideSpec defines the desired state of the Override.
type ResourceOverrideSpec struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty much the same as the ClusterResourceOverrideSpec. Is it good to use a common sub struct for all the fields except the resourceSelector?


ClusterResourceSnapshotName string `json:"clusterResourceSnapshotName"`

ResourceOverrides []OverrideSnapshotIdentifier `json:"resourceOverrides"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, What exactly is this field for?

@zhiying-lin zhiying-lin force-pushed the override-api branch 3 times, most recently from 5bec931 to e7aa1ca Compare January 26, 2024 12:35
@zhiying-lin
Copy link
Copy Markdown
Contributor Author

Created two separate PRs instead

weng271190436 pushed a commit that referenced this pull request May 28, 2026
* feat: harden property selector validation

Address the four validator-hardening TODOs from issue #646. Two were
stale (the placement validator already enforced them at the API
boundary); two are real and now have implementations + thorough tests.

What's new:

- Bundle operator and value validation into validateOperatorAndValues
  driven by a single supportedPropertyOperators table. Replaces the
  prior split validateOperator / validateValues. Preserves the
  existing "operator X requires exactly one value, got N" wording so
  existing tests stay stable, and only emits "exactly N values" if a
  future operator declares requiredValueCount > 1.

- Add per-property logical-contradiction detection
  (validateRequirementsConsistency). Split into named helpers per
  reviewer feedback: collectRequirementBounds, tightenLower /
  tightenUpper (with a strict-preferred tiebreak), checkEqVsNe,
  checkInterval, checkEqInsideBounds. Detects:
    * two Eq with different values,
    * Eq + Ne on the same value,
    * empty intervals from the most-restrictive lower vs the
      most-restrictive upper, including boundary cases such as
      Gt x + Lt x, Gt x + Lte x, Gte x + Lt x,
    * Eq value violating the most-restrictive lower or upper bound.
  More elaborate cases (sparse Ne exclusion sets, etc.) are left for
  the property-selector evaluator at scheduling time.

What's gone:

- Removed two stale TODOs in resource_selector_resolver.go (lines 526,
  556). Mutual-exclusiveness of Name vs LabelSelector and labelSelector
  validity are already enforced in pkg/utils/validator/placement.go
  (validatePlacement / validateLabelSelector). Replaced with comments
  pointing readers at the validator.

Tests:

- TestValidateOperatorAndValues (7 cases including unsupported
  operator, malformed quantity, count mismatch).
- TestValidateRequirementsConsistency (~20 cases: each contradiction
  variant plus negative controls and malformed-input skipping).
- TestRequirementBoundsTighten (4 cases for the "most restrictive
  wins" + tie-breaks).

Boy Scout: switched two interface{} to any in the Service nodePort
stripping block.

Fixes #646

Co-Authored-By: Claude Code <noreply@anthropic.com>
Signed-off-by: Yetkin Timocin <ytimocin@microsoft.com>

* Address feedback

Signed-off-by: Yetkin Timocin <ytimocin@microsoft.com>

* Address feedback

Consolidate the per-operator bounds-folding behaviour into the
supportedPropertyOperators registry so the spec table is the single
source of truth for operator validity, required value count, and how
each requirement folds into requirementBounds. collectRequirementBounds
no longer carries a switch; it dispatches via spec.applyToBounds.

- Add operatorSpec{ requiredValueCount, applyToBounds } and rewire
  validateOperatorAndValues and collectRequirementBounds to read from
  it. Drop the speculative multi-value error-wording branch in
  validateOperatorAndValues — no operator declares a count > 1, and a
  future multi-value operator can update the wording at the time.
- Extract (*requirementBounds).applyEq as a method so the spec table
  can reference it as (*requirementBounds).applyEq.
- Add TestApplyEq covering first call / second-equal / second-different
  branches, with postconditions that eqVal carries the expected value
  including preservation on the error path.
- Add cross-format consistency rows that exercise Cmp through the
  validator's own logic paths (Eq+Ne and Eq vs upper bound), not
  apimachinery primitives.
- Add TestSupportedPropertyOperatorsRegistryComplete to fail at test
  time if a future entry is added with a non-positive requiredValueCount
  or nil applyToBounds.

Signed-off-by: Yetkin Timocin <ytimocin@microsoft.com>

* fix: codespell typo (unparseable -> unparsable)

Signed-off-by: Yetkin Timocin <ytimocin@microsoft.com>

---------

Signed-off-by: Yetkin Timocin <ytimocin@microsoft.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants