Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kuadrant Policies Status conditions/states #9

Merged
merged 9 commits into from
Oct 17, 2023
Merged

Conversation

didierofrivia
Copy link
Contributor

@didierofrivia didierofrivia commented Feb 6, 2023

This RFC presents 2 options and each one could be applied/develop in order and would reflect valuable and accurate information with different degrees of acuity.
Given the feedback so far, it seems Option 2 is the winner:

Option 2
Based on GEP-713. In this case, besides the proposed Accepted PolicyType, the Enforced PolicyType would be added to reflect the final state of the policy, which means that the policy is showing the synced actual state of the Kuadrant services. The missing Failed PolicyType would be implicitly represented by the TargetNotFound and Invalid PolicyTypeReason.

Type Status Reason Message
Accepted True "Accepted" "KuadrantPolicy has been accepted"
False "Conflicted" "KuadrantPolicy is conflicted by [policy-ns/policy-name], ..."
False "Invalid" "KuadrantPolicy is invalid"
False "TargetNotFound" "KuadrantPolicy target [resource-name] was not found"
Enforced True "Enforced" "KuadrantPolicy has been successfully enforced"
False "PartiallyEnforced" "KuadrantPolicy has encountered some issues and has been partially applied"
False "Overridden" "KuadrantPolicy is overridden by [policy-ns/policy-name], ..."

@didierofrivia didierofrivia self-assigned this Feb 6, 2023
@didierofrivia didierofrivia force-pushed the rfc/policy-state branch 2 times, most recently from 8d8d139 to decdbbd Compare February 7, 2023 17:42
@didierofrivia didierofrivia changed the title [wip][rfc] RLP Status conditions/states RLP Status conditions/states Feb 8, 2023
@didierofrivia didierofrivia requested a review from a team February 8, 2023 14:19
@didierofrivia didierofrivia marked this pull request as ready for review February 8, 2023 14:19
@didierofrivia didierofrivia force-pushed the rfc/policy-state branch 3 times, most recently from e65d8d1 to a29d166 Compare February 9, 2023 14:17
@guicassolato guicassolato added RFC Request For Comments target/current labels Feb 9, 2023
rfcs/0000-rlp-status.md Outdated Show resolved Hide resolved

| Type | Status | Reason | Message |
|-------------|--------|-----------------------------|------------------------------------------------------------|
| Progressing | True | "PolicyCreated" | "RateLimitPolicy created" |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this simpler model. From this doc I feel it would read better.

Food for thought. I wondered if instead we need two types only

protected and unprotected ?

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- Should this proposal extend to the KAP as well?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think where possible if we can align the APIs it would make sense from a user exp point of view. Also interesting to think about KAP in the context of protected unprotected

Copy link
Collaborator

@maleck13 maleck13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally like the changes proposed. Although I would tend towards the simplified version and if it makes sense perhaps simplify a little more.

@maleck13
Copy link
Collaborator

maleck13 commented Feb 16, 2023

I also wonder if perhaps this should be rolled into RLP v2? @guicassolato

@alexsnaps alexsnaps mentioned this pull request Apr 13, 2023
12 tasks
Copy link
Collaborator

@maleck13 maleck13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are we doing with older RFCs like this, do we leave them open, is it still relevant?

@alexsnaps
Copy link
Member

what are we doing with older RFCs like this, do we leave them open, is it still relevant?

We discussed this in last week's planning meeting, we need to do a "update" pass on this one. Make sure all is still in alignment with the v2 policy efforts…


States rationale:

* `Created`: The initial state. It announces that the policy has successfully being created, the operator acknowledges it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we considered using the same types as Gateway API

Accepted , Programmed for these states?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely worth considering, I'll update the RFC adding the Gateway Status like alternative


All conditions are top-level.

| Type | Status | Reason | Message |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think less types would help and it is a good idea to align with K8s or as mentioned above the gateway conditions

@didierofrivia didierofrivia changed the title RLP Status conditions/states Kuadrant Policies Status conditions/states Jul 27, 2023
reconciliation and healthiness with its Operator managed services, mainly in this case the Rate Limit service `Limitador`, and
the Auth service `Authorino`.

As a consequence, only misleading information is shared causing unexpected errors and flawed assumptions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is me just agreeing with the motivation of this PR...

Current implementation is definitely limited (if not entirely misleading), in the sense that it provides the perspective of the policy controller only, i.e., no more than one level deep.

Would you say the current term "Protected" should be interpreted as closer to "Reconciled" (from stage 1) or "Applied" (stage 0)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's closer to Applied, since the controller can go only that far

Comment on lines 94 to 102
| Created | True | "PolicyCreated" | "KuadrantPolicy created" | No |
| Updated | True | "PolicyUpdated" | "KuadrantPolicy has been updated" | No |
| Applied | True | "PolicyApplied" | "KuadrantPolicy has been successfully applied | Yes |
| | False | "PolicyNotApplied" | "KuadrantPolicy is invalid" | |
| Reconciled | True | "PolicyReconciled" | "KuadrantPolicy has been successfully reconciled" | Yes |
| | False | "PolicyNotReconciled" | "KuadrantPolicy failed reconciliation" | |
| Enforced | True | "PolicyEnforced" | "KuadrantPolicy has been successfully enforced" | Yes |
| | False | "PolicyPartiallyEnforced" | "KuadrantPolicy has encountered some issues and has been partially applied" | |
| | False | "PolicyNotEnforced" | "KuadrantPolicy has encountered an error and can't be applied" | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasons here don't seem to add much on top of the Status boolean value. I personally prefer the other option proposed further below.


**Conditions**

| Type | Status | Reason | Message | Top Level |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "Top Level" the same as "Final state"?

as states, which in the [Reference-level explanation](#reference-level-explanation) will be translated to the actual Status Conditions.

## Stage 1
**The state is defined by the application and validation of the Policy CR**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say validation, do you mean at controller level? It may sound an implementation details at first, but I believe this differentiation can add nuances to some aspects of the proposal.

There are maybe 4 ways (4 levels) of validation of Kubernetes custom resources.:

  1. Basic API spec validation (OAS validation)
  2. Kubernetes (v1.25+) validation rules
  3. Admission control (validating webhook service)
  4. Reconciliation loop

In the end, it’s going to be a combination of all the above, I imagine. Of course, policies that don’t pass in the first 3 levels of validation at the time they are first requested to be created will be rejected altogether.

Fair to say then that “Validation” here is essentially about level-4 validation (whose rules, BTW, often overlap a bit with the kind of think that is validated at level-3)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it basically is about the number 4 you mention above. I didn't mind anything below the admission control stage, but if we're gonna have our own set of rules for admission control, it might reflect this in the state

This first stage is a simple version where the operator only relies on itself, not checking the healthiness of its services
and just validating the Spec.

![](0000-rlp-status-assets/rlp_status_1.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess "Spec changed" only means policy "Updated" if the spec in the matter is the policy spec.

Other events that can affect the state of the policy include:

  • changes related to targeted resource (direct and indirect)
  • other (override) policies created

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct and more clear now with Policy V2, regarding the indirect changes

| | True | "PolicyServiceError" | "KuadrantPolicy has encountered has failed to enforce" |
| | False | "PolicyEnforced" | "KuadrantPolicy has been successfully enforced" |


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth including here a status field to provide further details about things as:

  • Routes / route rules protected by the policy (and by which policy rules) vs selectors that failed to attach to any route / route rule

...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a good thing for a dashboard or an aggregated data resource that provides more insight about the Kuadrant policies overall state. Implementation wise, I can't foresee how possible it is to implement it per policy state, however, we could include some kind of brief informative summary or list of conflicting policies, network resources affected, etc...

| | False | "PolicyError" | "KuadrantPolicy has encountered an error" |
| Enforced | True | "PolicyEnforced" | "KuadrantPolicy has been successfully enforced" |
| | False | "PolicyPartiallyEnforced" | "KuadrantPolicy has encountered some issues and has been partially applied" |
| | False | "PolicyNotEnforced" | "KuadrantPolicy has encountered an error and can't be applied" |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an overlap between Enforced: False (PolicyNotEnforced) and Failed: True (PolicyReconciliationError) maybe?

What about specifying a more specific reason why the policy is not enforced? E.g.

Suggested change
| | False | "PolicyNotEnforced" | "KuadrantPolicy has encountered an error and can't be applied" |
| | False | "PolicyOverridden" | "KuadrantPolicy overridden by KuadrantPolicy policy-ns/policy-name" |

| Failed | True | "PolicyValidationError" | "KuadrantPolicy has failed to validate" |
| | True | "PolicyReconciliationError" | "KuadrantPolicy has encountered a reconciliation error" |
| | True | "PolicyServiceError" | "KuadrantPolicy has encountered has failed to enforce" |
| | False | "PolicyEnforced" | "KuadrantPolicy has been successfully enforced" |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the policy being overridden by another policy to be considered a fail? If it's not a fail, then we may need another reason here.

@maleck13
Copy link
Collaborator

maleck13 commented Aug 2, 2023

@didierofrivia this section of the policy attachment gep may be relevant https://gateway-api.sigs.k8s.io/geps/gep-713/#conditions

Copy link
Contributor

@guicassolato guicassolato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm not sure if we can model those exact state transitions depicted in the diagrams of the 3 stages, or if they serve rather as a guide to explain the general lifecycle of a policy (i.e., at documentation level only), I am quite happy with Option 3 proposed, and would love to move forward with that.

Another thing to watch in relation to this is how Gateway API's proposal for a PolicyBinding resource will evolve (and how we can contribute to that process!) Reflecting everything that may have happened only in the policy status sub-resource may end up simply not being feasible. Hence, if an alternative such as a policy binding resource (or kubectl plugin, or CLI tool, etc) exists, the policy conditions and reasons can be kept simple and high level, much along the lines of what's proposed here.

Nice job, @didierofrivia!

| | False | "Invalid" | "KuadrantPolicy is invalid" |
| | False | "TargetNotFound" | "KuadrantPolicy target [resource-name] was not found" |
| Enforced | True | "Enforced" | "KuadrantPolicy has been successfully enforced" |
| | False | "PartiallyEnforced" | "KuadrantPolicy has encountered some issues and has been partially applied" |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like partially enforced. might be tempted to group that under reason Unknown this is similar then to how the gateways themselves reflect transient or uncertain state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we can see the information of partially enforced within the PolicyAffected type, let's change this one to Unknown

Copy link
Collaborator

@maleck13 maleck13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should focus around option 2 that reflects the GEP status concepts. @didierofrivia does it makes sense to update this with a "ruled out" section to keep a record of the other options but simplify the overall doc?

@didierofrivia didierofrivia merged commit b26f573 into main Oct 17, 2023
1 check passed
@didierofrivia didierofrivia deleted the rfc/policy-state branch October 17, 2023 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request For Comments target/current
Projects
No open projects
Status: To test
Development

Successfully merging this pull request may close these issues.

None yet

4 participants