Skip to content

Add PSA to block host field in probe/lifecycle handlers #4942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 18, 2025

Conversation

tssurya
Copy link
Contributor

@tssurya tssurya commented Nov 3, 2024

  • One-line PR description: Add PSA to block host field in probe/lifecycle handlers
  • Other comments:

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 3, 2024
@k8s-ci-robot k8s-ci-robot requested review from deads2k and enj November 3, 2024 10:18
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 3, 2024
@aojea
Copy link
Member

aojea commented Nov 4, 2024

not familiar with the mechanics of this admission plugin, the trickiest part seems to handle the roll out of the feature, as some pods will be impacted by the new policy on upgrade. Things like a rolling update on a deployment comes to mind, where valid pods will created and will be impacted, despite the existing ones running will not.

From the network perspective +1 from me, most people should not use the host field, although there are case that is used for polling external endpoints and now it can not be removed without breaking existing workloads, but seems correct to warn these users about the risk they are running by doing that.

cc: @tallclair @liggitt

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 2, 2025
@tssurya tssurya marked this pull request as ready for review February 2, 2025 17:15
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 2, 2025
@tssurya
Copy link
Contributor Author

tssurya commented Feb 2, 2025

I have 3 more sections to update (unit, e2e, integration based on test grid links which since I was traveling was hard to load and access, will update those sections asap).. opening this up for reviews meanwhile

@deads2k PTAL if this is doable for 1.33 ?

@ibihim
Copy link
Contributor

ibihim commented Feb 24, 2025

@tssurya, @deads2k, we are passed enhancement freeze v1.33. What is your plan for this?

@tallclair
Copy link
Member

not familiar with the mechanics of this admission plugin, the trickiest part seems to handle the roll out of the feature, as some pods will be impacted by the new policy on upgrade. Things like a rolling update on a deployment comes to mind, where valid pods will created and will be impacted, despite the existing ones running will not.

pod security admission has a couple features to help with this. policies are versioned, so an admin can decouple policy upgrade from cluster upgrade. You can also dry-run policy changes against workload resources. https://kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-namespace-labels/ has some of the details, but we should probably add task documentation specifically for the upgrade scenario.

@tssurya
Copy link
Contributor Author

tssurya commented Mar 1, 2025

@tssurya, @deads2k, we are passed enhancement freeze v1.33. What is your plan for this?

I had spoken with @deads2k .. IIUC I should send an email in early part of the release cycle first and get some reviews which I had missed... So we are targeting this for next release and as soon as the window opens I plan to send an email to mailing list for reviews once I update this KEP to a recent state.

@tssurya
Copy link
Contributor Author

tssurya commented Apr 26, 2025

@tssurya, @deads2k, we are passed enhancement freeze v1.33. What is your plan for this?

I had spoken with @deads2k .. IIUC I should send an email in early part of the release cycle first and get some reviews which I had missed... So we are targeting this for next release and as soon as the window opens I plan to send an email to mailing list for reviews once I update this KEP to a recent state.

Has the 1.34 KEP window opened? https://kubernetes.slack.com/archives/C0EN96KUY/p1745693668074409?thread_ts=1745431405.777719&cid=C0EN96KUY

All related code will land within the same single release


#### Deprecation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For deprecating the .Host field we should probably be reaching out to SIG projects that use it, there is at least one which appears to be used in production clusters (kops):

https://cs.k8s.io/?q=HTTPGetAction&i=fosho&literal=nope&files=&excludeFiles=%5Evendor&repos=kubernetes/kops

https://github.com/kubernetes/kops/blob/5dd2f468b46fda43f3a63ba1e6dc7c55c21919eb/nodeup/pkg/model/kube_apiserver.go#L603

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So they're deploying the apiserver as a host-network pod and specifying to use host: 127.0.0.1 for the probes, presumably because there may be a firewall blocking access to the healthz port on the public node IP. That seems reasonable. (Ah, actually, there are examples of this in k/k/cluster/ too.)

We could have the semantics of the deprecation/PSA be "you aren't allowed to specify host except in the case of specifying 127.0.0.1 or ::1 for a hostNetwork: true pod...

Copy link
Contributor Author

@tssurya tssurya Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For deprecating the .Host field we should probably be reaching out to SIG projects that use it, 

that's a good idea..

We could have the semantics of the deprecation/PSA be "you aren't allowed to specify host except in the case of specifying 127.0.0.1 or ::1 for a hostNetwork: true pod...

hmm we could but its still a bit mucky, i.e we either deprecate the field or we don't right? Is it OK to deprecate with exceptions :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not sure what the most reasonable option is here. cc @aojea @thockin @liggitt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecation doesn't mean broken and doesn't mean removed. My understanding of this change is

  1. the meaning of the .Host field is remaining the same
  2. the resource validation on the .Host field is remaining the same
  3. PodSecurityAdmission with both version locked and intent driven evaluation modes will enforce a more accurate restriction based on the actual impact of the field. Version locked namespaces have no change. Intent-based namespaces (latest) will get the most accurate enforcement

I don't see documenting the field as deprecated as critical to this change. I'm fine with this KEP without any documentation changes to the field. Marking it deprecated seems more appropriate to do in the KEP that is adding its replacement once it reaches stable.

I suggest making this part optional in the KEP and removing from goals. The documentation should still be updated to explain that usage is privileged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack I think for now, as mentioned by @deads2k I'm removing the deprecation from the enhancement scope.

@tssurya
Copy link
Contributor Author

tssurya commented Jun 7, 2025

thanks for the reviews (just got back from PTO this week, will address these comments asap.

@tssurya
Copy link
Contributor Author

tssurya commented Jun 10, 2025

I've resolved most of the comments from @BenTheElder and @liggitt : Diff for the new pushes can be seen here: https://github.com/kubernetes/enhancements/compare/8d4b17b02bb6725ebf8620657f208216c62e7627..02514fb6fb222958dbd7481cb44a5768dc74d1fe so that its easier to see what changed.

Only remaining one is - I think #4942 (comment) is a good point, however I'm not sure if deprecating with exception like that is common practice?

Comment on lines 265 to 267
###### How can a rollout or rollback fail? Can it impact already running workloads?

N/A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fill this out with the impact to a workload resource (deployment) trying to create new pods with a PSA level set to latest after an upgrade.

Copy link
Contributor Author

@tssurya tssurya Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this to include this info.. on second thoughts I have mentioned this in L246 which probably now needs a tweak? cause on upgrades all pods get recreated so if encountered with the PSA set they would fail? -> updated that as well PTAL. cause existing pods if drained and brought up again during upgrades if PSA is enabled then they won't come up.. so my current sentence was a bit mis-leading

@tssurya
Copy link
Contributor Author

tssurya commented Jun 17, 2025

@deads2k I've addressed the comments, thanks for taking the time to review
See the diff here: https://github.com/kubernetes/enhancements/compare/02514fb6fb222958dbd7481cb44a5768dc74d1fe..c5130fb0298befbda391e0369ebd9884be9f04bb for easiness to see what changed with latest.

Copy link
Contributor

@deads2k deads2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the bit about deprecation and do it in the KEP adding the replacement. Keep the doc updates. This lgtm for PRR and sig.

All related code will land within the same single release


#### Deprecation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecation doesn't mean broken and doesn't mean removed. My understanding of this change is

  1. the meaning of the .Host field is remaining the same
  2. the resource validation on the .Host field is remaining the same
  3. PodSecurityAdmission with both version locked and intent driven evaluation modes will enforce a more accurate restriction based on the actual impact of the field. Version locked namespaces have no change. Intent-based namespaces (latest) will get the most accurate enforcement

I don't see documenting the field as deprecated as critical to this change. I'm fine with this KEP without any documentation changes to the field. Marking it deprecated seems more appropriate to do in the KEP that is adding its replacement once it reaches stable.

I suggest making this part optional in the KEP and removing from goals. The documentation should still be updated to explain that usage is privileged.

@tssurya
Copy link
Contributor Author

tssurya commented Jun 17, 2025

@deads2k
Copy link
Contributor

deads2k commented Jun 17, 2025

lgtm for both the sig and PRR. I see Jordan's lgtm above too

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 17, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, deads2k, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2025
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 18, 2025
@deads2k
Copy link
Contributor

deads2k commented Jun 18, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 18, 2025
@k8s-ci-robot k8s-ci-robot merged commit 849e2af into kubernetes:master Jun 18, 2025
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jun 18, 2025
@github-project-automation github-project-automation bot moved this from Backlog to Closed / Done in SIG Auth Jun 18, 2025
probes with the Host field set when using the (about to be deprecated) API.
This is implemented by [kubernetes PR 125271](https://github.com/kubernetes/kubernetes/pull/125271)
that does exactly that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops I forgot to actually write the localhost part that @BenTheElder and @danwinship pointed out here: #4942 (comment)
will do a small FUP PR to fix that up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Closed / Done
Development

Successfully merging this pull request may close these issues.

10 participants