Skip to content

e2e/loadbalancer: added hairpin connection cases #1161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mtulio
Copy link
Contributor

@mtulio mtulio commented Jun 16, 2025

What type of PR is this?

/kind bug
/kind failing-test

What this PR does / why we need it:

Implementing the hairpin connection test cases, and exposing an issue on NLB with internal scheme which fails when the client is trying to access a service loadbalancer which is hosted in the same node.

The hairpin connection is caused by the client IP preservation attribute is set to true (default), and the service does not provide an interface to prevent the issue.

The e2e is expecting to pass to prevent permanent failures in CI, but it is tracked by an issue #1160.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Those tests are important to increase coverage of scenarios that CCM declares as supported.

I also believe we can remove the hairpin with scheme internet-facing (public) LBs because the source IPs would be traversing a VPC gateway (IGW/NGW) and masquerade the real source, not reproducing the problem we are trying to expose in #1160. Thoughts?

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 16, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hakman for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 16, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 16, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @mtulio. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 16, 2025
@mtulio
Copy link
Contributor Author

mtulio commented Jun 16, 2025

Hi @kmala , would you mind stamping ok to test those new jobs, please?

I also have a question in the description related to the public cases, I used in the begging of exploration, but looks like we don't need it, LMK WDYT. Thanks

cc @elmiko

@mtulio mtulio changed the title e2e/loadbalancer: implement hairpin connection cases e2e/loadbalancer: added hairpin connection cases Jun 16, 2025
@kmala
Copy link
Member

kmala commented Jun 16, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 16, 2025
@mtulio
Copy link
Contributor Author

mtulio commented Jun 16, 2025

@mtulio: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cloud-provider-aws-e2e 24a0041 link true /test pull-cloud-provider-aws-e2e
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

CI infra issue.

/test pull-cloud-provider-aws-e2e

@mtulio mtulio force-pushed the e2e-hairpin branch 2 times, most recently from a623f3d to 5099f7f Compare June 16, 2025 16:07
@mtulio
Copy link
Contributor Author

mtulio commented Jun 16, 2025

I am observing a permanent failure on CI when launching the cluster trying to use an image that is no longer available:
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/cloud-provider-aws/1161/pull-cloud-provider-aws-e2e/1934643980113285120#1:build-log.txt%3A751

s invalid: could not find Image for "099720109477/ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20250502.1"

Hi @kmala @elmiko, do you know if is this comes from the test framework or is it possible to use a valid image in CCM repo?

Implementing the hairpin connection test cases, and exposing an issue on
NLB with internal scheme which fails when the client is trying
to access a service loadbalancer which is hosted in the same node.

The hairpin connection is caused by the client IP preservation attribute
is set to true (default), and the service does not provide an interface
to prevent the issue.

The e2e is expecting to pass to prevent permanent failures in CI, but it
is tracked by an issue kubernetes#1160.
@mtulio
Copy link
Contributor Author

mtulio commented Jun 17, 2025

An issue has been opened to track the CI problem: #1167

@mtulio
Copy link
Contributor Author

mtulio commented Jun 18, 2025

/assign @elmiko @kmala

@mtulio
Copy link
Contributor Author

mtulio commented Jun 18, 2025

looks like pull-cloud-provider-aws-e2e is running (and stuck) in the last 48h. Just stopped it and trying to run again:

/test pull-cloud-provider-aws-e2e

@mtulio
Copy link
Contributor Author

mtulio commented Jun 19, 2025

/test pull-cloud-provider-aws-e2e-kubetest2

@mtulio
Copy link
Contributor Author

mtulio commented Jun 20, 2025

Looks like #1167 has been resolved. I manually stopped the running job (42h+); Triggering it again:

/test pull-cloud-provider-aws-e2e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants