Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpv1: Don't use net package for addressing #25313

Merged

Conversation

YutaroHayakawa
Copy link
Member

@YutaroHayakawa YutaroHayakawa commented May 8, 2023

We're now discouraging using net package for addressing. Convert all
net.IP and net.IPNet to netip.Addr and netip.Prefix. During the
conversion, routerID resolution code is duplicated to two places. As a
part of the cleanup, I consolidated them into the single function.

Related: #24246

bgpv1: Don't use net package for addressing

@YutaroHayakawa YutaroHayakawa added the release-note/misc This PR makes changes that have no direct user impact. label May 8, 2023
@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/bgp-cplane/use-netip-prefix branch 4 times, most recently from a3d2752 to 041573b Compare May 8, 2023 14:02
@YutaroHayakawa YutaroHayakawa marked this pull request as ready for review May 8, 2023 14:33
@YutaroHayakawa YutaroHayakawa requested a review from a team as a code owner May 8, 2023 14:33
Copy link
Contributor

@harsimran-pabla harsimran-pabla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much awaited net/netip, amazing to see this change!

@YutaroHayakawa
Copy link
Member Author

YutaroHayakawa commented May 9, 2023

/test

Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks E/W loadbalancing (ClusterIP, NodePort from inside cluster, etc) with L7 policy Tests NodePort with L7 Policy

Failure Output

FAIL: Request from testclient-rlzkz pod to service tftp://[fd04::11]:30278/hello failed

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.25-kernel-4.19/2049/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.25-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

@YutaroHayakawa
Copy link
Member Author

k8s-1.25-kernel-4.19: #25096
runtime: Looks like it checkouts old commit. I guess it didn't trigger in the last push? Let me manually rerun.

@YutaroHayakawa
Copy link
Member Author

/test-1.25-4.19

@YutaroHayakawa
Copy link
Member Author

/test-runtime

Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know about this package until now, this is very nice!

@YutaroHayakawa
Copy link
Member Author

/test-runtime

@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/bgp-cplane/use-netip-prefix branch from 041573b to a50aca8 Compare May 9, 2023 14:38
@YutaroHayakawa
Copy link
Member Author

Fixed minor grammar mistakes in the commit message. Also wanted to try triggering the test-runtime with the right commit again.

@YutaroHayakawa
Copy link
Member Author

/test

@YutaroHayakawa
Copy link
Member Author

Very weird... GHA now started to pick older commits for other workflows: https://github.com/cilium/cilium/actions/runs/4927081052/jobs/8803538691?pr=25313

@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/bgp-cplane/use-netip-prefix branch from a50aca8 to 7a010ec Compare May 9, 2023 14:51
@YutaroHayakawa
Copy link
Member Author

Ok, I have no luck today. I don't know if this is due to the ongoing GH incident, but I'll retry tomorrow.

@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/bgp-cplane/use-netip-prefix branch from 7a010ec to c735864 Compare May 9, 2023 23:51
@YutaroHayakawa
Copy link
Member Author

YutaroHayakawa commented May 10, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Unexpected missed tail calls

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/2186/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

@YutaroHayakawa
Copy link
Member Author

Let me close and open this PR. Some workflow keeps grabbing the old commits.

@YutaroHayakawa
Copy link
Member Author

YutaroHayakawa commented May 12, 2023

Very weird... Seems like we cannot recover from here.

スクリーンショット 2023-05-12 17 09 36

@YutaroHayakawa
Copy link
Member Author

/test

@YutaroHayakawa
Copy link
Member Author

YutaroHayakawa commented May 15, 2023

Go-related checks / lint: https://github.com/cilium/cilium/actions/runs/4956257837/jobs/8866490366?pr=25313

  • GH got mad after the outage, and it no longer checkouts the correct latest changes

ClusterMesh: https://github.com/cilium/cilium/actions/runs/4956670271

  • 1.1.1.1 test is failing, which is known to be unstable

Travis: https://github.com/cilium/cilium/pull/25313/checks?check_run_id=13426991588

  • GH got mad after the outage, and it no longer checkouts the correct latest changes

k8s-1.26-kernel-net-next: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/2244/

  • Failing on the VM boot up.

runtime: https://jenkins.cilium.io/job/Cilium-PR-Runtime-net-next/6638/

  • GH got mad after the outage, and it no longer checkouts the correct latest changes

@YutaroHayakawa
Copy link
Member Author

/ci-multicluster

@YutaroHayakawa
Copy link
Member Author

/test-1.26-net-next

@YutaroHayakawa
Copy link
Member Author

ClusterMesh: Filed a new issue #25442

@YutaroHayakawa
Copy link
Member Author

/ci-multicluster

@YutaroHayakawa
Copy link
Member Author

For golang lint and precheck, I run these tests locally and they are passing.

$ git checkout yutaro/bgp-cplane/use-netip-prefix
$ make lint
docker run --rm -v `pwd`:/app -w /app docker.io/golangci/golangci-lint:v1.52.2@sha256:3d2f4240905054c7efa7f4e98ba145c12a16995bbc3e605300e21400a1665cb6 golangci-lint run

$ make precheck
contrib/scripts/check-logging-subsys-field.sh
contrib/scripts/check-fmt.sh
contrib/scripts/check-log-newlines.sh
contrib/scripts/check-test-tags.sh
contrib/scripts/check-assert-deep-equals.sh
contrib/scripts/lock-check.sh
contrib/scripts/check-viper.sh
contrib/scripts/custom-vet-check.sh
contrib/scripts/rand-check.sh

For runtime and Travis tests, I think it's safe to ignore them since they don't test the code path modified in this PR, and integration-test is passing.

@YutaroHayakawa YutaroHayakawa added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 15, 2023
@dylandreimerink
Copy link
Member

dylandreimerink commented May 15, 2023

The Travis test fails with the following error:

 GO     operator/cilium-operator
# github.com/cilium/cilium/pkg/bgpv1/manager
../pkg/bgpv1/manager/advertisements_reconciler.go:107:6: unknown field Net in struct literal of type "github.com/cilium/cilium/pkg/bgpv1/types".Advertisement
../pkg/bgpv1/manager/advertisements_reconciler.go:107:27: advrt.Net undefined (type "github.com/cilium/cilium/pkg/bgpv1/types".Advertisement has no field or method Net)
make[1]: *** [Makefile:14: cilium-agent] Error 1
make: *** [Makefile:59: daemon] Error 2

Same for the runtime test:

12:13:28      runtime: #15 72.06 CGO_ENABLED=0 go build -mod=vendor -ldflags '-X "github.com/cilium/cilium/pkg/version.ciliumVersion=1.14.0-dev 7e26f334d8 2023-05-12T08:30:47+00:00" -s -w -X "github.com/cilium/cilium/pkg/proxy.requiredEnvoyVersionSHA=4350471813b173839df78f7a1ea5d77b5cdf714b" -X "github.com/cilium/cilium/pkg/datapath/loader.DatapathSHA256=813a4a12226bba103827831c800082a316d36275abd90f07d726b23097b45bbb" ' -tags=osusergo,lockdebug  -o cilium-agent
12:13:28      runtime: #15 85.86 # github.com/cilium/cilium/pkg/bgpv1/manager
12:13:28      runtime: #15 85.86 ../pkg/bgpv1/manager/advertisements_reconciler.go:107:6: unknown field Net in struct literal of type "github.com/cilium/cilium/pkg/bgpv1/types".Advertisement
12:13:28      runtime: #15 85.86 ../pkg/bgpv1/manager/advertisements_reconciler.go:107:27: advrt.Net undefined (type "github.com/cilium/cilium/pkg/bgpv1/types".Advertisement has no field or method Net)
12:13:28      runtime: #15 96.14 make[1]: Leaving directory '/go/src/github.com/cilium/cilium/daemon'
12:13:28      runtime: #15 96.14 make[1]: *** [Makefile:14: cilium-agent] Error 1
12:13:28      runtime: #15 96.14 make: *** [Makefile:56: build-container] Error 2

That does not seem like a flake :(

@YutaroHayakawa
Copy link
Member Author

YutaroHayakawa commented May 15, 2023

@dylandreimerink That is a GH issue. It picks an old commit, as I mentioned in this comment (#25313 (comment)). In the latest commit, it is fixed.

https://github.com/cilium/cilium/pull/25313/files#diff-0bf546cea985047b084104fb137a2205d0060842e36f92bf5660ecbbe51cfe88L102-R107

If it still has that issue, every workflow compiles the source should also fail, but they don't.

@YutaroHayakawa FYI - I triggered rebase from the GH UI.

@aditighag
Copy link
Member

Screen Shot 2023-05-15 at 3 58 54 PM

Here is the snapshot of the test runs. Let's rebase and re-run the failing Travis and runtime tests. For go linters, Yutaro ran the tests locally.
We can fast track this PR without running the full CI once the basic test runs are completed.

@aditighag aditighag force-pushed the yutaro/bgp-cplane/use-netip-prefix branch from c735864 to 3019115 Compare May 15, 2023 23:00
@aditighag
Copy link
Member

@YutaroHayakawa There are still failures after rebase. PTAL. I'm removing the ready-to-merge label for now. Feel free to add it back with notes in case there are failures.

@aditighag aditighag removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 15, 2023
@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/bgp-cplane/use-netip-prefix branch 2 times, most recently from c735864 to f86c8c1 Compare May 16, 2023 01:25
We're now discouraging using net package for addressing. Convert all
net.IP and net.IPNet to netip.Addr and netip.Prefix. During the
conversion, I found routerID resolution code is duplicated to two
places. As a part of the cleanup, I consolidated them into the single
function.

Related: cilium#24246

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
@YutaroHayakawa YutaroHayakawa force-pushed the yutaro/bgp-cplane/use-netip-prefix branch from f86c8c1 to 6b5f1e9 Compare May 16, 2023 01:30
@YutaroHayakawa
Copy link
Member Author

YutaroHayakawa commented May 16, 2023

Ashame, I noticed I was stupid 🤦 I kept saying the failure was due to the GitHub issue, but it wasn't. The CI noticed line 107 of the advertisement_reconsiler.go has an issue that uses Advertisement.Net, which is renamed in this PR, but in the preview of this PR, it looks like this, and we don't have such a problem. So. I thought GitHub picked up the old commit, which had a similar issue (I forgot to rename some Net fields in the old commit).

スクリーンショット 2023-05-16 10 20 06

However, what I was missing was the CI rebases my changes on top of the latest main, and the latest main recently got a change in line 107 (a0254cd#diff-0bf546cea985047b084104fb137a2205d0060842e36f92bf5660ecbbe51cfe88R107).

My local branch was not rebased on top of the latest main. That's why retry never helped. Now I pushed the fix on top of the latest main, and now the CI recovered.

My bad 🤦 thanks @aditighag @dylandreimerink for stopping before I break the tree 🙇

@YutaroHayakawa
Copy link
Member Author

YutaroHayakawa commented May 16, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks N/S loadbalancing With host policy Tests NodePort

Failure Output

FAIL: Can not connect to service "tftp://192.168.56.12:32112/hello" from outside cluster (1/10)

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/2311/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

@YutaroHayakawa
Copy link
Member Author

@YutaroHayakawa
Copy link
Member Author

YutaroHayakawa commented May 16, 2023

/test-1.26-net-next

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks N/S loadbalancing With host policy Tests NodePort

Failure Output

FAIL: Can not connect to service "tftp://[fd04::11]:31279/hello" from outside cluster (1/10)

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/2313/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

@YutaroHayakawa
Copy link
Member Author

k8s-1.26-kernel-net-next: Again failing with #25411

@YutaroHayakawa
Copy link
Member Author

/test-1.26-net-next

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 16, 2023
@aditighag aditighag merged commit a555148 into cilium:main May 16, 2023
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bgp ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants