Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI jobs for AKS & EKS are failing #1572

Closed
antoninbas opened this issue Nov 18, 2020 · 4 comments · Fixed by #1575 or #1585
Closed

CI jobs for AKS & EKS are failing #1572

antoninbas opened this issue Nov 18, 2020 · 4 comments · Fixed by #1575 or #1585
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@antoninbas
Copy link
Contributor

Describe the bug

They have been failing for about a week:

Creating a single issue because even though I have not investigated the failures, the jobs started failing at the same time, so it is likely that it is the same root cause.

Versions:

  • Antrea version: ToT and probably 0.11
@antoninbas antoninbas added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Nov 18, 2020
@Dyanngg
Copy link
Contributor

Dyanngg commented Nov 18, 2020

triggered a manual run and saw this in sonobuoy logs:

ubuntu@antrea-cloud-ci-vm:~$ sonobuoy logs
namespace="sonobuoy" pod="sonobuoy" container="kube-sonobuoy"
time="2020-11-18T22:04:29Z" level=info msg="Scanning plugins in ./plugins.d (pwd: /)"
time="2020-11-18T22:04:29Z" level=info msg="Scanning plugins in /etc/sonobuoy/plugins.d (pwd: /)"
time="2020-11-18T22:04:29Z" level=info msg="Directory (/etc/sonobuoy/plugins.d) does not exist"
time="2020-11-18T22:04:29Z" level=info msg="Scanning plugins in ~/sonobuoy/plugins.d (pwd: /)"
time="2020-11-18T22:04:29Z" level=info msg="Directory (~/sonobuoy/plugins.d) does not exist"
time="2020-11-18T22:04:59Z" level=error msg="could not get api group resources: Get \"https://10.100.0.1:443/api?timeout=32s\": dial tcp 10.100.0.1:443: i/o timeout"
time="2020-11-18T22:04:59Z" level=info msg="no-exit was specified, sonobuoy is now blocking"

@antoninbas
Copy link
Contributor Author

Re-opening as the tests are still failing on both AKS & EKS.

@antoninbas antoninbas reopened this Nov 19, 2020
@antoninbas antoninbas assigned antoninbas and unassigned Dyanngg Nov 19, 2020
@antoninbas
Copy link
Contributor Author

One finding is that the EKS CNI introspection API is no longer available, so the https://github.com/vmware-tanzu/antrea/blob/master/build/yamls/antrea-eks-node-init.yml script is broken.

We can use the contents of /var/run/aws-node/ipam.json to decide which containers to restart.

[ec2-user@ip-192-168-23-17 ~]$ sudo cat /var/run/aws-node/ipam.json
{"version":"vpc-cni-ipam/1","allocations":[{"networkName":"aws-cni","containerID":"720492b284d1193a485455f4a6a28e3df2ac56ea67de802daee99940b20a676f","ifName":"eth0","ipv4":"192.168.15.180"}]}

See awslabs/amazon-eks-ami#487

I will address this first

@antoninbas
Copy link
Contributor Author

Second issue is that AntreaProxy is not enabled for policyOnlyMode: https://github.com/vmware-tanzu/antrea/blob/3db240dab7a758f0199ada0eb333b0cf8fb442f2/cmd/antrea-agent/agent.go#L185-L192

In that mode nodeConfig.PodIPv4CIDR and nodeConfig.PodIPv6CIDR are not set...

antoninbas added a commit to antoninbas/antrea that referenced this issue Nov 20, 2020
policyOnlyMode was broken since adding support for IPv6 clusters in the
code base. This is because the code used the Node's PodCIDR(s) to
determine which address family was supported, which doesn't work in
policyOnlyMode, for which IPAM is not the responsibility of Antrea.

Instead we now use the following rules:
* if there is an IPv4 PodCIDR for the Node, then v4 is supported
* otherwise, if policyOnlyMode is used and the Node's IP address
  (primary, as reported by K8s) is an IPv4 address then v4 is supported
Same rules for v6.

This may not work for dual-stack, but IIRC none of the cloud services
support IPv6 / dual-stack. There are also other issues for dual-stack
support in Antrea, which themselves depend on upstream issues.

Additionally, we include the following changes:
* the build/yamls/antrea-eks-node-init.yml manifest is updated to
  account for a breaking change in the AWS CNI introspection API.
* we pin the K8s conformance test image for all clouds to v1.18.5 to
  avoid issues with recently-added conformance tests.

Fixes antrea-io#1572
antoninbas added a commit to antoninbas/antrea that referenced this issue Nov 20, 2020
policyOnlyMode was broken since adding support for IPv6 clusters in the
code base. This is because the code used the Node's PodCIDR(s) to
determine which address family was supported, which doesn't work in
policyOnlyMode, for which IPAM is not the responsibility of Antrea.

Instead we now use the following rules:
* if there is an IPv4 PodCIDR for the Node, then v4 is supported
* otherwise, if policyOnlyMode is used and the Node's IP address
  (primary, as reported by K8s) is an IPv4 address then v4 is supported
Same rules for v6.

This may not work for dual-stack, but IIRC none of the cloud services
support IPv6 / dual-stack. There are also other issues for dual-stack
support in Antrea, which themselves depend on upstream issues.

Additionally, we include the following changes:
* the build/yamls/antrea-eks-node-init.yml manifest is updated to
  account for a breaking change in the AWS CNI introspection API.
* we pin the K8s conformance test image for all clouds to v1.18.5 to
  avoid issues with recently-added conformance tests.

Fixes antrea-io#1572
antoninbas added a commit to antoninbas/antrea that referenced this issue Nov 20, 2020
policyOnlyMode was broken since adding support for IPv6 clusters in the
code base. This is because the code used the Node's PodCIDR(s) to
determine which address family was supported, which doesn't work in
policyOnlyMode, for which IPAM is not the responsibility of Antrea.

Instead we now use the following rules:
* if there is an IPv4 PodCIDR for the Node, then v4 is supported
* otherwise, if policyOnlyMode is used and the Node's IP address
  (primary, as reported by K8s) is an IPv4 address then v4 is supported
Same rules for v6.

This may not work for dual-stack, but IIRC none of the cloud services
support IPv6 / dual-stack. There are also other issues for dual-stack
support in Antrea, which themselves depend on upstream issues.

Additionally, we include the following changes:
* the build/yamls/antrea-eks-node-init.yml manifest is updated to
  account for a breaking change in the AWS CNI introspection API.
* we pin the K8s conformance test image for all clouds to v1.18.5 to
  avoid issues with recently-added conformance tests.

Fixes antrea-io#1572
antoninbas added a commit that referenced this issue Nov 20, 2020
policyOnlyMode was broken since adding support for IPv6 clusters in the
code base. This is because the code used the Node's PodCIDR(s) to
determine which address family was supported, which doesn't work in
policyOnlyMode, for which IPAM is not the responsibility of Antrea.

Instead we now use the following rules:
* if there is an IPv4 PodCIDR for the Node, then v4 is supported
* otherwise, if policyOnlyMode is used and the Node's IP address
  (primary, as reported by K8s) is an IPv4 address then v4 is supported
Same rules for v6.

This may not work for dual-stack, but IIRC none of the cloud services
support IPv6 / dual-stack. There are also other issues for dual-stack
support in Antrea, which themselves depend on upstream issues.

Additionally, we include the following changes:
* the build/yamls/antrea-eks-node-init.yml manifest is updated to
  account for a breaking change in the AWS CNI introspection API.
* we pin the K8s conformance test image for all clouds to v1.18.5 to
  avoid issues with recently-added conformance tests.

Fixes #1572
antoninbas added a commit to antoninbas/antrea that referenced this issue Nov 20, 2020
policyOnlyMode was broken since adding support for IPv6 clusters in the
code base. This is because the code used the Node's PodCIDR(s) to
determine which address family was supported, which doesn't work in
policyOnlyMode, for which IPAM is not the responsibility of Antrea.

Instead we now use the following rules:
* if there is an IPv4 PodCIDR for the Node, then v4 is supported
* otherwise, if policyOnlyMode is used and the Node's IP address
  (primary, as reported by K8s) is an IPv4 address then v4 is supported
Same rules for v6.

This may not work for dual-stack, but IIRC none of the cloud services
support IPv6 / dual-stack. There are also other issues for dual-stack
support in Antrea, which themselves depend on upstream issues.

Additionally, we include the following changes:
* the build/yamls/antrea-eks-node-init.yml manifest is updated to
  account for a breaking change in the AWS CNI introspection API.
* we pin the K8s conformance test image for all clouds to v1.18.5 to
  avoid issues with recently-added conformance tests.

Fixes antrea-io#1572
antoninbas added a commit that referenced this issue Nov 21, 2020
policyOnlyMode was broken since adding support for IPv6 clusters in the
code base. This is because the code used the Node's PodCIDR(s) to
determine which address family was supported, which doesn't work in
policyOnlyMode, for which IPAM is not the responsibility of Antrea.

Instead we now use the following rules:
* if there is an IPv4 PodCIDR for the Node, then v4 is supported
* otherwise, if policyOnlyMode is used and the Node's IP address
  (primary, as reported by K8s) is an IPv4 address then v4 is supported
Same rules for v6.

This may not work for dual-stack, but IIRC none of the cloud services
support IPv6 / dual-stack. There are also other issues for dual-stack
support in Antrea, which themselves depend on upstream issues.

Additionally, we include the following changes:
* the build/yamls/antrea-eks-node-init.yml manifest is updated to
  account for a breaking change in the AWS CNI introspection API.
* we pin the K8s conformance test image for all clouds to v1.18.5 to
  avoid issues with recently-added conformance tests.

Fixes #1572
antoninbas added a commit that referenced this issue Dec 23, 2020
policyOnlyMode was broken since adding support for IPv6 clusters in the
code base. This is because the code used the Node's PodCIDR(s) to
determine which address family was supported, which doesn't work in
policyOnlyMode, for which IPAM is not the responsibility of Antrea.

Instead we now use the following rules:
* if there is an IPv4 PodCIDR for the Node, then v4 is supported
* otherwise, if policyOnlyMode is used and the Node's IP address
  (primary, as reported by K8s) is an IPv4 address then v4 is supported
Same rules for v6.

This may not work for dual-stack, but IIRC none of the cloud services
support IPv6 / dual-stack. There are also other issues for dual-stack
support in Antrea, which themselves depend on upstream issues.

Additionally, we include the following changes:
* the build/yamls/antrea-eks-node-init.yml manifest is updated to
  account for a breaking change in the AWS CNI introspection API.
* we pin the K8s conformance test image for all clouds to v1.18.5 to
  avoid issues with recently-added conformance tests.

Fixes #1572
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
2 participants