Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate guides for v1.13 release #23051

Closed
13 of 20 tasks
christarazi opened this issue Jan 11, 2023 · 38 comments
Closed
13 of 20 tasks

Validate guides for v1.13 release #23051

christarazi opened this issue Jan 11, 2023 · 38 comments
Labels
area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. release-blocker/1.13 This issue will prevent the release of the next version of Cilium.

Comments

@christarazi
Copy link
Member

christarazi commented Jan 11, 2023

Below is a set of platforms with a semi-randomly picked feature "getting started guide" links. The goal is to test each platform with at least one of the getting started guides. Secondary goal is to expose more people to the system.

Please test using https://docs.cilium.io/en/v1.13.0-rc4. For quick install guides, ensure to pass the version argument to the Cilium CLI to install the correct version.

Also take notes how much time it took you performing the Quick Install and how much time it took you testing the feature itself. Ideally the Quick Install should take < 15 minutes.

Deadline: TBD

EKS (@lizrice)

GKE (@lizrice)

K3s (@tracypholmes)

Kind (@thebsdbox)

RKE (@raphink)

AKS BYOCNI (@jspaleta)

AKS Azure IPAM (@tommyp1ckles)

Openshift (@cmluciano)

@christarazi christarazi added area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. release-blocker/1.13 This issue will prevent the release of the next version of Cilium. labels Jan 11, 2023
@thebsdbox
Copy link
Contributor

Kind Quick Install - 163 seconds

Terminal Output

dan@kind:~$ time -p sh -c 'kind create cluster --config=kind-config.yaml; cilium install --version v1.13.0-rc4; cilium status --wait'
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.25.3) 🖼
 ✓ Preparing nodes 📦 📦 📦 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing StorageClass 💾
 ✓ Joining worker nodes 🚜
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a nice day! 👋
🔮 Auto-detected Kubernetes kind: kind
✨ Running "kind" validation checks
✅ Detected kind version "0.17.0"
ℹ️  Using Cilium version 1.13.0-rc4
🔮 Auto-detected cluster name: kind-kind
🔮 Auto-detected datapath mode: tunnel
🔮 Auto-detected kube-proxy has been installed
ℹ️  helm template --namespace kube-system cilium cilium/cilium --version 1.13.0-rc4 --set cluster.id=0,cluster.name=kind-kind,encryption.nodeEncryption=false,ipam.mode=kubernetes,kubeProxyReplacement=disabled,operator.replicas=1,serviceAccounts.cilium.name=cilium,serviceAccounts.operator.name=cilium-operator,tunnel=vxlan
ℹ️  Storing helm values file in kube-system/cilium-cli-helm-values Secret
🔑 Created CA in secret cilium-ca
🔑 Generating certificates for Hubble...
🚀 Creating Service accounts...
🚀 Creating Cluster roles...
🚀 Creating ConfigMap for Cilium version 1.13.0-rc4...
🚀 Creating Agent DaemonSet...
🚀 Creating Operator Deployment...
⌛ Waiting for Cilium to be installed and ready...
✅ Cilium was successfully installed! Run 'cilium status' to view installation health
    /¯¯\
 /¯¯\__/¯¯\    Cilium:         OK
 \__/¯¯\__/    Operator:       OK
 /¯¯\__/¯¯\    Hubble:         disabled
 \__/¯¯\__/    ClusterMesh:    disabled
    \__/

DaemonSet         cilium             Desired: 4, Ready: 4/4, Available: 4/4
Deployment        cilium-operator    Desired: 1, Ready: 1/1, Available: 1/1
Containers:       cilium             Running: 4
                  cilium-operator    Running: 1
Cluster Pods:     3/3 managed by Cilium
Image versions    cilium             quay.io/cilium/cilium:v1.13.0-rc4@sha256:32acd47fd9bea9c0045222ba5d27f5fe9ad06dabd572a80b870b1f0e68c0e928: 4
                  cilium-operator    quay.io/cilium/operator-generic:v1.13.0-rc4@sha256:19f612d4f1052e26edf33e26f60d64d8fb6caed9f03692b85b429a4ef5d175b2: 1
real 163.30
user 4.05
sys 1.70

@thebsdbox
Copy link
Contributor

Kind HTTP Policy - 4 mins
(URL is wrong in the issue, https://docs.cilium.io/en/v1.13.0-rc4/security/http/#identity-aware-and-http-aware-policy-enforcement)

Terminal Output

dan@kind:~$ date
Thu Jan 12 13:52:20 UTC 2023
dan@kind:~$ kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.13.0-rc4/examples/minikube/http-sw-app.yaml
service/deathstar created
deployment.apps/deathstar created
pod/tiefighter created
pod/xwing created
dan@kind:~$ kubectl get pods,svc
NAME                             READY   STATUS              RESTARTS   AGE
pod/deathstar-54bb8475cc-qg9s9   0/1     ContainerCreating   0          8s
pod/deathstar-54bb8475cc-wkr5m   1/1     Running             0          8s
pod/tiefighter                   1/1     Running             0          8s
pod/xwing                        0/1     ContainerCreating   0          8s

NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
service/deathstar    ClusterIP   10.96.24.133   <none>        80/TCP    8s
service/kubernetes   ClusterIP   10.96.0.1      <none>        443/TCP   8m12s
dan@kind:~$ kubectl -n kube-system get pods -l k8s-app=cilium
kubectl -n kube-system exec cilium-1c2cz -- cilium endpoint list
NAME           READY   STATUS    RESTARTS   AGE
cilium-6vrsz   1/1     Running   0          8m3s
cilium-nxk7b   1/1     Running   0          8m3s
cilium-t8bv7   1/1     Running   0          8m3s
cilium-xbp2m   1/1     Running   0          8m3s
Error from server (NotFound): pods "cilium-1c2cz" not found
dan@kind:~$ kubectl -n kube-system exec cilium-6vrsz -- cilium endpoint list
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                   IPv6   IPv4           STATUS
           ENFORCEMENT        ENFORCEMENT
387        Disabled           Disabled          1          k8s:node-role.kubernetes.io/control-plane                                           ready
                                                           k8s:node.kubernetes.io/exclude-from-external-load-balancers
                                                           reserved:host
1409       Disabled           Disabled          4          reserved:health                                                      10.244.0.230   ready
dan@kind:~$ $ kubectl exec xwing -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
-bash: $: command not found
dan@kind:~$  kubectl exec xwing -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
Ship landed
dan@kind:~$ kubectl exec tiefighter -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
Ship landed
dan@kind:~$ kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.13.0-rc4/examples/minikube/sw_l3_l4_policy.yaml
ciliumnetworkpolicy.cilium.io/rule1 created
dan@kind:~$ kubectl exec tiefighter -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
Ship landed
dan@kind:~$ kubectl exec xwing -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing


^C
dan@kind:~$
dan@kind:~$ kubectl -n kube-system exec cilium-6vrsz -- cilium endpoint list
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                   IPv6   IPv4           STATUS
           ENFORCEMENT        ENFORCEMENT
387        Disabled           Disabled          1          k8s:node-role.kubernetes.io/control-plane                                           ready
                                                           k8s:node.kubernetes.io/exclude-from-external-load-balancers
                                                           reserved:host
1409       Disabled           Disabled          4          reserved:health                                                      10.244.0.230   ready
dan@kind:~$ kubectl -n kube-system exec cilium-nxk7b -- cilium endpoint list
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                              IPv6   IPv4           STATUS
           ENFORCEMENT        ENFORCEMENT
617        Disabled           Disabled          1          reserved:host                                                                                  ready
692        Disabled           Disabled          2986       k8s:app.kubernetes.io/name=xwing                                                10.244.1.47    ready
                                                           k8s:class=xwing
                                                           k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default
                                                           k8s:io.cilium.k8s.policy.cluster=kind-kind
                                                           k8s:io.cilium.k8s.policy.serviceaccount=default
                                                           k8s:io.kubernetes.pod.namespace=default
                                                           k8s:org=alliance
1250       Enabled            Disabled          17862      k8s:app.kubernetes.io/name=deathstar                                            10.244.1.165   ready
                                                           k8s:class=deathstar
                                                           k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default
                                                           k8s:io.cilium.k8s.policy.cluster=kind-kind
                                                           k8s:io.cilium.k8s.policy.serviceaccount=default
                                                           k8s:io.kubernetes.pod.namespace=default
                                                           k8s:org=empire
1474       Disabled           Disabled          4          reserved:health                                                                 10.244.1.188   ready
dan@kind:~$ kubectl exec tiefighter -- curl -s -XPUT deathstar.default.svc.cluster.local/v1/exhaust-port
Panic: deathstar exploded

goroutine 1 [running]:
main.HandleGarbage(0x2080c3f50, 0x2, 0x4, 0x425c0, 0x5, 0xa)
        /code/src/github.com/empire/deathstar/
        temp/main.go:9 +0x64
main.main()
        /code/src/github.com/empire/deathstar/
        temp/main.go:5 +0x85
dan@kind:~$ kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/v1.13.0-rc4/examples/minikube/sw_l3_l4_l7_policy.yaml
Warning: resource ciliumnetworkpolicies/rule1 is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
ciliumnetworkpolicy.cilium.io/rule1 configured
dan@kind:~$ kubectl exec tiefighter -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing
Ship landed
dan@kind:~$ kubectl exec tiefighter -- curl -s -XPUT deathstar.default.svc.cluster.local/v1/exhaust-port
Access denied
dan@kind:~$ kubectl exec xwing -- curl -s -XPOST deathstar.default.svc.cluster.local/v1/request-landing


^C
dan@kind:~$ date
Thu Jan 12 13:56:11 UTC 2023
dan@kind:~$

@tracypholmes
Copy link

Will grab the K3s tasks and update once completed.

@lizrice
Copy link
Member

lizrice commented Jan 13, 2023

EKS Quick Install - under 3 minutes once the cluster is created (which takes a lifetime)

EKS IPSec 😭

  1. URL is wrong in the issue, I am following this page

  2. Can't run cilium install as described in Enable Encryption In Cilium because it's already installed. Needs a note to say to uninstall existing installation first?

Terminal Output
temp ❯ cilium install --encryption ipsec
🔮 Auto-detected Kubernetes kind: EKS
ℹ️  Using Cilium version 1.12.5
🔮 Auto-detected cluster name: liz-25894-eu-west-1-eksctl-io
🔮 Auto-detected datapath mode: aws-eni
🔮 Auto-detected kube-proxy has been installed
ℹ️  helm template --namespace kube-system cilium cilium/cilium --version 1.12.5 --set cluster.id=0,cluster.name=liz-25894-eu-west-1-eksctl-io,egressMasqueradeInterfaces=eth0,encryption.enabled=true,encryption.nodeEncryption=false,encryption.type=ipsec,eni.enabled=true,ipam.mode=eni,kubeProxyReplacement=disabled,nodeinit.enabled=true,operator.replicas=1,serviceAccounts.cilium.name=cilium,serviceAccounts.operator.name=cilium-operator,tunnel=disabled
ℹ️  Storing helm values file in kube-system/cilium-cli-helm-values Secret
🚀 Creating ConfigMap for Cilium version 1.12.5...
🔥 Patching the "aws-node" DaemonSet to evict its pods...
🔑 Found CA in secret cilium-ca
🔑 Generating certificates for Hubble...
↩️ Rolling back installation...

Error: Unable to install Cilium: unable to create secret kube-system/hubble-server-certs: secrets "hubble-server-certs" already exists
  1. IPsec appears to be enabled, but following the guide does not show traffic being encrypted. From the Troubleshooting section, KVstore connectivity may be the culprit? It's showing as "KVStore Ok Disabled" - if that's what it is, the guide needs to say how to set this up.
Status
root@ip-192-168-171-21:/home/cilium# cilium status
KVStore:                 Ok   Disabled
Kubernetes:              Ok   1.24+ (v1.24.8-eks-ffeb93d) [linux/amd64]
Kubernetes APIs:         ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:    Disabled
Host firewall:           Disabled
CNI Chaining:            none
Cilium:                  Ok   1.12.5 (v1.12.5-701acde)
NodeMonitor:             Listening for events on 2 CPUs with 64x4096 of shared memory
Cilium health daemon:    Ok
IPAM:                    IPv4: 4/18 allocated,
BandwidthManager:        Disabled
Host Routing:            Legacy
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       29/29 healthy
Proxy Status:            OK, ip 192.168.189.109, 0 redirects active on ports 10000-20000
Global Identity Range:   min 256, max 65535
Hubble:                  Ok   Current/Max Flows: 4095/4095 (100.00%), Flows/s: 56.82   Metrics: Disabled
Encryption:              IPsec
Cluster health:          2/2 reachable   (2023-01-13T18:49:43Z)
tcpdump not showing encrypted traffic
root@ip-192-168-171-21:/home/cilium# tcpdump -n -i eth0 esp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
[...nothing...]

but lots of traffic shown if esp not specified

root@ip-192-168-171-21:/home/cilium# tcpdump -n -i eth0
...
19:24:25.723022 IP 192.168.171.21.10250 > 192.168.175.207.56308: Flags [P.], seq 188911:189610, ack 53, win 473, options [nop,nop,TS val 3916837314 ecr 3000040077], length 699
19:24:25.723099 IP 192.168.175.207.56308 > 192.168.171.21.10250: Flags [.], ack 188911, win 2985, options [nop,nop,TS val 3000040077 ecr 3916837314], length 0
19:24:25.723193 IP 192.168.171.21.10250 > 192.168.175.207.56308: Flags [P.], seq 189610:189994, ack 53, win 473, options [nop,nop,TS val 3916837314 ecr 3000040077], length 384
19:24:25.723316 IP 192.168.175.207.56308 > 192.168.171.21.10250: Flags [.], ack 189994, win 2985, options [nop,nop,TS val 3000040078 ecr 3916837314], length 0
19:24:25.723356 IP 192.168.171.21.10250 > 192.168.175.207.56308: Flags [P.], seq 189994:190361, ack 53, win 473, options [nop,nop,TS val 3916837314 ecr 3000040078], length 367
19:24:25.723535 IP 192.168.171.21.10250 > 192.168.175.207.56308: Flags [P.], seq 190361:190391, ack 53, win 473, options [nop,nop,TS val 3916837314 ecr 3000040078], length 30
19:24:25.723574 IP 192.168.171.21.10250 > 192.168.175.207.56308: Flags [P.], seq 190391:190750, ack 53, win 473, options [nop,nop,TS val 3916837315 ecr 3000040078], length 359
19:24:25.723658 IP 192.168.175.207.56308 > 192.168.171.21.10250: Flags [.], ack 190391, win 2985, options [nop,nop,TS val 3000040078 ecr 3916837314], length 0
19:24:25.723765 IP 192.168.171.21.10250 > 192.168.175.207.56308: Flags [P.], seq 190750:190780, ack 53, win 473, options [nop,nop,TS val 3916837315 ecr 3000040078], length 30
19:24:25.723813 IP 192.168.171.21.10250 > 192.168.175.207.56308: Flags [P.], seq 190780:191155, ack 53, win 473, options [nop,nop,TS val 3916837315 ecr 3000040078], length 375
19:24:25.723936 IP 192.168.171.21.10250 > 192.168.175.207.56308: Flags [P.], seq 191155:191698, ack 53, win 473, options [nop,nop,TS val 3916837315 ecr 3000040078], length 543
19:24:25.723992 IP 192.168.175.207.56308 > 192.168.171.21.10250: Flags [P.], seq 53:76, ack 191155, win 2983, options [nop,nop,TS val 3000040078 ecr 3916837315], length 23
^C
1050 packets captured
1050 packets received by filter
0 packets dropped by kernel
  1. Under Validate the Setup the tcpdump command doesn't seem to match the output?

tcpdump -n -i eth0 esp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cilium_vxlan, link-type EN10MB (Ethernet), capture size 262144 bytes

@pchaigno
Copy link
Member

Taking a look at the IPsec guide... cc @lizrice

URL is wrong in the issue, I am following this page

FYI, note you tested v1.12 and not v1.13-rc4. As per Chris's OP:

For quick install guides, ensure to pass the version argument to the Cilium CLI to install the correct version.

The issues you found probably also affect v1.13 though.


2. Can't run cilium install as described in Enable Encryption In Cilium because it's already installed. Needs a note to say to uninstall existing installation first?

We typically assume Cilium is uninstalled at the beginning of guide. If we changed that guide to assume something else, we would need to change a lot of guides. Improving the CLI to not fail in this case is tracked at cilium/cilium-cli#205.


From the Troubleshooting section, KVstore connectivity may be the culprit? It's showing as "KVStore Ok Disabled" - if that's what it is, the guide needs to say how to set this up.

I don't think that matters (anymore?). I've sent #23135 to update the guide.

Under Validate the Setup the tcpdump command doesn't seem to match the output?

#23135 will fix that as well.


IPsec appears to be enabled, but following the guide does not show traffic being encrypted.

I was able to reproduce that. I had to wait a while before the tcpdump command showed anything. This seems to be caused by the buffer on stdout. I've also updated the guide in #23135 to account for that.

@lizrice
Copy link
Member

lizrice commented Jan 17, 2023

FYI, note you tested v1.12 and not v1.13-rc4

Ha, you're right, I forgot the argument when I had to uninstall and reinstall, sorry about that. Did it again, much improved by being able to run cilium encrypt status per the updates in #23135, thanks!

@jspaleta
Copy link
Contributor

The provided links for AKS BYONCI for "Cluster Mesh" and "Service Affinity" lead to 404.

@joestringer
Copy link
Member

@jspaleta you can find the clustermesh and service affinity pages from the new multi-cluster networking section: https://docs.cilium.io/en/v1.13.0-rc4/network/clustermesh/

@jspaleta
Copy link
Contributor

AKS BYOCNI worksforme.

I recorded the validation testing live: https://www.twitch.tv/videos/1710361420

One connectivity test error that cleaned itself up on re-run.

@christarazi
Copy link
Member Author

christarazi commented Jan 17, 2023

Apologies for all the 404s. I quickly created this issue without going through and checking the links. I assumed the docs structure didn't change much, but that was not true 🙂. They should all be fixed now.

@jspaleta
Copy link
Contributor

As I'm running through these to validate for correction, if I come up with an enhancement idea should I just file those as new issues?

@jspaleta
Copy link
Contributor

AKS BYOCNI quick start states default detapath will be Encapsulation

But checking cilium config view shows tunnel disabled

If I'm understanding this correctly config view is tell me cilium is in native routing datapath

https://docs.cilium.io/en/v1.13.0-rc4/network/concepts/routing/#native-routing

Also this impacts steps needed to get ServiceMesh running on AKS as encap/native have slightly different preflight steps that needs to be documented.

So is this a documentation bug, or should cilium install really default to tunnel: enabled on AK BYOCNI?

@joestringer
Copy link
Member

joestringer commented Jan 18, 2023

@jspaleta good spotting. Can you share the output from the Cilium install steps? This looks like a discrepancy between the docs and the behaviour of cilium-cli.

EDIT: At a glance, it looks like the BYOCNI instructions for helm (on this page) were updated the same way as the quick install, but cilium-cli was not changed to enable the byocni options by default. We may need to revisit the way that cilium-cli behaves when attempting to install fresh into an AKS environment.

@lizrice
Copy link
Member

lizrice commented Jan 18, 2023

GKE Quick Install - just under 30 min (of which cluster install ~6 min, connectivity test ~17min)

GKE Egress Gateway works fine, but I'm adding some troubleshooting tips to the guide to help anyone who mis-types a label like I did

@jspaleta
Copy link
Contributor

@joestringer
Disregard the last comment. I had created a 2nd cluster without --no-plugin when using the cost reductions for the isovalent account and forgot to add --no-plugin.

I reran the quickstart instructions and confirmed config view now shows tunnel: vxlan and ipam:cluster-pool.

But now I wonder, can we enhance the docs with a breadcrumb to help a user verify AKS cluster was created with adequate settings prior to cilium cli install? Maybe a cilium cli preflight mode that does the investigation and reports back what it would do for install settings? That sort of mode could probe the AKS setup and tell you if its configured correctly for byocni or azure-ipam without attempting the install. hmmm.

@christarazi
Copy link
Member Author

christarazi commented Jan 18, 2023

@jspaleta In the installation output, the CLI will say what sort of datapath mode it's configuring Cilium for. We could improve the docs to say to check for that specific line in the output.

The Azure autodetection sort of logic already exists in the Cilium CLI installation: https://github.com/cilium/cilium-cli/blob/03f744ff360e46030509904f89d7e4ffe3ac036f/install/autodetect.go#L84

@jspaleta
Copy link
Contributor

Ran through the AKS Cluster Mesh specific instructions.

I got one error running the connectivity tests at the tail end of the instructions. The no-polices test has several failures involving its sub-tests.

I re-ran the no-polices test by itself, and still got a failure.

$ cilium connectivity test --context $CLUSTER1 --multi-cluster $CLUSTER2 --test "no-policies" --verbose
Monitor aggregation detected, will skip some flow validation steps
⌛ [jspaleta-22766] Waiting for deployments [client client2 echo-same-node] to become ready...
⌛ [jspaleta-26504] Waiting for deployments [echo-other-node] to become ready...
⌛ [jspaleta-22766] Waiting for CiliumEndpoint for pod cilium-test/client-755fb678bd-gnb2z to appear...
⌛ [jspaleta-22766] Waiting for CiliumEndpoint for pod cilium-test/client2-5b97d7bc66-k9mmt to appear...
⌛ [jspaleta-22766] Waiting for pod cilium-test/client-755fb678bd-gnb2z to reach DNS server on cilium-test/echo-same-node-64774c64d5-g9dxz pod...
⌛ [jspaleta-22766] Waiting for pod cilium-test/client2-5b97d7bc66-k9mmt to reach DNS server on cilium-test/echo-same-node-64774c64d5-g9dxz pod...
⌛ [jspaleta-22766] Waiting for pod cilium-test/client-755fb678bd-gnb2z to reach DNS server on cilium-test/echo-other-node-67b74b6685-9z59h pod...
⌛ [jspaleta-22766] Waiting for pod cilium-test/client2-5b97d7bc66-k9mmt to reach DNS server on cilium-test/echo-other-node-67b74b6685-9z59h pod...
⌛ [jspaleta-22766] Waiting for pod cilium-test/client-755fb678bd-gnb2z to reach default/kubernetes service...
⌛ [jspaleta-22766] Waiting for pod cilium-test/client2-5b97d7bc66-k9mmt to reach default/kubernetes service...
⌛ [jspaleta-22766] Waiting for CiliumEndpoint for pod cilium-test/echo-same-node-64774c64d5-g9dxz to appear...
⌛ [jspaleta-26504] Waiting for CiliumEndpoint for pod cilium-test/echo-other-node-67b74b6685-9z59h to appear...
⌛ [jspaleta-22766] Waiting for Service cilium-test/echo-other-node to become ready...
⌛ [jspaleta-22766] Waiting for Service cilium-test/echo-same-node to become ready...
ℹ️  Skipping IPCache check
🔭 Enabling Hubble telescope...
⚠️  Unable to contact Hubble Relay, disabling Hubble telescope and flow validation: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp [::1]:4245: connect: connection refused"
ℹ️  Expose Relay locally with:
   cilium hubble enable
   cilium hubble port-forward&
ℹ️  Cilium version: 1.13.0
🏃 Running tests...

...

Test Report
❌ 1/1 tests failed (18/48 actions), 31 tests skipped, 1 scenarios skipped:
Test [no-policies]:
  ❌ no-policies/pod-to-remote-nodeport/curl-0: cilium-test/client-755fb678bd-gnb2z (10.10.2.161) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-1: cilium-test/client-755fb678bd-gnb2z (10.10.2.161) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-2: cilium-test/client-755fb678bd-gnb2z (10.10.2.161) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-3: cilium-test/client-755fb678bd-gnb2z (10.10.2.161) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-4: cilium-test/client-755fb678bd-gnb2z (10.10.2.161) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-7: cilium-test/client-755fb678bd-gnb2z (10.10.2.161) -> cilium-test/echo-same-node (echo-same-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-8: cilium-test/client-755fb678bd-gnb2z (10.10.2.161) -> cilium-test/echo-same-node (echo-same-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-9: cilium-test/client-755fb678bd-gnb2z (10.10.2.161) -> cilium-test/echo-same-node (echo-same-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-12: cilium-test/client2-5b97d7bc66-k9mmt (10.10.2.54) -> cilium-test/echo-same-node (echo-same-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-13: cilium-test/client2-5b97d7bc66-k9mmt (10.10.2.54) -> cilium-test/echo-same-node (echo-same-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-14: cilium-test/client2-5b97d7bc66-k9mmt (10.10.2.54) -> cilium-test/echo-same-node (echo-same-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-15: cilium-test/client2-5b97d7bc66-k9mmt (10.10.2.54) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-16: cilium-test/client2-5b97d7bc66-k9mmt (10.10.2.54) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-17: cilium-test/client2-5b97d7bc66-k9mmt (10.10.2.54) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-18: cilium-test/client2-5b97d7bc66-k9mmt (10.10.2.54) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies/pod-to-remote-nodeport/curl-19: cilium-test/client2-5b97d7bc66-k9mmt (10.10.2.54) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies/pod-to-local-nodeport/curl-1: cilium-test/client-755fb678bd-gnb2z (10.10.2.161) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies/pod-to-local-nodeport/curl-2: cilium-test/client2-5b97d7bc66-k9mmt (10.10.2.54) -> cilium-test/echo-other-node (echo-other-node:8080)

@sayboras
Copy link
Member

Enable datapath-only mTLS support and try it out.

The feature seems working as expected, please find below the details of testing.

Create workload (using clustermesh example workload)
$ kg services
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP   61m
rebel-base   ClusterIP   10.107.228.37   <none>        80/TCP    59m

$ kg services
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP   61m
rebel-base   ClusterIP   10.107.228.37   <none>        80/TCP    59m
Create CNP policy without auth configuration
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "auth-policy"
spec:
  endpointSelector:
    matchLabels:
      name: x-wing
  egress:
  - toEndpoints:
    - matchLabels:
        name: rebel-base

Send the request from one pod to another

$ kubectl exec -ti deployment/x-wing -- curl 10.107.228.37
{"Galaxy": "Alderaan", "Cluster": "Cluster-1"}

# No Authentication log found
$ ksyslo ds/cilium | grep -is "Authentication type"
^C
Create CNP policy auth configuration (e.g. null)
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "auth-policy"
spec:
  endpointSelector:
    matchLabels:
      name: x-wing
  egress:
  - toEndpoints:
    - matchLabels:
        name: rebel-base
    auth:
      type: "null"

Send the request from one pod to another

$ kubectl exec -ti deployment/x-wing -- curl 10.107.228.37
{"Galaxy": "Alderaan", "Cluster": "Cluster-1"}

# Authentication log found, allowed by default right now
$ ksyslo ds/cilium | grep -is "Authentication type"
level=debug msg="policy: Authentication type null required for identity 28335->764, tcp 10.244.0.234:38332->10.244.0.222:80" subsys=auth

@jspaleta
Copy link
Contributor

jspaleta commented Jan 23, 2023

A little more color on the connectivity errors I'm seeing. I think there is a missing Azure specific instruction at https://docs.cilium.io/en/v1.13.0-rc4/network/clustermesh/aks-clustermesh-prep/, But I'm not sure what we need to add.

Just noodling around with the peering create options.. and I think we need to enable gateway transit on the peering to get pod-to-local-nodeport tests passing.
I'm going to file a separate docs issue for this, as it appears to also impact v1.12.5
Added an issue for the test failure: #23266

@nbusseneau
Copy link
Member

EDIT: At a glance, it looks like the BYOCNI instructions for helm (on this page) were updated the same way as the quick install, but cilium-cli was not changed to enable the byocni options by default. We may need to revisit the way that cilium-cli behaves when attempting to install fresh into an AKS environment.

The CLI autodetects the right configuration to use based on the actual cluster it's working against, trying to keep close with the CLI design idea that for users it's a transparent experience no matter the cluster on any platform, hence why there are no explicit options documented to be passed ;)

@nbusseneau
Copy link
Member

FWIW I'm proposing a refactor of AKS installation instructions in #23304.

@jspaleta
Copy link
Contributor

FWIW I'm proposing a refactor of AKS installation instructions in #23304.

I'll run through the refactor to see if I see something different. But heads up.. right now following the byonci aks cluster mesh instructions I'm hitting some rough edges.
With kube-proxy enabled in aks(the default) i have nodeport connectivity tests failing (both cilium v1.12.4 and v1.13.0-rc4).
with kp disabled in aks (using cilium kpr, attempted as a workaround for nodeport test failures) I'm hitting different pod-to-pod connectivity test failures (just tested v1.12.5 so far.. need to check prerelease).

There's a bug or two lurking here with AKS.

@jspaleta
Copy link
Contributor

Good News!
I was able to validate the rebel-base/x-wing service affinity documentation using my BYOCNI AKS cluster mesh setup[1]

  1. Note: AKS with kp disabled and using cilium kp replacement to avoid the nodeport connectivity test failures.

@jspaleta
Copy link
Contributor

The connectivity test failures for AKS BYOCNI are now understood and are documentation/cli tool bugs not-specific to the AKS guidance. With that, I can confidently say the AKS BYOCNI guides are validated.

@nbusseneau
Copy link
Member

I'll run through the refactor to see if I see something different.

It should be OK as instructions are the same, just moved around in a different structure to encourage BYOCNI over Azure IPAM. A fresh read is more than welcome though, I'd like to get your opinion as a user :)

@tracypholmes
Copy link

tracypholmes commented Jan 26, 2023

k3s Quick Install - ~7-8 mins including cluster creation (timestamps below include connectivity test and reinstall of correct version)

Feedback

  1. in the QuickStart "Create A Cluster" section, k3s isn't even mentioned. k3s actually isn't mentioned until AFTER the "install the Cilium CLI" section.
  2. In the "Install Cilium" section, you have instructions that the person would need when spinning up the cluster (pretty much everything in the "Requirements Section" of the "Cilium Install" portion).
  3. The proper needed k3s instructions live in the "Advanced Installation" section of the docs - specifically the "Installation with K8s distributions" section.
  4. The Installation using K3s section provides more complete instructions. Although a couple of steps are in a slightly different order (cluster access and Cilium install are reversed in this section.)

Note: In all cases, you have the user export the KUBECONFIG, however the env var hasn't been configured with the proper permissions (ex. 644), so any attempt to access the cluster will error out with permission denied. ref: [1] [2]

Terminal Output

$ date
Thu Jan 26 07:07:34 UTC 2023

$ curl -sfL https://get.k3s.io | K3S_KUBECONFIG_MODE="644" INSTALL_K3S_EXEC=' --flannel-backend=none --disable-network-policy' sh -
[sudo] password for :
[INFO]  Finding release for channel stable
[INFO]  Using v1.25.5+k3s2 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.25.5+k3s2/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.25.5+k3s2/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s


$ export KUBECONFIG=/etc/rancher/k3s/k3s.yaml

$ CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/master/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 25.4M  100 25.4M    0     0  8084k      0  0:00:03  0:00:03 --:--:-- 10.4M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100    92  100    92    0     0    473      0 --:--:-- --:--:-- --:--:--   473
cilium-linux-amd64.tar.gz: OK
cilium

$ cilium install --version v1.13.0-rc4
🔮 Auto-detected Kubernetes kind: K3s
ℹ️  Using Cilium version 1.13.0-rc4
🔮 Auto-detected cluster name: default
🔮 Auto-detected datapath mode: tunnel
⚠️ Unable to list kubernetes api resources, try --api-versions if needed: %!w(*fmt.wrapError=&{failed to list api resources: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request 0xc00062a008})
ℹ️  helm template --namespace kube-system cilium cilium/cilium --version 1.13.0-rc4 --set cluster.id=0,cluster.name=default,encryption.nodeEncryption=false,kubeProxyReplacement=disabled,operator.replicas=1,serviceAccounts.cilium.name=cilium,serviceAccounts.operator.name=cilium-operator,tunnel=vxlan
ℹ️  Storing helm values file in kube-system/cilium-cli-helm-values Secret
🔑 Created CA in secret cilium-ca
🔑 Generating certificates for Hubble...
🚀 Creating Service accounts...
🚀 Creating Cluster roles...
🚀 Creating ConfigMap for Cilium version 1.13.0-rc4...
🚀 Creating Agent DaemonSet...
🚀 Creating Operator Deployment...
⌛ Waiting for Cilium to be installed and ready...
✅ Cilium was successfully installed! Run 'cilium status' to view installation health

$ cilium status --wait
    /¯¯\
 /¯¯\__/¯¯\    Cilium:         OK
 \__/¯¯\__/    Operator:       OK
 /¯¯\__/¯¯\    Hubble:         disabled
 \__/¯¯\__/    ClusterMesh:    disabled
    \__/

DaemonSet         cilium             Desired: 1, Ready: 1/1, Available: 1/1
Deployment        cilium-operator    Desired: 1, Ready: 1/1, Available: 1/1
Containers:       cilium             Running: 1
                  cilium-operator    Running: 1
Cluster Pods:     5/5 managed by Cilium
Image versions    cilium             quay.io/cilium/cilium:v1.13.0-rc4@sha256:32acd47fd9bea9c0045222ba5d27f5fe9ad06dabd572a80b870b1f0e68c0e928: 1
                  cilium-operator    quay.io/cilium/operator-generic:v1.13.0-rc4@sha256:19f612d4f1052e26edf33e26f60d64d8fb6caed9f03692b85b429a4ef5d175b2: 1

$ date
Thu Jan 26 07:18:29 UTC 2023

@tommyp1ckles
Copy link
Contributor

This probably relates to many platforms, but the step cilium install step is a bit misleading, it currently installs v1.12 by default (as expected) but in the context of a versioned doc this behaviour will change over time.

Should we always have the cilium install --version=xxx for each version to avoid this?

@tommyp1ckles
Copy link
Contributor

tommyp1ckles commented Jan 26, 2023

Feedback

  • Cilium install doesn't specify version, so this only works on the "latest" page. We ought to always specify the docs cilium version (note: for the purposes of this review, I added the version flag with 1.13-rc4)
  • CLI install commands get mangled curl/copy-paste, easy fix by moving generating the URL into a variable and then passing that to curl.

Time Taken
Total: ~20 minutes
Without connectivity test: ~10 minutes

Terminal Output
~/Desktop/tmp » date                                                                                                                         tom@Toms-MacBook-Pro
Thu 26 Jan 2023 12:05:43 PST
------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/Desktop/tmp » #!/bin/bash                                                                                                                  tom@Toms-MacBook-Pro

export NAME="$(whoami)-$RANDOM"
export AZURE_RESOURCE_GROUP="${NAME}-group"
az group create --name "${AZURE_RESOURCE_GROUP}" -l westus2

# Create AKS cluster
az aks create \
  --resource-group "${AZURE_RESOURCE_GROUP}" \
  --name "${NAME}" \
  --network-plugin azure \
  --node-count 2

# Get the credentials to access the cluster with kubectl
az aks get-credentials --resource-group "${AZURE_RESOURCE_GROUP}" --name "${NAME}"


{
  "id": "/subscriptions/22716d91-fb67-4a07-ac5f-d36ea49d6167/resourceGroups/tom-21889-group",
  "location": "westus2",
  "managedBy": null,
  "name": "tom-21889-group",
  "properties": {
    "provisioningState": "Succeeded"
  },
  "tags": null,
  "type": "Microsoft.Resources/resourceGroups"
}
The behavior of this command has been altered by the following extension: aks-preview
{
  "aadProfile": null,
  "addonProfiles": null,
  "agentPoolProfiles": [
    {
      "availabilityZones": null,
      "capacityReservationGroupId": null,
      "count": 2,
      "creationData": null,
      "currentOrchestratorVersion": "1.24.6",
      "enableAutoScaling": false,
      "enableCustomCaTrust": false,
      "enableEncryptionAtHost": false,
      "enableFips": false,
      "enableNodePublicIp": false,
      "enableUltraSsd": false,
      "gpuInstanceProfile": null,
      "hostGroupId": null,
      "kubeletConfig": null,
      "kubeletDiskType": "OS",
      "linuxOsConfig": null,
      "maxCount": null,
      "maxPods": 30,
      "messageOfTheDay": null,
      "minCount": null,
      "mode": "System",
      "name": "nodepool1",
      "nodeImageVersion": "AKSUbuntu-1804gen2containerd-2023.01.10",
      "nodeLabels": null,
      "nodePublicIpPrefixId": null,
      "nodeTaints": null,
      "orchestratorVersion": "1.24.6",
      "osDiskSizeGb": 128,
      "osDiskType": "Managed",
      "osSku": "Ubuntu",
      "osType": "Linux",
      "podSubnetId": null,
      "powerState": {
        "code": "Running"
      },
      "provisioningState": "Succeeded",
      "proximityPlacementGroupId": null,
      "scaleDownMode": null,
      "scaleSetEvictionPolicy": null,
      "scaleSetPriority": null,
      "spotMaxPrice": null,
      "tags": null,
      "type": "VirtualMachineScaleSets",
      "upgradeSettings": {
        "maxSurge": null
      },
      "vmSize": "Standard_DS2_v2",
      "vnetSubnetId": null,
      "windowsProfile": null,
      "workloadRuntime": "OCIContainer"
    }
  ],
  "apiServerAccessProfile": null,
  "autoScalerProfile": null,
  "autoUpgradeProfile": null,
  "azureMonitorProfile": null,
  "azurePortalFqdn": "tom-21889-tom-21889-group-22716d-8cc9f018.portal.hcp.westus2.azmk8s.io",
  "creationData": null,
  "currentKubernetesVersion": "1.24.6",
  "disableLocalAccounts": false,
  "diskEncryptionSetId": null,
  "dnsPrefix": "tom-21889-tom-21889-group-22716d",
  "enableNamespaceResources": null,
  "enablePodSecurityPolicy": false,
  "enableRbac": true,
  "extendedLocation": null,
  "fqdn": "tom-21889-tom-21889-group-22716d-8cc9f018.hcp.westus2.azmk8s.io",
  "fqdnSubdomain": null,
  "guardrailsProfile": null,
  "httpProxyConfig": null,
  "id": "/subscriptions/22716d91-fb67-4a07-ac5f-d36ea49d6167/resourcegroups/tom-21889-group/providers/Microsoft.ContainerService/managedClusters/tom-21889",
  "identity": {
    "principalId": "6603011b-a288-4163-a17a-502da9ad1c5b",
    "tenantId": "625cda75-62dd-470e-a554-7313877ff03c",
    "type": "SystemAssigned",
    "userAssignedIdentities": null
  },
  "identityProfile": {
    "kubeletidentity": {
      "clientId": "dca23190-0091-4b2e-89c8-f2088a69c292",
      "objectId": "c6e5ee78-b203-41a4-b087-61c8a66e748a",
      "resourceId": "/subscriptions/22716d91-fb67-4a07-ac5f-d36ea49d6167/resourcegroups/MC_tom-21889-group_tom-21889_westus2/providers/Microsoft.ManagedIdentity/userAssignedIdentities/tom-21889-agentpool"
    }
  },
  "ingressProfile": null,
  "kubernetesVersion": "1.24.6",
  "linuxProfile": {
    "adminUsername": "azureuser",
    "ssh": {
      "publicKeys": [
        {
          "keyData": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDY0fElKst9A5z7gGUdWHLfGIJ9qEgU4wEeJQ/ysQHtHSYt6u3+dG+bd4IdpQWqzBEp6E691WLYXYFyZ82kSpUDQ/r93PVzIDBCujTjQVYIUeL8lK8MaRTkpkFN3hfJLPqNOr1Lzd3tQKQ96aVaEBAekUrTsUhrJemeLabjyJJntkb2AOOPLwK6kUpcDxHAeSYcEpo7sbHIlpR4RSsBv6VGkrQVlRvX3RRjRejh48sgta8dY2vYZC1kpK53EPK3WLLyxmD6oEJ2SQVZbms7fmY+5SQv1/RwHYqfP0CEONDi5CDd14aF41Iky0XVpaVnU4o5o6MND3uY29WVjV6hoJFuFwDDjBS7efqWAzrnSGBqLexkBJNk3WGoGBqzmnDedg6vaTcNc2cBdutSkU/2Wsoofpjfrkd3UzpuQYo6lh3iHiRZLTjxlFXMRMUzZPi4vxwUeNAQGJmQxLPltsnZmDjl7ZiwKzy4caBdv+NSDibjT+aY9LaEDEnICoiD5OV3sh8= tom@Toms-MacBook-Pro.local\n"
        }
      ]
    }
  },
  "location": "westus2",
  "maxAgentPools": 100,
  "name": "tom-21889",
  "networkProfile": {
    "dnsServiceIp": "10.0.0.10",
    "dockerBridgeCidr": "172.17.0.1/16",
    "ipFamilies": [
      "IPv4"
    ],
    "kubeProxyConfig": null,
    "loadBalancerProfile": {
      "allocatedOutboundPorts": null,
      "backendPoolType": "nodeIPConfiguration",
      "effectiveOutboundIPs": [
        {
          "id": "/subscriptions/22716d91-fb67-4a07-ac5f-d36ea49d6167/resourceGroups/MC_tom-21889-group_tom-21889_westus2/providers/Microsoft.Network/publicIPAddresses/fe5de8ba-cc6a-4f84-abc7-09661ec9b698",
          "resourceGroup": "MC_tom-21889-group_tom-21889_westus2"
        }
      ],
      "enableMultipleStandardLoadBalancers": null,
      "idleTimeoutInMinutes": null,
      "managedOutboundIPs": {
        "count": 1,
        "countIpv6": null
      },
      "outboundIPs": null,
      "outboundIpPrefixes": null
    },
    "loadBalancerSku": "Standard",
    "natGatewayProfile": null,
    "networkMode": null,
    "networkPlugin": "azure",
    "networkPluginMode": null,
    "networkPolicy": null,
    "outboundType": "loadBalancer",
    "podCidr": null,
    "podCidrs": null,
    "serviceCidr": "10.0.0.0/16",
    "serviceCidrs": [
      "10.0.0.0/16"
    ]
  },
  "nodeResourceGroup": "MC_tom-21889-group_tom-21889_westus2",
  "oidcIssuerProfile": {
    "enabled": false,
    "issuerUrl": null
  },
  "podIdentityProfile": null,
  "powerState": {
    "code": "Running"
  },
  "privateFqdn": null,
  "privateLinkResources": null,
  "provisioningState": "Succeeded",
  "publicNetworkAccess": null,
  "resourceGroup": "tom-21889-group",
  "securityProfile": {
    "azureKeyVaultKms": null,
    "defender": null,
    "imageCleaner": null,
    "nodeRestriction": null,
    "workloadIdentity": null
  },
  "servicePrincipalProfile": {
    "clientId": "msi",
    "secret": null
  },
  "sku": {
    "name": "Basic",
    "tier": "Free"
  },
  "storageProfile": {
    "blobCsiDriver": null,
    "diskCsiDriver": {
      "enabled": true,
      "version": "v1"
    },
    "fileCsiDriver": {
      "enabled": true
    },
    "snapshotController": {
      "enabled": true
    }
  },
  "systemData": null,
  "tags": null,
  "type": "Microsoft.ContainerService/ManagedClusters",
  "windowsProfile": {
    "adminPassword": null,
    "adminUsername": "azureuser",
    "enableCsiProxy": true,
    "gmsaProfile": null,
    "licenseType": null
  },
  "workloadAutoScalerProfile": {
    "keda": null,
    "verticalPodAutoscaler": null
  }
}
The behavior of this command has been altered by the following extension: aks-preview
Merged "tom-21889" as current context in /Users/tom/.kube/config
------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/Desktop/tmp » CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/master/stable.txt)                          tom@Toms-MacBook-Pro
CLI_ARCH=amd64
if [ "$(uname -m)" = "arm64" ]; then CLI_ARCH=arm64; fi
cli_addr=https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-darwin-${CLI_ARCH}.tar.gz{,.sha256sum}
curl -L --fail --remote-name-all $cli_addr
shasum -a 256 -c cilium-darwin-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-darwin-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-darwin-${CLI_ARCH}.tar.gz{,.sha256sum}


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 25.5M  100 25.5M    0     0  6003k      0  0:00:04  0:00:04 --:--:-- 7193k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100    93  100    93    0     0    240      0 --:--:-- --:--:-- --:--:--   240
cilium-darwin-arm64.tar.gz: OK
Password:
x cilium
------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/Desktop/tmp » cilium install --azure-resource-group "${AZURE_RESOURCE_GROUP}" --version=v1.13.0-rc4                                        tom@Toms-MacBook-Pro
🔮 Auto-detected Kubernetes kind: AKS
✨ Running "AKS" validation checks
✅ Detected az binary
ℹ️  Using Cilium version 1.13.0-rc4
🔮 Auto-detected cluster name: tom-21889
✅ Derived Azure subscription ID 22716d91-fb67-4a07-ac5f-d36ea49d6167 from subscription cilium-dev
✅ Derived Azure AKS node resource group MC_tom-21889-group_tom-21889_westus2 from resource group tom-21889-group
🔮 Auto-detected datapath mode: azure
🔮 Auto-detected kube-proxy has been installed
🚀 Creating Azure Service Principal for Cilium Azure operator...
✅ Created Azure Service Principal for Cilium Azure operator with App ID 2886d82a-0bac-432c-86c6-346eb774c3b1 and Tenant ID 625cda75-62dd-470e-a554-7313877ff03c
ℹ️  Its RBAC privileges are restricted to the AKS node resource group MC_tom-21889-group_tom-21889_westus2
ℹ️  helm template --namespace kube-system cilium cilium/cilium --version 1.13.0-rc4 --set azure.clientID=2886d82a-0bac-432c-86c6-346eb774c3b1,azure.clientSecret=ag68Q~PKyc_h1zedLiDpya06UnVV8CnjhcAEUbZv,azure.enabled=true,azure.resourceGroup=MC_tom-21889-group_tom-21889_westus2,azure.subscriptionID=22716d91-fb67-4a07-ac5f-d36ea49d6167,azure.tenantID=625cda75-62dd-470e-a554-7313877ff03c,bpf.masquerade=false,cluster.id=0,cluster.name=tom-21889,enableIPv4Masquerade=false,enableIPv6Masquerade=false,encryption.nodeEncryption=false,ipam.mode=azure,kubeProxyReplacement=disabled,nodeinit.enabled=true,operator.replicas=1,serviceAccounts.cilium.name=cilium,serviceAccounts.operator.name=cilium-operator,tunnel=disabled
ℹ️  Storing helm values file in kube-system/cilium-cli-helm-values Secret
🔑 Generated AKS secret cilium-azure
🔑 Created CA in secret cilium-ca
🔑 Generating certificates for Hubble...
🚀 Creating Service accounts...
🚀 Creating Cluster roles...
🚀 Creating ConfigMap for Cilium version 1.13.0-rc4...
🚀 Creating AKS Node Init DaemonSet...
🚀 Creating Agent DaemonSet...
🚀 Creating Operator Deployment...
⌛ Waiting for Cilium to be installed and ready...
♻️  Restarting unmanaged pods...
♻️  Restarted unmanaged pod kube-system/coredns-autoscaler-5655d66f64-87xdg
♻️  Restarted unmanaged pod kube-system/konnectivity-agent-6b8cfb864c-hqjt2
♻️  Restarted unmanaged pod kube-system/konnectivity-agent-6b8cfb864c-njxdh
♻️  Restarted unmanaged pod kube-system/metrics-server-8655f897d8-rkxks
♻️  Restarted unmanaged pod kube-system/metrics-server-8655f897d8-xqf5f
✅ Cilium was successfully installed! Run 'cilium status' to view installation health
------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/Desktop/tmp » cilium status --wait                                                                                                         tom@Toms-MacBook-Pro
    /¯¯\
 /¯¯\__/¯¯\    Cilium:         OK
 \__/¯¯\__/    Operator:       OK
 /¯¯\__/¯¯\    Hubble:         disabled
 \__/¯¯\__/    ClusterMesh:    disabled
    \__/

DaemonSet         cilium             Desired: 2, Ready: 2/2, Available: 2/2
Deployment        cilium-operator    Desired: 1, Ready: 1/1, Available: 1/1
Containers:       cilium             Running: 2
                  cilium-operator    Running: 1
Cluster Pods:     7/7 managed by Cilium
Image versions    cilium             quay.io/cilium/cilium:v1.13.0-rc4@sha256:32acd47fd9bea9c0045222ba5d27f5fe9ad06dabd572a80b870b1f0e68c0e928: 2
                  cilium-operator    quay.io/cilium/operator-azure:v1.13.0-rc4@sha256:105bccc4b486fd242f05c06e21e9928255906e2c6c5ace63c833c4d2a1371e0c: 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/Desktop/tmp » cilium connectivity test                                                                                                     tom@Toms-MacBook-Pro
ℹ️  Monitor aggregation detected, will skip some flow validation steps
✨ [tom-21889] Creating namespace cilium-test for connectivity check...
✨ [tom-21889] Deploying echo-same-node service...
✨ [tom-21889] Deploying DNS test server configmap...
✨ [tom-21889] Deploying same-node deployment...
✨ [tom-21889] Deploying client deployment...
✨ [tom-21889] Deploying client2 deployment...
✨ [tom-21889] Deploying echo-other-node service...
✨ [tom-21889] Deploying other-node deployment...
⌛ [tom-21889] Waiting for deployments [client client2 echo-same-node] to become ready...
⌛ [tom-21889] Waiting for deployments [echo-other-node] to become ready...
⌛ [tom-21889] Waiting for CiliumEndpoint for pod cilium-test/client-755fb678bd-rpjz9 to appear...
⌛ [tom-21889] Waiting for CiliumEndpoint for pod cilium-test/client2-5b97d7bc66-5gmn8 to appear...
⌛ [tom-21889] Waiting for pod cilium-test/client-755fb678bd-rpjz9 to reach DNS server on cilium-test/echo-same-node-64774c64d5-58z2n pod...
⌛ [tom-21889] Waiting for pod cilium-test/client2-5b97d7bc66-5gmn8 to reach DNS server on cilium-test/echo-same-node-64774c64d5-58z2n pod...
⌛ [tom-21889] Waiting for pod cilium-test/client2-5b97d7bc66-5gmn8 to reach DNS server on cilium-test/echo-other-node-67b74b6685-h89jf pod...
⌛ [tom-21889] Waiting for pod cilium-test/client-755fb678bd-rpjz9 to reach DNS server on cilium-test/echo-other-node-67b74b6685-h89jf pod...
⌛ [tom-21889] Waiting for pod cilium-test/client-755fb678bd-rpjz9 to reach default/kubernetes service...
⌛ [tom-21889] Waiting for pod cilium-test/client2-5b97d7bc66-5gmn8 to reach default/kubernetes service...
⌛ [tom-21889] Waiting for CiliumEndpoint for pod cilium-test/echo-other-node-67b74b6685-h89jf to appear...
⌛ [tom-21889] Waiting for CiliumEndpoint for pod cilium-test/echo-same-node-64774c64d5-58z2n to appear...
⌛ [tom-21889] Waiting for Service cilium-test/echo-other-node to become ready...
⌛ [tom-21889] Waiting for Service cilium-test/echo-same-node to become ready...
⌛ [tom-21889] Waiting for NodePort 10.224.0.33:31473 (cilium-test/echo-same-node) to become ready...
⌛ [tom-21889] Waiting for NodePort 10.224.0.33:30513 (cilium-test/echo-other-node) to become ready...
⌛ [tom-21889] Waiting for NodePort 10.224.0.4:30513 (cilium-test/echo-other-node) to become ready...
⌛ [tom-21889] Waiting for NodePort 10.224.0.4:31473 (cilium-test/echo-same-node) to become ready...
ℹ️  Skipping IPCache check
🔭 Enabling Hubble telescope...
⚠️  Unable to contact Hubble Relay, disabling Hubble telescope and flow validation: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp [::1]:4245: connect: connection refused"
ℹ️  Expose Relay locally with:
   cilium hubble enable
   cilium hubble port-forward&
ℹ️  Cilium version: 1.13.0
🏃 Running tests...
[=] Test [no-policies]
................................
[=] Test [allow-all-except-world]
..............
[=] Test [client-ingress]
..
[=] Test [all-ingress-deny]
........
[=] Test [all-egress-deny]
................
[=] Test [all-entities-deny]
........
[=] Test [cluster-entity]
..
[=] Test [host-entity]
....
[=] Test [echo-ingress]
....
[=] Test [client-ingress-icmp]
..
[=] Test [client-egress]
....
[=] Test [client-egress-expression]
....
[=] Test [client-egress-to-echo-service-account]
....
[=] Test [to-entities-world]
......
[=] Test [to-cidr-1111]
....
[=] Test [echo-ingress-from-other-client-deny]
......
[=] Test [client-ingress-from-other-client-icmp-deny]
......
[=] Test [client-egress-to-echo-deny]
......
[=] Test [client-ingress-to-echo-named-port-deny]
....
[=] Test [client-egress-to-echo-expression-deny]
....
[=] Test [client-egress-to-echo-service-account-deny]
....
[=] Test [client-egress-to-cidr-deny]
....
[=] Test [client-egress-to-cidr-deny-default]
....
[=] Test [health]
..
[=] Test [echo-ingress-l7]
............
[=] Test [echo-ingress-l7-named-port]
............
[=] Test [client-egress-l7-method]
............
[=] Test [client-egress-l7]
..........
[=] Test [client-egress-l7-named-port]
..........
[=] Test [dns-only]
..........
[=] Test [to-fqdns]
........

✅ All 31 tests (228 actions) successful, 0 tests skipped, 1 scenarios skipped.
------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/Desktop/tmp » date                                                                                                                         tom@Toms-MacBook-Pro
Thu 26 Jan 2023 12:25:09 PST

@tommyp1ckles
Copy link
Contributor

tommyp1ckles commented Jan 27, 2023

Host Firewall

Feedback

This ones fine, but I think the doc as a whole could use some work:

  • It was so clear that I had to replace the eth config flag with my own eth device, causing some crashes on my first attempt.
  • No mention of using cilium status --wait and article says At this point, the Cilium-managed nodes are ready to enforce network policies. which is likely not true.
  • Same issue with the labelling of the k8s node, I just copy pasted it in and got the wrong result - should be more explicity that "your node name goes here"
  • I had to figure out on my own how to induce the audit monitor logs (i.e. exec into a pod and try to ssh into the firewalled node). Initially even I made the mistake of trying to do the ssh from a non cluster entity and getting unexpected results.
  • Messed up copy pasting stuff since we mix scripts and output, it's annoying to have to copy and paste individual lines, lets break these up.
  • (minor) No guidance on what environment to try this one. Really, that shouldn't matter, but I think as someone following this it would be helpful to mention a well worn path for how to do this. For example: "this could work on any cluster but we recommend kind ...").
  • Note sure if the policy is correct, the doc says It allows communications from outside the cluster only on port TCP/22. but the policy only has fromEntities: ["cluster"]?

@pchaigno
Copy link
Member

@tommyp1ckles Are you planning to send a PR?

For example: "this could work on any cluster but we recommend kind ...").

I'd expect users to usually come with a specific env. in mind. But regardless, it's not something we do in any guide AFAIK.

Note sure if the policy is correct, the doc says It allows communications from outside the cluster only on port TCP/22. but the policy only has fromEntities: ["cluster"]?

Those two things don't seem to contradict each other 🤔 The second allows anything from the cluster; the first says that if something comes from outside the cluster, it will only be allowed on port TCP/22.

@tommyp1ckles
Copy link
Contributor

@pchaigno Yup, im putting together some changes.

tommyp1ckles added a commit to tommyp1ckles/cilium that referenced this issue Jan 28, 2023
Some terminals automatically escape characters for arguments.

Addresses: cilium#23051

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
tommyp1ckles added a commit to tommyp1ckles/cilium that referenced this issue Jan 28, 2023
* Add options for using cilium-cli or helm.
* Ensure cilium is ready prior to proceeding.

Addresses: cilium#23051

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
@cmluciano
Copy link
Member

Update: Checking on olm releases for the RC so that I can test OKD

@raphink
Copy link
Member

raphink commented Feb 13, 2023

RKE doc links to a broken anchor. PR: #23706

@raphink
Copy link
Member

raphink commented Feb 13, 2023

RKE install worked well.

@raphink
Copy link
Member

raphink commented Feb 13, 2023

RKE toEntities/kube-apiserver test failed, most likely because this is a one-node cluster (where kubeapi-server == host):

  • created a pod in the default namespace
  • curl -k https://kubernetes.default.svc.cluster.local:443/
  • added a default deny policy to the default namespace, curl fails now
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: default-deny
  namespace: default
spec:
  endpointSelector: {}
  ingress:
    - {}
  egress:
    - toEndpoints:
        - matchLabels:
            io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: UDP
          rules:
            dns:
              - matchPattern: "*"
  • added a toEntities: [kubeapi-server] egress policy to the default namespace, curl still fails
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: kubeapi-server
  namespace: default
spec:
  endpointSelector: {}
  egress:
    - toEntities:
      - kube-apiserver
  • hubble says:
Feb 13 11:21:37.934: default/test:55964 (ID:15350) <> 164.90.237.91:6443 (host) Policy denied DROPPED (TCP Flags: SYN)
  • added host to the list of allowed egress entities, curl passes now
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: kubeapi-server
  namespace: default
spec:
  endpointSelector: {}
  egress:
    - toEntities:
      - kube-apiserver
      - host

@raphink
Copy link
Member

raphink commented Feb 14, 2023

Better fix for the RKE doc link: #23728

@cmluciano
Copy link
Member

With regard to Openshift testing, I was able to bring up the cluster but hit a few snags with the cilium-olm.

1.6763978282953136e+09	ERROR	helm.controller	Release failed	{"namespace": "cilium", "name": "cilium", "apiVersion": "cilium.io/v1alpha1", "kind": "CiliumConfig", "release": "cilium", "error": "failed to install release: rendered manifests contain a resource that already exists. Unable to continue with install: could not get information about the resource Role \"cilium-config-agent\" in namespace \"cilium\": roles.rbac.authorization.k8s.io \"cilium-config-agent\" is forbidden: User \"system:serviceaccount:cilium:cilium-olm\" cannot get resource \"roles\" in API group \"rbac.authorization.k8s.io\" in the namespace \"cilium\": RBAC: role.rbac.authorization.k8s.io \"leader-election\" not found"}

I manually patched the clusterrole for cilium-olm to include the ability to view roles and rolebindings

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: "2023-02-14T16:02:06Z"
  name: cilium-cilium-olm
  resourceVersion: "24707"
  uid: 8a51cc46-657d-484a-839d-629ee196e146
rules:
- apiGroups:
  - security.openshift.io
  resourceNames:
  - hostnetwork
  resources:
  - securitycontextconstraints
  verbs:
  - use
- apiGroups:
  - rbac.authorization.k8s.io
  resources:
  - clusterroles
  - clusterrolebindings
  - roles
  - rolebindings
  verbs:
  - create
  - get
  - patch
  - update
  - delete
  - list
  - watch

This triggered the following

I0214 18:36:08.422128       1 request.go:601] Waited for 1.047810343s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/storage.k8s.io/v1beta1?timeout=32s
1.676399770037502e+09	ERROR	helm.controller	Release failed	{"namespace": "cilium", "name": "cilium", "apiVersion": "cilium.io/v1alpha1", "kind": "CiliumConfig", "release": "cilium", "error": "failed to install release: clusterroles.rbac.authorization.k8s.io \"cilium-operator\" is forbidden: user \"system:serviceaccount:cilium:cilium-olm\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:cilium\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"\"], Resources:[\"services/status\"], Verbs:[\"patch\"]}"}

@nbusseneau
Copy link
Member

nbusseneau commented Feb 15, 2023

@cmluciano Probably we need to fix this in the manifests then. cc @nathanjsweet for opinion

michi-covalent added a commit to cilium/cilium-olm that referenced this issue Mar 1, 2023
- KPR probe mode has been removed in v1.13. Use the default setting
  for running the CNI tests.
- Update cilium-olm cluster role. Cilium v1.13 needs access to role,
  rolebindings, and service/status.

Ref: cilium/cilium#23051 (comment)
Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
michi-covalent added a commit to cilium/cilium-olm that referenced this issue Mar 2, 2023
- KPR probe mode has been removed in v1.13. Use the default setting
  for running the CNI tests.
- Update cilium-olm cluster role. Cilium v1.13 needs access to role,
  rolebindings, and service/status.

Ref: cilium/cilium#23051 (comment)
Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. release-blocker/1.13 This issue will prevent the release of the next version of Cilium.
Projects
None yet
Development

No branches or pull requests