New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod networking #25

Closed
anton-johansson opened this Issue Jan 7, 2019 · 17 comments

Comments

Projects
None yet
2 participants
@anton-johansson
Copy link
Contributor

anton-johansson commented Jan 7, 2019

I know pod networking isn't something that should be handled by this repository, but I have a minor question. I've just gotten into pod networking, I've previously worked with a single worker node, meaning that the bridge supplied by this repository has worked wonders.

This repository supplies a bridge using the subnet 10.19.0.0/16.
This repository installs kube-controller-manager with the argument --cluster-cidr=10.19.0.0/16, which I believe is in charge of appointing IP addresses to each new pod.

I've now installed Flannel that should be used as an overlay network for proper pod networking. The configuration for it uses the subnet 10.244.0.0/16 (from the default in the flannel repository).

My question: How am I supposed to tell my cluster to use the Flannel subnet instead? One idea would be to have an Ansible parameter for this, but I'm kind of clueless here. Is that the right way to go? Or am I missing something? :)

@anton-johansson

This comment has been minimized.

Copy link
Contributor

anton-johansson commented Jan 7, 2019

I think I've got this all wrong. Don't mind this for now, let me wrap my head around this a bit more.

@amimof

This comment has been minimized.

Copy link
Owner

amimof commented Jan 7, 2019

Yes, you would need to change --cluster-cidr on kube-controller-manager so that pods get IP addresses in range of Flannels subnet. I think the --pod-cidr parameter on kubelet is never used and can be removed. It would be a good idea to add a cluster_cidr ansible variable.

@amimof amimof referenced this issue Jan 8, 2019

Merged

Cluster cidr var #28

@amimof

This comment has been minimized.

Copy link
Owner

amimof commented Jan 8, 2019

Added cluster_cidr variable. Looks like changing the CIDR in runtime is harder than i thought.
kubernetes/kubernetes#50305

@amimof amimof closed this in #28 Jan 8, 2019

@anton-johansson

This comment has been minimized.

Copy link
Contributor

anton-johansson commented Jan 8, 2019

I have some struggle setting the cluster CIDR alltogether. I've set it to 10.244.0.0/16 as I mentioned above, and the logs of kube-controller-manager looks correct:

Jan 08 09:42:12 k8s-master-01 kube-controller-manager[2178]: I0108 09:42:12.004863    2178 range_allocator.go:310] Set node k8s-worker-02 PodCIDR to 10.244.0.0/24
Jan 08 09:42:12 k8s-master-01 kube-controller-manager[2178]: I0108 09:42:12.059812    2178 range_allocator.go:310] Set node k8s-worker-01 PodCIDR to 10.244.2.0/24
Jan 08 09:42:12 k8s-master-01 kube-controller-manager[2178]: I0108 09:42:12.311693    2178 range_allocator.go:310] Set node k8s-worker-03 PodCIDR to 10.244.1.0/24

But when creating deployments, Pods still get IP addresses in the "old" CIDR, like 10.19.0.9 and 10.19.0.13.

I'll get the latest changes from master and see if it works better.

@amimof

This comment has been minimized.

Copy link
Owner

amimof commented Jan 8, 2019

@anton-johansson
Have a look in /etc/cni/net.d/10-bridge.conf:

{
  "cniVersion": "0.3.1",
  "name": "bridge",
  "type": "bridge",
  "bridge": "cnio0",
  "isGateway": true,
  "ipMasq": true,
  "ipam": {
    "type": "host-local",
    "ranges": [
      [{"subnet": "10.19.0.0/16"}]
    ],
    "routes": [{"dst": "0.0.0.0/0"}]
  }
}
@anton-johansson

This comment has been minimized.

Copy link
Contributor

anton-johansson commented Jan 8, 2019

Ah, that is probably it. Should the subnet of the bridge be the same as the Flannel one? I think that'll happen with your latest changes, which I did not do. Let me give it a go.

@anton-johansson

This comment has been minimized.

Copy link
Contributor

anton-johansson commented Jan 8, 2019

Actually, I'm gonna wait until #23 is done. :)

@anton-johansson

This comment has been minimized.

Copy link
Contributor

anton-johansson commented Jan 8, 2019

Failed create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "999fd02c43b32f1930384ec5abe15d0fa7177593868e1982bd47f9c29083f783": failed to set bridge addr: "cni0" already has an IP address different from 10.244.0.1/16

It does not seem to like it when both Flannel and the Bridge use the same subnet, which makes sense. Maybe the idea is to remove the Bridge in favor of Flannel?

@amimof

This comment has been minimized.

Copy link
Owner

amimof commented Jan 8, 2019

Are you trying on a fresh install or have you changed the CIDR afterwards? Apparently that causes issues.

@anton-johansson

This comment has been minimized.

Copy link
Contributor

anton-johansson commented Jan 8, 2019

I'm doing fresh installations now, to make sure I get things right.

Here's my Flannel configuration:

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        { 
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }

... where cni-conf.json is moved to /etc/cni/net.d/10-flannel.conflist by an init container.

I think I'm missing something regarding how the bridge plays together with Flannel (or other overlay networks).

@amimof

This comment has been minimized.

Copy link
Owner

amimof commented Jan 8, 2019

You are right. The 10-bridge.conflist on the nodes essentially creates pod networking which the README clearly states that this repo does not :) Should the cni role only install CNI and not generate it's configuration?

@anton-johansson

This comment has been minimized.

Copy link
Contributor

anton-johansson commented Jan 8, 2019

I think one of either:

  • Keep the repository purpose clear and just install CNI, not it's confiugration (like you said)
  • Install CNI, and generate the bridge configuration unless a parameter is given. This way, it's very easy to get started with this repository (as long as you're using a single worker node).

I think the first one is better. But maybe with need examples instead, so people can get started more easily. The bridge configuration is a good example for single nodes. Maybe examples for DNS and Ingress is a good idea too? They wouldn't "ruin" the purpose of the repository.

About my issue: Is it enough to just remove the CNI configuration (10-bridge.conflist)? I removed it from roles/cni/tasks/main.yml, but I'm still getting the same error:

Failed create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "beeb5054209efb30daa09fd5895f49ddbc207027364b1f83d37d4ba626f10d13": failed to set bridge addr: "cni0" already has an IP address different from 10.244.0.1/24

I must be doing something wrong, though, I'm not sure where cni0 is coming from if not the bridge configuration.

@anton-johansson

This comment has been minimized.

Copy link
Contributor

anton-johansson commented Jan 8, 2019

Okay, some progress.

anton@k8s-worker-01:~# ifconfig
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.19.0.1  netmask 255.255.0.0  broadcast 0.0.0.0
        inet6 fe80::782e:22ff:fec2:3eed  prefixlen 64  scopeid 0x20<link>
        ether 0a:58:0a:13:00:01  txqueuelen 1000  (Ethernet)
        RX packets 22295  bytes 2180289 (2.1 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 28191  bytes 4354602 (4.3 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

        ...

The old bridge, cni0 is still there even after I clean my cluster using cleanup.yml. Rebooting the machines after cleaning up solved it. Now my pods get the correct IP addresses.

But should we re-open this issue so we can fix the CNI configuration?

@amimof

This comment has been minimized.

Copy link
Owner

amimof commented Jan 8, 2019

Ah, nice find! Yes, go ahead

@anton-johansson

This comment has been minimized.

Copy link
Contributor

anton-johansson commented Jan 8, 2019

I can't 😂

@amimof amimof reopened this Jan 8, 2019

@amimof

This comment has been minimized.

Copy link
Owner

amimof commented Jan 8, 2019

Do you think that cluster_cidr should default to 10.244.0.0/16? If so then existing clusters would need to be re-installed or alternatively add cluster_cidr=10.19.0.0/16 to their inventory.

@anton-johansson

This comment has been minimized.

Copy link
Contributor

anton-johansson commented Jan 8, 2019

I don't have an opinion on that really. For me, personally, it does not matter. I don't mind having cluster_cidr=10.244.0.0/16 in my inventory. I don't mind changing my flannel subnet to 10.19.0.0/16 either for that matter. :)

Is there any kind of standard or common CIDR that most people use that would be wise to default to?

@amimof amimof closed this in #29 Jan 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment