Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube-Scheduler Support for managing node's available IP addresses #26

Closed
liwenwu-amazon opened this issue Jan 29, 2018 · 15 comments
Closed
Assignees
Labels
feature request stale Issue or PR is stale

Comments

@liwenwu-amazon
Copy link
Contributor

liwenwu-amazon commented Jan 29, 2018

Kube-Scheduler Support for managing node's available IPv4 addresses

Problem

As described in issue #18, kube-scheduler is not aware of the number of available IPv4 addresses on a node. Kube-Scheduler can schedule a Pod to run on a node even after the node has exhausted all of its IPv4 addresses.

Proposal

Here we propose to use kubernete's extended resources (which is supported in Kubernetes 1.8) to enable kube-scheduler support for managing node's available IPv4 addresses

  • We will define a vpc.amazonaws.com/ipv4 as a node level extended resource
  • A pod which is not using hostNetwork mode needs to specify following in one of its container:
apiVersion: v1
  kind: Pod
  metadata:
    name: my-pod
spec:
  containers:
  - name: my-container
    image: myimage
    resource:
      requests:
        vpc.amazonaws.com/ipv4: 1
      limits:
        vpc.amazonaws.com/ipv4: 1

Solution

eni-ip-controller is a new component which runs in a kubernetes cluster. It watches kubernetes node resources. When a new node joins the cluster, eni-ip-controller updates API server about node's vpc.amazonaws.com/ipv4 resource (the number of available IPv4 addresses on the new node). Here is the workflow:

eni-ip-jan30

Here is an HTTP request that advertises 15 "vpc.amazonaws.com/ipv4" resources on node k8s-node-1

curl --header "Content-Type: application/json-patch+json" \
--request PATCH \
--data '[{"op": "add", "path": "/status/capacity/vpc.amazonaws.com~1ipv4", "value": "15"}]' \
http://k8s-master:8080/api/v1/nodes/k8s-node-1/status 

Pod Admission Control

Since "vpc.amazonaws.com/ipv4" is NOT a standard resource like "cpu" or "memory", if a Pod does NOT have "vpc.amazonaws.com/ipv4" specified, the Pod can consume an ENI IPv4 resource on a node without accounted in the scheduler.

Here are few options to solve this:

Using Taint

Using taint, such that a pod which does NOT have "vpc.amazonaws.com/ipv4" resource limits specified will be evicted or not scheduled. This is accomplished by

  • tainting all nodes in the cluster with vpc-ipv4=true
kubectl taint nodes <node-name> vpc-ipv4=true:NoSchedule
kubectl taint nodes <node-name> vpc-ipv4=true:NoExecute
  • specifying all Pods with following toleration:
tolerations:
- key: "vpc-ipv4"
  operator: "Equal"
  value: "true"
  effect: "NoSchedule"
- key: "vpc-ipv4"
  operator: "Equal"
  value: "true"
  effect: "NoExecute"
  
Using ExtendedResourceToleration

Kubernetes 1.9 introduces ExtendedResourceToleration, where API server can automatically add tolerations for such taints.

Using Kubernetes Admission Control (initializer)

Kubernetes Initializer
can be used to always inject "vpc.amazonaws.com/ipv4" resource request into a pod spec. Here is the workflow:

eni-initializer-jan30 1

Alternative Solutions Considered:

Using kubernetes Custom Scheduler Custom Schedulers

Kubernetes Custom Schedulers allows us to build a new scheduler for all pods to be scheduled on our cluster.

  • Pros: our custom scheduler have full control how to schedule pods onto nodes based on the number of available IPv4 addresses
  • Cons: our custom scheduler also have to build all other existing scheduler features and it will not have any future enhancements coming out of the default scheduler.

Tainting node not schedulable when running out of IPv4 addresses

When L-IPAM runs out of IPv4 addresses, L-IPAM can taint node itself as not-schedulable.

  • Pros: L-IPAM runs on the node and have direct knowledge when it runs out of IPv4 addresses
  • Cons: There is a race condition that scheduler could still schedule a pod onto a node before node's L-IPAM finish tainting the node as un-schedulable.

Using scheduler extension

kubernetes scheduler extender allows us to implement a "scheduler extender" process that the standard Kubernetes scheduler calls out to as a final pass when making scheduling decisions.

  • Pro: Have full control whether to schedule a pod onto a node based on the number of available IPv4 addresses
  • Cons: It is a performance hit where scheduler must call out twice to external webhooks when scheduling each pod. Also, this extender MUST manage and perform accounting for the number of available IPv4 addresses on each node.
@bsalamat
Copy link

Extended resources are already supported by the scheduler. Since you plan to use extended resources, no changes in the scheduler is needed.

@liwenwu-amazon
Copy link
Contributor Author

Thank you Bobby! Is there any plan in future to allow Extended resources specified at Pod level?

@bsalamat
Copy link

It is already possible for pods (containers) to request extended resources. Please see this: https://kubernetes.io/docs/tasks/configure-pod-container/extended-resource/

@vishh
Copy link

vishh commented Feb 15, 2018

It is not currently possible to specify pod level resources. What you can do is build a scheduler extender which will count a single IP per pod even if the pod itself doesn't request for that resource. This becomes an extended predicate (and possibly priority) function. You can expose the number of IPs per node by running a separate controller that patches node objects.

@Huang-Wei
Copy link

@liwenwu-amazon one question here, is latest EKS still using the "k8s extended resource" design to ensure a pod can be scheduled properly to get a valid VPC ip?

apiVersion: v1
  kind: Pod
  metadata:
    name: my-pod
spec:
  containers:
  - name: my-container
    image: myimage
    resource:
      requests:
        vpc.amazonaws.com/ipv4: 1
      limits:
        vpc.amazonaws.com/ipv4: 1

If so, in user's perspective, does user still need to specify vpc.amazonaws.com/ipv4 in a yaml spec, or EKS already makes this transparent to end-user?

@ofiliz
Copy link

ofiliz commented Jun 2, 2018

@Huang-Wei Not yet. When we add that feature, we'll also make it transparent so that you won't need to request the resource in your spec.

Currently kubelet advertises pod capacity to match the IPv4 private address capacity. Either way, you don't need to do anything.

@labria
Copy link
Contributor

labria commented Oct 21, 2018

@ofiliz

Currently kubelet advertises pod capacity to match the IPv4 private address capacity

Can you please link me to docs/code about that? It seems that this is not true in my cluster, I want to try figuring out what.

@deiwin
Copy link

deiwin commented Oct 22, 2018

The EKS AMI bootstrap script sets kubelet's --max-pods argument. The exact value is based on the instance type used.

@labria
Copy link
Contributor

labria commented Oct 22, 2018

@deiwin I see, thank you. I'm using a kops-created cluster, not EKS, so not using those images.

@gabegorelick
Copy link

Any progress on implementing this?

@mogren
Copy link
Contributor

mogren commented Aug 16, 2019

Closing in favor of aws/containers-roadmap#398

@github-actions
Copy link

github-actions bot commented Sep 2, 2023

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Sep 2, 2023
@jdn5126 jdn5126 removed the stale Issue or PR is stale label Sep 12, 2023
@cskinfill
Copy link

I would love to see this feature.

Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

@github-actions github-actions bot added the stale Issue or PR is stale label Dec 31, 2023
Copy link

Issue closed due to inactivity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request stale Issue or PR is stale
Projects
None yet
Development

No branches or pull requests