A system of Kubernetes controllers that intelligently schedules workloads across clusters.
Clone or download
adrienjt vendor
Signed-off-by: adrienjt <adrienjt@users.noreply.github.com>
Latest commit 9fa992b Dec 9, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cmd initial commit Dec 10, 2018
config initial commit Dec 10, 2018
doc initial commit Dec 10, 2018
hack initial commit Dec 10, 2018
pkg initial commit Dec 10, 2018
vendor vendor Dec 10, 2018
.gitignore initial commit Dec 10, 2018
CONTRIBUTING.md initial commit Dec 10, 2018
Gopkg.lock initial commit Dec 10, 2018
Gopkg.toml initial commit Dec 10, 2018
LICENSE initial commit Dec 10, 2018
PROJECT initial commit Dec 10, 2018
README.md initial commit Dec 10, 2018

README.md

Multicluster-Scheduler

Multicluster-scheduler is a system of Kubernetes controllers—a scheduler and its agents—that intelligently schedules workloads across clusters. It differs from Federation v2 in several ways:

  • Multi-cluster workloads can be declared in any member cluster and/or the scheduler's control plane (the Kubernetes API of the cluster hosting the scheduler), allowing greater architectural flexibility–e.g., don't give users direct access to the scheduler's cluster.
  • The agents push observations and pull scheduling decisions to/from the scheduler's control plane. The scheduler reconciles scheduling decisions with observations, but never calls the Kubernetes APIs of the member clusters.

Multicluster-scheduler implements a basic spread scheduler, which can be replaced by more advanced implementations leveraging pod, node, and node-pool observations. For example, Admiralty's hybrid and multicloud scheduler as a service, built upon multicluster-scheduler, supports strategies including burst-to-cloud and real-time arbitrage (cost optimization).

How it Works

Multicluster-scheduler includes custom resource definitions (CRDs) for:

  • node pools, which hold min/max node counts and pricing information;
  • multi-cluster workloads, e.g., multi-cluster deployments, multicluster-scheduler's user-facing API;
  • observations, images of observed node pools, nodes, pods, and multi-cluster workloads;
  • and decisions, images of desired single-cluster workloads.

Note: For now, multicluster-scheduler defines its own multi-cluster workload API, but we're considering integrations with Federation v2's API and/or Crossplane.

Node pools and multi-cluster workloads are defined in each member cluster, whereas observations and decisions are only defined in the scheduler's control plane.

The custom resources are controlled by two managers:

  • The agent, deployed in each member cluster, manages three controllers:
    • The node pool controller automatically creates node pool objects in the agent's cluster. In GKE and AKS, it uses the cloud.google.com/gke-nodepool or agentpool label, respectively; in the absence of those labels, a default node pool object is created. Min/max node counts and pricing information can be updated by the user, or controlled by other tools. Custom node pool objects can also be created using label selectors.
    • The observations controller, a multi-cluster controller, watches node pools, nodes, pods, and multi-cluster workloads in the agent's cluster and reconciles corresponding observations in the scheduler's control plane.
    • The decisions controller, another multi-cluster controller, watches scheduling decisions in the scheduler's control plane and reconciles corresponding single-cluster workloads in the agent's cluster.
  • The scheduler, deployed wherever, manages an eponymous controller that watches observations and makes scheduling decisions in the scheduler's cluster. For example, it creates one or several deployment decisions from a multi-cluster deployment observation, using node pool, node and pod observations to inform those decisions. The scheduler doesn't push anything to the member clusters.

Getting Started

We assume that you are a cluster admin for two clusters, associated with, e.g., the contexts "cluster1" and "cluster2" in your kubeconfig. We're going to install a basic scheduler in cluster1 and agents in cluster1 and cluster2, but the scripts can easily be adapted for other configurations. Then, we will deploy a multi-cluster NGINX.

CLUSTER1=cluster1 # change me
CLUSTER2=cluster2 # change me

Note: with Admiralty's hybrid and multicloud scheduler as a service, which is basically a managed scheduler with advanced features, you only need to install the agent and a Secret. To install the full multicluster-scheduler, please read on.

Installation

Scheduler

Choose a cluster to host the scheduler, download the basic scheduler's manifest and install it:

SCHEDULER_CLUSTER_NAME="$CLUSTER1"
RELEASE_URL=https://github.com/admiraltyio/multicluster-scheduler/releases/download/v0.1.0
kubectl config use-context "$SCHEDULER_CLUSTER_NAME"
kubectl apply -f "$RELEASE_URL/scheduler.yaml"

Federation

In the same cluster as the scheduler, create a namespace for each cluster federation and, in it, a service account and role binding for each member cluster. The scheduler's cluster can be a member too.

FEDERATION_NAMESPACE=foo
kubectl create namespace "$FEDERATION_NAMESPACE"
MEMBER_CLUSTER_NAMES=("$CLUSTER1" "$CLUSTER2") # Add as many member clusters as you want.
for CLUSTER_NAME in "${MEMBER_CLUSTER_NAMES[@]}"; do
    kubectl create serviceaccount "$CLUSTER_NAME" \
        --namespace "$FEDERATION_NAMESPACE"
    kubectl create rolebinding "$CLUSTER_NAME" \
        --namespace "$FEDERATION_NAMESPACE" \
        --serviceaccount "$FEDERATION_NAMESPACE:$CLUSTER_NAME" \
        --clusterrole multicluster-scheduler-member
done

Multicluster-Service-Account

If you're already running multicluster-service-account in each member cluster, and the scheduler's cluster is known to them as $SCHEDULER_CLUSTER_NAME, you can skip this step and install the agents. Otherwise, read on.

Download the multicluster-service-account manifest and install it in each member cluster:

MCSA_RELEASE_URL=https://github.com/admiraltyio/multicluster-service-account/releases/download/v0.2.0
for CLUSTER_NAME in "${MEMBER_CLUSTER_NAMES[@]}"; do
    kubectl --context "$CLUSTER_NAME" apply -f "$MCSA_RELEASE_URL/install.yaml"
done

Then, download the kubemcsa binary and run the bootstrap command to allow member clusters to import service accounts from the scheduler's cluster:

OS=linux # or darwin (i.e., OS X) or windows
ARCH=amd64 # if you're on a different platform, you must know how to build from source
curl -Lo kubemcsa "$MCSA_RELEASE_URL/kubemcsa-$OS-$ARCH"
chmod +x kubemcsa
sudo mv kubemcsa /usr/local/bin

for CLUSTER_NAME in "${MEMBER_CLUSTER_NAMES[@]}"; do
    kubemcsa bootstrap "$CLUSTER_NAME" "$SCHEDULER_CLUSTER_NAME"
done

Agent

Download the agent manifest and install it in each member cluster:

curl -LO $RELEASE_URL/agent.yaml
for CLUSTER_NAME in "${MEMBER_CLUSTER_NAMES[@]}"; do
    sed -e "s/SCHEDULER_CLUSTER_NAME/$SCHEDULER_CLUSTER_NAME/g" \
        -e "s/FEDERATION_NAMESPACE/$FEDERATION_NAMESPACE/g" \
        -e "s/CLUSTER_NAME/$CLUSTER_NAME/g" \
        agent.yaml > "agent-$CLUSTER_NAME.yaml"
    kubectl --context "$CLUSTER_NAME" apply -f "agent-$CLUSTER_NAME.yaml"
done

Check that node pool objects have been created in the agents' clusters and observations appear in the scheduler's control plane:

for CLUSTER_NAME in "${MEMBER_CLUSTER_NAMES[@]}"; do
    kubectl --context "$CLUSTER_NAME" get nodepools
done
kubectl --namespace "$FEDERATION_NAMESPACE" get nodepoolobservations
kubectl --namespace "$FEDERATION_NAMESPACE" get nodeobservations
kubectl --namespace "$FEDERATION_NAMESPACE" get podobservations
# or
kubectl --namespace "$FEDERATION_NAMESPACE" get observations # by category

Example

Deploy NGINX as a MulticlusterDeployment in any of the member cluster, e.g., cluster2:

cat <<EOF | kubectl --context "$CLUSTER2" apply -f -
apiVersion: multicluster.admiralty.io/v1alpha1
kind: MulticlusterDeployment
metadata:
  name: nginx
spec:
  replicas: 5
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
EOF

Check that an observation of the multi-cluster deployment has been created in the scheduler's control plane, as well as one deployment decision for each member cluster, hence one deployment in each member cluster:

kubectl --namespace "$FEDERATION_NAMESPACE" get multiclusterdeploymentobservations
kubectl --namespace "$FEDERATION_NAMESPACE" get deploymentdecisions
for CLUSTER_NAME in "${MEMBER_CLUSTER_NAMES[@]}"; do
    kubectl --context "$CLUSTER_NAME" get deployments
done

Bring Your Own Scheduler

You can easily implement your own multi-cluster scheduler using the Scheduler interface. For now, only deployments are supported. Here's a basic manager scaffolding, where the custom scheduler struct, which implements the Scheduler interface, is passed as an argument to the controller:

package main

import (
  "log"

  "admiralty.io/multicluster-controller/pkg/cluster"
  "admiralty.io/multicluster-controller/pkg/manager"
  "admiralty.io/multicluster-scheduler/pkg/apis/multicluster/v1alpha1"
  "admiralty.io/multicluster-scheduler/pkg/controllers/schedule"
  "admiralty.io/multicluster-service-account/pkg/config"
  appsv1 "k8s.io/api/apps/v1"
  corev1 "k8s.io/api/core/v1"
  _ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
  "k8s.io/sample-controller/pkg/signals"
)

func main() {
  cfg, _, err := config.ConfigAndNamespace()
  if err != nil {
    log.Fatalf("cannot load config: %v", err)
  }
  cl := cluster.New("", cfg, cluster.Options{})

  co, err := schedule.NewController(cl, &scheduler{})
  if err != nil {
    log.Fatalf("cannot create scheduler controller: %v", err)
  }

  m := manager.New()
  m.AddController(co)

  if err := m.Start(signals.SetupSignalHandler()); err != nil {
    log.Fatalf("while or after starting manager: %v", err)
  }
}

type scheduler struct {
    // ... e.g., structured observations
}

func (s *scheduler) SetNodePool(np *v1alpha1.NodePool) {
    // ... e.g., store to inform scheduling decisions
    // Note: the controller has added a ClusterName to the object's metadata.
}

func (s *scheduler) SetNode(n *corev1.Node) {
    // ... e.g., store to inform scheduling decisions
    // Note: the controller has added a ClusterName to the object's metadata.
}
func (s *scheduler) SetPod(p *corev1.Pod) {
    // ... e.g., store to inform scheduling decisions
    // Note: the controller has added a ClusterName to the object's metadata.
}

func (s *scheduler) Schedule(mcd *v1alpha1.MulticlusterDeployment) ([]*appsv1.Deployment, error) {
    // ... Given a MulticlusterDeployment, produce a slice of Deployments,
    // where the ClusterName in each object's metadata MUST be provided.
}

API Reference

https://godoc.org/admiralty.io/multicluster-scheduler/

or

go get admiralty.io/multicluster-scheduler
godoc -http=:6060

then http://localhost:6060/pkg/admiralty.io/multicluster-scheduler/