-
Notifications
You must be signed in to change notification settings - Fork 12
Alnair Installation Guide
Zhaobo edited this page Jan 16, 2023
·
11 revisions
1. Prepare two linux nodes (at least one gpu node as worker), install docker engine
- in the
/etc/docker/daemon.json
add{"exec-opts": ["native.cgroupdriver=systemd"]}
, Thensystemctl daemon-reload
,systemctl restart docker
this is because kubelet's default driver is systemd, different from docker's default cgroupfs, kubelet cannot start with this difference
2. On the master node, install 1.23 version with kubeadm, and then create cluster with kubeadm
apt-mark unhold kubeadm && apt-get install --allow-downgrades kubeadm=1.23.0-00 && apt-mark hold kubeadm
apt-mark unhold kubelet && apt-get install --allow-downgrades kubelet=1.23.0-00 && apt-mark hold kubelet
apt-mark unhold kubectl && apt-get install --allow-downgrades kubectl=1.23.0-00 && apt-mark hold kubectl
kubeadm init --pod-network-cidr=10.244.0.0/16
- in
kubeadm init
add network cidr, otherwise later flannel fails - after k8s 1.24, you need to install cri-dockerd. Docker Engine does not implement the CRI which is a requirement for a container runtime to work with Kubernetes. For that reason, an additional service cri-dockerd has to be installed. cri-dockerd is a project based on the legacy built-in Docker Engine support that was removed from the kubelet in version 1.24.
3. On the master node, Install network plugin flannel
5. On the worker node, install apt-get install nvidia-docker2
and set nvidia docker runtime as default runtime
{
"default-runtime":"nvidia",
"exec-opts": ["native.cgroupdriver=systemd"],
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
join command can be printed on master node with kubeadm token create --print-join-command
7. On the master, Install nvidia device plugin
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config