You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NetworkUnavailable False Tue, 21 May 2024 00:17:41 +0800 Tue, 21 May 2024 00:17:41 +0800 FlannelIsUp Flannel is running on this node
MemoryPressure False Tue, 21 May 2024 11:41:24 +0800 Thu, 09 May 2024 15:48:31 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 21 May 2024 11:41:24 +0800 Thu, 09 May 2024 15:48:31 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 21 May 2024 11:41:24 +0800 Thu, 09 May 2024 15:48:31 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 21 May 2024 11:41:24 +0800 Tue, 21 May 2024 00:17:52 +0800 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 192.168.2.145
Hostname: 416a100
Capacity:
cpu: 80
ephemeral-storage: 459819088Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 263739228Ki
nvidia.com/gpu: 0
pods: 110
Allocatable:
cpu: 80
ephemeral-storage: 447312008456
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 263739228Ki
nvidia.com/gpu: 0
pods: 110
2.nvidia-smi
nvidia-smi
Tue May 21 11:44:30 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB Off | 00000000:36:00.0 Off | 0 |
| N/A 42C P0 46W / 250W | 13MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-PCIE-40GB Off | 00000000:37:00.0 Off | 0 |
| N/A 42C P0 45W / 250W | 13MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA RTX A6000 Off | 00000000:9D:00.0 Off | Off |
| 30% 37C P8 22W / 300W | 14MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA RTX A6000 Off | 00000000:9E:00.0 Off | Off |
| 30% 36C P8 28W / 300W | 14MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB |
3.sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Tue May 21 03:44:52 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB Off | 00000000:36:00.0 Off | 0 |
| N/A 42C P0 46W / 250W | 13MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-PCIE-40GB Off | 00000000:37:00.0 Off | 0 |
| N/A 42C P0 45W / 250W | 13MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA RTX A6000 Off | 00000000:9D:00.0 Off | Off |
| 30% 37C P8 22W / 300W | 14MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA RTX A6000 Off | 00000000:9E:00.0 Off | Off |
| 30% 35C P8 27W / 300W | 14MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Name: 416a100
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=k3s
beta.kubernetes.io/os=linux
gpu=on
k3s.io/hostname=416a100
k3s.io/internal-ip=192.168.2.145
kubernetes.io/arch=amd64
kubernetes.io/hostname=416a100
kubernetes.io/os=linux
node-role.kubernetes.io/master=true
node.kubernetes.io/instance-type=k3s
Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"a2:0a:a5:6d:d7:7e"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.2.145
hami.io/mutex.lock: 2024-05-13T13:04:17Z
hami.io/node-handshake: Requesting_2024.05.20 11:44:46
hami.io/node-nvidia-register:
GPU-b7c4eb59-dd76-ca5a-8482-56fd796b0a75,10,40960,100,NVIDIA-NVIDIA A100-PCIE-40GB,0,true:GPU-ec7d894f-bb24-dc73-1adb-17806ec68749,10,4096...
k3s.io/node-args: ["server","--docker"]
k3s.io/node-config-hash: 6DWNFWQMIJPJNOOSYKGNXGNN7DPG53Z77PAPIQ56XNVT2UPS3TFA====
k3s.io/node-env: {"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/cba07c8500bccabd42d9215a6af6b01181cb6ca5755d12ae1e4e02b27b50bafa"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Thu, 09 May 2024 15:48:32 +0800
Taints:
Unschedulable: false
Lease:
HolderIdentity: 416a100
AcquireTime:
RenewTime: Tue, 21 May 2024 11:41:41 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
NetworkUnavailable False Tue, 21 May 2024 00:17:41 +0800 Tue, 21 May 2024 00:17:41 +0800 FlannelIsUp Flannel is running on this node
MemoryPressure False Tue, 21 May 2024 11:41:24 +0800 Thu, 09 May 2024 15:48:31 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 21 May 2024 11:41:24 +0800 Thu, 09 May 2024 15:48:31 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 21 May 2024 11:41:24 +0800 Thu, 09 May 2024 15:48:31 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 21 May 2024 11:41:24 +0800 Tue, 21 May 2024 00:17:52 +0800 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 192.168.2.145
Hostname: 416a100
Capacity:
cpu: 80
ephemeral-storage: 459819088Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 263739228Ki
nvidia.com/gpu: 0
pods: 110
Allocatable:
cpu: 80
ephemeral-storage: 447312008456
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 263739228Ki
nvidia.com/gpu: 0
pods: 110
2.nvidia-smi
nvidia-smi
Tue May 21 11:44:30 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB Off | 00000000:36:00.0 Off | 0 |
| N/A 42C P0 46W / 250W | 13MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-PCIE-40GB Off | 00000000:37:00.0 Off | 0 |
| N/A 42C P0 45W / 250W | 13MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA RTX A6000 Off | 00000000:9D:00.0 Off | Off |
| 30% 37C P8 22W / 300W | 14MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA RTX A6000 Off | 00000000:9E:00.0 Off | Off |
| 30% 36C P8 28W / 300W | 14MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB |
3.sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Tue May 21 03:44:52 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB Off | 00000000:36:00.0 Off | 0 |
| N/A 42C P0 46W / 250W | 13MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-PCIE-40GB Off | 00000000:37:00.0 Off | 0 |
| N/A 42C P0 45W / 250W | 13MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA RTX A6000 Off | 00000000:9D:00.0 Off | Off |
| 30% 37C P8 22W / 300W | 14MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA RTX A6000 Off | 00000000:9E:00.0 Off | Off |
| 30% 35C P8 27W / 300W | 14MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
4.hami pod:
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
helm-install-traefik-nzsh4 0/1 Completed 0 11d
svclb-traefik-cpwxf 2/2 Running 40 11d
metrics-server-7b4f8b595-5kn69 1/1 Running 21 11d
local-path-provisioner-64d457c485-nccpm 1/1 Running 20 11d
coredns-5d69dc75db-q7rxn 1/1 Running 20 11d
traefik-5dd496474-rxmr2 1/1 Running 20 11d
nvidia-device-plugin-daemonset-jg762 1/1 Running 0 5m50s
hami-device-plugin-nv5gs 2/2 Running 0 4m43s
hami-scheduler-757847d79f-n7dbf 2/2 Running 0 4m43s
The text was updated successfully, but these errors were encountered: