Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

单机双显卡时,调度器显示绑定到了不同的显卡上,实际全部都调度到了一张显卡上 #184

Open
1003111014 opened this issue Aug 25, 2022 · 1 comment

Comments

@1003111014
Copy link

1003111014 commented Aug 25, 2022

调度器日志截图,显示两块显卡都进行了分配,用nvidia-smi查看gpu内存,实际全部都调度到了第一张显卡
screenshot-20220825-100950
screenshot-20220825-101009

screenshot-20220825-100303
getUsedGPUs: map[0:1000 1:4512] in node worker2, and devs map[1:0xc420629b00 0:0xc420629ae0] [ info ] 2022/08/25 01:57:36 nodeinfo.go:431: try to find unhealthy node unhealthy-gpu-worker2 [ info ] 2022/08/25 01:57:36 nodeinfo.go:397: available GPU list map[0:6979 1:3467] before removing unhealty GPUs [ info ] 2022/08/25 01:57:36 nodeinfo.go:402: available GPU list map[0:6979 1:3467] after removing unhealty GPUs [ debug ] 2022/08/25 01:57:36 nodeinfo.go:162: AvailableGPUs: map[0:6979 1:3467] in node worker2 [ info ] 2022/08/25 01:57:36 gpushare-predicate.go:31: The pod p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-npk9b in the namespace ai-model can be scheduled on worker2 [ info ] 2022/08/25 01:57:36 routes.go:93: gpusharingfilter extenderFilterResult = {"Nodes":null,"NodeNames":["worker2"],"FailedNodes":{},"Error":""} [ debug ] 2022/08/25 01:57:36 routes.go:162: /gpushare-scheduler/filter response=&{0xc4203820a0 0xc420402300 0xc420e45b80 0x565cc0 true false false false 0xc420e45d00 {0xc4202fe000 map[Content-Type:[application/json]] false false} map[Content-Type:[application/json]] true 66 -1 200 false false [] 0 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0] 0xc42037ad90 0} [ debug ] 2022/08/25 01:57:36 routes.go:160: /gpushare-scheduler/bind request body = &{0xc420e78b20 <nil> <nil> false true {0 0} false false false 0x69c120} [ debug ] 2022/08/25 01:57:36 routes.go:121: gpusharingBind ExtenderArgs ={p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-npk9b ai-model cb2866b3-bb73-47ca-898e-1652d41d79c0 worker2} [ info ] 2022/08/25 01:57:36 cache.go:160: GetNodeInfo() uses the existing nodeInfo for worker2 [ debug ] 2022/08/25 01:57:36 cache.go:162: node worker2 with devices map[0:0xc420629ae0 1:0xc420629b00] [ info ] 2022/08/25 01:57:36 nodeinfo.go:184: Allocate() ----Begin to allocate GPU for gpu mem for pod p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-npk9b in ns ai-model---- [ info ] 2022/08/25 01:57:36 nodeinfo.go:423: getAllGPUs: map[0:7979 1:7979] in node worker2, and dev map[0:0xc420629ae0 1:0xc420629b00] [ debug ] 2022/08/25 01:57:36 deviceinfo.go:42: GetUsedGPUMemory() podMap map[7c8baa1c-38d3-426c-b613-6673ef6cdd03:&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:p-f407d20e-239d-11ed-8a9a-b2a44a4d350f-8556669ddf-w9crj,GenerateName:p-f407d20e-239d-11ed-8a9a-b2a44a4d350f-8556669ddf-,Namespace:ai-model,SelfLink:,UID:7c8baa1c-38d3-426c-b613-6673ef6cdd03,ResourceVersion:87201592,Generation:0,CreationTimestamp:2022-08-24 11:43:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{app: f407d20e-239d-11ed-8a9a-b2a44a4d350f,pod-template-hash: 8556669ddf,},Annotations:map[string]string{ALIYUN_COM_GPU_MEM_ASSIGNED: true,ALIYUN_COM_GPU_MEM_ASSUME_TIME: 1661341380164154462,ALIYUN_COM_GPU_MEM_DEV: 7979,ALIYUN_COM_GPU_MEM_IDX: 0,ALIYUN_COM_GPU_MEM_POD: 1000,cni.projectcalico.org/podIP: 10.42.1.242/32,cni.projectcalico.org/podIPs: 10.42.1.242/32,},OwnerReferences:[{apps/v1 ReplicaSet p-f407d20e-239d-11ed-8a9a-b2a44a4d350f-8556669ddf 2c580ca3-1b79-46f3-a869-15dd1fedc74c 0xc4206ba7a0 0xc4206ba7a1}],Finalizers:[],ClusterName:,Initializers:nil,},Spec:PodSpec{Volumes:[{default-token-qg7xz {nil nil nil nil nil SecretVolumeSource{SecretName:default-token-qg7xz,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{c-f407d20e-239d-11ed-8a9a-b2a44a4d350f reg.tx.com/ai/offline_function_tensorrt/clamp:amd-2.6 [] [] [{port-7070 0 7070 TCP }] [] [{MINIO_ENDPOINT minio.minio:9000 nil} {MINIO_ACCESS_KEY AKIAIOSFODNN7EXAMGTFXXX nil} {MINIO_SECRET_KEY wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAXXXX nil} {MINIO_BUCKET_NAME aibucket nil} {MINIO_SECURE False nil} {DATA_CALLBACK_URL http://ai-service.lis-test:8000/ai/model/process/data nil} {NVIDIA_VISIBLE_DEVICES all nil}] {map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}] map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}]} [{default-token-qg7xz true /var/run/secrets/kubernetes.io/serviceaccount <n il>}] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}],RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:worker2,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists NoExecute 0xc4206ba7b0} {node.kubernetes.io/unreachable Exists NoExecute 0xc4206ba7b8}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],},Status:PodStatus{Phase:Pending,Conditions:[{Initialized True 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:43:00 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:43:00 +0000 UTC ContainersNotReady containers with unready status: [c-f407d20e-239d-11ed-8a9a-b2a44a4d350f]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:43:00 +0000 UTC ContainersNotReady containers with unready status: [c-f407d20e-239d-11ed-8a9a-b2a44a4d350f]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:43:00 +0000 UTC }],Message:,Reason:,HostIP:192.168.3.15,PodIP:,StartTime:2022-08-24 11:43:00 +0000 UTC,ContainerStatuses:[{c-f407d20e-239d-11ed-8a9a-b2a44a4d350f {ContainerStateWaiting{Reason:ContainerCreating,Message:,} nil nil} {nil nil nil} false 0 reg.tx.com/ai/offline_function_tensorrt/clamp:amd-2.6 }],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},}], and its address is 0xc420629ae0 [ debug ] 2022/08/25 01:57:36 pod.go:107: pod p-f407d20e-239d-11ed-8a9a-b2a44a4d350f-8556669ddf-w9crj in ns ai-model with status Pending has GPU Mem 1000 [ debug ] 2022/08/25 01:57:36 deviceinfo.go:42: GetUsedGPUMemory() podMap map[2c38f29c-ecb0-4991-99e5-f6daf12e8e43:&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:p-3bdd5ad6-1d54-11ed-bf40-dabc71e5c302-669f6bd48-bhvkr,GenerateName:p-3bdd5ad6-1d54-11ed-bf40-dabc71e5c302-669f6bd48-,Namespace:ai-model,SelfLink:,UID:2c38f29c-ecb0-4991-99e5-f6daf12e8e43,ResourceVersion:87185439,Generation:0,CreationTimestamp:2022-08-24 11:08:30 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{app: 3bdd5ad6-1d54-11ed-bf40-dabc71e5c302,pod-template-hash: 669f6bd48,},Annotations:map[string]string{ALIYUN_COM_GPU_MEM_ASSIGNED: true,ALIYUN_COM_GPU_MEM_ASSUME_TIME: 1661339310868533391,ALIYUN_COM_GPU_MEM_DEV: 7979,ALIYUN_COM_GPU_MEM_IDX: 1,ALIYUN_COM_GPU_MEM_POD: 1000,cni.projectcalico.org/podIP: 10.42.1.231/32,cni.projectcalico.org/podIPs: 10.42.1.231/32,},OwnerReferences:[{apps/v1 ReplicaSet p-3bdd5ad6-1d54-11ed-bf40-dabc71e5c302-669f6bd48 7d75cebd-fd97-4f53-a892-01655fba71f2 0xc42069ab30 0xc42069ab31}],Finalizers:[],ClusterName:,Initializers:nil,},Spec:PodSpec{Volumes:[{default-token-qg7xz {nil nil nil nil nil SecretVolumeSource{SecretName:default-token-qg7xz,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{c-3bdd5ad6-1d54-11ed-bf40-dabc71e5c302 reg.tx.com/ai/offline_function_tensorrt/bird_nest:amd-2.2 [] [] [{port-7070 0 7070 TCP }] [] [{MINIO_ENDPOINT minio.minio:9000 nil} {MINIO_ACCESS_KEY AKIAIOSFODNN7EXAMGTFXXX nil} {MINIO_SECRET_KEY wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAXXXX nil} {MINIO_BUCKET_NAME aibucket nil} {MINIO_SECURE False nil} {DATA_CALLBACK_URL http://ai-service.lis-test:8000/ai/model/process/data nil} {NVIDIA_VISIBLE_DEVICES all nil}] {map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}] map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}]} [{default-token-qg7xz true /var/run/secrets/kubernetes.io/serviceaccount <n il>}] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}],RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:worker2,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists NoExecute 0xc42069ab40} {node.kubernetes.io/unreachable Exists NoExecute 0xc42069ab48}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],},Status:PodStatus{Phase:Pending,Conditions:[{Initialized True 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:08:30 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:08:30 +0000 UTC ContainersNotReady containers with unready status: [c-3bdd5ad6-1d54-11ed-bf40-dabc71e5c302]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:08:30 +0000 UTC ContainersNotReady containers with unready status: [c-3bdd5ad6-1d54-11ed-bf40-dabc71e5c302]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:08:30 +0000 UTC }],Message:,Reason:,HostIP:192.168.3.15,PodIP:,StartTime:2022-08-24 11:08:30 +0000 UTC,ContainerStatuses:[{c-3bdd5ad6-1d54-11ed-bf40-dabc71e5c302 {ContainerStateWaiting{Reason:ContainerCreating,Message:,} nil nil} {nil nil nil} false 0 reg.tx.com/ai/offline_function_tensorrt/bird_nest:amd-2.2 }],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},} 030a837d-cdc8-49ff-9b4b-1267bd6540a2:&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:p-427f8ad8-1d56-11ed-bf40-dabc71e5c302-7777bdd858-rg894,GenerateName:p-427f8ad8-1d56-11ed-bf40-dabc71e5c302-7777bdd858-,Namespace:ai-model,SelfLink: ,UID:030a837d-cdc8-49ff-9b4b-1267bd6540a2,ResourceVersion:87185550,Generation:0,CreationTimestamp:2022-08-24 11:08:42 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{app: 427f8ad8-1d56-11ed-bf40-dabc71e5c302,pod-template-hash: 7777bdd858,},Annotations:map[string]string{ALIYUN_COM_GPU_MEM_ASSIGNED: true,ALIYUN_COM_GPU_MEM_ASSUME_TIME: 1661339322298410169,ALIYUN_COM_GPU_MEM_DEV: 7979,ALIYUN_COM_GPU_MEM_IDX: 1,ALIYUN_COM_GPU_MEM_POD: 1000,cni.projectcalico.org/podIP: 10.42.1.232/32,cni.projectcalico.org/podIPs: 10.42.1.232/32,},OwnerReferences:[{apps/v1 ReplicaSet p-427f8ad8-1d56-11ed-bf40-dabc71e5c302-7777bdd858 1ff29496-f9aa-4f20-b10a-07cbc2c52924 0xc422290c00 0xc422290c01}],Finalizers:[],ClusterName:,Initializers:nil,},Spec:PodSpec{Volumes:[{default-token-qg7xz {nil nil nil nil nil SecretVolumeSource{SecretName:default-token-qg7xz,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{c-427f8ad8-1d56-11ed-bf40-dabc71e5c302 reg.tx.com/ai/offline_function_tensorrt/rust:amd-2.2 [] [] [{port-7070 0 7070 TCP }] [] [{MINIO_ENDPOINT minio.minio:9000 nil} {MINIO_ACCESS_KEY AKIAIOSFODNN7EXAMGTFXXX nil} {MINIO_SECRET_KEY wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAXXXX nil} {MINIO_BUCKET_NAME aibucket nil} {MINIO_SECURE False nil} {DATA_CALLBACK_URL http://ai-service.lis-test:8000/ai/model/process/data nil} {NVIDIA_VISIBLE_DEVICES all nil}] {map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}] map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}]} [{default-token-qg7xz true /var/run/secrets/kubernetes.io/serviceaccount <nil>}] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}],RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:worker2,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinu xOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists NoExecute 0xc422290c10} {node.kubernetes.io/unreachable Exists NoExecute 0xc422290c18}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],},Status:PodStatus{Phase:Pending,Conditions:[{Initialized True 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:08:42 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:08:42 +0000 UTC ContainersNotReady containers with unready status: [c-427f8ad8-1d56-11ed-bf40-dabc71e5c302]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:08:42 +0000 UTC ContainersNotReady containers with unready status: [c-427f8ad8-1d56-11ed-bf40-dabc71e5c302]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:08:42 +0000 UTC }],Message:,Reason:,HostIP:192.168.3.15,PodIP:,StartTime:2022-08-24 11:08:42 +0000 UTC,ContainerStatuses:[{c-427f8ad8-1d56-11ed-bf40-dabc71e5c302 {ContainerStateWaiting{Reason:ContainerCreating,Message:,} nil nil} {nil nil nil} false 0 reg.tx.com/ai/offline_function_tensorrt/rust:amd-2.2 }],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},} 79ee0235-6df0-4d9c-9099-bf5512da3171:&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:binpack-1-6f4d6d4ff5-bw2b6,GenerateName:binpack-1-6f4d6d4ff5-,Namespace:default,SelfLink:,UID:79ee0235-6df0-4d9c-9099-bf5512da3171,ResourceVersion:86588075,Generation:0,CreationTimestamp:2022-08-23 11:13:24 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{app: binpack-1,pod-template-hash: 6f4d6d4ff5,},Annotations:map[string]string{ALIYUN_COM_GPU_MEM_ASSIGNED: true,ALIYUN_COM_GPU_MEM_ASSUME_TIME: 1661255100544398707,ALIYUN_COM_GPU_MEM_DEV: 7979,ALIYUN_COM_GPU_MEM_IDX: 1,ALIYUN_COM_GPU_MEM_POD: 512,cattle.io/timestamp: 2022-04-26T07:39:18Z,field.cattle.io/ports: [[{"containerPort":7070,"dnsName":"binpack-1","hostPort":0,"kind":"ClusterIP","name":"7070tcp02","protocol":"TCP","sourcePort":0}]],},OwnerReferences:[{apps/v1 ReplicaSet binpack-1-6f4d6d4ff5 e8b1ff38-189a-4147-8627-102a7dc80272 0xc4210bd580 0xc4210bd581}],Finalizers:[],ClusterName:,Initializers:nil,},Spec:PodSpec{Volumes:[{default-token-mjrc5 {nil nil nil nil nil SecretVolumeSource{SecretName:default-token-mjrc5,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{binpack-1 reg.tx.com/ai/offline_function_tensorrt/light:amd-2.8 [] [] [{7070tcp02 0 7070 TCP }] [] [{CGPU_DISABLE true nil} {DATA_CALLBACK_URL http://ai-service.lis-dev:8000/ai/model/process/data nil} {FLASK_PORT 7070 nil} {KAFKA_IP 172.16.2.88:9092 nil} {KAFKA_TOPIC_RECIEVE ai.train.report nil} {KAFKA_TOPIC_SEND ai.train nil} {MINIO_ACCESS_KEY AKIAIOSFODNN7EXAMGTF nil} {MINIO_BUCKET_NAME aibucket nil} {MINIO_ENDPOINT minio-dev.bot-patrol.com nil} {MINIO_SECRET_KEY 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 nil} {MINIO_SECURE False nil} {NVIDIA_VISIBLE_DEVICES all nil}] {map[aliyun.com/gpu-mem:{{512 0} {<nil>} 512 DecimalSI}] map[aliyun.com/gpu-mem:{{512 0} {<nil>} 512 DecimalSI}]} [{default-token-mjrc5 true /var/run/secrets/kubernetes.io/serviceaccount <nil>}] [] nil nil nil /dev/termination-log File IfNotPresent SecurityContext{Capabilities:&Capabilities{Add:[],Drop:[],},Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} false false false}],RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:worker2,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FS Group:nil,RunAsGroup:nil,Sysctls:[],},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists NoExecute 0xc4210bd590} {node.kubernetes.io/unreachable Exists NoExecute 0xc4210bd598}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:&PodDNSConfig{Nameservers:[],Searches:[],Options:[],},ShareProcessNamespace:nil,ReadinessGates:[],},Status:PodStatus{Phase:Pending,Conditions:[{Initialized True 0001-01-01 00:00:00 +0000 UTC 2022-08-23 11:45:00 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2022-08-23 11:45:00 +0000 UTC ContainersNotReady containers with unready status: [binpack-1]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2022-08-23 11:45:00 +0000 UTC ContainersNotReady containers with unready status: [binpack-1]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2022-08-23 11:45:00 +0000 UTC }],Message:,Reason:,HostIP:192.168.3.15,PodIP:,StartTime:2022-08-23 11:45:00 +0000 UTC,ContainerStatuses:[{binpack-1 {ContainerStateWaiting{Reason:ContainerCreating,Message:,} nil nil} {nil nil nil} false 0 reg.tx.com/ai/offline_function_tensorrt/light:amd-2.8 }],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},} cb9d1880-5022-4a30-9dd7-e6e047675497:&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:p-73862152-1d4a-11ed-bf40-dabc71e5c302-954b9cbd5-zfwsx,GenerateName:p-73862152-1d4a-11ed-bf40-dabc71e5c302-954b9cbd5-,Namespace:ai-model,SelfLink:,UID:cb9d1880-5022-4a30-9dd7-e6e047675497,ResourceVersion:87122660,Generation:0,CreationTimestamp:2022-08-24 08:38:20 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{app: 73862152-1d4a-11ed-bf40-dabc71e5c302,pod-template-hash: 954b9cbd5,},Annotations:map[string]string{ALIYUN_COM_GPU_MEM_ASSIGNED: true,ALIYUN_COM_GPU_MEM_ASSUME_TIME: 1661330300863721341,ALIYUN_COM_GPU_MEM_DEV: 7979,ALIYUN_COM_GPU_MEM_IDX: 1,ALIYUN_COM_GPU_MEM_POD: 1000,},OwnerReference s:[{apps/v1 ReplicaSet p-73862152-1d4a-11ed-bf40-dabc71e5c302-954b9cbd5 ca34bd09-1d44-4d88-a484-6d22cf477940 0xc420738ea0 0xc420738ea1}],Finalizers:[],ClusterName:,Initializers:nil,},Spec:PodSpec{Volumes:[{default-token-qg7xz {nil nil nil nil nil SecretVolumeSource{SecretName:default-token-qg7xz,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{c-73862152-1d4a-11ed-bf40-dabc71e5c302 reg.tx.com/ai/offline_function_tensorrt/meter_sf6:amd-2.2 [] [] [{port-7070 0 7070 TCP }] [] [{MINIO_ENDPOINT minio.minio:9000 nil} {MINIO_ACCESS_KEY AKIAIOSFODNN7EXAMGTFXXX nil} {MINIO_SECRET_KEY wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAXXXX nil} {MINIO_BUCKET_NAME aibucket nil} {MINIO_SECURE False nil} {DATA_CALLBACK_URL http://ai-service.lis-test:8000/ai/model/process/data nil} {NVIDIA_VISIBLE_DEVICES all nil}] {map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}] map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}]} [{default-token-qg7xz true /var/run/secrets/kubernetes.io/serviceaccount <nil>}] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}],RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:worker2,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists NoExecute 0xc420738eb0} {node.kubernetes.io/unreachable Exists NoExecute 0xc420738eb8}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],},Status:PodStatus{Phase:Pending,Conditions:[{Initialized True 0001-01-01 00:00:00 +0000 UTC 202 08:38:20 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2022-08-24 08:38:20 +0000 UTC ContainersNotReady containers with unready status: [c-73862152-1d4a-11ed-bf40-dabc71e5c302]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2022-08-24 08:38:20 +0000 UTC ContainersNotReady containers with unready status: [c-73862152-1d4a-11ed-bf40-dabc71e5c302]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2022-08-24 08:38:20 +0000 UTC }],Message:,Reason:,HostIP:192.168.3.15,PodIP:,StartTime:2022-08-24 08:38:20 +0000 UTC,ContainerStatuses:[{c-73862152-1d4a-11ed-bf40-dabc71e5c302 {ContainerStateWaiting{Reason:ContainerCreating,Message:,} nil nil} {nil nil nil} false 0 reg.tx.com/ai/offline_function_tensorrt/meter_sf6:amd-2.2 }],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},} a2b53218-05cf-4d93-a8e3-d3b1945e1058:&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:p-7425d432-1d49-11ed-bf40-dabc71e5c302-7c8b4b458f-4qnx7,GenerateName:p-7425d432-1d49-11ed-bf40-dabc71e5c302-7c8b4b458f-,Namespace:ai-model,SelfLink:,UID:a2b53218-05cf-4d93-a8e3-d3b1945e1058,ResourceVersion:87122965,Generation:0,CreationTimestamp:2022-08-24 08:38:55 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{app: 7425d432-1d49-11ed-bf40-dabc71e5c302,pod-template-hash: 7c8b4b458f,},Annotations:map[string]string{ALIYUN_COM_GPU_MEM_ASSIGNED: true,ALIYUN_COM_GPU_MEM_ASSUME_TIME: 1661330335577141303,ALIYUN_COM_GPU_MEM_DEV: 7979,ALIYUN_COM_GPU_MEM_IDX: 1,ALIYUN_COM_GPU_MEM_POD: 1000,cni.projectcalico.org/podIP: 10.42.1.230/32,cni.projectcalico.org/podIPs: 10.42.1.230/32,},OwnerReferences:[{apps/v1 ReplicaSet p-7425d432-1d49-11ed-bf40-dabc71e5c302-7c8b4b458f eaf357e4-6727-46ab-8222-eb8cbfc8c3ac 0xc420adfc20 0xc420adfc21}],Finalizers:[],ClusterName:,Initializers:nil,},Spec:PodSpec{Volumes:[{default-token-qg7xz {nil nil nil nil nil SecretVolumeSource{SecretName:default-token-qg7xz,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ni l nil nil}}],Containers:[{c-7425d432-1d49-11ed-bf40-dabc71e5c302 reg.tx.com/ai/offline_function_tensorrt/respirator:amd-2.2 [] [] [{port-7070 0 7070 TCP }] [] [{MINIO_ENDPOINT minio.minio:9000 nil} {MINIO_ACCESS_KEY AKIAIOSFODNN7EXAMGTFXXX nil} {MINIO_SECRET_KEY wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAXXXX nil} {MINIO_BUCKET_NAME aibucket nil} {MINIO_SECURE False nil} {DATA_CALLBACK_URL http://ai-service.lis-test:8000/ai/model/process/data nil} {NVIDIA_VISIBLE_DEVICES all nil}] {map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}] map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}]} [{default-token-qg7xz true /var/run/secrets/kubernetes.io/serviceaccount <nil>}] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}],RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:worker2,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists NoExecute 0xc420adfc30} {node.kubernetes.io/unreachable Exists NoExecute 0xc420adfc38}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],},Status:PodStatus{Phase:Pending,Conditions:[{Initialized True 0001-01-01 00:00:00 +0000 UTC 2022-08-24 08:38:55 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2022-08-24 08:38:55 +0000 UTC ContainersNotReady containers with unready status: [c-7425d432-1d49-11ed-bf40-dabc71e5c302]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2022-08-24 08:38:55 +0000 UTC ContainersNotReady containers with unready status: [c-7425d432-1d49-11ed-bf40-dabc71e5c302]} {PodSche duled True 0001-01-01 00:00:00 +0000 UTC 2022-08-24 08:38:55 +0000 UTC }],Message:,Reason:,HostIP:192.168.3.15,PodIP:,StartTime:2022-08-24 08:38:55 +0000 UTC,ContainerStatuses:[{c-7425d432-1d49-11ed-bf40-dabc71e5c302 {ContainerStateWaiting{Reason:ContainerCreating,Message:,} nil nil} {nil nil nil} false 0 reg.tx.com/ai/offline_function_tensorrt/respirator:amd-2.2 }],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},}], and its address is 0xc420629b00 [ debug ] 2022/08/25 01:57:36 pod.go:107: pod binpack-1-6f4d6d4ff5-bw2b6 in ns default with status Pending has GPU Mem 512 [ debug ] 2022/08/25 01:57:36 pod.go:107: pod p-73862152-1d4a-11ed-bf40-dabc71e5c302-954b9cbd5-zfwsx in ns ai-model with status Pending has GPU Mem 1000 [ debug ] 2022/08/25 01:57:36 pod.go:107: pod p-7425d432-1d49-11ed-bf40-dabc71e5c302-7c8b4b458f-4qnx7 in ns ai-model with status Pending has GPU Mem 1000 [ debug ] 2022/08/25 01:57:36 pod.go:107: pod p-3bdd5ad6-1d54-11ed-bf40-dabc71e5c302-669f6bd48-bhvkr in ns ai-model with status Pending has GPU Mem 1000 [ debug ] 2022/08/25 01:57:36 pod.go:107: pod p-427f8ad8-1d56-11ed-bf40-dabc71e5c302-7777bdd858-rg894 in ns ai-model with status Pending has GPU Mem 1000 [ info ] 2022/08/25 01:57:36 nodeinfo.go:413: getUsedGPUs: map[0:1000 1:4512] in node worker2, and devs map[0:0xc420629ae0 1:0xc420629b00] [ info ] 2022/08/25 01:57:36 nodeinfo.go:431: try to find unhealthy node unhealthy-gpu-worker2 [ info ] 2022/08/25 01:57:36 nodeinfo.go:397: available GPU list map[0:6979 1:3467] before removing unhealty GPUs [ info ] 2022/08/25 01:57:36 nodeinfo.go:402: available GPU list map[0:6979 1:3467] after removing unhealty GPUs [ info ] 2022/08/25 01:57:36 nodeinfo.go:321: reqGPU for pod p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-npk9b in ns ai-model: 1000 [ info ] 2022/08/25 01:57:36 nodeinfo.go:322: AvailableGPUs: map[0:6979 1:3467] in node worker2 [ info ] 2022/08/25 01:57:36 nodeinfo.go:372: Find candidate dev id 0 for pod p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-npk9b in ns ai-model successfully. [ info ] 2022/08/25 01:57:36 nodeinfo.go:188: Allocate() 1. Allocate GPU ID 0 to pod p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-npk9b in ns ai-model.---- [ info ] 2022/08/25 01:57:36 nodeinfo.go:227: Allocate() 2. Try to bind pod p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-npk9b in ai-model namespace to node with &Binding{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-npk9b,GenerateName:,Namespace:,SelfLink:,UID:cb2866b3-bb73-47ca-898e-1652d41d79c0,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Target:ObjectReference{Kind:Node,Namespace:,Name:worker2,UID:,APIVersion:,ResourceVersion:,FieldPath:,},} [ info ] 2022/08/25 01:57:36 controller.go:297: Need to update pod name p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-npk9b in ns ai-model and old status is Pending, new status is Pending; its old annotation map[] and new annotation map[ALIYUN_COM_GPU_MEM_IDX:0 ALIYUN_COM_GPU_MEM_POD:1000 ALIYUN_COM_GPU_MEM_ASSIGNED:false ALIYUN_COM_GPU_MEM_ASSUME_TIME:1661392656240237376 ALIYUN_COM_GPU_MEM_DEV:7979] [ info ] 2022/08/25 01:57:36 nodeinfo.go:241: Allocate() 3. Try to add pod p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-npk9b in ns ai-model to dev 0 [ debug ] 2022/08/25 01:57:36 deviceinfo.go:57: dev.addPod() Pod p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-npk9b in ns ai-model with the GPU ID 0 will be added to device map [ debug ] 2022/08/25 01:57:36 deviceinfo.go:64: dev.addPod() after updated is map[cb2866b3-bb73-47ca-898e-1652d41d79c0:&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-npk9b,GenerateName:p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-,Namespace:ai-model,SelfLink:,UID:cb2866b3-bb73-47ca-898e-1652d41d79c0,ResourceVersion:87560153,Generation:0,CreationTimestamp:2022-08-25 01:57:36 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{app: 0011eac6-1f84-11ed-b00d-ee11b204c376,pod-template-hash: 78dfc9cd48,},Annotations:map[string]string{ALIYUN_COM_GPU_MEM_ASSIGNED: false,ALIYUN_COM_GPU_MEM_ASSUME_TIME: 1661392656240237376,ALIYUN_COM_GPU_MEM_DEV: 7979,ALIYUN_COM_GPU_MEM_IDX: 0,ALIYUN_COM_GPU_MEM_POD: 1000,},OwnerReferences:[{apps/v1 ReplicaSet p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48 577b8c42-2fa6-4bfc-9eb2-4658deff6bea 0xc420eda3f7 0xc420eda3f8}],Finalizers:[],ClusterName:,Initializers:nil,},Spec:PodSpec{Volumes:[{default-token-qg7xz {nil nil nil nil nil SecretVolumeSource{SecretName:default-token-qg7xz,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{c-0011eac6-1f84-11ed-b00d-ee11b204c376 reg.tx.com/ai/offline_function_tensorrt/toggle_switch:amd-2.3 [] [] [{port-7070 0 7070 TCP }] [] [{MINIO_ENDPOINT minio.minio:9000 nil} {MINIO_ACCESS_KEY AKIAIOSFODNN7EXAMGTFXXX nil} {MINIO_SECRET_KEY wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAXXXX nil} {MINIO_BUCKET_NAME aibucket nil} {MINIO_SECURE False nil} {DATA_CALLBACK_URL http://ai-service.lis-test:8000/ai/model/process/data nil} {NVIDIA_VISIBLE_DEVICES all nil}] {map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}] map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}]} [{default-token-qg7xz true /var/run/secrets/kubernetes.io/serviceaccount <nil>}] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}],RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists NoExecute 0xc420edab40} {node.kubernetes.io/unreachable Exists NoExecute 0xc420edab60}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],},Status:PodStatus{Phase:Pending,Conditions:[],Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},} 7c8baa1c-38d3-426c-b613-6673ef6cdd03:&Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:p-f407d20e-239d-11ed-8a9a-b2a44a4d350f-8556669ddf-w9crj,GenerateName:p-f407d20e-239d-11ed-8a9a-b2a44a4d350f-8556669ddf-,Namespace:ai-model,SelfLink:,UID:7c8baa1c-38d3-426c-b613-6673ef6cdd03,ResourceVersion:87201592,Generation:0,CreationTimestamp:2022-08-24 11:43:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{app: f407d20e-239d-11ed-8a9a-b2a44a4d350f,pod-template-hash: 8556669ddf,},Annotations:map[string]string{ALIYUN_COM_GPU_MEM_ASSIGNED: true,ALIYUN_COM_GPU_MEM_ASSUME_TIME: 1661341380164154462,ALIYUN_COM_GPU_MEM_DEV: 7979,ALIYUN_COM_GPU_MEM_IDX: 0,ALIYUN_COM_GPU_MEM_POD: 1000,cni.projectcalico.org/podIP: 10.42.1.242/32,cni.projectcalico.org/podIPs: 10.42.1.242/32,},OwnerReferences:[{apps/v1 ReplicaSet p-f407d20e-239d-11ed-8a9a-b2a44a4d350f-8556669ddf 2c580ca3-1b79-46f3-a869-15dd1fedc74c 0xc4206ba7a0 0xc4206ba7a1}],Finalizers:[],ClusterName:,Initializers:nil,},Spec:PodSpec{Volumes:[{default-token-qg7xz {nil n il nil nil nil SecretVolumeSource{SecretName:default-token-qg7xz,Items:[],DefaultMode:*420,Optional:nil,} nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil}}],Containers:[{c-f407d20e-239d-11ed-8a9a-b2a44a4d350f reg.tx.com/ai/offline_function_tensorrt/clamp:amd-2.6 [] [] [{port-7070 0 7070 TCP }] [] [{MINIO_ENDPOINT minio.minio:9000 nil} {MINIO_ACCESS_KEY AKIAIOSFODNN7EXAMGTFXXX nil} {MINIO_SECRET_KEY wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAXXXX nil} {MINIO_BUCKET_NAME aibucket nil} {MINIO_SECURE False nil} {DATA_CALLBACK_URL http://ai-service.lis-test:8000/ai/model/process/data nil} {NVIDIA_VISIBLE_DEVICES all nil}] {map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}] map[aliyun.com/gpu-mem:{{1 3} {<nil>} 1k DecimalSI}]} [{default-token-qg7xz true /var/run/secrets/kubernetes.io/serviceaccount <nil>}] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}],RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:worker2,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists NoExecute 0xc4206ba7b0} {node.kubernetes.io/unreachable Exists NoExecute 0xc4206ba7b8}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],},Status:PodStatus{Phase:Pending,Conditions:[{Initialized True 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:43:00 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:43:00 +0000 UTC ContainersNotReady containers with unready status: [c-f407d20e-239d-11ed-8a9a-b2a44a4d350f]} {ContainersReady False 0001-01-01 00:00: +0000 UTC 2022-08-24 11:43:00 +0000 UTC ContainersNotReady containers with unready status: [c-f407d20e-239d-11ed-8a9a-b2a44a4d350f]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2022-08-24 11:43:00 +0000 UTC }],Message:,Reason:,HostIP:192.168.3.15,PodIP:,StartTime:2022-08-24 11:43:00 +0000 UTC,ContainerStatuses:[{c-f407d20e-239d-11ed-8a9a-b2a44a4d350f {ContainerStateWaiting{Reason:ContainerCreating,Message:,} nil nil} {nil nil nil} false 0 reg.tx.com/ai/offline_function_tensorrt/clamp:amd-2.6 }],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},}], and its address is 0xc420629ae0 [ info ] 2022/08/25 01:57:36 nodeinfo.go:252: Allocate() ----End to allocate GPU for gpu mem for pod p-0011eac6-1f84-11ed-b00d-ee11b204c376-78dfc9cd48-npk9b in ns ai-model----

@1003111014
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant