Description
#978 added an extra check for the PodUnschedulable condition, for cases when pod events were not reliable to report issues with the workspace deployment. That fix was made obsolete by 255d699.
Now it's come to my attention that, if you try and add the FailedScheduling event to the list of ignoredUnrecoverableEvents in the DWOC, the PodUnschedulable condition will still be caught, and the workspace will fail.
How To Reproduce
- Add FailedScheduling to the ignoredUnrecoverableEvents in the DWOC:
apiVersion: controller.devfile.io/v1alpha1
config:
routing:
clusterHostSuffix: 192.168.39.246.nip.io
defaultRoutingClass: basic
workspace:
+ ignoredUnrecoverableEvents:
+ - FailedScheduling
imagePullPolicy: Always
- Start a workspace that causes a FailedScheduling event, e.g. by requesting more CPU than the cluster can provide:
kind: DevWorkspace
apiVersion: workspace.devfile.io/v1alpha2
metadata:
name: theia-next-high-cpu
spec:
started: true
template:
projects:
- name: web-nodejs-sample
git:
remotes:
origin: "https://github.com/che-samples/web-nodejs-sample.git"
components:
- name: theia
plugin:
uri: https://che-plugin-registry-main.surge.sh/v3/plugins/eclipse/che-theia/latest/devfile.yaml
components:
- name: theia-ide
container:
env:
- name: THEIA_HOST
value: 0.0.0.0
memoryRequest: 2Gi
memoryLimit: 16Gi
cpuRequest: 4000m
cpuLimit: 8000m
commands:
- id: say-hello
exec:
component: theia-ide
commandLine: echo "Hello from $(pwd)"
workingDir: ${PROJECTS_ROOT}/project/app
- See that the workspace still fails, stating the pod is unschedulable:
$ kubectl get dw -n $NAMESPACE -w
NAME DEVWORKSPACE ID PHASE INFO
theia-next-high-cpu workspace544c789045e040d0 Failed Pod is unschedulable: 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Expected behavior
The workspace does not fail immediately, and should time out instead.
Description
#978 added an extra check for the
PodUnschedulablecondition, for cases when pod events were not reliable to report issues with the workspace deployment. That fix was made obsolete by 255d699.Now it's come to my attention that, if you try and add the FailedScheduling event to the list of ignoredUnrecoverableEvents in the DWOC, the
PodUnschedulablecondition will still be caught, and the workspace will fail.How To Reproduce
apiVersion: controller.devfile.io/v1alpha1 config: routing: clusterHostSuffix: 192.168.39.246.nip.io defaultRoutingClass: basic workspace: + ignoredUnrecoverableEvents: + - FailedScheduling imagePullPolicy: AlwaysExpected behavior
The workspace does not fail immediately, and should time out instead.