-
Notifications
You must be signed in to change notification settings - Fork 66
Description
Description
Note: This only happens with some specific container images. It doesn't happen with all container images. If I change the container base image to quay.io/wto/web-terminal-tooling:next, this workspace comes into Running state.
When the DevWorkspaceOperatorConfig is configured with a config.workspace.postStartTimeout (e.g., 5m), a DevWorkspace with a postStart event referencing a command fails to start and enters the Failing phase. The pod for the workspace enters a CrashLoopBackOff state due to a FailedPostStartHook.
This issue does not occur if the postStartTimeout is removed from the configuration.
Example DevWorkspaceOperatorConfig snippet:
config:
workspace:
postStartTimeout: 5mHere is the DevWorkspace I was trying to create. It has a very simple postStart hook that should succeed:
apiVersion: workspace.devfile.io/v1alpha2
kind: DevWorkspace
metadata:
name: working-post-start-ws
annotations:
controller.devfile.io/debug-start: "true"
spec:
started: true
template:
components:
- name: tools
container:
image: quay.io/wto/web-terminal-tooling:latest
sourceMapping: /projects
command: [ "tail" ]
args: [ "-f", "/dev/null" ]
commands:
- id: failing-command
exec:
commandLine: |
echo "Execuet poststart ls"
ls -lt
component: tools
events:
postStart:
- failing-commandHowever, after creating this DevWorkspace goes into this state
NAMESPACE NAME DEVWORKSPACE ID PHASE INFO
openshift-operators working-post-start-ws workspacef36328c1632b4957 Failing Error creating DevWorkspace deployment: Detected unrecoverable event FailedPostStartHook: [postStart hook] failed with an unknown error (see pod events or container logs for more details)
NAME READY STATUS RESTARTS AGE
workspacef36328c1632b4957-8588d4d77-rpgkh 0/1 CrashLoopBackOff 5 (64s ago) 5m26s
# Rendered lifecycle.postStart in pod spec
image: quay.io/wto/web-terminal-tooling:latest
imagePullPolicy: Always
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- |
{
# This script block ensures its exit code is preserved
# while its stdout and stderr are tee'd.
_script_to_run() {
export POSTSTART_TIMEOUT_DURATION="300"
export POSTSTART_KILL_AFTER_DURATION="5"
_TIMEOUT_COMMAND_PART=""
_WAS_TIMEOUT_USED="false" # Use strings "true" or "false" for shell boolean
if command -v timeout >/dev/null 2>&1; then
echo "[postStart hook] Executing commands with timeout: ${POSTSTART_TIMEOUT_DURATION} seconds, kill after: ${POSTSTART_KILL_AFTER_DURATION} seconds" >&2
_TIMEOUT_COMMAND_PART="timeout --preserve-status --kill-after=${POSTSTART_KILL_AFTER_DURATION} ${POSTSTART_TIMEOUT_DURATION}"
_WAS_TIMEOUT_USED="true"
else
echo "[postStart hook] WARNING: 'timeout' utility not found. Executing commands without timeout." >&2
fi
# Execute the user's script
${_TIMEOUT_COMMAND_PART} /bin/sh -c 'set -e
echo "Execuet poststart ls"
ls -lt
'
exit_code=$?
# Check the exit code based on whether timeout was attempted
if [ "$_WAS_TIMEOUT_USED" = "true" ]; then
if [ $exit_code -eq 143 ]; then # 128 + 15 (SIGTERM)
echo "[postStart hook] Commands terminated by SIGTERM (likely timed out after ${POSTSTART_TIMEOUT_DURATION}s). Exit code 143." >&2
elif [ $exit_code -eq 137 ]; then # 128 + 9 (SIGKILL)
echo "[postStart hook] Commands forcefully killed by SIGKILL (likely after --kill-after ${POSTSTART_KILL_AFTER_DURATION}s expired). Exit code 137." >&2
elif [ $exit_code -ne 0 ]; then # Catches any other non-zero exit code
echo "[postStart hook] Commands failed with exit code $exit_code." >&2
else
echo "[postStart hook] Commands completed successfully within the time limit." >&2
fi
else
if [ $exit_code -ne 0 ]; then
echo "[postStart hook] Commands failed with exit code $exit_code (no timeout)." >&2
else
echo "[postStart hook] Commands completed successfully (no timeout)." >&2
fi
fi
exit $exit_code
}
_script_to_run
} 1> >(tee -a "/tmp/poststart-stdout.txt") 2> >(tee -a "/tmp/poststart-stderr.txt" >&2)
I observed this issue is more related to the image used in DevWorkspace spec. here are few observations:
quay.io/wto/web-terminal-tooling:nextworksquay.io/wto/web-terminal-tooling:latestdoesn't workquay.io/devfile/universal-developer-image:latestworksquay.io/devfile/universal-developer-image:ubi8-latestdoesn't work
I checked timeout utility is present in all these images. I'm not 100% sure whether it's due to some configuration mistake from my side or an actual issue.