Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions controllers/workspace/devworkspace_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -323,12 +323,17 @@ func (r *DevWorkspaceReconciler) Reconcile(ctx context.Context, req ctrl.Request
}
}

postStartDebugTrapSleepDuration := ""
if workspace.Annotations[constants.DevWorkspaceDebugStartAnnotation] == "true" {
postStartDebugTrapSleepDuration = workspace.Config.Workspace.ProgressTimeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I don't like using ProgressTimeout for this purpose.
But on the other hand I don't have another solution but some constant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I use ProgressTimeout to be consistent with the behavior of the Debug annotation when it fails for the main component.

We do not scale down the failing workspace until the failing timeout is satisfied:

// If debug annotation is present, leave the deployment in place to let users
// view logs.
if workspace.Annotations[constants.DevWorkspaceDebugStartAnnotation] == "true" {
if isTimeout, err := checkForFailingTimeout(workspace); err != nil {

Inside the checkForFailingTimeout, we're parsing ProgressTimeout:

timeout, err := time.ParseDuration(workspace.Config.Workspace.ProgressTimeout)

}
devfilePodAdditions, err := containerlib.GetKubeContainersFromDevfile(
&workspace.Spec.Template,
workspace.Config.Workspace.ContainerSecurityContext,
workspace.Config.Workspace.ImagePullPolicy,
workspace.Config.Workspace.DefaultContainerResources,
workspace.Config.Workspace.PostStartTimeout,
postStartDebugTrapSleepDuration,
)
if err != nil {
return r.failWorkspace(workspace, fmt.Sprintf("Error processing devfile: %s", err), metrics.ReasonBadRequest, reqLogger, &reconcileStatus), nil
Expand Down
8 changes: 8 additions & 0 deletions docs/additional-configuration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,14 @@ The DevWorkspace Operator sets the `volumeMounts` by default for config files, m
## Debugging a failing workspace
Normally, when a workspace fails to start, the deployment will be scaled down and the workspace will be stopped in a `Failed` state. This can make it difficult to debug misconfiguration errors, so the annotation `controller.devfile.io/debug-start: "true"` can be applied to DevWorkspaces to leave resources for failed workspaces on the cluster. This allows viewing logs from workspace containers.

It also enables a specialized debug mode for `postStart` lifecycle hooks, which are often used for initial setup tasks.

When a postStart command fails:
- The container will not immediately crash or restart. It would stay in `ContainerCreating` phase.
- The command failure is trapped, and the container is instead forced to sleep for some seconds as per configured DevWorkspace progressTimeout (by default, 5 minutes).

This trap sleep pause is a critical window that allows developers to connect to the container (e.g., using `kubectl exec`), inspect the file system, and review logs `/tmp/poststart-stderr.txt` / `/tmp/poststart-stdout.txt` to diagnose the exact cause of the postStart failure before the workspace ultimately scales down. This applies to both standard and timeout-wrapped postStart scripts.

## Setting RuntimeClass for workspace pods
To run a DevWorkspace with a specific RuntimeClass, the attribute `controller.devfile.io/runtime-class` can be set on the DevWorkspace with the name of the RuntimeClass to be used. If the specified RuntimeClass does not exist, the workspace will fail to start. For example, to run a DevWorkspace using the https://github.com/kata-containers/kata-containers[kata containers] runtime in clusters where this is enabled, the DevWorkspace can be specified:
[source,yaml]
Expand Down
4 changes: 2 additions & 2 deletions pkg/library/container/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ import (
// rewritten as Volumes are added to PodAdditions, in order to support e.g. using one PVC to hold all volumes
//
// Note: Requires DevWorkspace to be flattened (i.e. the DevWorkspace contains no Parent or Components of type Plugin)
func GetKubeContainersFromDevfile(workspace *dw.DevWorkspaceTemplateSpec, securityContext *corev1.SecurityContext, pullPolicy string, defaultResources *corev1.ResourceRequirements, postStartTimeout string) (*v1alpha1.PodAdditions, error) {
func GetKubeContainersFromDevfile(workspace *dw.DevWorkspaceTemplateSpec, securityContext *corev1.SecurityContext, pullPolicy string, defaultResources *corev1.ResourceRequirements, postStartTimeout string, postStartDebugTrapSleepDuration string) (*v1alpha1.PodAdditions, error) {
if !flatten.DevWorkspaceIsFlattened(workspace, nil) {
return nil, fmt.Errorf("devfile is not flattened")
}
Expand Down Expand Up @@ -77,7 +77,7 @@ func GetKubeContainersFromDevfile(workspace *dw.DevWorkspaceTemplateSpec, securi
podAdditions.Containers = append(podAdditions.Containers, *k8sContainer)
}

if err := lifecycle.AddPostStartLifecycleHooks(workspace, podAdditions.Containers, postStartTimeout); err != nil {
if err := lifecycle.AddPostStartLifecycleHooks(workspace, podAdditions.Containers, postStartTimeout, postStartDebugTrapSleepDuration); err != nil {
return nil, err
}

Expand Down
2 changes: 1 addition & 1 deletion pkg/library/container/container_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ func TestGetKubeContainersFromDevfile(t *testing.T) {
t.Run(tt.Name, func(t *testing.T) {
// sanity check that file is read correctly.
assert.True(t, len(tt.Input.Components) > 0, "Input defines no components")
gotPodAdditions, err := GetKubeContainersFromDevfile(tt.Input, nil, testImagePullPolicy, defaultResources, "")
gotPodAdditions, err := GetKubeContainersFromDevfile(tt.Input, nil, testImagePullPolicy, defaultResources, "", "")
if tt.Output.ErrRegexp != nil && assert.Error(t, err) {
assert.Regexp(t, *tt.Output.ErrRegexp, err.Error(), "Error message should match")
} else {
Expand Down
56 changes: 48 additions & 8 deletions pkg/library/lifecycle/poststart.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ package lifecycle

import (
"fmt"
"regexp"
"strings"
"time"

Expand All @@ -41,7 +42,9 @@ const (
`
)

func AddPostStartLifecycleHooks(wksp *dw.DevWorkspaceTemplateSpec, containers []corev1.Container, postStartTimeout string) error {
var trapErrRegex = regexp.MustCompile(`\btrap\b.*\bERR\b`)

func AddPostStartLifecycleHooks(wksp *dw.DevWorkspaceTemplateSpec, containers []corev1.Container, postStartTimeout string, postStartDebugTrapSleepDuration string) error {
if wksp.Events == nil || len(wksp.Events.PostStart) == 0 {
return nil
}
Expand Down Expand Up @@ -69,7 +72,7 @@ func AddPostStartLifecycleHooks(wksp *dw.DevWorkspaceTemplateSpec, containers []
return fmt.Errorf("failed to process postStart event %s: %w", commands[0].Id, err)
}

postStartHandler, err := processCommandsForPostStart(commands, postStartTimeout)
postStartHandler, err := processCommandsForPostStart(commands, postStartTimeout, postStartDebugTrapSleepDuration)
if err != nil {
return fmt.Errorf("failed to process postStart event %s: %w", commands[0].Id, err)
}
Expand All @@ -85,10 +88,10 @@ func AddPostStartLifecycleHooks(wksp *dw.DevWorkspaceTemplateSpec, containers []

// processCommandsForPostStart processes a list of DevWorkspace commands
// and generates a corev1.LifecycleHandler for the PostStart lifecycle hook.
func processCommandsForPostStart(commands []dw.Command, postStartTimeout string) (*corev1.LifecycleHandler, error) {
func processCommandsForPostStart(commands []dw.Command, postStartTimeout string, postStartDebugTrapSleepDuration string) (*corev1.LifecycleHandler, error) {
if postStartTimeout == "" {
// use the fallback if no timeout propagated
return processCommandsWithoutTimeoutFallback(commands)
return processCommandsWithoutTimeoutFallback(postStartDebugTrapSleepDuration, commands)
}

originalUserScript, err := buildUserScript(commands)
Expand All @@ -101,7 +104,7 @@ func processCommandsForPostStart(commands []dw.Command, postStartTimeout string)
scriptToExecute := "set -e\n" + originalUserScript
escapedUserScriptForTimeoutWrapper := strings.ReplaceAll(scriptToExecute, "'", `'\''`)

fullScriptWithTimeout := generateScriptWithTimeout(escapedUserScriptForTimeoutWrapper, postStartTimeout)
fullScriptWithTimeout := generateScriptWithTimeout(postStartDebugTrapSleepDuration, escapedUserScriptForTimeoutWrapper, postStartTimeout)

finalScriptForHook := fmt.Sprintf(redirectOutputFmt, fullScriptWithTimeout)

Expand All @@ -128,8 +131,10 @@ func processCommandsForPostStart(commands []dw.Command, postStartTimeout string)
// - |
// cd <workingDir>
// <commandline>
func processCommandsWithoutTimeoutFallback(commands []dw.Command) (*corev1.LifecycleHandler, error) {
func processCommandsWithoutTimeoutFallback(postStartDebugTrapSleepDuration string, commands []dw.Command) (*corev1.LifecycleHandler, error) {
var dwCommands []string
postStartFailureDebugSleepSeconds := parsePostStartFailureDebugSleepDurationToSeconds(postStartDebugTrapSleepDuration)
hasErrTrapInUserScript := false
for _, command := range commands {
execCmd := command.Exec
if len(execCmd.Env) > 0 {
Expand All @@ -139,6 +144,21 @@ func processCommandsWithoutTimeoutFallback(commands []dw.Command) (*corev1.Lifec
dwCommands = append(dwCommands, fmt.Sprintf("cd %s", execCmd.WorkingDir))
}
dwCommands = append(dwCommands, execCmd.CommandLine)
if trapErrRegex.MatchString(execCmd.CommandLine) {
hasErrTrapInUserScript = true
}
}

if postStartFailureDebugSleepSeconds > 0 && !hasErrTrapInUserScript {
debugTrap := fmt.Sprintf(`
trap 'echo "[postStart] failure encountered, sleep for debugging"; sleep %d' ERR
`, postStartFailureDebugSleepSeconds)
debugTrapLine := strings.ReplaceAll(strings.TrimSpace(debugTrap), "\n", " ")

dwCommands = append([]string{
"set -e",
debugTrapLine,
}, dwCommands...)
}

joinedCommands := strings.Join(dwCommands, "\n")
Expand Down Expand Up @@ -187,7 +207,7 @@ func buildUserScript(commands []dw.Command) (string, error) {
// environment variable exports, and specific exit code handling.
// The killAfterDurationSeconds is hardcoded to 5s within this generated script.
// It conditionally prefixes the user script with the timeout command if available.
func generateScriptWithTimeout(escapedUserScript string, postStartTimeout string) string {
func generateScriptWithTimeout(postStartDebugTrapSleepDuration string, escapedUserScript string, postStartTimeout string) string {
// Convert `postStartTimeout` into the `timeout` format
var timeoutSeconds int64
if postStartTimeout != "" && postStartTimeout != "0" {
Expand All @@ -199,10 +219,12 @@ func generateScriptWithTimeout(escapedUserScript string, postStartTimeout string
timeoutSeconds = int64(duration.Seconds())
}
}
postStartFailureDebugSleepSeconds := parsePostStartFailureDebugSleepDurationToSeconds(postStartDebugTrapSleepDuration)

return fmt.Sprintf(`
export POSTSTART_TIMEOUT_DURATION="%d"
export POSTSTART_KILL_AFTER_DURATION="5"
export DEBUG_ENABLED="%t"

_TIMEOUT_COMMAND_PART=""
_WAS_TIMEOUT_USED="false" # Use strings "true" or "false" for shell boolean
Expand All @@ -219,6 +241,11 @@ fi
${_TIMEOUT_COMMAND_PART} /bin/sh -c '%s'
exit_code=$?

if [ "$DEBUG_ENABLED" = "true" ] && [ $exit_code -ne 0 ]; then
echo "[postStart] failure encountered, sleep for debugging" >&2
sleep %d
fi

# Check the exit code based on whether timeout was attempted
if [ "$_WAS_TIMEOUT_USED" = "true" ]; then
if [ $exit_code -eq 143 ]; then # 128 + 15 (SIGTERM)
Expand All @@ -239,5 +266,18 @@ else
fi

exit $exit_code
`, timeoutSeconds, escapedUserScript)
`, timeoutSeconds, postStartFailureDebugSleepSeconds > 0, escapedUserScript, postStartFailureDebugSleepSeconds)
}

func parsePostStartFailureDebugSleepDurationToSeconds(durationStr string) int {
if durationStr == "" {
return 0
}

d, err := time.ParseDuration(durationStr)
if err != nil {
return 0
}

return int(d.Seconds())
}
Loading
Loading