Skip to content

Add project clone retries functionality#1613

Open
dkwon17 wants to merge 2 commits intodevfile:mainfrom
dkwon17:retries
Open

Add project clone retries functionality#1613
dkwon17 wants to merge 2 commits intodevfile:mainfrom
dkwon17:retries

Conversation

@dkwon17
Copy link
Copy Markdown
Collaborator

@dkwon17 dkwon17 commented Apr 10, 2026

Assisted-by: Claude Code Opus 4.6

What does this PR do?

Adds project clone retries functionality.

For the time being, by default, there will be up to 3 retries (4 attempts in total) with an exponential backoff. In a later PR, it would be nice to add a field in the DWOC to configure the number of retry attempts.

What issues does this PR fix or reference?

https://redhat.atlassian.net/browse/CRW-9779

Is it tested? How?

After building the image, install DWO:

export PROJECT_CLONE_IMG=quay.io/dkwon17/project-clone:retries
podman build -t "$PROJECT_CLONE_IMG" -f ./project-clone/Dockerfile .
podman push $PROJECT_CLONE_IMG
export DOCKER=podman
make docker
make install 

I have tested two cases:

Verify that regular cloning is working as intended

Create the following DW and verify that project-clone was successful:

curl -sL https://raw.githubusercontent.com/devfile/devworkspace-operator/refs/heads/main/samples/git-clone-sample.yaml | oc apply -f -

Successful project-clone init container:

...
2026/04/09 23:09:41 Processing project devworkspace-operator
2026/04/09 23:09:41 Cloning project devworkspace-operator to /projects/project-clone-2875820073/devworkspace-operator
Cloning into '/projects/project-clone-2875820073/devworkspace-operator'...
2026/04/09 23:09:49 Cloned project devworkspace-operator to /projects/project-clone-2875820073/devworkspace-operator
...
Verify that retries are working as intended.

Create the following DW. Note that in this DW, the devworkspace-operator git project has typos in the remote urls, which will cause failures during git clone.

kind: DevWorkspace
apiVersion: workspace.devfile.io/v1alpha2
metadata:
  name: git-clone-sample-devworkspace
spec:
  started: true
  template:
    projects:
      - name: web-nodejs-sample
        git:
          remotes:
            origin: "https://github.com/che-samples/web-nodejs-sample.git"
      - name: devworkspace-operator
        git:
          checkoutFrom:
            remote: origin
            revision: 0.21.x
          remotes:
            origin: "https://github.com/typo/devworkspace-operator.git"
            amisevsk: "https://github.com/typo/devworkspace-operator.git"
    commands:
      - id: say-hello
        exec:
          component: che-code-runtime-description
          commandLine: echo "Hello from $(pwd)"
          workingDir: ${PROJECT_SOURCE}/app
  contributions:
    - name: che-code
      uri: https://eclipse-che.github.io/che-plugin-registry/main/v3/plugins/che-incubator/che-code/latest/devfile.yaml
      components:
        - name: che-code-runtime-description
          container:
            env:
              - name: CODE_HOST
                value: 0.0.0.0

Verify in the project-clone init container logs, that retries have been logged:

2026/04/09 23:59:47 Failed git clone for project devworkspace-operator (attempt 1/4): failed to git clone from https://github.com/ddevfile/devworkspace-operator.git: exit status 128
2026/04/09 23:59:47 Retrying git clone for project devworkspace-operator (attempt 2/4) after 1s
2026/04/09 23:59:48 Cloning project devworkspace-operator to /projects/project-clone-2688951296/devworkspace-operator
Cloning into '/projects/project-clone-2688951296/devworkspace-operator'...
Error: passphrase file is missing in the '/etc/ssh/' directory
error: unable to read askpass response from '/.ssh-askpass/ssh-askpass.sh'
fatal: could not read Username for 'https://github.com': No such device or address
2026/04/09 23:59:48 Failed git clone for project devworkspace-operator (attempt 2/4): failed to git clone from https://github.com/ddevfile/devworkspace-operator.git: exit status 128
2026/04/09 23:59:48 Retrying git clone for project devworkspace-operator (attempt 3/4) after 2s
2026/04/09 23:59:50 Cloning project devworkspace-operator to /projects/project-clone-2688951296/devworkspace-operator
Cloning into '/projects/project-clone-2688951296/devworkspace-operator'...
Error: passphrase file is missing in the '/etc/ssh/' directory
error: unable to read askpass response from '/.ssh-askpass/ssh-askpass.sh'
fatal: could not read Username for 'https://github.com': No such device or address
2026/04/09 23:59:50 Failed git clone for project devworkspace-operator (attempt 3/4): failed to git clone from https://github.com/ddevfile/devworkspace-operator.git: exit status 128
2026/04/09 23:59:50 Retrying git clone for project devworkspace-operator (attempt 4/4) after 4s

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

Summary by CodeRabbit

  • New Features

    • Git clone now automatically retries failed attempts with exponential backoff between tries.
    • Number of retry attempts is configurable via PROJECT_CLONE_RETRIES (defaults and sensible limits applied).
    • Previous failed clone data is cleaned up before each retry.
  • Bug Fixes

    • Improved retry and failure logging for clearer diagnostics.

Signed-off-by: David Kwon <dakwon@redhat.com>
Assisted-by: Claude Code Opus 4.6
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 10, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dkwon17

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

Adds retry logic with exponential backoff to the initial git clone flow and exposes retry configuration via a PROJECT_CLONE_RETRIES environment variable with validation and clamping.

Changes

Cohort / File(s) Summary
Git clone retry flow
project-clone/internal/git/setup.go
Replaces a single CloneProject attempt with a retry loop up to CloneRetries, logs intermediate failures, removes failed temp clone dirs between attempts, uses %w to wrap final error, and adds delayBeforeRetry with exponential backoff (uses time).
Retry configuration & init
project-clone/internal/global.go
Adds CloneRetries variable, PROJECT_CLONE_RETRIES env parsing, default (3) and max (10) bounds, base retry delay constant, validation/clamping of parsed values, and imports (strconv, time).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I dug a hole, then tried to clone,
When Git said "not today," I waited alone.
Backoff in hops, I clean what failed,
Retry, retry — the plan prevailed.
A happy hop, the repo's home. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add project clone retries functionality' accurately and specifically describes the main change—implementing retry logic for project cloning with exponential backoff.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
project-clone/internal/global.go (1)

68-76: Consider adding an upper bound for CloneRetries.

The validation rejects negative values but allows arbitrarily large retry counts. With exponential backoff, high values like CloneRetries=30 would result in a delay of ~17 years before the final retry attempt (1s * 2^29).

Consider capping at a reasonable maximum (e.g., 10) to prevent configuration mistakes from causing excessively long waits.

Suggested change
+const maxCloneRetries = 10
+
 CloneRetries = defaultCloneRetries
 if val := os.Getenv(cloneRetriesEnvVar); val != "" {
     parsed, err := strconv.Atoi(val)
-    if err != nil || parsed < 0 {
-        log.Printf("Invalid value for %s: %q, using default (%d)", cloneRetriesEnvVar, val, defaultCloneRetries)
+    if err != nil || parsed < 0 || parsed > maxCloneRetries {
+        log.Printf("Invalid value for %s: %q (must be 0-%d), using default (%d)", cloneRetriesEnvVar, val, maxCloneRetries, defaultCloneRetries)
     } else {
         CloneRetries = parsed
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@project-clone/internal/global.go` around lines 68 - 76, The env parsing for
CloneRetries (using cloneRetriesEnvVar and defaultCloneRetries) only rejects
negatives but allows huge values; add an upper bound (e.g., maxCloneRetries =
10) and when parsed > maxCloneRetries log a warning and set CloneRetries =
maxCloneRetries, otherwise set CloneRetries = parsed; keep the existing
invalid-number fallback behavior and use the same log.Printf pattern to report
out-of-range values.
project-clone/internal/git/setup.go (1)

66-68: Use %w for proper error wrapping.

Using %s formats the error as a string but loses the error chain, preventing callers from using errors.Is() or errors.Unwrap() to inspect the underlying cause.

Suggested change
 if cloneErr != nil {
-    return fmt.Errorf("failed to clone project: %s", cloneErr)
+    return fmt.Errorf("failed to clone project: %w", cloneErr)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@project-clone/internal/git/setup.go` around lines 66 - 68, The error return
currently formats cloneErr with "%s" which discards the error chain; update the
return in the clone error path to wrap the original error using "%w" so callers
can use errors.Is/errors.Unwrap (i.e., change the return that references
cloneErr to use "%w" to wrap the error).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@project-clone/internal/git/setup.go`:
- Around line 66-68: The error return currently formats cloneErr with "%s" which
discards the error chain; update the return in the clone error path to wrap the
original error using "%w" so callers can use errors.Is/errors.Unwrap (i.e.,
change the return that references cloneErr to use "%w" to wrap the error).

In `@project-clone/internal/global.go`:
- Around line 68-76: The env parsing for CloneRetries (using cloneRetriesEnvVar
and defaultCloneRetries) only rejects negatives but allows huge values; add an
upper bound (e.g., maxCloneRetries = 10) and when parsed > maxCloneRetries log a
warning and set CloneRetries = maxCloneRetries, otherwise set CloneRetries =
parsed; keep the existing invalid-number fallback behavior and use the same
log.Printf pattern to report out-of-range values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 62363f36-541d-466e-bc69-a145673dc401

📥 Commits

Reviewing files that changed from the base of the PR and between fcee0cd and f8e5d42.

📒 Files selected for processing (2)
  • project-clone/internal/git/setup.go
  • project-clone/internal/global.go

Signed-off-by: dkwon17 <dakwon@redhat.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@project-clone/internal/git/setup.go`:
- Around line 66-68: The clone error return currently exits without removing the
temporary clone directory (tmpClonePath), leaving stale data; ensure
tmpClonePath is cleaned on failure by calling os.RemoveAll(tmpClonePath) before
returning (or by registering a defer cleanup immediately after creating
tmpClonePath) in the function that performs the clone (refer to the tmpClonePath
variable and the clone error handling block that returns fmt.Errorf("failed to
clone project: %w", cloneErr)).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a72c18e9-e8fb-47ea-9cd8-f1a9f4fc692e

📥 Commits

Reviewing files that changed from the base of the PR and between f8e5d42 and d312f58.

📒 Files selected for processing (2)
  • project-clone/internal/git/setup.go
  • project-clone/internal/global.go

Comment on lines +66 to 68
if cloneErr != nil {
return fmt.Errorf("failed to clone project: %w", cloneErr)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Final clone-failure path skips temp directory cleanup.

After the last failed attempt, Line 67 returns immediately and can leave partial repo data in tmpClonePath. Over repeated init-container failures, stale temp directories may accumulate under the mounted projects volume.

Proposed fix
 	if cloneErr != nil {
+		if err := os.RemoveAll(tmpClonePath); err != nil {
+			log.Printf("Warning: cleanup after final clone failure failed: %s", err)
+		}
 		return fmt.Errorf("failed to clone project: %w", cloneErr)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if cloneErr != nil {
return fmt.Errorf("failed to clone project: %w", cloneErr)
}
if cloneErr != nil {
if err := os.RemoveAll(tmpClonePath); err != nil {
log.Printf("Warning: cleanup after final clone failure failed: %s", err)
}
return fmt.Errorf("failed to clone project: %w", cloneErr)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@project-clone/internal/git/setup.go` around lines 66 - 68, The clone error
return currently exits without removing the temporary clone directory
(tmpClonePath), leaving stale data; ensure tmpClonePath is cleaned on failure by
calling os.RemoveAll(tmpClonePath) before returning (or by registering a defer
cleanup immediately after creating tmpClonePath) in the function that performs
the clone (refer to the tmpClonePath variable and the clone error handling block
that returns fmt.Errorf("failed to clone project: %w", cloneErr)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant