Skip to content

Adds retries to 404 requests#73

Merged
jscalev merged 1 commit intomasterfrom
dev/jcalev/RetryMetadata404
Jan 30, 2026
Merged

Adds retries to 404 requests#73
jscalev merged 1 commit intomasterfrom
dev/jcalev/RetryMetadata404

Conversation

@jscalev
Copy link
Copy Markdown
Contributor

@jscalev jscalev commented Jan 30, 2026

Due to a NodeService CCF caching issue, the ccf file may not be written at the time the metadata request arrives. To get around this, simply retry on 404 responses from the host.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the retry logic to treat HTTP 404 responses as transient (to mitigate a timing issue where metadata requests can arrive before the NodeService CCF cache file is written) and extends the retry test suite to cover the new behavior.

Changes:

  • Add http.StatusNotFound (404) to the set of retryable HTTP status codes.
  • Refactor the retry tests’ “healing server” into a configurable handler and add a test that validates retry-on-404 behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
internal/requesthelper/retry.go Expands transient HTTP status classification to include 404 so WithRetries will retry NotFound responses.
internal/requesthelper/retry_test.go Updates healing server test scaffolding and adds coverage ensuring 404s are retried until success.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +170 to +176
func TestWithRetries_retry404(t *testing.T) {
h := &healingServer{
HealAfter: 4,
FailCode: 404,
SuccessCode: 200,
}
srv := httptest.NewServer(h)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this new test, the response body is closed via defer resp.Body.Close() before asserting err == nil / resp != nil (same pattern as the healing-server test). If the retry logic changes and the request doesn’t succeed, the test will panic with a nil-pointer dereference and mask the real failure. Prefer deferring the close only after confirming resp/resp.Body are non-nil (or guard the defer).

Copilot uses AI. Check for mistakes.
Comment on lines +151 to +155
h := &healingServer{
HealAfter: 4,
FailCode: 500,
SuccessCode: 200,
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use http.Status... constants instead of raw status-code integers (e.g., http.StatusInternalServerError, http.StatusOK) to avoid magic numbers and make intent clearer.

Copilot uses AI. Check for mistakes.
@mayankdaruka-msft
Copy link
Copy Markdown
Contributor

mayankdaruka-msft commented Jan 30, 2026

General comment: In addition to there being 404s due to the NS cache not holding the latest CCF file path (metadata regarding newly added VM Apps not present), there is also the scenario where a VM App is already present on the VM and is being upgraded. In this case, if the latest CCF is not known by HGAP, it will provide metadata for the older version to the extension. This will never surface as a 404 and the new version would never be installed.

This PR would address it - #58 - but I didn't merge it earlier because of the potential it has to surface larger numbers of failures, especially with RCVv2.

I think this is a step in the right direction, we can start retrying on 404s for scenarios where the older goal state did not contain VM Apps and the extension encounters a 404. Eventually, we will want to come up with a plan for flighting the change for the extension to include the app version in its request to HGAP.

@jscalev jscalev merged commit 79b35f3 into master Jan 30, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants