Enhance GitHub API error handling#7922
Conversation
jongio
left a comment
There was a problem hiding this comment.
Solid work - the structured error classification is a big improvement over the opaque could not find a valid branch message. The code is well-organized: the parsing logic in api_error.go is thorough, the branch resolver short-circuit is correct, and the error chain preservation via Unwrap() is properly maintained. A few suggestions below.
|
Does this resolve #6646? |
It should, yes |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR improves gh api failure handling in GitHub URL/branch resolution so real access failures (SAML SSO, rate limits, auth errors, server errors) are surfaced as actionable typed errors instead of being misreported as “could not find a valid branch”.
Changes:
- Introduces a typed
*github.ApiErrorwith classification (ApiErrorKind) parsed fromgh apistdout/stderr. - Updates branch resolution to short-circuit on access/server failures and adds a
/repos/{owner}/{repo}probe to emit*RepoNotAccessibleErrorafter exhausting 404-based branch candidates. - Adds code-based suggestions via
*internal.ErrorWithSuggestionfor recognized GitHub failures and expands test coverage for the new behaviors.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/resources/error_suggestions.yaml | Adds commentary clarifying that GitHub API suggestions are emitted in code rather than via YAML rules. |
| cli/azd/pkg/tools/github/github.go | Switches ApiCall failures to return typed *ApiError instead of a generic wrapped error. |
| cli/azd/pkg/tools/github/api_error.go | Adds ApiErrorKind, ApiError, parsing/classification helpers for gh api errors. |
| cli/azd/pkg/tools/github/api_error_test.go | Adds unit/integration-style tests for parsing/classification and ApiCall typed errors. |
| cli/azd/pkg/tools/github/github_methods_test.go | Updates tests to assert ApiCall returns typed *ApiError. |
| cli/azd/pkg/templates/gh_source.go | Refactors branch resolution to short-circuit on access failures and add repo accessibility probing + new error type. |
| cli/azd/pkg/templates/gh_source_test.go | Adds tests for short-circuit behavior and repo-not-accessible fallback. |
| cli/azd/pkg/templates/gh_errors.go | Adds code-based suggestion mapping for *github.ApiError and *RepoNotAccessibleError. |
…ctionable suggestions for SAML, rate-limit, and authorization errors
wbreza
left a comment
There was a problem hiding this comment.
Review — Enhance GitHub API error handling (#7922)
Nice work, @JeffreyCA. The structured ApiErrorKind classification and short-circuit logic are a real improvement — replacing the silent branch-walk swallow with actionable, user-facing messages is exactly the right approach. The typed error design is clean and the SAML/rate-limit/EMU coverage hits the most common pain points.
That said, the new error-handling layer carries meaningful risk without matching test coverage, and there are a few hardening gaps worth closing before merge. Findings below are grouped by priority and intentionally avoid re-flagging items already raised by @jongio (nil guard for KindUnknown, missing KindServerError suggestion, companion test for fallback path).
🔴 Critical
F1 — gh_errors.go has zero test coverage
File: cli/azd/pkg/templates/gh_errors.go (all 126 lines)
The user-facing suggestion layer — withGitHubSuggestion(), suggestionForApiError() (5 kinds → messages with docs links), and suggestionForRepoNotAccessible() — is the core value of this PR and has no tests.
Any regression here silently breaks the user experience for every classified error path. Table-driven tests are straightforward for this shape:
func TestSuggestionForApiError(t *testing.T) {
tests := []struct {
name string
kind github.ApiErrorKind
wantNil bool
wantText string // substring match
}{
{"SAML", github.KindSAMLBlocked, false, "SAML SSO"},
{"RateLimited", github.KindRateLimited, false, "rate limit"},
{"Unauthorized", github.KindUnauthorized, false, "gh auth status"},
{"Forbidden", github.KindForbidden, false, "token"},
{"NotFound", github.KindNotFound, true, ""},
// expand per Kind...
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := &github.ApiError{Kind: tt.kind, URL: "https://api.github.com/test"}
got := suggestionForApiError(err)
if tt.wantNil {
require.Nil(t, got)
} else {
require.NotNil(t, got)
require.Contains(t, got.Error(), tt.wantText)
}
})
}
}Also verify withGitHubSuggestion() dispatches correctly for both *ApiError and *RepoNotAccessibleError inputs, and returns the original error unchanged for unrecognized types.
🟠 High
F2 — Short-circuit behavior not verified in tests
File: cli/azd/pkg/templates/gh_source_test.go
The existing test asserts the error value on a classified failure, but doesn't confirm resolveBranchAndPathInner() actually exits the loop early. A SAML error on the first candidate should produce exactly 1 API call — not N (one per branch candidate).
Suggested approach: Use a call-counter mock (or verify the mock expectation count) to assert only a single ApiCall is made when the first branch probe returns KindSAMLBlocked or KindRateLimited.
F3 — RepoNotAccessibleError path never tested
File: cli/azd/pkg/templates/gh_source_test.go
Expanding on jongio's suggestion — the gap is broader than a single companion test. The entire checkRepoAccessible() → RepoNotAccessibleError integration is uncovered:
- All branch probes return 404 → repo probe returns 404 →
RepoNotAccessibleErrorsurfaced - All branch probes return 404 → repo probe returns 200 → falls through to original "no valid branch" error
- Repo probe itself returns a classified error (e.g., SAML) → error propagated correctly
F4 — 4 of 8 ApiErrorKind values untested in classification
File: cli/azd/pkg/tools/github/api_error_test.go
classifyKind() is tested for SAMLBlocked, RateLimited, NotFound, and Unknown, but Unauthorized (401), Forbidden (403 non-SAML), ServerError (5xx), and Other (misc 4xx) have no test cases. Since classification drives both short-circuiting and user-facing suggestions, each path needs coverage.
{"Unauthorized_401", 401, "Bad credentials", github.KindUnauthorized},
{"Forbidden_403_non_SAML", 403, "Resource not accessible", github.KindForbidden},
{"ServerError_500", 500, "Internal server error", github.KindServerError},
{"ServerError_502", 502, "Bad gateway", github.KindServerError},
{"Other_422", 422, "Validation failed", github.KindOther},
{"Other_409", 409, "Conflict", github.KindOther},F5 — parseApiError edge cases untested
File: cli/azd/pkg/tools/github/api_error_test.go
parseGitHubErrorBody handles untrusted input (gh CLI stdout) but has no edge-case tests:
- Malformed JSON (truncated, HTML error pages from proxies)
- Empty body (gh crashes before writing)
- Missing
messagefield (valid JSON, no expected keys) - Non-UTF-8 bytes
These are realistic failure modes when gh CLI misbehaves or is behind a corporate proxy.
F6 — Error chain broken in newGhTemplateSource()
File: cli/azd/pkg/templates/gh_source.go (~line 218)
ParseGitHubUrl errors are returned bare (return nil, err), losing the call-site context. When this surfaces to the user, there's no indication which operation failed. Wrap with context:
return nil, fmt.Errorf("failed to create template source for %q: %w", templateUrl, err)F7 — No size validation on untrusted JSON deserialization
File: cli/azd/pkg/tools/github/api_error.go (parseGitHubErrorBody)
json.Unmarshal is called directly on gh CLI stdout with no size guard. While gh is a local process, a misbehaving proxy or a compromised gh binary could produce arbitrarily large output. A defensive cap is cheap:
const maxErrorBodySize = 100_000 // 100 KB — generous for any error response
if len(stdout) > maxErrorBodySize {
return nil
}🟡 Medium
F8 — Inconsistent error return contract in branchExists()
File: cli/azd/pkg/templates/gh_source.go (~line 270)
branchExists() returns *ApiError for classified API failures but plain error for infrastructure failures (context canceled, gh not installed). Callers use errors.AsType[*ApiError] to branch — but the implicit contract ("nil ApiError means continue the walk") is fragile.
Suggested fix: Add a doc comment clarifying the contract and the caller's responsibility:
// branchExists checks whether a branch exists via the GitHub API.
// Returns *ApiError for classified API failures (callers should inspect Kind
// to decide whether to short-circuit). Returns a plain error for
// infrastructure failures (context canceled, gh not installed) which should
// always abort the walk.F11 — Duplicate auth check logic
File: cli/azd/pkg/templates/gh_source.go
ensureGitHubAuthenticated is called in ParseGitHubUrl (lines ~64 and ~110) and again in newGhTemplateSource. This means every template-source creation triggers 2–3 auth checks against gh CLI. Consider caching the auth status for the lifetime of the operation, or consolidating to a single check at the entry point.
F12 — KindOther not handled in suggestion generation
File: cli/azd/pkg/templates/gh_errors.go (~line 119)
Expanding on jongio's KindServerError observation — KindOther also falls through with no suggestion. Users hitting unclassified 4xx errors (422, 409, etc.) get the raw error with no guidance. A generic fallback would close this gap:
case github.KindOther:
return &internal.ErrorWithSuggestion{
Err: apiErr,
Suggestion: "An unexpected GitHub API error occurred. " +
"Check your token permissions with 'gh auth status' and retry.",
}F13 — Phrase matching and stderr regex edge cases undertested
File: cli/azd/pkg/tools/github/api_error_test.go
Only one SAML phrase variant is tested in classification. The httpStatusRe regex fallback (parsing HTTP status from stderr) has no edge-case tests. Consider parameterized tests for:
- All SAML phrase variants (
"Resource protected by organization SAML enforcement","saml_sso"in message, etc.) - Stderr with/without the expected
(HTTP NNN)pattern - Stderr with multiple HTTP codes (which one wins?)
🟢 Low
F9 — KindOther documentation imprecise
File: cli/azd/pkg/tools/github/api_error.go
The doc comment says KindOther covers "4xx codes" but the implementation catches any unclassified HTTP status that doesn't match the specific kinds — including codes outside the 4xx range that aren't 5xx. Update the comment to match reality:
// KindOther represents any classified HTTP error that doesn't match
// a more specific kind (e.g., unrecognized 4xx or edge-case status codes).F10 — Missing doc comment on RepoNotAccessibleError
File: cli/azd/pkg/templates/gh_source.go
RepoNotAccessibleError is an exported type with no Go doc comment. Per project conventions and golint, exported types should be documented:
// RepoNotAccessibleError indicates that a GitHub repository was not
// accessible to the authenticated user. This typically means the
// repository is private, belongs to an EMU organization, or does not exist.
type RepoNotAccessibleError struct { ... }Summary
| Priority | Count | Areas |
|---|---|---|
| 🔴 Critical | 1 | Testing (gh_errors.go untested) |
| 🟠 High | 6 | Testing (4), Error Handling (1), Security (1) |
| 🟡 Medium | 4 | API Design (1), Code Quality (2), Testing (1) |
| 🟢 Low | 2 | API Design (1), Code Quality (1) |
Recommendation: Request Changes. The structured error classification is solid design work, but the suggestion layer (gh_errors.go) and several classification paths ship without tests. Given this code directly shapes user-facing error messages across multiple failure modes, test coverage should match the ambition of the change before merge. The high-priority items (F2–F7) are all straightforward to address and would meaningfully de-risk the PR.
ac532c8 to
5a67833
Compare
- F1: Add gh_errors_test.go covering suggestionForApiError per-kind,
suggestionForRepoNotAccessible, and withGitHubSuggestion dispatch.
- F3: Add Test_ParseGitHubUrl_RepoAccessibleFallsThrough and
Test_ParseGitHubUrl_RepoProbeReturnsClassifiedError covering the two
remaining /repos/{slug} probe outcomes.
- F4: Add TestClassifyKind_HttpStatusToKind covering all status->Kind
mappings and TestClassifyKind_SAMLPhraseVariants for SAML detection.
- F5: Add malformed-JSON, HTML-body, and message-only fallback rows
to TestParseApiError_StatusFromJSONBody.
- F6: Wrap ParseGitHubUrl error in newGhTemplateSource with the
failing template URL so users see which operation failed.
- F9: Tighten KindOther doc comment to match classifyKind behavior.
- F13: Add TestParseApiError_StderrHttpStatusFallback covering the
'(HTTP NNN)' regex fallback edge cases (multiple markers, missing
marker, malformed patterns, case sensitivity).
- Use t.Context() in new gh_source_test.go subtests to match repo
convention.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for the thorough review @wbreza. Pushed Addressed
Already addressed in earlier rounds (heads-up since timing overlapped with your review)
Deferred / decided against (happy to revisit)
|
wbreza
left a comment
There was a problem hiding this comment.
Looks great — thorough fix commits that address the review feedback comprehensively. The new test coverage is especially strong. Approving. ✅
Resolution Summary (16 findings → 14 resolved)
| # | Severity | Finding | Status |
|---|---|---|---|
| F1 | 🔴 Critical | Unbounded body read in readApiError |
✅ Resolved |
| F2 | 🟠 High | Silent fallback masks root cause | ✅ Resolved |
| F3 | 🟠 High | Missing error wrapping in newGhTemplateSource |
✅ Resolved |
| F4 | 🟠 High | SAML detection too narrow | ✅ Resolved (6 variants tested) |
| F5 | 🟠 High | branchExists undocumented short-circuit |
✅ Resolved |
| F6 | 🟠 High | No test coverage for suggestion layer | ✅ Resolved (gh_errors_test.go created) |
| F7 | 🟠 High | No hard size cap on JSON parsing | |
| F8 | 🟡 Medium | ApiErrorKind incomplete test coverage |
✅ Resolved (all 8 kinds tested) |
| F9 | 🟡 Medium | Nil guard on KindUnknown.Error() |
✅ Resolved |
| F10 | 🟡 Medium | Missing KindServerError suggestion |
✅ Resolved |
| F11 | 🟡 Medium | Duplicate auth checks across layers | 📌 Open (non-blocking) |
| F12 | 🟢 Low | Missing doc comments | ✅ Resolved |
| F13 | 🟢 Low | Edge-case test gaps (malformed JSON, HTML) | ✅ Resolved |
| J1–J3 | — | jongio's findings (nil guard, KindServerError, fallback test) | ✅ All resolved |
Non-blocking notes
-
F7 (partial) — No hard
io.LimitReadercap was added, but theHasPrefixguard on content-type plus graceful JSON-error handling is a reasonable defense. Acceptable as-is; a size cap could be added later if large error bodies become an issue in practice. -
F11 (open) — The duplicate auth-error checks across
ghCli→ghSource→ suggestion layer are an architectural concern, not a correctness bug. Fine to address in a follow-up if the layering is refactored.
Note on copilot-pull-request-reviewer[bot] comments
The bot's ~10 comments suggesting errors.As(err, &target) instead of errors.AsType[T]() are false positives. errors.AsType is a generic helper added in Go 1.26, and golangci-lint passes cleanly. These can be dismissed.
jongio
left a comment
There was a problem hiding this comment.
Addresses my previous feedback. The nil guard on KindUnknown, the KindServerError suggestion, and the RepoNotAccessibleError fallback test are all in place. New test coverage in gh_errors_test.go is thorough - per-kind assertions plus the three integration tests for the branch-walk/repo-probe matrix cover the important paths.
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
Fixes #6646
Problem
azd ai agent init(and other GitHub URL flows) frequently fails with a misleading message:The real cause is usually SAML SSO enforcement, rate limiting, an expired token, or a private/EMU repo invisible to the active
ghaccount. The branch-walker treated every non-2xx as "not a branch" and kept walking, swallowing the first real failure and surfacing nothing actionable.Changes
*github.ApiError(pkg/tools/github/api_error.go) with anApiErrorKindenum (SAMLBlocked,RateLimited,Unauthorized,Forbidden,NotFound,ServerError,Other,Unknown).parseApiErrorreads the JSON error envelope fromgh apistdout and falls back to the stderr(HTTP NNN)marker.Cli.ApiCallnow returns this typed value, with the original error preserved viaUnwrap()soerrors.Is/errors.AsTypestill work.pkg/templates/gh_source.go) on classified failures (SAMLBlocked/RateLimited/Unauthorized/Forbidden/ServerError) instead of walking every candidate. 404s and unparseable errors still continue the walk, preserving the "branch with slashes" behavior.RepoNotAccessibleError: after the branch walk exhausts on 404s, probe/repos/{owner}/{repo}. If that's also 404, surface a typed repo-not-accessible error (covers private repos and EMU repos invisible to the active account).pkg/templates/gh_errors.go) wrap typed errors as*internal.ErrorWithSuggestionfor each failure mode (SAML, rate-limit, 401, 403, 5xx, repo-not-accessible) with EMU /gh auth statusguidance and relevant doc links. Inline wrapping (rather thanerror_suggestions.yaml) keeps the suggestion text co-located with the classifier and survives boundaries (e.g., gRPC) where the YAML pipeline doesn't run.Behavior
Before:
failed to parse GitHub URL: could not find a valid branch ...for every failure mode.After (SAML):
gh api .../branches/main: SAML SSO enforcement blocked the request (HTTP 403)+ actionable suggestion linking the SSO authorization docs.After (rate limit):
gh api .../branches/main: GitHub API rate limit exceeded (HTTP 403)+ suggestion to authenticate or wait.After (private / EMU):
repository github.com/<user>_microsoft/<repo> is not accessible (HTTP 404). It may not exist, may be private, or your account may not have access+ suggestion to verify withgh auth statusand check enterprise account selection.After (5xx):
GitHub returned a server error (HTTP 5xx). This usually indicates a transient issue on GitHub's side ...+ link to https://www.githubstatus.com/.