fix(crm-agent): mcp_auth_proxy refresh based on token exp, not wall-clock#74
Merged
Conversation
…lock User caught a 401: token expired at 04:18:55 UTC but proxy was still serving it at 04:26:01 UTC. Root cause: refresh decision was based on "we last fetched 45 min ago" instead of "token's exp claim says <60s remaining". The bug surfaces when `az account get-access-token` returns its OWN cached near-expiry token. Our 45-min countdown starts fresh from the moment we received it, but the token itself was already aged. We then serve an expired token for up to its remaining lifetime (which can be seconds). Fix: - Decode the JWT `exp` claim each request. - Refresh when `exp - now < 60s` (was: refresh when "fetched > 45 min ago"). - Catch the rarer case where `az` itself returns an already-near-expiry token (MSAL refresh wedged) — fail loud with `az logout && az login` guidance instead of silently serving a bad token. `time.time() < refresh_after` was a sin: clock-since-last-fetch is a proxy for token freshness, but `az` breaks the assumption when it serves a cached token directly. Smoke: parser correctly extracts exp from a real Foundry-audience JWT (3857s remaining, matches `az account get-access-token --query expiresOn`). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
User caught a 401 in Claude Code through the proxy:
```
你的访问令牌已过期(过期时间 04:18:55 UTC,当前 04:26:01 UTC)
```
Token had expired ~7 min ago, but the proxy was still serving it.
Root cause
`mcp_auth_proxy.py` cached the token on first fetch and refreshed when "more than 45 min has passed since we fetched". That's a proxy for "token is fresh" — it breaks when `az account get-access-token` returns its own near-expiry cached token. Our 45-min countdown then starts on a token that's already aged 50+ min, and we serve it for the remaining ~10 min before it expires; the next ~35 min we keep serving an expired token because our countdown hasn't elapsed.
Fix
Refresh decision now based on the token's actual `exp` claim, not on when we fetched:
```python
if _token_cache and _token_exp(_token_cache) - time.time() > 60:
return _token_cache
else: re-fetch from az
```
Plus a safeguard: if `az` itself returns a near-expiry token (MSAL refresh wedged), fail loud with `az logout && az login` guidance instead of silently serving a bad token.
Test plan
🤖 Generated with Claude Code