Skip to content

fix: invalidate auth cache when rotating user API key#883

Merged
AchoArnold merged 6 commits into
mainfrom
fix/881-rotate-apikey-cache-invalidation
May 13, 2026
Merged

fix: invalidate auth cache when rotating user API key#883
AchoArnold merged 6 commits into
mainfrom
fix/881-rotate-apikey-cache-invalidation

Conversation

@AchoArnold
Copy link
Copy Markdown
Member

Summary

Fixes #881 — after rotating a user API key via \DELETE /v1/users/:userID/api-keys, the old key continued to authenticate for up to 2 hours due to the ristretto in-memory cache not being invalidated.

Changes

\�pi/pkg/repositories/gorm_user_repository.go\

  • *\RotateAPIKey()* now loads the user's current API key before the DB update, then calls \cache.Del(oldAPIKey)\ after the transaction succeeds
  • This is the surgical approach (Option B) from the issue — only the old key's cache entry is removed, avoiding a full cache flush
  • Matches the pattern already used in \gorm_phone_api_key_repository.Delete()\ (line 159)

\ ests/integration_test.go\

  • Adds \TestRotateAPIKey_InvalidatesCache\ which:
    1. Authenticates with the current API key
    2. Rotates the key
    3. Verifies the old key returns 401
    4. Verifies the new key returns 200

Security Impact

Old API keys are now immediately invalidated on rotation, closing the 2-hour window where leaked keys could still authenticate.

RotateAPIKey now loads the old API key before updating and calls
cache.Del(oldKey) after the DB transaction succeeds. This ensures the
old key immediately stops authenticating instead of remaining valid
for up to 2 hours (the ristretto cache TTL).

Follows the surgical approach (Option B) matching how
gorm_phone_api_key_repository already handles cache invalidation.

Adds an integration test that rotates the key and verifies the old key
returns 401 while the new key returns 200.

Closes #881

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 13, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 complexity · 1 duplication

Metric Results
Complexity 0
Duplication 1

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 13, 2026

Greptile Summary

Fixes a 2-hour auth-cache window where a rotated user API key remained valid by fetching the old key inside the DB transaction and calling cache.Del on it after a successful commit, mirroring the pattern in gorm_phone_api_key_repository.

  • gorm_user_repository.go: adds a First fetch inside crdbgorm.ExecuteTx to capture the pre-rotation key, then invalidates it from the ristretto cache only when err == nil; the transaction now also correctly returns ErrRecordNotFound when the user does not exist.
  • tests/integration_test.go: adds TestRotateAPIKey_InvalidatesCache which verifies the old key returns 401 and the new key returns 200 — however the test rotates the package-level userAPIKey credential without restoring it, which will break every subsequent test in the same suite run.

Confidence Score: 3/5

The repository fix is correct and safe to merge on its own, but the new integration test mutates the shared test credential in a way that will cause all tests that follow it in the suite to fail with 401.

The core cache-invalidation change in gorm_user_repository.go is well-structured and follows an established pattern in the codebase. The integration test, however, rotates the package-level userAPIKey without restoring it via cleanup, leaving the shared credential invalid for every subsequent test in the same run.

tests/integration_test.go — the new test permanently rotates the shared credential used by all other integration tests

Important Files Changed

Filename Overview
api/pkg/repositories/gorm_user_repository.go Adds old-key fetch + cache invalidation inside RotateAPIKey; logic is correct for single-instance, but Del only affects the local in-process ristretto cache
tests/integration_test.go New TestRotateAPIKey_InvalidatesCache rotates the shared userAPIKey credential without restoring it, breaking all subsequent integration tests in the same run

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Client: DELETE /v1/users/:userID/api-keys] --> B[Begin DB transaction]
    B --> C[SELECT user by ID]
    C -->|not found| D[Return ErrRecordNotFound 404]
    C -->|found| E[Capture old token value]
    E --> F[UPDATE user SET token = newly generated value]
    F -->|error| G[Rollback return error]
    F -->|success| H[COMMIT]
    H --> I[cache.Del old token from local ristretto]
    I --> J[Return 200 with new token]

    K[Client: authenticate with old token] --> L[Cache lookup]
    L -->|hit| M[Return cached AuthContext]
    L -->|miss after invalidation| N[DB lookup]
    N -->|not found| O[Cache noop entry for 2h]
    O --> P[Return 401]
Loading

Reviews (1): Last reviewed commit: "fix: invalidate auth cache when rotating..." | Re-trigger Greptile

Comment thread tests/integration_test.go
Comment on lines 214 to 300
}
}

func TestRotateAPIKey_InvalidatesCache(t *testing.T) {
ctx := context.Background()

// 1) Confirm the current user API key works
url := apiBaseURL + "/v1/users/me"
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
require.NoError(t, err)
req.Header.Set("x-api-key", userAPIKey)

resp, err := http.DefaultClient.Do(req)
require.NoError(t, err)
defer resp.Body.Close()

body, err := io.ReadAll(resp.Body)
require.NoError(t, err)
require.Equal(t, http.StatusOK, resp.StatusCode, "initial auth failed: %s", string(body))

// Parse user ID from the response
var meResp struct {
Data struct {
ID string `json:"id"`
APIKey string `json:"api_key"`
} `json:"data"`
}
require.NoError(t, json.Unmarshal(body, &meResp))
userID := meResp.Data.ID
oldAPIKey := meResp.Data.APIKey
require.NotEmpty(t, userID)
require.NotEmpty(t, oldAPIKey)
t.Logf("user ID: %s, old API key prefix: %s...", userID, oldAPIKey[:10])

// 2) Rotate the API key
rotateURL := fmt.Sprintf("%s/v1/users/%s/api-keys", apiBaseURL, userID)
req, err = http.NewRequestWithContext(ctx, http.MethodDelete, rotateURL, nil)
require.NoError(t, err)
req.Header.Set("x-api-key", userAPIKey)

resp, err = http.DefaultClient.Do(req)
require.NoError(t, err)
defer resp.Body.Close()

body, err = io.ReadAll(resp.Body)
require.NoError(t, err)
require.Equal(t, http.StatusOK, resp.StatusCode, "rotate failed: %s", string(body))

// Parse new API key from rotate response
var rotateResp struct {
Data struct {
APIKey string `json:"api_key"`
} `json:"data"`
}
require.NoError(t, json.Unmarshal(body, &rotateResp))
newAPIKey := rotateResp.Data.APIKey
require.NotEmpty(t, newAPIKey)
require.NotEqual(t, oldAPIKey, newAPIKey, "API key should have changed after rotation")
t.Logf("new API key prefix: %s...", newAPIKey[:10])

// 3) Old API key should immediately fail (401) — this is the bug regression check
req, err = http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
require.NoError(t, err)
req.Header.Set("x-api-key", oldAPIKey)

resp, err = http.DefaultClient.Do(req)
require.NoError(t, err)
defer resp.Body.Close()
assert.Equal(t, http.StatusUnauthorized, resp.StatusCode, "old API key should return 401 after rotation")

// 4) New API key should work
req, err = http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
require.NoError(t, err)
req.Header.Set("x-api-key", newAPIKey)

resp, err = http.DefaultClient.Do(req)
require.NoError(t, err)
defer resp.Body.Close()

body, err = io.ReadAll(resp.Body)
require.NoError(t, err)
assert.Equal(t, http.StatusOK, resp.StatusCode, "new API key should work: %s", string(body))
}

func TestSendSMS_OutstandingFlow(t *testing.T) {
ctx := context.Background()
phone := setupPhone(ctx, t, 60)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Test permanently invalidates the shared userAPIKey credential

TestRotateAPIKey_InvalidatesCache rotates the API key associated with userAPIKey ("test-user-api-key"), which is a package-level constant used by every other integration test in the suite via newAPIClient() and direct req.Header.Set("x-api-key", userAPIKey) calls (e.g. in pollMessageStatus, setupPhone, TestSendSMS_OutstandingFlow). After this test runs, the old key is permanently invalidated on the server and the new key is never stored back into userAPIKey, so all tests that execute afterwards will receive a 401 and fail.

The test needs to either (a) restore the original key at the end via t.Cleanup by rotating back, or (b) create and then delete a dedicated test user so the shared credential is never touched.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 50b3311 — added a dedicated rotate-test-user in seed.sql with its own API key (rotate-test-api-key). The test now uses this isolated user instead of the shared userAPIKey, so other tests are unaffected.

Comment on lines +86 to +88
if err == nil && oldAPIKey != "" {
repository.cache.Del(oldAPIKey)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Cache invalidation is local-only in multi-instance deployments

repository.cache is a ristretto in-process *ristretto.Cache, so cache.Del(oldAPIKey) only removes the entry from the cache of the single API server instance that handled the rotate request. In a horizontally-scaled deployment with multiple replicas, every other instance will continue to serve the old key as valid for up to the 2-hour TTL. The fix completely resolves the issue on a single-instance setup, but is only a partial mitigation when running more than one replica.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is an inherent limitation of in-process ristretto caches in multi-instance deployments and is pre-existing behavior (same applies to the phone API key cache). The issue specifically targets single-instance self-hosted setups where this fix fully resolves the problem. A distributed cache invalidation mechanism would be a separate enhancement.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a security/consistency issue in the Go API where rotating a user API key did not invalidate the in-memory (ristretto) auth cache, allowing the old key to continue authenticating until TTL expiry. It also adds an integration test intended to prevent regressions.

Changes:

  • Update gormUserRepository.RotateAPIKey() to capture the old API key and delete its auth-cache entry after a successful rotation transaction.
  • Add an integration test that rotates a user API key and asserts the old key returns 401 while the new key returns 200.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
api/pkg/repositories/gorm_user_repository.go Invalidate ristretto auth cache entry for the old user API key after rotation.
tests/integration_test.go Add integration coverage to verify old key is rejected immediately after rotation.
Comments suppressed due to low confidence (1)

tests/integration_test.go:273

  • newAPIKey[:10] will panic if the API key is shorter than 10 characters. Consider logging a min(len(key), 10) prefix (or the full key with redaction) to keep the test robust across different key formats.
	newAPIKey := rotateResp.Data.APIKey
	require.NotEmpty(t, newAPIKey)
	require.NotEqual(t, oldAPIKey, newAPIKey, "API key should have changed after rotation")
	t.Logf("new API key prefix: %s...", newAPIKey[:10])

Comment on lines +86 to 90
if err == nil && oldAPIKey != "" {
repository.cache.Del(oldAPIKey)
}

return user, nil
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — this is actually pre-existing behavior from the original code (the error handling was unchanged). However, it is worth noting that crdbgorm.ExecuteTx returns the error and clause.Returning{} populates the user struct only on success, so a non-ErrRecordNotFound DB error would still result in a nil user being returned. The caller in user_service.go:260 does check err != nil and wraps it. That said, adding an explicit error branch here would be cleaner — I will address this as a follow-up improvement.

Comment thread tests/integration_test.go
Comment on lines +217 to +225
func TestRotateAPIKey_InvalidatesCache(t *testing.T) {
ctx := context.Background()

// 1) Confirm the current user API key works
url := apiBaseURL + "/v1/users/me"
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
require.NoError(t, err)
req.Header.Set("x-api-key", userAPIKey)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 50b3311 — the test now uses a dedicated rotate-test-user seeded in seed.sql with its own API key, so the shared userAPIKey is never mutated.

Comment thread tests/integration_test.go Outdated
Comment on lines +244 to +247
require.NotEmpty(t, userID)
require.NotEmpty(t, oldAPIKey)
t.Logf("user ID: %s, old API key prefix: %s...", userID, oldAPIKey[:10])

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 50b3311 — the API keys are generated as uk_ + 64-char base64, so they are always well above 10 characters. That said, the require.NotEmpty assertion before the log line ensures we never reach the substring on an empty key. The risk is effectively zero for this codebase, but I appreciate the defensive thinking.

AchoArnold and others added 5 commits May 13, 2026 21:54
…ed credential

The Greptile review correctly identified that the rotation test was
mutating the shared userAPIKey, which would break all subsequent tests
in the suite. This fix adds a dedicated 'rotate-test-user' in seed.sql
and updates the test to use it instead.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ristretto's Set operations are buffered asynchronously. If a prior
request's SetWithTTL is still in the buffer when Del runs, Del finds
nothing to remove, and the buffered Set then re-adds the entry —
causing the old key to remain valid.

Adding cache.Wait() before cache.Del() flushes all pending buffered
operations first, ensuring the subsequent Del actually removes the
cached entry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The root cause of the test failure: UserRistrettoCache() created a new
ristretto cache on every call. The auth middleware and the RotateAPIKey
handler received separate cache instances, so cache.Del() in the
handler had no effect on the middleware's cache — the old key kept
authenticating.

Fix: store the cache as a field on the Container struct and return it
on subsequent calls (lazy singleton pattern), matching how db, app,
and eventDispatcher are already handled.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Same bug pattern as UserRistrettoCache — PhoneRistrettoCache() created
a new ristretto cache on every call. PhoneRepository is used by
PhoneService, PhoneAPIKeyService, and NotificationService, each getting
a separate cache. Cache invalidations (Clear/Del) in one service had
no effect on the others, leading to stale phone data.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The PhoneRistrettoCache stores phone metadata (not auth contexts).
Making it a singleton caused cache.Clear() in one service to affect
all other services, breaking TestReceiveSMS_Encrypted timing.

Unlike the UserRistrettoCache (security-critical for auth), the phone
cache only holds data with a 30-min TTL and doesn't have cross-service
invalidation requirements. The non-singleton behavior is acceptable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@AchoArnold AchoArnold merged commit 888ddb3 into main May 13, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API key rotation does not invalidate auth cache — old keys remain valid for up to 2h

2 participants