fix(oauth2): silence lock acquisition errors on token refresh#39463
fix(oauth2): silence lock acquisition errors on token refresh#39463betodealmeida wants to merge 1 commit intoapache:masterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #39463 +/- ##
=======================================
Coverage 64.49% 64.49%
=======================================
Files 2557 2557
Lines 133191 133191
Branches 30935 30935
=======================================
Hits 85897 85897
Misses 45804 45804
Partials 1490 1490
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| factor=10, | ||
| base=2, | ||
| max_tries=5, | ||
| raise_on_giveup=False, |
There was a problem hiding this comment.
Suggestion: Setting raise_on_giveup=False swallows all AcquireDistributedLockFailedException cases after retries, including Redis/backend lock failures (not just benign lock contention). That makes real infrastructure errors look like "no token" and can incorrectly trigger fallback auth flows instead of surfacing an operational failure. [logic error]
Severity Level: Major ⚠️
- ❌ OAuth2 token refresh masks Redis/distributed-lock outages.
- ⚠️ Users see unnecessary OAuth2 re-auth flows.
- ⚠️ Operational lock failures no longer surface as errors.| raise_on_giveup=False, | |
| raise_on_giveup=True, |
Steps of Reproduction ✅
1. Configure an OAuth2-enabled database (e.g. Google Sheets) and ensure it has an expired
access token with a non-empty refresh token stored in `DatabaseUserOAuth2Tokens` so that
`get_oauth2_access_token()` takes the refresh path in `superset/utils/oauth2.py:88-124`.
2. Cause the distributed lock backend to fail (e.g. Redis unavailable), which per
`superset/distributed_lock/__init__.py:10-11` and
`tests/unit_tests/distributed_lock/distributed_lock_tests.py:22-32` makes
`DistributedLock(...)` raise `AcquireDistributedLockFailedException` both when the lock is
already held and when the Redis connection fails.
3. Trigger a code path that refreshes the OAuth2 token, such as requesting table names for
a GSheets database so that `GSheetsEngineSpec.get_table_names()` in
`superset/db_engine_specs/gsheets.py:271-283` calls `get_oauth2_access_token(...)`, which
then calls `refresh_oauth2_token()` and enters the `DistributedLock` context at
`superset/utils/oauth2.py:127-141`, causing `AcquireDistributedLockFailedException` due to
the failing lock backend.
4. Observe that because of the backoff decorator configuration at
`superset/utils/oauth2.py:79-87` with `raise_on_giveup=False`, after exhausting retries
the `AcquireDistributedLockFailedException` is swallowed and `get_oauth2_access_token()`
returns `None`; callers like `GSheetsEngineSpec.get_table_names()` (`gsheets.py:276-282`)
treat this as "no token" and invoke `database.start_oauth2_dance()` instead of surfacing
an operational failure of the distributed lock backend.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/utils/oauth2.py
**Line:** 85:85
**Comment:**
*Logic Error: Setting `raise_on_giveup=False` swallows all `AcquireDistributedLockFailedException` cases after retries, including Redis/backend lock failures (not just benign lock contention). That makes real infrastructure errors look like "no token" and can incorrectly trigger fallback auth flows instead of surfacing an operational failure.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.There was a problem hiding this comment.
Code Review Agent Run #63dbd9
Actionable Suggestions - 1
-
tests/unit_tests/utils/oauth2_tests.py - 1
- Datetime created without timezone info · Line 363-363
Review Details
-
Files reviewed - 2 · Commit Range:
36355c1..36355c1- superset/utils/oauth2.py
- tests/unit_tests/utils/oauth2_tests.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| db_engine_spec = mocker.MagicMock() | ||
| token = mocker.MagicMock() | ||
| token.access_token = "access-token" # noqa: S105 | ||
| token.access_token_expiration = datetime(2024, 1, 1) |
There was a problem hiding this comment.
Add tzinfo argument to datetime() call. Use datetime(2024, 1, 1, tzinfo=timezone.utc) or similar to specify timezone.
Code suggestion
Check the AI-generated fix before applying
-from datetime import datetime
+from datetime import datetime, timezone
from typing import cast
@@ -360,7 +361,7 @@
db = mocker.patch("superset.utils.oauth2.db")
db_engine_spec = mocker.MagicMock()
token = mocker.MagicMock()
token.access_token = "access-token" # noqa: S105
- token.access_token_expiration = datetime(2024, 1, 1)
+ token.access_token_expiration = datetime(2024, 1, 1, tzinfo=timezone.utc)
Code Review Run #63dbd9
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
When a dashboard has multiple charts sharing an OAuth2-enabled DB with an expired token, simultaneous requests race for the distributed lock. Only the first acquires it; the rest should silently return None. Added raise_on_giveup=False and giveup_log_level=logging.DEBUG to the @backoff.on_exception decorator on get_oauth2_access_token so that failed lock acquisition is no longer logged at ERROR level and does not propagate as an exception. Also adds a unit test covering this scenario.
36355c1 to
425ad47
Compare
Code Review Agent Run #acfa63Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Summary
When a dashboard contains multiple charts from the same OAuth2-enabled DB with an expired token, simultaneous requests all try to refresh the token at once. Only the first process acquires a distributed lock; the rest raise
AcquireDistributedLockFailedException. This is expected behavior — the first request does the refresh and subsequent ones should just retry and find a valid token — but backoff was logging it atERRORlevel and re-raising after all retries, which was both noisy and incorrect.raise_on_giveup=Falseto silently returnNoneinstead of re-raising when all retries are exhaustedgiveup_log_level=logging.DEBUGto downgrade the giveup log fromERRORtoDEBUGtest_get_oauth2_access_token_lock_not_acquired_no_error_logcovering this scenarioTest plan
pytest tests/unit_tests/utils/oauth2_tests.py::test_get_oauth2_access_token_lock_not_acquired_no_error_log -xvs🤖 Generated with Claude Code