Refresh OverDrive collection token every 30 days (PP-3937)#3204
Refresh OverDrive collection token every 30 days (PP-3937)#3204dbernstein merged 4 commits intomainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3204 +/- ##
=======================================
Coverage 93.30% 93.30%
=======================================
Files 497 497
Lines 46163 46166 +3
Branches 6326 6326
=======================================
+ Hits 43074 43077 +3
+ Misses 2004 2003 -1
- Partials 1085 1086 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cf5fba4 to
89c45e9
Compare
jonathangreen
left a comment
There was a problem hiding this comment.
I approved this @dbernstein, but I think there is some interaction between the two layers of caching here that are not ideal. I added some comments with my thoughts for consideration.
I don't think we really need two layers of caching here. It makes it kind of hard to reason about what is going on. I think we might be better off just dropping the in-memory cache and caching via Representation.get.
| self._cached_collection_token = OverdriveToken( | ||
| token=token, | ||
| expires=utc_now() + self.LIBRARY_MAX_AGE, | ||
| ) |
There was a problem hiding this comment.
Because of the two layer caching here, if the self.get_library returns a cached representation near the end of its TTL, then we will cache it here for an additional LIBRARY_MAX_AGE. So a token could be used for 2 * LIBRARY_MAX_AGE.
| representation, cached = Representation.get( | ||
| self._db, | ||
| url, | ||
| self.get, | ||
| exception_handler=Representation.reraise_exception, | ||
| max_age=self.LIBRARY_MAX_AGE, | ||
| ) |
There was a problem hiding this comment.
Representation.get returns the old document on error, even if max_age has passed. So its possible that a failure will make the old document stick around. Combined with the second layer of caching, I think this means old tokens can hang around for some time.
|
@jonathangreen : thanks for the review. I'll drop the in-memory cacheing and we can always bring it (or something similar) back if necessary. |
|
@jonathangreen : in the current state of the app (ie what is currently deployed in production), calls to the OverDrive API is caching the token once it has been read and it doesn't refresh it for the life of the OverdriveAPI object. For celery this is not a problem since overdrive api instances are shortlived. For the flask layer calls, the OD API hangs around for the entire life of the process which is only restarted if a uwsgi process runs into trouble. So if we didn't restart every two weeks, we could hit a snag. That was the motivation for the second level cache: basically keep the existing caching functionality but also try to guarantee we'll recycle the token every X days. I just want to make sure you're clear that by removing the second layer, we would be losing an existing performance optimization for both the flask calls (not a big deal since there are so many fewer) and the celery import calls (multiple API calls within a single celery task). I agree with your assessment about being hard to reason about and the possible document failure you identified. Just want to make sure we're aligned. Another option is to have the second level cache work on a different frequency than the first level cache - ie just enough time for the celery task to enjoy the caching of the token (from the database) - ie couple of minutes. |
|
Thanks @dbernstein , we're aligned. I understand the tradeoff, without the in-memory layer, each I'm fine with dropping the in-memory cache to keep things simple, but I'd also be okay with keeping it at a shorter TTL as you suggested. My concern wasn't the in-memory cache itself, just the long lifetime causing the unexpected interactions I flagged. |
OverDrive is retiring legacy collection tokens and will require periodic refresh. Previously, get_library() and get_advantage_accounts() passed no max_age to Representation.get(), so cached library documents (and the collectionTokens within them) could persist in the representations table indefinitely. Add LIBRARY_MAX_AGE = timedelta(days=30) as a typed class constant and pass it to both Representation.get() calls so tokens are transparently re-fetched after 30 days — covering both main and Advantage child accounts. Add three tests: verify max_age is forwarded in get_library(), verify end-to-end staleness re-fetch behavior using the real DB, and verify max_age is forwarded in get_advantage_accounts().
The previous implementation cached _collection_token as a plain string for the lifetime of the OverdriveAPI instance. In the Flask path, CirculationManager.load_settings() keeps one OverdriveAPI per collection alive for the entire process lifetime, so the cached token was never refreshed — making the LIBRARY_MAX_AGE fix on Representation.get() ineffective for web requests. Replace the unbounded in-memory cache with a TTL-based cache using the existing OverdriveToken type (same pattern as _cached_client_oauth_token). The token is now stored alongside an expiry timestamp set to utc_now() + LIBRARY_MAX_AGE. On each access the TTL is checked first (zero DB/network I/O if fresh); once expired, get_library() is called again and the Representation layer handles the actual HTTP staleness check. Also update docstrings on collection_token, get_advantage_accounts, and the _cached_collection_token comment to explain the two-layer cache design. Add tests for cache hit, TTL expiry, and errorCode handling.
) The prior commit set the in-memory _cached_collection_token TTL to LIBRARY_MAX_AGE (30 days), which was effectively as unbounded as the original plain string cache. Add a separate COLLECTION_TOKEN_MAX_AGE = timedelta(minutes=5) constant and use it for the in-memory expiry. The DB-layer TTL (LIBRARY_MAX_AGE, 30 days) is unchanged. Long-lived Flask workers now re-check the DB every 5 minutes and pick up a rotated token within that window, while avoiding a DB hit on every individual request. Also align MockOverdriveAPI and test_script.py to use timedelta(hours=1) for the fake token expiry (consistent with the client OAuth token mock), and update test comments to reference COLLECTION_TOKEN_MAX_AGE.
b7a099d to
96b09f4
Compare
Description
Add a two-layer cache for OverDrive collection tokens so they are
automatically refreshed as OverDrive rotates them, without requiring a
process restart or configuration change.
Motivation and Context
OverDrive is retiring legacy collection tokens within the next 3–4 months
and moving to a model where tokens must be refreshed periodically.
collectionTokenis embedded in the library account response(
GET /v1/libraries/{id}) and is required for all collection-scoped APIcalls (search, product listings, availability, metadata).
Two problems existed in the prior implementation:
get_library()andget_advantage_accounts()called
Representation.get()with nomax_age, so the library document(and the
collectionTokenwithin it) could persist in therepresentationstable indefinitely._collection_tokenwas a plainstr | Noneset once and never cleared. Flask workers hold oneOverdriveAPIinstance per collection for the entire process lifetime(
CirculationManager.load_settings), so a rotated token would never bepicked up without a full process restart or an admin config change.
Solution: two-layer cache
_cached_collection_token)COLLECTION_TOKEN_MAX_AGErepresentationstable)LIBRARY_MAX_AGEHow Has This Been Tested?
test_collection_token— verifies the in-memory cache is populated onfirst access and reused on subsequent calls within
COLLECTION_TOKEN_MAX_AGE.test_collection_token_cache_expires— verifies that aging thein-memory cache past
COLLECTION_TOKEN_MAX_AGEcausesget_librarytobe called again and returns the new token.
test_collection_token_error— verifies that anerrorCodein thelibrary response raises
CannotLoadConfiguration.test_get_library_passes_max_age— verifiesLIBRARY_MAX_AGEisforwarded to
Representation.get()inget_library().test_get_library_refreshes_stale_token— end-to-end DB staleness test:ages the
Representationrecord pastLIBRARY_MAX_AGEand verifies thenext call re-fetches from the network.
test_get_advantage_accounts_passes_max_age— verifiesLIBRARY_MAX_AGEis also forwarded in the advantage accounts
Representation.get()call.Checklist