Refresh OverDrive collection token every 30 days (PP-3937) by dbernstein · Pull Request #3204 · ThePalaceProject/circulation

dbernstein · 2026-04-06T22:01:26Z

Description

Add a two-layer cache for OverDrive collection tokens so they are
automatically refreshed as OverDrive rotates them, without requiring a
process restart or configuration change.

Motivation and Context

OverDrive is retiring legacy collection tokens within the next 3–4 months
and moving to a model where tokens must be refreshed periodically.
collectionToken is embedded in the library account response
(GET /v1/libraries/{id}) and is required for all collection-scoped API
calls (search, product listings, availability, metadata).
Two problems existed in the prior implementation:

DB cache had no TTL. get_library() and get_advantage_accounts()
called Representation.get() with no max_age, so the library document
(and the collectionToken within it) could persist in the
representations table indefinitely.
In-memory cache was unbounded. _collection_token was a plain
str | None set once and never cleared. Flask workers hold one
OverdriveAPI instance per collection for the entire process lifetime
(CirculationManager.load_settings), so a rotated token would never be
picked up without a full process restart or an admin config change.

Solution: two-layer cache

Layer	Constant	TTL	Purpose
In-memory (`_cached_collection_token`)	`COLLECTION_TOKEN_MAX_AGE`	5 minutes	Avoids a DB hit on every request
DB (`representations` table)	`LIBRARY_MAX_AGE`	30 days	Avoids a network hit on every 5-minute miss
A long-lived Flask worker re-checks the DB every 5 minutes and the DB
re-fetches from OverDrive every 30 days. Rotated tokens are picked up
within 5 minutes.

How Has This Been Tested?

test_collection_token — verifies the in-memory cache is populated on
first access and reused on subsequent calls within COLLECTION_TOKEN_MAX_AGE.
test_collection_token_cache_expires — verifies that aging the
in-memory cache past COLLECTION_TOKEN_MAX_AGE causes get_library to
be called again and returns the new token.
test_collection_token_error — verifies that an errorCode in the
library response raises CannotLoadConfiguration.
test_get_library_passes_max_age — verifies LIBRARY_MAX_AGE is
forwarded to Representation.get() in get_library().
test_get_library_refreshes_stale_token — end-to-end DB staleness test:
ages the Representation record past LIBRARY_MAX_AGE and verifies the
next call re-fetches from the network.
test_get_advantage_accounts_passes_max_age — verifies LIBRARY_MAX_AGE
is also forwarded in the advantage accounts Representation.get() call.

Checklist

I have updated the documentation accordingly.
All new and existing tests passed.

codecov · 2026-04-06T22:09:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.30%. Comparing base (0312188) to head (96b09f4).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3204   +/-   ##
=======================================
  Coverage   93.30%   93.30%           
=======================================
  Files         497      497           
  Lines       46163    46166    +3     
  Branches     6326     6326           
=======================================
+ Hits        43074    43077    +3     
+ Misses       2004     2003    -1     
- Partials     1085     1086    +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jonathangreen

I approved this @dbernstein, but I think there is some interaction between the two layers of caching here that are not ideal. I added some comments with my thoughts for consideration.

I don't think we really need two layers of caching here. It makes it kind of hard to reason about what is going on. I think we might be better off just dropping the in-memory cache and caching via Representation.get.

jonathangreen · 2026-04-07T15:20:38Z

+        self._cached_collection_token = OverdriveToken(
+            token=token,
+            expires=utc_now() + self.LIBRARY_MAX_AGE,
+        )


Because of the two layer caching here, if the self.get_library returns a cached representation near the end of its TTL, then we will cache it here for an additional LIBRARY_MAX_AGE. So a token could be used for 2 * LIBRARY_MAX_AGE.

jonathangreen · 2026-04-07T15:23:41Z

            representation, cached = Representation.get(
                self._db,
                url,
                self.get,
                exception_handler=Representation.reraise_exception,
+                max_age=self.LIBRARY_MAX_AGE,
            )


Representation.get returns the old document on error, even if max_age has passed. So its possible that a failure will make the old document stick around. Combined with the second layer of caching, I think this means old tokens can hang around for some time.

dbernstein · 2026-04-07T20:52:50Z

@jonathangreen : thanks for the review. I'll drop the in-memory cacheing and we can always bring it (or something similar) back if necessary.

dbernstein · 2026-04-09T00:43:00Z

@jonathangreen : in the current state of the app (ie what is currently deployed in production), calls to the OverDrive API is caching the token once it has been read and it doesn't refresh it for the life of the OverdriveAPI object. For celery this is not a problem since overdrive api instances are shortlived. For the flask layer calls, the OD API hangs around for the entire life of the process which is only restarted if a uwsgi process runs into trouble. So if we didn't restart every two weeks, we could hit a snag. That was the motivation for the second level cache: basically keep the existing caching functionality but also try to guarantee we'll recycle the token every X days.

I just want to make sure you're clear that by removing the second layer, we would be losing an existing performance optimization for both the flask calls (not a big deal since there are so many fewer) and the celery import calls (multiple API calls within a single celery task). I agree with your assessment about being hard to reason about and the possible document failure you identified. Just want to make sure we're aligned.

Another option is to have the second level cache work on a different frequency than the first level cache - ie just enough time for the celery task to enjoy the caching of the token (from the database) - ie couple of minutes.

jonathangreen · 2026-04-09T01:12:40Z

Thanks @dbernstein , we're aligned. I understand the tradeoff, without the in-memory layer, each collection_token access during a Celery import will do a DB query via Representation.get().

I'm fine with dropping the in-memory cache to keep things simple, but I'd also be okay with keeping it at a shorter TTL as you suggested. My concern wasn't the in-memory cache itself, just the long lifetime causing the unexpected interactions I flagged.

OverDrive is retiring legacy collection tokens and will require periodic refresh. Previously, get_library() and get_advantage_accounts() passed no max_age to Representation.get(), so cached library documents (and the collectionTokens within them) could persist in the representations table indefinitely. Add LIBRARY_MAX_AGE = timedelta(days=30) as a typed class constant and pass it to both Representation.get() calls so tokens are transparently re-fetched after 30 days — covering both main and Advantage child accounts. Add three tests: verify max_age is forwarded in get_library(), verify end-to-end staleness re-fetch behavior using the real DB, and verify max_age is forwarded in get_advantage_accounts().

The previous implementation cached _collection_token as a plain string for the lifetime of the OverdriveAPI instance. In the Flask path, CirculationManager.load_settings() keeps one OverdriveAPI per collection alive for the entire process lifetime, so the cached token was never refreshed — making the LIBRARY_MAX_AGE fix on Representation.get() ineffective for web requests. Replace the unbounded in-memory cache with a TTL-based cache using the existing OverdriveToken type (same pattern as _cached_client_oauth_token). The token is now stored alongside an expiry timestamp set to utc_now() + LIBRARY_MAX_AGE. On each access the TTL is checked first (zero DB/network I/O if fresh); once expired, get_library() is called again and the Representation layer handles the actual HTTP staleness check. Also update docstrings on collection_token, get_advantage_accounts, and the _cached_collection_token comment to explain the two-layer cache design. Add tests for cache hit, TTL expiry, and errorCode handling.

) The prior commit set the in-memory _cached_collection_token TTL to LIBRARY_MAX_AGE (30 days), which was effectively as unbounded as the original plain string cache. Add a separate COLLECTION_TOKEN_MAX_AGE = timedelta(minutes=5) constant and use it for the in-memory expiry. The DB-layer TTL (LIBRARY_MAX_AGE, 30 days) is unchanged. Long-lived Flask workers now re-check the DB every 5 minutes and pick up a rotated token within that window, while avoiding a DB hit on every individual request. Also align MockOverdriveAPI and test_script.py to use timedelta(hours=1) for the fake token expiry (consistent with the client OAuth token mock), and update test comments to reference COLLECTION_TOKEN_MAX_AGE.

dbernstein marked this pull request as ready for review April 6, 2026 22:03

dbernstein marked this pull request as draft April 6, 2026 22:05

dbernstein force-pushed the feature/PP-3937-refresh-overdrive-tokens-every-30-days branch from cf5fba4 to 89c45e9 Compare April 6, 2026 22:13

dbernstein marked this pull request as ready for review April 6, 2026 23:14

dbernstein requested a review from a team April 6, 2026 23:41

jonathangreen approved these changes Apr 7, 2026

View reviewed changes

dbernstein added 4 commits April 9, 2026 15:52

Fix tests.

df6038a

dbernstein force-pushed the feature/PP-3937-refresh-overdrive-tokens-every-30-days branch from b7a099d to 96b09f4 Compare April 9, 2026 22:52

dbernstein enabled auto-merge (squash) April 9, 2026 22:53

dbernstein added the feature New feature label Apr 9, 2026

dbernstein merged commit d99f37e into main Apr 9, 2026
20 checks passed

dbernstein deleted the feature/PP-3937-refresh-overdrive-tokens-every-30-days branch April 9, 2026 23:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refresh OverDrive collection token every 30 days (PP-3937)#3204

Refresh OverDrive collection token every 30 days (PP-3937)#3204
dbernstein merged 4 commits intomainfrom
feature/PP-3937-refresh-overdrive-tokens-every-30-days

dbernstein commented Apr 6, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

jonathangreen left a comment

Uh oh!

jonathangreen Apr 7, 2026

Uh oh!

jonathangreen Apr 7, 2026

Uh oh!

dbernstein commented Apr 7, 2026

Uh oh!

dbernstein commented Apr 9, 2026 •

edited

Loading

Uh oh!

jonathangreen commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dbernstein commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Solution: two-layer cache

How Has This Been Tested?

Checklist

Uh oh!

codecov Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jonathangreen left a comment

Choose a reason for hiding this comment

Uh oh!

jonathangreen Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

jonathangreen Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

dbernstein commented Apr 7, 2026

Uh oh!

dbernstein commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonathangreen commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dbernstein commented Apr 6, 2026 •

edited

Loading

codecov Bot commented Apr 6, 2026 •

edited

Loading

dbernstein commented Apr 9, 2026 •

edited

Loading