Skip to content

Refresh OverDrive collection token every 30 days (PP-3937)#3204

Merged
dbernstein merged 4 commits intomainfrom
feature/PP-3937-refresh-overdrive-tokens-every-30-days
Apr 9, 2026
Merged

Refresh OverDrive collection token every 30 days (PP-3937)#3204
dbernstein merged 4 commits intomainfrom
feature/PP-3937-refresh-overdrive-tokens-every-30-days

Conversation

@dbernstein
Copy link
Copy Markdown
Contributor

@dbernstein dbernstein commented Apr 6, 2026

Description

Add a two-layer cache for OverDrive collection tokens so they are
automatically refreshed as OverDrive rotates them, without requiring a
process restart or configuration change.

Motivation and Context

OverDrive is retiring legacy collection tokens within the next 3–4 months
and moving to a model where tokens must be refreshed periodically.
collectionToken is embedded in the library account response
(GET /v1/libraries/{id}) and is required for all collection-scoped API
calls (search, product listings, availability, metadata).
Two problems existed in the prior implementation:

  1. DB cache had no TTL. get_library() and get_advantage_accounts()
    called Representation.get() with no max_age, so the library document
    (and the collectionToken within it) could persist in the
    representations table indefinitely.
  2. In-memory cache was unbounded. _collection_token was a plain
    str | None set once and never cleared. Flask workers hold one
    OverdriveAPI instance per collection for the entire process lifetime
    (CirculationManager.load_settings), so a rotated token would never be
    picked up without a full process restart or an admin config change.

Solution: two-layer cache

Layer Constant TTL Purpose
In-memory (_cached_collection_token) COLLECTION_TOKEN_MAX_AGE 5 minutes Avoids a DB hit on every request
DB (representations table) LIBRARY_MAX_AGE 30 days Avoids a network hit on every 5-minute miss
A long-lived Flask worker re-checks the DB every 5 minutes and the DB
re-fetches from OverDrive every 30 days. Rotated tokens are picked up
within 5 minutes.

How Has This Been Tested?

  • test_collection_token — verifies the in-memory cache is populated on
    first access and reused on subsequent calls within COLLECTION_TOKEN_MAX_AGE.
  • test_collection_token_cache_expires — verifies that aging the
    in-memory cache past COLLECTION_TOKEN_MAX_AGE causes get_library to
    be called again and returns the new token.
  • test_collection_token_error — verifies that an errorCode in the
    library response raises CannotLoadConfiguration.
  • test_get_library_passes_max_age — verifies LIBRARY_MAX_AGE is
    forwarded to Representation.get() in get_library().
  • test_get_library_refreshes_stale_token — end-to-end DB staleness test:
    ages the Representation record past LIBRARY_MAX_AGE and verifies the
    next call re-fetches from the network.
  • test_get_advantage_accounts_passes_max_age — verifies LIBRARY_MAX_AGE
    is also forwarded in the advantage accounts Representation.get() call.

Checklist

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

@dbernstein dbernstein marked this pull request as ready for review April 6, 2026 22:03
@dbernstein dbernstein marked this pull request as draft April 6, 2026 22:05
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.30%. Comparing base (0312188) to head (96b09f4).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3204   +/-   ##
=======================================
  Coverage   93.30%   93.30%           
=======================================
  Files         497      497           
  Lines       46163    46166    +3     
  Branches     6326     6326           
=======================================
+ Hits        43074    43077    +3     
+ Misses       2004     2003    -1     
- Partials     1085     1086    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dbernstein dbernstein force-pushed the feature/PP-3937-refresh-overdrive-tokens-every-30-days branch from cf5fba4 to 89c45e9 Compare April 6, 2026 22:13
@dbernstein dbernstein marked this pull request as ready for review April 6, 2026 23:14
@dbernstein dbernstein requested a review from a team April 6, 2026 23:41
Copy link
Copy Markdown
Member

@jonathangreen jonathangreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved this @dbernstein, but I think there is some interaction between the two layers of caching here that are not ideal. I added some comments with my thoughts for consideration.

I don't think we really need two layers of caching here. It makes it kind of hard to reason about what is going on. I think we might be better off just dropping the in-memory cache and caching via Representation.get.

Comment on lines +424 to +427
self._cached_collection_token = OverdriveToken(
token=token,
expires=utc_now() + self.LIBRARY_MAX_AGE,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the two layer caching here, if the self.get_library returns a cached representation near the end of its TTL, then we will cache it here for an additional LIBRARY_MAX_AGE. So a token could be used for 2 * LIBRARY_MAX_AGE.

Comment on lines 571 to 577
representation, cached = Representation.get(
self._db,
url,
self.get,
exception_handler=Representation.reraise_exception,
max_age=self.LIBRARY_MAX_AGE,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Representation.get returns the old document on error, even if max_age has passed. So its possible that a failure will make the old document stick around. Combined with the second layer of caching, I think this means old tokens can hang around for some time.

@dbernstein
Copy link
Copy Markdown
Contributor Author

@jonathangreen : thanks for the review. I'll drop the in-memory cacheing and we can always bring it (or something similar) back if necessary.

@dbernstein
Copy link
Copy Markdown
Contributor Author

dbernstein commented Apr 9, 2026

@jonathangreen : in the current state of the app (ie what is currently deployed in production), calls to the OverDrive API is caching the token once it has been read and it doesn't refresh it for the life of the OverdriveAPI object. For celery this is not a problem since overdrive api instances are shortlived. For the flask layer calls, the OD API hangs around for the entire life of the process which is only restarted if a uwsgi process runs into trouble. So if we didn't restart every two weeks, we could hit a snag. That was the motivation for the second level cache: basically keep the existing caching functionality but also try to guarantee we'll recycle the token every X days.

I just want to make sure you're clear that by removing the second layer, we would be losing an existing performance optimization for both the flask calls (not a big deal since there are so many fewer) and the celery import calls (multiple API calls within a single celery task). I agree with your assessment about being hard to reason about and the possible document failure you identified. Just want to make sure we're aligned.

Another option is to have the second level cache work on a different frequency than the first level cache - ie just enough time for the celery task to enjoy the caching of the token (from the database) - ie couple of minutes.

@jonathangreen
Copy link
Copy Markdown
Member

Thanks @dbernstein , we're aligned. I understand the tradeoff, without the in-memory layer, each collection_token access during a Celery import will do a DB query via Representation.get().

I'm fine with dropping the in-memory cache to keep things simple, but I'd also be okay with keeping it at a shorter TTL as you suggested. My concern wasn't the in-memory cache itself, just the long lifetime causing the unexpected interactions I flagged.

OverDrive is retiring legacy collection tokens and will require periodic
refresh. Previously, get_library() and get_advantage_accounts() passed no
max_age to Representation.get(), so cached library documents (and the
collectionTokens within them) could persist in the representations table
indefinitely.

Add LIBRARY_MAX_AGE = timedelta(days=30) as a typed class constant and
pass it to both Representation.get() calls so tokens are transparently
re-fetched after 30 days — covering both main and Advantage child accounts.

Add three tests: verify max_age is forwarded in get_library(), verify
end-to-end staleness re-fetch behavior using the real DB, and verify
max_age is forwarded in get_advantage_accounts().
The previous implementation cached _collection_token as a plain string
for the lifetime of the OverdriveAPI instance. In the Flask path,
CirculationManager.load_settings() keeps one OverdriveAPI per collection
alive for the entire process lifetime, so the cached token was never
refreshed — making the LIBRARY_MAX_AGE fix on Representation.get()
ineffective for web requests.

Replace the unbounded in-memory cache with a TTL-based cache using the
existing OverdriveToken type (same pattern as _cached_client_oauth_token).
The token is now stored alongside an expiry timestamp set to
utc_now() + LIBRARY_MAX_AGE. On each access the TTL is checked first
(zero DB/network I/O if fresh); once expired, get_library() is called
again and the Representation layer handles the actual HTTP staleness check.
Also update docstrings on collection_token, get_advantage_accounts, and
the _cached_collection_token comment to explain the two-layer cache design.
Add tests for cache hit, TTL expiry, and errorCode handling.
)

The prior commit set the in-memory _cached_collection_token TTL to
LIBRARY_MAX_AGE (30 days), which was effectively as unbounded as the
original plain string cache.

Add a separate COLLECTION_TOKEN_MAX_AGE = timedelta(minutes=5) constant
and use it for the in-memory expiry. The DB-layer TTL (LIBRARY_MAX_AGE,
30 days) is unchanged. Long-lived Flask workers now re-check the DB every
5 minutes and pick up a rotated token within that window, while avoiding
a DB hit on every individual request.

Also align MockOverdriveAPI and test_script.py to use timedelta(hours=1)
for the fake token expiry (consistent with the client OAuth token mock),
and update test comments to reference COLLECTION_TOKEN_MAX_AGE.
@dbernstein dbernstein force-pushed the feature/PP-3937-refresh-overdrive-tokens-every-30-days branch from b7a099d to 96b09f4 Compare April 9, 2026 22:52
@dbernstein dbernstein enabled auto-merge (squash) April 9, 2026 22:53
@dbernstein dbernstein added the feature New feature label Apr 9, 2026
@dbernstein dbernstein merged commit d99f37e into main Apr 9, 2026
20 checks passed
@dbernstein dbernstein deleted the feature/PP-3937-refresh-overdrive-tokens-every-30-days branch April 9, 2026 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants