THREESCALE-14969: Rotate OIDC tokens for Zync by jlledom · Pull Request #4310 · 3scale/porta

jlledom · 2026-05-29T08:37:44Z

What this PR does / why we need it:

The issue description explains the situation, but summarizing, #4236 broke the integration with zync. Porta calls PUT /tenant endpoint on zync and pushes the access token to Zync, which will use that access token later to pull data from porta. After #4236, now the access token is sent hashed to zync, which is useless as authentication method, so zync can't retrieve anymore data from porta.

To solve that, this PR implements token rotation every hour, that way we mitigate possible zync DB leaks containing plaintext tokens.

The rotation process implies a small race condition window between the moment the old token is expired and the new one is received by zync and available for next requests. For that reason, old tokens are not set to expire immediately or deleted, instead, they are set to expire 1 day after being discarded.

This way, even in the worse scenario when the zync queue gets stuck for some reason and doesn't process jobs, and also it holds a discarded token in the DB, it still has one complete day to recover.

In order to further mitigate the leaking problem, our plan is to make changes in Zync to implement client token encryption.

In order to further mitigate the race condition problems when rotating, we also include some caching that forces the rotation to happen once per hour, and includes protection against cache stampede when many processes get a cache miss concurrently at the same time. This is better explained in the in-code comments below.

Besides, we also plan to implement retry logic for auth errors in Zync.

Finally, the PR also includes some changes in the Janitor to purge all discarded tokens once per week.

Replaces #4304 and #4309

Which issue(s) this PR fixes

https://redhat.atlassian.net/browse/THREESCALE-14969

Verification steps

Test should pass. Also, just work normally with porta + zync and verify everuthing works as expected, creating domains, pushing apps to Keycloak, etc.

Previously, the OIDC sync token was reused indefinitely via find_or_create_by!. Zync stored the plaintext token in its DB without encryption, so a long-lived token is a security liability. This change rotates the token hourly: on each cache miss the active token is expired (expires_at set to 1 day from now) and a fresh one is created. The plaintext is cached for 1 hour so rotation does not happen on every Zync job. Row locks serialize concurrent cache misses to avoid stampede. Expired tokens are kept for 1 day so Zync can finish any in-flight requests before they become invalid. A new worker (DeleteExpiredOIDCSyncTokensWorker) handles pruning expired OIDC tokens from the database when called. Assisted-by: Claude Code

Now that OIDC sync tokens are expired instead of deleted immediately, there is a need to periodically purge them from the database. The janitor runs weekly and is the right place for this housekeeping. Assisted-by: Claude Code

codecov · 2026-05-29T09:07:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.87%. Comparing base (c708373) to head (1950368).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4310      +/-   ##
==========================================
- Coverage   88.92%   88.87%   -0.06%     
==========================================
  Files        1752     1753       +1     
  Lines       44131    44146      +15     
  Branches      689      689              
==========================================
- Hits        39245    39235      -10     
- Misses       4870     4895      +25     
  Partials       16       16

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jlledom · 2026-05-29T11:34:18Z

+  def self.refresh_oidc_sync
+    user_id = scope_attributes["owner_id"]
+    cache_key = "access_tokens/user:#{user_id}/oidc"
+
+    # Hot path: skip the transaction entirely on cache hit (zero DB queries).
+    cached = Rails.cache.read(cache_key)
+    return cached if cached
+
+    transaction do
+      # Lock existing OIDC tokens to serialize concurrent cache misses (e.g. full resync).
+      # Workers that arrive simultaneously will queue here. On cold start (no tokens yet)
+      # there is nothing to lock and stampede churn is harmless — all created tokens are valid.
+      lock.where(name: OIDC_SYNC_TOKEN).load
+
+      # Double-check inside the transaction: a concurrent worker may have populated the cache
+      # while we were waiting on the row lock above.
+      Rails.cache.fetch(cache_key, expires_in: 1.hour) do
+        # Expire (not delete) the current active token so Zync can keep using it for up to
+        # 1 day while it picks up the new one. The janitor cleans up expired tokens weekly.
+        where(name: OIDC_SYNC_TOKEN, expires_at: nil).update_all(expires_at: 1.day.from_now)
+        create!(name: OIDC_SYNC_TOKEN, scopes: %w[account_management], permission: 'ro').plaintext_value
+      end
+    end
  end


This method could be much more simple, like:

def self.refresh_oidc_sync user_id = scope_attributes["owner_id"] Rails.cache.fetch("access_tokens/user:#{user_id}/oidc", expires_in: 1.hour) do where(name: OIDC_SYNC_TOKEN, expires_at: nil).update_all(expires_at: 1.day.from_now) create!(name: OIDC_SYNC_TOKEN, scopes: %w[account_management], permission: 'ro').plaintext_value end end

But that would cause a non-deterministic result on concurrent cache misses. e.g. When we run the full resync task from #4307 and the cache key is invalid, it could happen that many concurrent processes enter the block at the same time and update-then-create tokens, which would en up on the last of them prevailing and expiring all previously created token. Claude explains:

Simple version (bare Rails.cache.fetch) 1. A, B, C all call Rails.cache.read — all miss 2. A enters Rails.cache.fetch — cache miss, enters block 3. B enters Rails.cache.fetch — also cache miss (A hasn't finished yet), enters block 4. C enters Rails.cache.fetch — also cache miss, enters block 5. A: update_all(expires_at: 1.day.from_now) — expires the active token. Then create! — new token. 6. B: update_all(expires_at: 1.day.from_now) — expires A's freshly created token (it has expires_at: nil). Then create! — another new token. 7. C: same — expires B's token, creates yet another. 8. Each process writes its own value to the cache — last writer wins. Result: 3 rotations, 2 wasted tokens (still valid for 1 day, so no broken auth, but unnecessary churn).

But we could also end up with multiple non-expired tokens, in this scenario:

1. A: update_all — expires old token, commits 2. B: update_all — no rows with expires_at: nil, no-op 3. A: create! — creates T1 (expires_at: nil) 4. C: update_all — expires T1 5. B: create! — creates T2 (expires_at: nil) 6. C: create! — creates T3 (expires_at: nil)

The committed code prevents this because it sets a DB lock at lock.where(name: OIDC_SYNC_TOKEN).load, and processes B and C will wait there, until A closes the transaction block. Since the cache.fetch call is inside the transaction, we ensure the cache is also written when A finished, so B and C will always get a hit when calling cache.fetch.

1. A, B, C all call Rails.cache.read — all miss 2. A enters the transaction first, runs SELECT ... FOR UPDATE — locks the OIDC token row 3. B and C enter transactions, hit SELECT ... FOR UPDATE — blocked at DB level, waiting for A's lock 4. A runs Rails.cache.fetch — cache miss, executes block: expires old token, creates new one, writes to cache. Transaction commits, lock released. 5. B gets the lock, runs SELECT ... FOR UPDATE. Then Rails.cache.fetch — cache hit (A populated it). Returns cached value. Transaction commits. 6. C same as B — cache hit, no DB writes. Result: 1 rotation, 0 wasted tokens.

This is convenient but not really necessary since such race conditions will be rare and the result is just some wasted tokens that the Janitor will purge anyway. So if you want the simple version, I'm fine with it.

I also considered the :race_condition_ttl option for Rails.cache.fetch but discarded it because it only prevents the cache stampede in the short period of time of N seconds after the cache expires, in any other scenario all processes would enter the block anyway.

The lock increases DB query count. If we really need locking, them maybe better use redlock like we do for billing.

Given the complexity I wonder if we should avoid caching, expire the tokens and clear them with janitor every night.

akostadinov · 2026-05-29T13:47:16Z

+        # Expire (not delete) the current active token so Zync can keep using it for up to
+        # 1 day while it picks up the new one. The janitor cleans up expired tokens weekly.
+        where(name: OIDC_SYNC_TOKEN, expires_at: nil).update_all(expires_at: 1.day.from_now)


Doesn't zync pick up then new one almost immediately? 1 day to pick just he key looks excessive.

Also didn't we think to provide expiring tokens always to begin with?

jlledom added 2 commits May 29, 2026 10:15

feat(janitor): schedule expired OIDC token cleanup

1950368

Now that OIDC sync tokens are expired instead of deleted immediately, there is a need to periodically purge them from the database. The janitor runs weekly and is the right place for this housekeeping. Assisted-by: Claude Code

jlledom self-assigned this May 29, 2026

jlledom requested review from akostadinov, madnialihussain and mayorova May 29, 2026 11:09

jlledom marked this pull request as ready for review May 29, 2026 11:09

This was referenced May 29, 2026

THREESCALE-14969: Allow authentication by zync token #4304

Closed

zync needs plaintext tokens #4309

Closed

jlledom commented May 29, 2026

View reviewed changes

akostadinov reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

THREESCALE-14969: Rotate OIDC tokens for Zync#4310

THREESCALE-14969: Rotate OIDC tokens for Zync#4310
jlledom wants to merge 2 commits into
masterfrom
THREESCALE-14969-zync-rotate-tokens

jlledom commented May 29, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 29, 2026 •

edited

Loading

Uh oh!

jlledom May 29, 2026 •

edited

Loading

Uh oh!

jlledom May 29, 2026

Uh oh!

akostadinov May 29, 2026

Uh oh!

akostadinov May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jlledom commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jlledom May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlledom May 29, 2026

Choose a reason for hiding this comment

Uh oh!

akostadinov May 29, 2026

Choose a reason for hiding this comment

Uh oh!

akostadinov May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jlledom commented May 29, 2026 •

edited

Loading

codecov Bot commented May 29, 2026 •

edited

Loading

jlledom May 29, 2026 •

edited

Loading

akostadinov May 29, 2026 •

edited

Loading