fix(CubeProxy): initialize random seed in worker phase by novahe · Pull Request #138 · TencentCloud/CubeSandbox

novahe · 2026-05-04T11:51:57Z

Summary

Seed the random number generator for each worker to ensure that cache TTL jitter (math.random) works correctly.

For example, in our cubeProxy logic, we use math.random to calculate the cache TTL jitter:

local function get_cache_timeout()
    return math.random(tonumber(ngx.var.timeout_min), tonumber(ngx.var.timeout_max))
end

(Reference: rewrite_phase.lua#L20-L22)

Without a unique seed per worker, all OpenResty workers initialize with the exact same default internal state. If multiple workers receive concurrent requests and calculate this timeout, they will all call math.random() and receive the exact same value. This causes all workers to set identical expiration times in the shared cache, leading to a synchronized stampede to the backend when that specific time is reached, completely defeating the purpose of adding jitter.

Here is a simple demo illustrating how the lack of unique seeds synchronizes the behavior across workers:

Demo:

-- Simulate behavior in init_worker_phase.lua
-- Scenario: Workers starting simultaneously to handle requests

-- [Case A]: No fix (All Workers share the same seed, default is 1)
local function worker_without_fix(id)
    math.randomseed(1) -- OpenResty default behavior, all workers start with the same state
    local jitter = math.random(1, 100)
    print(string.format("Worker %d (No Fix) - Calculated Jitter: %d", id, jitter))
end

-- [Case B]: With fix (Each Worker has a unique seed)
local function worker_with_fix(id)
    -- Fix: Use timestamp + Worker ID to ensure unique seeds
    math.randomseed(os.time() + id) 
    local jitter = math.random(1, 100)
    print(string.format("Worker %d (Fixed)  - Calculated Jitter: %d", id, jitter))
end

print("--- Testing Synchronized Behavior ---")
for i = 1, 3 do worker_without_fix(i) end

print("\n--- Testing Distributed Behavior ---")
for i = 1, 3 do worker_with_fix(i) end

Output:

--- Testing Synchronized Behavior ---
Worker 1 (No Fix) - Calculated Jitter: 82
Worker 2 (No Fix) - Calculated Jitter: 82
Worker 3 (No Fix) - Calculated Jitter: 82  <-- All random values are identical!

--- Testing Distributed Behavior ---
Worker 1 (Fixed)  - Calculated Jitter: 45
Worker 2 (Fixed)  - Calculated Jitter: 12
Worker 3 (Fixed)  - Calculated Jitter: 98  <-- Truly distributed randomness

By uniquely seeding each worker based on its ngx.worker.id() and the current time, we ensure the random distribution works as intended across the entire proxy layer, keeping backend refreshes safely distributed.

Verification with Real-world Data (from dev environment)

To further validate this, I performed an end-to-end test in a live environment with multiple workers handling concurrent requests.

Case 1: Without Fix (Seeding commented out)

As expected, different workers generated identical timeout sequences, which would trigger a synchronized stampede:

2026/05/07 01:21:19 [notice] worker=6  timeout=618
2026/05/07 01:21:19 [notice] worker=9  timeout=618  <-- COLLISION!
2026/05/07 01:21:19 [notice] worker=6  timeout=516
2026/05/07 01:21:08 [notice] worker=0  timeout=516  <-- COLLISION!

Case 2: With Fix (Unique seeding applied)

Each worker now generates its own unique, distributed timeout value even when requests are handled simultaneously:

2026/05/07 01:19:31 [notice] worker=0  timeout=629
2026/05/07 01:19:31 [notice] worker=3  timeout=661
2026/05/07 01:19:31 [notice] worker=9  timeout=657
2026/05/07 01:19:31 [notice] worker=10 timeout=506

This confirms that the fix successfully decouples the workers' random state, ensuring that the cache jitter works as intended across the entire proxy layer.

ref: #135 (comment)

chenhengqi · 2026-05-06T03:12:49Z

+-- Seed the random number generator for each worker to ensure
+-- that cache TTL jitter (math.random) works correctly.
+math.randomseed(ngx.now() * 1000 + ngx.worker.id())


Please elaborate more on this change in commit message. I still don't get the point of this change.

This is essential for features like cache TTL jitter to work as intended and avoid synchronized cache expiration stampedes.

The cache is shared by all workers. How does this change help with the issue describe above?

I've updated the above.

In OpenResty, all worker processes inherit the same state from the master process. Without explicitly seeding the random number generator in the init_worker phase, each worker starts with the same default seed. This results in math.random() producing the exact same sequence of numbers across all workers. Seeding with (ngx.now() * 1000 + ngx.worker.id()) ensures that each worker has a unique, time-varying seed. This is essential for features like cache TTL jitter to work as intended and avoid synchronized cache expiration stampedes. Signed-off-by: novahe <heqianfly@gmail.com>

chenhengqi · 2026-05-07T01:49:00Z

The change itself looks good, but I still doubt the rationale.

What's the real issue in the following scenario?

Worker 1 (No Fix) - Calculated Jitter: 82 -> for sandbox A on node 1
Worker 2 (No Fix) - Calculated Jitter: 82 -> for sandbox B on node 2
Worker 3 (No Fix) - Calculated Jitter: 82 -> for sandbox C on node 3

novahe requested review from chenhengqi, fslongjin, ls-ggg, tinklone and up2wing as code owners May 4, 2026 11:51

novahe mentioned this pull request May 4, 2026

fix(CubeProxy): implement singleflight pattern for routing and health checks #135

Open

chenhengqi reviewed May 6, 2026

View reviewed changes

ls-ggg assigned chenhengqi May 6, 2026

novahe force-pushed the fix/random-seed branch from ba71c20 to 0b5f058 Compare May 6, 2026 05:04

novahe requested a review from chenhengqi May 6, 2026 06:39

kinwin-ustc merged commit f746f8e into TencentCloud:master May 9, 2026
2 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(CubeProxy): initialize random seed in worker phase#138

fix(CubeProxy): initialize random seed in worker phase#138
kinwin-ustc merged 1 commit into
TencentCloud:masterfrom
novahe:fix/random-seed

novahe commented May 4, 2026 •

edited

Loading

Uh oh!

chenhengqi May 6, 2026

Uh oh!

novahe May 6, 2026

Uh oh!

chenhengqi May 6, 2026

Uh oh!

novahe May 6, 2026

Uh oh!

chenhengqi commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

novahe commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification with Real-world Data (from dev environment)

Case 1: Without Fix (Seeding commented out)

Case 2: With Fix (Unique seeding applied)

Uh oh!

chenhengqi May 6, 2026

Choose a reason for hiding this comment

Uh oh!

novahe May 6, 2026

Choose a reason for hiding this comment

Uh oh!

chenhengqi May 6, 2026

Choose a reason for hiding this comment

Uh oh!

novahe May 6, 2026

Choose a reason for hiding this comment

Uh oh!

chenhengqi commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

novahe commented May 4, 2026 •

edited

Loading