fix: store KeyIndex entries without exptime to allow safe flush_expired usage#13
fix: store KeyIndex entries without exptime to allow safe flush_expired usage#13sihyeonn wants to merge 1 commit into
Conversation
…ed usage When metrics are registered with an exptime (e.g. expire=300), the KeyIndex index entries (__key:N) were also stored with the same exptime via dict:add(). However, while metric value keys have their TTL refreshed on every incr/set call, the KeyIndex index entries are NOT refreshed because prometheus.lua's lookup_or_create() returns early from its local Lua cache on cache hit, skipping the key_index:add() call entirely. This means that after 'exptime' seconds, the index entry expires and gets removed by flush_expired() — even for metrics that are still actively being recorded. On the next metric_data() collection, sync_expired() detects the missing index entry and removes it from memory, causing the metric to silently disappear from Prometheus output. Fix: store KeyIndex entries with nil exptime (permanent). Since dict:add() only succeeds once per key (returns 'exists' error if key exists), setting nil here means the index entry is created permanently, while metric value keys still expire normally. flush_expired() will then only remove expired metric value entries, leaving the index intact.
📝 WalkthroughWalkthroughModified the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
prometheus_keys.lua (1)
141-143: Minor inconsistency: markingexpire_keysfor permanent entries.This code marks
expire_keys[N] = truewhenexptime > 0, but the index entry is now created with no TTL (permanent). This means:
remove_expired_keys()will iterate over these keysdict:ttl()returns 0 for permanent keys- The condition
ttl >= 0is true, so keys are never removedWhile not incorrect (the code is defensive), it causes unnecessary iteration in
remove_expired_keys()over keys that can never expire. Additionally, there's an inconsistency withsync_range()at lines 73-79, which usesttl ~= 0to decide whether to markexpire_keys[i]—permanent keys will not be marked during sync.This is minor since the functional behavior is correct, but you could consider either:
- Not marking
expire_keysfor permanent index entries (remove lines 141-143)- Or accepting this as acceptable overhead for tracking metrics that should have expiration semantics
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prometheus_keys.lua` around lines 141 - 143, The loop that sets self.expire_keys[N] = true when exptime > 0 should avoid marking permanent index entries (exptime == 0) to prevent needless work in remove_expired_keys(); update the logic around exptime and the creation of the index entry (the exptime variable and assignment to self.expire_keys[N]) so it only marks expire_keys for entries with a nonzero TTL (consistent with sync_range which checks ttl ~= 0), and ensure remove_expired_keys, dict:ttl checks, and the index creation paths (N / i usage) remain consistent after this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@prometheus_keys.lua`:
- Around line 141-143: The loop that sets self.expire_keys[N] = true when
exptime > 0 should avoid marking permanent index entries (exptime == 0) to
prevent needless work in remove_expired_keys(); update the logic around exptime
and the creation of the index entry (the exptime variable and assignment to
self.expire_keys[N]) so it only marks expire_keys for entries with a nonzero TTL
(consistent with sync_range which checks ttl ~= 0), and ensure
remove_expired_keys, dict:ttl checks, and the index creation paths (N / i usage)
remain consistent after this change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b23dad04-3911-434d-9a05-a013654f03ca
📒 Files selected for processing (1)
prometheus_keys.lua
Problem
When metrics are registered with an
exptime(e.g.expire=300), the KeyIndexindex entries (
__key:N) are stored with the sameexptimeviadict:add().While metric value keys have their TTL refreshed on every
incr/setcall,the KeyIndex index entries are never refreshed because
prometheus.lua'slookup_or_create()returns early from its local Lua cache on cache hit,skipping the
key_index:add()call entirely:After
exptimeseconds, the index entry expires and gets removed byflush_expired()— even for metrics that are still actively being recorded.On the next
metric_data()collection,sync_expired()detects the missingindex entry and removes it from memory, causing the metric to silently
disappear from Prometheus output.
Note on existing TTL refresh code
The current code already attempts to refresh TTL for existing keys (lines 118-133):
However, this code is unreachable for active metrics because
lookup_or_create()'slocal cache prevents
key_index:add()from being called in the first place.Fix
Store KeyIndex entries with
nilexptime (permanent). Sincedict:add()onlysucceeds once per key (returns
'exists'error if the key already exists),changing
exptimetonilhere means the index entry is created permanently,while metric value keys still expire normally via
incr/set.flush_expired()will then only remove expired metric value entries, leavingthe index intact — making it safe to call
flush_expired()periodicallyto reclaim nginx slab memory from expired metrics.
Reproduction
exptime=300ngx.shared["prometheus-metrics"]:flush_expired()/metricsoutput despite still being activeSummary by CodeRabbit
Release Notes