Skip to content

Add EntropyAdaptivePress: per-layer adaptive eviction via attention entropy#206

Open
jagmarques wants to merge 1 commit intoNVIDIA:mainfrom
jagmarques:add-entropy-adaptive-press
Open

Add EntropyAdaptivePress: per-layer adaptive eviction via attention entropy#206
jagmarques wants to merge 1 commit intoNVIDIA:mainfrom
jagmarques:add-entropy-adaptive-press

Conversation

@jagmarques
Copy link
Copy Markdown

Extends ObservedAttentionPress. Layers with peaked attention (structured text) get evicted more aggressively. Layers with uniform attention (creative text) keep more tokens.

No dead code, no external benchmark claims. Just the press and 4 tests.

Tests:

  • Smoke test: press runs on unit_test_model_output_attention
  • Compression: cache size < input size
  • Ratio bounds: min_ratio/max_ratio respected
  • Zero compression: all tokens kept when ratio=0

What changed:

  • kvpress/presses/entropy_adaptive_press.py (63 lines)
  • tests/presses/test_entropy_adaptive_press.py (65 lines)

Extends ObservedAttentionPress so it inherits the attention-based scoring. The entropy computation modulates scores per-layer: uniform attention layers get a score boost (keep more tokens), peaked attention layers don't (evict more).

Extends ObservedAttentionPress. Layers with peaked attention (structured
text) get evicted more aggressively. Layers with uniform attention
(creative text) keep more tokens.

Includes 4 tests using the unit_test_model_output_attention fixture.

Signed-off-by: João André Gomes Marques <joaoagm90@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant