Skip to content

tests(benchmarks): expire/persist coverage + group metadata#9380

Open
fcostaoliveira wants to merge 3 commits into
masterfrom
bench/expire-persist-coverage
Open

tests(benchmarks): expire/persist coverage + group metadata#9380
fcostaoliveira wants to merge 3 commits into
masterfrom
bench/expire-persist-coverage

Conversation

@fcostaoliveira
Copy link
Copy Markdown
Contributor

@fcostaoliveira fcostaoliveira commented Apr 30, 2026

Summary

Extend the expire/persist benchmark group so PR #9356 (doc-expiration fast path) — and any future change to the expire_cmd / persist_cmd notification branches — can be evaluated for both improvements and regressions on the existing on-the-fly EC2 infrastructure.

Adds 4 new specs (all oss-standalone only)

Spec Why it exists
search-persist-doc-1000-seconds.yml PERSIST was previously unbenched; this exercises the PERSIST branch of OnKeySpaceNotification and the matching fast path.
search-expire-doc-multi-index-10-milliseconds.yml Three FT indexes on the same prefix — amplifies the per-spec fan-out delta between full reindex and the metadata-only path.
search-expire-doc-50-50-10-milliseconds.yml 50/50 write/read ratio so PEXPIRE-driven work dominates the workload — clean signal well above the variance floor.
search-expire-doc-json-10-milliseconds.yml JSON path coverage for Document_LoadSchemaFieldJson and the shared GetKeyExpirationTime helper.

All four:

  • use a deterministic catch-all FT.SEARCH * NOCONTENT LIMIT 0 1 so query latency is tight and re-triggers on the same branch produce stable throughput,
  • run with DEBUG SET-ACTIVE-EXPIRE 0 so the dataset never evicts during the test (the fast path is exercised purely as a metadata update),
  • use -c 16 -t 4 (or -c 32 -t 4 for the 10K JSON dataset) to sustain ≥1000 QPS.

Adds metadata grouping (8 files)

Adds metadata.group: "expire-persist" and a per-spec use_case string to all eight expire/persist specs (the four new ones plus the four existing search-expire-{doc,numeric-field}-{10-milliseconds,1000-seconds}.yml). No behavioral changes to the existing specs — only metadata. The numeric-field specs serve as negative controls for the doc-expiration fast path (they exercise hexpire_cmd, which is not on the changed code path).

Why a separate PR from #9356

These specs are evaluation infrastructure — they need to land independently of the optimization being measured so baseline numbers can be produced from master before #9356 merges.

Test plan

  • redisbench-admin run-remote against this branch on oss-standalone for each new spec — confirm ≥1000 QPS and <5% CV across 3 datapoints.
  • redisbench-admin compare master-baseline vs PR [MOD-14930] refactor expire handling without full reindexing #9356 head on the four doc-level specs (existing + new) — should show improvement on the doc-expiration fast path.
  • Same comparison on the two numeric-field specs — should stay flat (negative-control sanity check).
  • --enable-profilers PROFILE=1 on search-expire-doc-multi-index-10-milliseconds to verify the per-spec write-lock pattern in Indexes_UpdateMatchingDocExpiration is not contended.

Note

Low Risk
Low risk: changes are confined to benchmark YAML specs (metadata and new workloads) with no production code impact.

Overview
Adds metadata.group: "expire-persist" and a descriptive use_case field to the existing expire benchmarks so they can be consistently grouped and understood in reporting.

Introduces four new oss-standalone benchmark specs to broaden expire/persist coverage: a PERSIST notification workload, a high write-ratio PEXPIRE workload, a multi-index fan-out PEXPIRE workload, and a JSON document PEXPIRE workload (all using deterministic FT.SEARCH patterns and disabled active expiration for stable signal).

Reviewed by Cursor Bugbot for commit 8d14d1a. Bugbot is set up for automated code reviews on this repo. Configure here.

Add four new oss-standalone-only benchmark specs targeting the
doc-expiration fast path introduced in PR #9356:

- search-persist-doc-1000-seconds: covers the PERSIST keyspace-notification
  branch (previously unbenched).
- search-expire-doc-multi-index-10-milliseconds: three FT indexes on the
  same prefix to amplify the per-spec fan-out delta between full reindex
  and the new metadata-only path.
- search-expire-doc-50-50-10-milliseconds: high (50/50) write-ratio variant
  of the existing 5/95 PEXPIRE bench for clean signal above the m5/m7i
  variance floor.
- search-expire-doc-json-10-milliseconds: JSON variant covering the
  Document_LoadSchemaFieldJson side of the GetKeyExpirationTime helper.

All four use a deterministic catch-all FT.SEARCH query, disable active
expiration, and bump per-thread connection count so they sustain >1000 QPS
with low re-trigger variance.

Also tag the existing four expire specs (and the four new ones) with a
shared metadata.group: "expire-persist" plus a per-spec use_case string so
this evaluation group can be selected as a unit.
@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented Apr 30, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e64afd9. Configure here.

clientconfig:
benchmark_type: "mixed"
tool: memtier_benchmark
arguments: "--test-time 180 -c 32 -t 4 --hide-histogram --key-prefix 'doc:single' --key-minimum 1 --key-maximum 10000 --command 'FT.SEARCH idx:single * NOCONTENT LIMIT 0 1' --command-ratio 95 --command 'PEXPIRE __key__ 10' --command-ratio 5"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON benchmark key-prefix missing trailing colon separator

High Severity

The --key-prefix 'doc:single' in memtier arguments generates keys like doc:single1, doc:single2, etc. However, the dataset loaded from the CSV almost certainly uses doc:single:N format (with a colon separator before the number), as evidenced by the test code in test_json_multi_numeric.py which consistently creates keys as doc:single:{N}. All other benchmarks in this PR use a colon-terminated prefix ('idx10:'), matching the idx10:N key format. The missing trailing colon means every PEXPIRE targets a non-existent key (returning 0), so no keyspace notification fires and the doc-expiration fast path is never exercised — completely defeating the benchmark's purpose.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e64afd9. Configure here.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.67%. Comparing base (fae9b71) to head (8d14d1a).
⚠️ Report is 46 commits behind head on master.

⚠️ Current head 8d14d1a differs from pull request most recent head 484bced

Please upload reports for the commit 484bced to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9380      +/-   ##
==========================================
+ Coverage   81.30%   81.67%   +0.36%     
==========================================
  Files         492      501       +9     
  Lines       66927    68114    +1187     
  Branches    23562    24625    +1063     
==========================================
+ Hits        54414    55630    +1216     
+ Misses      12274    12246      -28     
+ Partials      239      238       -1     
Flag Coverage Δ
flow 83.73% <ø> (+0.10%) ⬆️
unit 50.90% <ø> (+0.78%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fcostaoliveira
Copy link
Copy Markdown
Contributor Author

fcostaoliveira commented Apr 30, 2026

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

Environment:

  • Triggering env: circleci

Architecture: x86_64 — branch-over-branch

Deployment: oss-standalone

In summary:

You can check a comparison in detail via the grafana link

Performance Improvements - Comparison between master and bench/expire-persist-coverage.

Time Period from a month ago. (environment used: oss-standalone)

Test Case Baseline master (median obs. +- std.dev) Comparison bench/expire-persist-coverage (median obs. +- std.dev) % change (higher-better) Note
search-aggregate-post-filter-simple.yml 12666 +- 1.0% (20 datapoints) 17820 40.7% IMPROVEMENT
ftsb-10K-enwiki_abstract-hashes-fulltext-sortby 1130 +- 3.1% (20 datapoints) 1514 34.0% IMPROVEMENT
ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query 8167 +- 4.3% (20 datapoints) 10272 25.8% IMPROVEMENT
search-numeric-sortby-optimize 424 +- 6.6% (20 datapoints) 534 25.8% IMPROVEMENT
ftsb-1M-nyc_taxis-hashes-load 27683 +- 3.1% (20 datapoints) 34723 25.4% IMPROVEMENT
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable 1209 +- 5.8% (20 datapoints) 1479 22.3% IMPROVEMENT
ftsb-10K-enwiki_abstract-hashes-term-prefix 9461 +- 2.2% (20 datapoints) 11221 18.6% IMPROVEMENT
search-ftsb-1M-enwiki_abstract-hashes-gc 118 +- 9.9% (20 datapoints) 140 18.5% waterline=9.9%. IMPROVEMENT
search-high-cardinality-negation-term-baseline 305 +- 1.1% (20 datapoints) 353 15.5% IMPROVEMENT
search-ftsb-370K-docs-union-iterators-q4 34 +- 1.2% (20 datapoints) 39 14.8% IMPROVEMENT
search-numeric 16244 +- 8.2% (20 datapoints) 18013 10.9% waterline=8.2%. IMPROVEMENT
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-numeric-filter 2625 +- 5.6% (20 datapoints) 2875 9.5% IMPROVEMENT
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query 4350 +- 5.8% (20 datapoints) 4727 8.7% IMPROVEMENT
search-expire-numeric-field-10-milliseconds 14 +- 1.8% (20 datapoints) 15 8.3% IMPROVEMENT

Performance Regressions and Issues - Comparison between master and bench/expire-persist-coverage.

Time Period from a month ago. (environment used: oss-standalone)

Test Case Baseline master (median obs. +- std.dev) Comparison bench/expire-persist-coverage (median obs. +- std.dev) % change (higher-better) Note
search-ftsb-arxiv-titles-384-angular-filters-m16-ef-128-json-load 5066 +- 5.0% (20 datapoints) 3679.0 -27.4% REGRESSION
ftsb-10K-multivalue-numeric-json 7090 +- 6.7% (20 datapoints) 5227.0 -26.3% REGRESSION
search-ftsb-10K-enwiki_abstract-hashes-term-withsuffix-trie 13769 +- 5.2% (20 datapoints) 10246.0 -25.6% REGRESSION
search-numeric-sortby 16677 +- 9.1% (20 datapoints) 13293.0 -20.3% waterline=9.1%. REGRESSION
ftsb-1M-nyc_taxis-ftadd-load 33137 +- 5.4% (20 datapoints) 27066.0 -18.3% REGRESSION
hybrid-arxiv-titles-384-angular-linear-numeric-vector 1577 +- 10.2% UNSTABLE (20 datapoints) 1381.0 -12.4% UNSTABLE (baseline high variance); server: FT.HYBRID p50 increased 21.1% (baseline CV=15.8%); client: Latency increased 14.2% (baseline CV=10.2%); confidence=LOW (FT.HYBRID baseline CV=12.4%; FT.HYBRID p99 +4.2% (stable baseline, minor change); CV=coefficient of variation (data stability: <30% stable, 30-50% moderate, >50% unstable))
search-ftsb-10K-enwiki_abstract-hashes-fulltext-aggregate-sortby-limit-0-100 5705 +- 4.2% (20 datapoints) 5002.0 -12.3% REGRESSION
ftsb-10K-enwiki_pages-hashes-fulltext-mixed_simple-1word-query_write_1_to_read_20.yml 1681 +- 4.5% (20 datapoints) 1487.0 -11.5% REGRESSION
hybrid-arxiv-titles-384-angular-rrf-text-vector 1590 +- 10.7% UNSTABLE (20 datapoints) 1418.0 -10.8% UNSTABLE (baseline high variance); server: FT.HYBRID p50 increased 19.0% (baseline CV=15.8%); client: Latency increased 14.7% (baseline CV=10.8%); confidence=LOW (FT.HYBRID baseline CV=13.3%; FT.HYBRID p99 +5.3% (stable baseline, minor change); CV=coefficient of variation (data stability: <30% stable, 30-50% moderate, >50% unstable))
ftsb-1K-enwiki_abstract-hashes-term-contains 11258 +- 4.8% (20 datapoints) 10150.0 -9.8% REGRESSION
search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-non-sortable 180 +- 11.6% UNSTABLE (20 datapoints) 174.0 -3.6% UNSTABLE (baseline high variance); server: FT.SEARCH p50 increased 28.5% (baseline CV=13.2%); client: OverallQuantiles.allCommands.q50 increased 26.6% (baseline CV=12.8%)
hybrid-arxiv-titles-384-angular-rrf-tag-range 4.2 +- 12.5% UNSTABLE (20 datapoints) 4.1 -3.3% UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression
search-high-cardinality-negation-term-comparison_union_all_other_terms 211 +- 14.7% UNSTABLE (20 datapoints) 207.0 -1.9% UNSTABLE (baseline high variance); server: FT.SEARCH p50 decreased 28.0% (baseline CV=1.1%); client: client latency stable; neither server nor client side confirms regression
search-numeric-sortby-desc-optimize 524 +- 10.4% UNSTABLE (20 datapoints) 520.0 -0.7% UNSTABLE (baseline high variance); server: FT.SEARCH p50 decreased 6.3% (baseline CV=7.0%); client: Latency decreased 5.4% (baseline CV=6.9%); neither server nor client side confirms regression
search-expire-doc-1000-seconds 17 +- 10.4% UNSTABLE (20 datapoints) 19.0 14.3% UNSTABLE (baseline high variance); server: p50 latency stable; client: Latency decreased 20.2% (baseline CV=1.5%); neither server nor client side confirms regression
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable 34 +- 24.7% UNSTABLE (20 datapoints) 52.0 52.0% UNSTABLE (baseline high variance); server: p50 latency stable; client: OverallQuantiles.allCommands.q50 decreased 6.1% (baseline CV=2.2%); neither server nor client side confirms regression
Tests with No Significant Changes (23 tests)

Tests with No Significant Changes

Test Case Baseline master (median obs. +- std.dev) Comparison bench/expire-persist-coverage (median obs. +- std.dev) % change (higher-better) Note
ftsb-10K-enwiki_abstract-hashes-term-suffix 13949 +- 5.6% (20 datapoints) 13798.0 -1.1% No Change
ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie 16311 +- 5.4% (20 datapoints) 15188.0 -6.9% potential REGRESSION
ftsb-10K-enwiki_abstract-hashes-term-wildcard 14028 +- 5.7% (20 datapoints) 13060.0 -6.9% potential REGRESSION
ftsb-10K-enwiki_pages-hashes-load 66491 +- 8.5% (20 datapoints) 61457.0 -7.6% waterline=8.5%. potential REGRESSION
ftsb-10K-singlevalue-numeric-json 3803 +- 4.6% (20 datapoints) 3877.0 1.9% No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query 9408 +- 1.7% (20 datapoints) 9748.0 3.6% potential IMPROVEMENT
ftsb-1M-enwiki_abstract-hashes-load 23668 +- 7.1% (20 datapoints) 23895.0 1.0% No Change
hybrid-arxiv-titles-384-angular-linear-text-range 3.9 +- 9.1% (20 datapoints) 3.9 -1.3% waterline=9.1%. No Change
search-expire-doc-10-milliseconds 17 +- 8.8% (20 datapoints) 16.0 -8.7% waterline=8.8%. potential REGRESSION
search-expire-numeric-field-1000-seconds 14 +- 2.4% (20 datapoints) 14.0 3.5% potential IMPROVEMENT
search-filtering-tag-numeric 4097 +- 9.4% (20 datapoints) 3785.0 -7.6% waterline=9.4%. potential REGRESSION
search-filtering-tag-numeric-filter-pipeline 11269 +- 4.9% (20 datapoints) 11222.0 -0.4% No Change
search-ftsb-10K-enwiki_abstract-hashes-fulltext-search-sortby-limit-0-100 5608 +- 4.2% (20 datapoints) 5564.0 -0.8% No Change
search-ftsb-10K-enwiki_abstract-hashes-term-withoutsuffix-trie 13718 +- 5.6% (20 datapoints) 13152.0 -4.1% potential REGRESSION
search-ftsb-1700K-docs-union-iterators-q3 32 +- 0.5% (20 datapoints) 35.0 7.5% potential IMPROVEMENT
search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-one-indexed-field 13202 +- 7.6% (20 datapoints) 12499.0 -5.3% potential REGRESSION
search-ftsb-5200K-docs-union-iterators-q1 4.2 +- 6.7% (20 datapoints) 3.9 -7.4% potential REGRESSION
search-ftsb-5500K-docs-union-iterators-q2 5.6 +- 6.0% (20 datapoints) 5.8 2.3% No Change
search-geo 3237 +- 4.1% (20 datapoints) 3225.0 -0.4% No Change
search-numeric-optimize 8888 +- 9.3% (20 datapoints) 8075.0 -9.1% waterline=9.3%. potential REGRESSION
search-numeric-sortby-desc 12400 +- 4.3% (20 datapoints) 13052.0 5.3% potential IMPROVEMENT
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-fulltext-filter 7391 +- 2.6% (20 datapoints) 7562.0 2.3% No Change
vecsim-arxiv-titles-384-angular-filters-m16-ef-128-tag-filter 15211 +- 6.2% (20 datapoints) 14466.0 -4.9% potential REGRESSION

Architecture: aarch64 — branch-over-branch

Deployment: oss-standalone

In summary:

  • Detected a total of 40 stable tests between versions.
  • Detected a total of 2 highly unstable benchmarks (2 baseline).
  • Detected a total of 3 improvements above the improvement water line.
  • Detected a total of 1 regressions bellow the regression water line 8.0%.

You can check a comparison in detail via the grafana link

Performance Improvements - Comparison between master and bench/expire-persist-coverage.

Time Period from a month ago. (environment used: oss-standalone)

Test Case Baseline master (median obs. +- std.dev) Comparison bench/expire-persist-coverage (median obs. +- std.dev) % change (higher-better) Note
search-ftsb-1M-enwiki_abstract-hashes-gc 118 +- 9.9% (20 datapoints) 157 32.8% waterline=9.9%. IMPROVEMENT
search-numeric-sortby-optimize 424 +- 6.6% (20 datapoints) 488 14.9% IMPROVEMENT
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable 1209 +- 5.8% (20 datapoints) 1346 11.3% IMPROVEMENT

Performance Regressions and Issues - Comparison between master and bench/expire-persist-coverage.

Time Period from a month ago. (environment used: oss-standalone)

Test Case Baseline master (median obs. +- std.dev) Comparison bench/expire-persist-coverage (median obs. +- std.dev) % change (higher-better) Note
search-numeric-sortby-desc-optimize 492 +- 7.0% (20 datapoints) 443 -9.9% REGRESSION
search-filtering-tag-numeric 3600 +- 10.6% UNSTABLE (20 datapoints) 4161 15.6% UNSTABLE (baseline high variance); server: FT.AGGREGATE p50 decreased 13.5% (baseline CV=13.5%); client: Latency decreased 13.5% (baseline CV=9.8%); neither server nor client side confirms regression
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable 34 +- 24.7% UNSTABLE (20 datapoints) 46 34.5% UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression
Tests with No Significant Changes (40 tests)

Tests with No Significant Changes

Test Case Baseline master (median obs. +- std.dev) Comparison bench/expire-persist-coverage (median obs. +- std.dev) % change (higher-better) Note
ftsb-10K-enwiki_abstract-hashes-fulltext-sortby 1130 +- 3.1% (20 datapoints) 1165.0 3.1% potential IMPROVEMENT
ftsb-10K-enwiki_abstract-hashes-term-prefix 9461 +- 2.2% (20 datapoints) 9893.0 4.6% potential IMPROVEMENT
ftsb-10K-enwiki_abstract-hashes-term-suffix 10715 +- 1.2% (20 datapoints) 10929.0 2.0% No Change
ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie 11541 +- 0.9% (20 datapoints) 11549.0 0.1% No Change
ftsb-10K-enwiki_abstract-hashes-term-wildcard 10849 +- 1.2% (20 datapoints) 11098.0 2.3% No Change
ftsb-10K-enwiki_pages-hashes-fulltext-mixed_simple-1word-query_write_1_to_read_20.yml 1681 +- 4.5% (20 datapoints) 1677.0 -0.2% No Change
ftsb-10K-enwiki_pages-hashes-load 60396 +- 6.5% (20 datapoints) 61457.0 1.8% No Change
ftsb-10K-multivalue-numeric-json 5110 +- 1.7% (20 datapoints) 5227.0 2.3% No Change
ftsb-10K-singlevalue-numeric-json 3015 +- 0.9% (20 datapoints) 3014.0 -0.1% No Change
ftsb-1K-enwiki_abstract-hashes-term-contains 9144 +- 1.7% (20 datapoints) 9328.0 2.0% No Change
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query 4350 +- 5.8% (20 datapoints) 4490.0 3.2% potential IMPROVEMENT
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query 9408 +- 1.7% (20 datapoints) 9748.0 3.6% potential IMPROVEMENT
ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query 8167 +- 4.3% (20 datapoints) 8332.0 2.0% No Change
ftsb-1M-enwiki_abstract-hashes-load 23668 +- 7.1% (20 datapoints) 23895.0 1.0% No Change
ftsb-1M-nyc_taxis-ftadd-load 25940 +- 3.0% (20 datapoints) 27066.0 4.3% potential IMPROVEMENT
ftsb-1M-nyc_taxis-hashes-load 27683 +- 3.1% (20 datapoints) 28629.0 3.4% potential IMPROVEMENT
search-aggregate-post-filter-simple.yml 12666 +- 1.0% (20 datapoints) 12723.0 0.5% No Change
search-expire-doc-10-milliseconds 14 +- 2.2% (20 datapoints) 14.0 -1.0% No Change
search-expire-doc-1000-seconds 14 +- 1.5% (20 datapoints) 14.0 -3.2% potential REGRESSION
search-expire-numeric-field-10-milliseconds 14 +- 1.8% (20 datapoints) 14.0 -0.3% No Change
search-expire-numeric-field-1000-seconds 14 +- 2.4% (20 datapoints) 14.0 3.5% potential IMPROVEMENT
search-filtering-tag-numeric-filter-pipeline 9302 +- 0.9% (20 datapoints) 9397.0 1.0% No Change
search-ftsb-10K-enwiki_abstract-hashes-fulltext-aggregate-sortby-limit-0-100 4968 +- 2.9% (20 datapoints) 5002.0 0.7% No Change
search-ftsb-10K-enwiki_abstract-hashes-fulltext-search-sortby-limit-0-100 4897 +- 2.3% (20 datapoints) 5026.0 2.6% No Change
search-ftsb-10K-enwiki_abstract-hashes-term-withoutsuffix-trie 10396 +- 1.2% (20 datapoints) 10375.0 -0.2% No Change
search-ftsb-10K-enwiki_abstract-hashes-term-withsuffix-trie 10351 +- 1.2% (20 datapoints) 10246.0 -1.0% No Change
search-ftsb-1700K-docs-union-iterators-q3 32 +- 0.5% (20 datapoints) 32.0 0.4% No Change
search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-non-sortable 167 +- 7.1% (20 datapoints) 174.0 4.1% potential IMPROVEMENT
search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-one-indexed-field 10339 +- 1.6% (20 datapoints) 10632.0 2.8% No Change
search-ftsb-370K-docs-union-iterators-q4 34 +- 1.2% (20 datapoints) 34.0 -0.5% No Change
search-ftsb-5200K-docs-union-iterators-q1 3.5 +- 1.8% (20 datapoints) 3.5 1.3% No Change
search-ftsb-5500K-docs-union-iterators-q2 5.3 +- 3.2% (20 datapoints) 5.1 -4.6% potential REGRESSION
search-ftsb-arxiv-titles-384-angular-filters-m16-ef-128-json-load 3552 +- 3.3% (20 datapoints) 3679.0 3.6% potential IMPROVEMENT
search-geo 2649 +- 5.9% (20 datapoints) 2661.0 0.4% No Change
search-high-cardinality-negation-term-baseline 305 +- 1.1% (20 datapoints) 299.0 -2.0% No Change
search-high-cardinality-negation-term-comparison_union_all_other_terms 158 +- 1.2% (20 datapoints) 155.0 -1.8% No Change
search-numeric 12380 +- 4.0% (20 datapoints) 12370.0 -0.1% No Change
search-numeric-optimize 7418 +- 1.2% (20 datapoints) 7452.0 0.5% No Change
search-numeric-sortby 13153 +- 4.0% (20 datapoints) 13293.0 1.1% No Change
search-numeric-sortby-desc 12400 +- 4.3% (20 datapoints) 13052.0 5.3% potential IMPROVEMENT

Cross-arch delta on bench/expire-persist-coverage (x86_64aarch64)

Same commit (bench/expire-persist-coverage) compared across architectures. Positive deltas = aarch64 outperforms x86_64.

In summary:

  • Detected a total of 17 stable tests between versions.
  • Detected a total of 6 highly unstable benchmarks (6 baseline).
  • Latency analysis confirmed regressions in 1 of the unstable tests:
  • Detected a total of 1 improvements above the improvement water line.
  • Detected a total of 26 regressions bellow the regression water line 8.0%.

You can check a comparison in detail via the grafana link

Performance Improvements - Comparison between bench/expire-persist-coverage and bench/expire-persist-coverage.

Time Period from a month ago. (environment used: oss-standalone)

Test Case Baseline bench/expire-persist-coverage (median obs. +- std.dev) Comparison bench/expire-persist-coverage (median obs. +- std.dev) % change (higher-better) Note
search-ftsb-1M-enwiki_abstract-hashes-gc 140 +- 4.1% (4 datapoints) 157 11.8% IMPROVEMENT

Performance Regressions and Issues - Comparison between bench/expire-persist-coverage and bench/expire-persist-coverage.

Time Period from a month ago. (environment used: oss-standalone)

Test Case Baseline bench/expire-persist-coverage (median obs. +- std.dev) Comparison bench/expire-persist-coverage (median obs. +- std.dev) % change (higher-better) Note
search-numeric 17861 +- 7.9% (4 datapoints) 12370.0 -30.7% REGRESSION
search-expire-doc-1000-seconds 19 +- 12.4% UNSTABLE (3 datapoints) 14.0 -29.7% UNSTABLE (baseline high variance); server: p50 latency stable; client: Latency increased 29.4% (baseline CV=8.3%); only client side confirms regression (server side stable) - insufficient evidence
search-aggregate-post-filter-simple.yml 17928 +- 3.6% (4 datapoints) 12723.0 -29.0% REGRESSION
search-expire-doc-json-10-milliseconds 18368 +- 3.6% (4 datapoints) 13290.0 -27.6% REGRESSION
ftsb-10K-enwiki_abstract-hashes-fulltext-sortby 1606 +- 5.7% (4 datapoints) 1165.0 -27.5% REGRESSION
ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie 15873 +- 2.7% (4 datapoints) 11549.0 -27.2% REGRESSION
search-high-cardinality-negation-term-comparison_union_all_other_terms 207 +- 6.4% (4 datapoints) 155.0 -25.1% REGRESSION
search-expire-numeric-field-10-milliseconds 18 +- 10.6% UNSTABLE (3 datapoints) 14.0 -23.6% UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression
search-numeric-sortby-desc-optimize 571 +- 8.2% (3 datapoints) 443.0 -22.4% waterline=8.2%. REGRESSION
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query 5776 +- 11.1% UNSTABLE (4 datapoints) 4490.0 -22.3% UNSTABLE (baseline high variance); server: FT.SEARCH p50 increased 19.3% (baseline CV=15.4%); client: OverallQuantiles.allCommands.q50 increased 28.0% (baseline CV=11.8%); confidence=HIGH (FT.SEARCH baseline CV=20.7%; FT.SEARCH p99 +22.6% (stable baseline); CV=coefficient of variation (data stability: <30% stable, 30-50% moderate, >50% unstable))
search-ftsb-10K-enwiki_abstract-hashes-term-withoutsuffix-trie 13361 +- 4.9% (4 datapoints) 10375.0 -22.3% REGRESSION
ftsb-10K-enwiki_abstract-hashes-term-suffix 14031 +- 3.4% (4 datapoints) 10929.0 -22.1% REGRESSION
search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-one-indexed-field 13535 +- 8.0% (4 datapoints) 10632.0 -21.5% REGRESSION
ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query 10476 +- 5.3% (4 datapoints) 8332.0 -20.5% REGRESSION
ftsb-10K-singlevalue-numeric-json 3742 +- 3.8% (4 datapoints) 3014.0 -19.5% REGRESSION
ftsb-10K-enwiki_abstract-hashes-term-wildcard 13730 +- 3.0% (4 datapoints) 11098.0 -19.2% REGRESSION
ftsb-1M-nyc_taxis-hashes-load 35412 +- 3.1% (4 datapoints) 28629.0 -19.2% REGRESSION
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable 56 +- 28.0% UNSTABLE (4 datapoints) 46.0 -18.1% UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression
search-geo 3232 +- 6.4% (4 datapoints) 2661.0 -17.7% REGRESSION
search-filtering-tag-numeric-filter-pipeline 11349 +- 4.2% (4 datapoints) 9397.0 -17.2% REGRESSION
ftsb-1K-enwiki_abstract-hashes-term-contains 10973 +- 6.1% (4 datapoints) 9328.0 -15.0% REGRESSION
search-expire-doc-10-milliseconds 16 +- 3.0% (3 datapoints) 14.0 -14.4% REGRESSION
search-ftsb-5200K-docs-union-iterators-q1 4.1 +- 3.1% (4 datapoints) 3.5 -14.0% REGRESSION
search-ftsb-1700K-docs-union-iterators-q3 37 +- 4.4% (4 datapoints) 32.0 -13.8% REGRESSION
search-high-cardinality-negation-term-baseline 347 +- 5.9% (4 datapoints) 299.0 -13.8% REGRESSION
ftsb-10K-enwiki_abstract-hashes-term-prefix 11387 +- 7.0% (4 datapoints) 9893.0 -13.1% REGRESSION
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable 1534 +- 14.0% UNSTABLE (4 datapoints) 1346.0 -12.3% UNSTABLE (baseline high variance); server: p50 latency stable; client: OverallQuantiles.allCommands.q50 increased 16.8% (baseline CV=13.0%); only client side confirms regression (server side stable) - insufficient evidence
search-numeric-optimize 8473 +- 6.8% (3 datapoints) 7452.0 -12.1% REGRESSION
search-ftsb-5500K-docs-union-iterators-q2 5.8 +- 2.8% (4 datapoints) 5.1 -11.9% REGRESSION
search-ftsb-370K-docs-union-iterators-q4 37 +- 5.9% (4 datapoints) 34.0 -10.5% REGRESSION
search-ftsb-10K-enwiki_abstract-hashes-fulltext-search-sortby-limit-0-100 5610 +- 2.9% (4 datapoints) 5026.0 -10.4% REGRESSION
search-filtering-tag-numeric 3671 +- 11.2% UNSTABLE (4 datapoints) 4161.0 13.3% UNSTABLE (baseline high variance); server: FT.AGGREGATE p50 decreased 11.4% (baseline CV=15.6%); client: Latency decreased 11.9% (baseline CV=10.3%); neither server nor client side confirms regression
Tests with No Significant Changes (17 tests)

Tests with No Significant Changes

Test Case Baseline bench/expire-persist-coverage (median obs. +- std.dev) Comparison bench/expire-persist-coverage (median obs. +- std.dev) % change (higher-better) Note
ftsb-10K-enwiki_pages-hashes-fulltext-mixed_simple-1word-query_write_1_to_read_20.yml 1750 +- 9.0% (4 datapoints) 1677 -4.1% waterline=9.0%. potential REGRESSION
ftsb-10K-enwiki_pages-hashes-load 57930 +- 6.3% (4 datapoints) 61457 6.1% potential IMPROVEMENT
ftsb-10K-multivalue-numeric-json 5065 +- 2.6% (4 datapoints) 5227 3.2% potential IMPROVEMENT
ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query 9566 +- 2.6% (4 datapoints) 9748 1.9% No Change
ftsb-1M-enwiki_abstract-hashes-load 22821 +- 5.1% (4 datapoints) 23895 4.7% potential IMPROVEMENT
ftsb-1M-nyc_taxis-ftadd-load 26382 +- 4.9% (4 datapoints) 27066 2.6% No Change
search-expire-doc-50-50-10-milliseconds 50 +- 0.4% (3 datapoints) 50 0.3% No Change
search-expire-doc-multi-index-10-milliseconds 32 +- 0.1% (4 datapoints) 32 -0.1% No Change
search-expire-numeric-field-1000-seconds 14 +- 1.9% (3 datapoints) 14 1.8% No Change
search-ftsb-10K-enwiki_abstract-hashes-fulltext-aggregate-sortby-limit-0-100 4934 +- 2.6% (4 datapoints) 5002 1.4% No Change
search-ftsb-10K-enwiki_abstract-hashes-term-withsuffix-trie 10277 +- 0.6% (4 datapoints) 10246 -0.3% No Change
search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-non-sortable 165 +- 5.1% (4 datapoints) 174 5.4% potential IMPROVEMENT
search-ftsb-arxiv-titles-384-angular-filters-m16-ef-128-json-load 3638 +- 1.4% (4 datapoints) 3679 1.1% No Change
search-numeric-sortby 12679 +- 4.7% (4 datapoints) 13293 4.8% potential IMPROVEMENT
search-numeric-sortby-desc 12631 +- 4.6% (4 datapoints) 13052 3.3% potential IMPROVEMENT
search-numeric-sortby-optimize 466 +- 8.9% (4 datapoints) 488 4.6% waterline=8.9%. potential IMPROVEMENT
search-persist-doc-1000-seconds 35 +- 0.4% (3 datapoints) 35 0.1% No Change

fcostaoliveira added a commit that referenced this pull request May 7, 2026
Brings the four new oss-standalone specs onto this branch so master
baseline and this PR head exercise the same set, enabling direct
master-vs-#9356 comparisons on the doc-expiration fast path:

- search-persist-doc-1000-seconds: PERSIST notification branch (previously unbenched)
- search-expire-doc-multi-index-10-milliseconds: 3-index fan-out signal amplifier
- search-expire-doc-50-50-10-milliseconds: 50/50 write-ratio variant
- search-expire-doc-json-10-milliseconds: JSON branch (Document_LoadSchemaFieldJson)

No metadata changes to the four pre-existing expire/numeric-field specs.
paulorsousa and others added 2 commits May 7, 2026 18:45
The four expire/persist benchmark specs added in this PR push to
RedisTimeSeries via redisbench-admin's exporter, which forwards
metadata.use_case as a TS.CREATE label value verbatim. The server-side
LABELS parser rejects values containing parentheses, em-dashes,
forward slashes paired with comparators, and similar punctuation —
producing "TSDB: Couldn't parse LABELS" and aborting the headline
Ops/sec timeseries push for the affected tests. Server-side metrics
(commandstats / latencystats) push through a different path and made
it; the headline Ops/sec series did not, leaving redisbench-admin
compare with 0 comparison points for these specs.

Replace the offending characters with plain ASCII while preserving the
intent of each description. No code or workload change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cherry-pick from joan-expire-not-fully-reindex covered the 4 specs that
also live on PR #9356; this commit applies the same character-class
sanitization to the 4 specs that only exist here (-10-milliseconds,
-1000-seconds, -numeric-field-10-milliseconds,
-numeric-field-1000-seconds). Same root cause: parens, em-dashes,
forward slashes and commas in metadata.use_case make the RTS LABELS
parser reject the TS.CREATE call and abort the headline Ops/sec push.

No code or workload change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ fcostaoliveira
❌ paulorsousa
You have signed the CLA already but the status is still pending? Let us recheck it.

@paulorsousa paulorsousa force-pushed the bench/expire-persist-coverage branch from 8d14d1a to 484bced Compare May 8, 2026 10:20
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 8, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants