tests(benchmarks): expire/persist coverage + group metadata#9380
tests(benchmarks): expire/persist coverage + group metadata#9380fcostaoliveira wants to merge 3 commits into
Conversation
Add four new oss-standalone-only benchmark specs targeting the doc-expiration fast path introduced in PR #9356: - search-persist-doc-1000-seconds: covers the PERSIST keyspace-notification branch (previously unbenched). - search-expire-doc-multi-index-10-milliseconds: three FT indexes on the same prefix to amplify the per-spec fan-out delta between full reindex and the new metadata-only path. - search-expire-doc-50-50-10-milliseconds: high (50/50) write-ratio variant of the existing 5/95 PEXPIRE bench for clean signal above the m5/m7i variance floor. - search-expire-doc-json-10-milliseconds: JSON variant covering the Document_LoadSchemaFieldJson side of the GetKeyExpirationTime helper. All four use a deterministic catch-all FT.SEARCH query, disable active expiration, and bump per-thread connection count so they sustain >1000 QPS with low re-trigger variance. Also tag the existing four expire specs (and the four new ones) with a shared metadata.group: "expire-persist" plus a per-spec use_case string so this evaluation group can be selected as a unit.
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e64afd9. Configure here.
| clientconfig: | ||
| benchmark_type: "mixed" | ||
| tool: memtier_benchmark | ||
| arguments: "--test-time 180 -c 32 -t 4 --hide-histogram --key-prefix 'doc:single' --key-minimum 1 --key-maximum 10000 --command 'FT.SEARCH idx:single * NOCONTENT LIMIT 0 1' --command-ratio 95 --command 'PEXPIRE __key__ 10' --command-ratio 5" |
There was a problem hiding this comment.
JSON benchmark key-prefix missing trailing colon separator
High Severity
The --key-prefix 'doc:single' in memtier arguments generates keys like doc:single1, doc:single2, etc. However, the dataset loaded from the CSV almost certainly uses doc:single:N format (with a colon separator before the number), as evidenced by the test code in test_json_multi_numeric.py which consistently creates keys as doc:single:{N}. All other benchmarks in this PR use a colon-terminated prefix ('idx10:'), matching the idx10:N key format. The missing trailing colon means every PEXPIRE targets a non-existent key (returning 0), so no keyspace notification fires and the doc-expiration fast path is never exercised — completely defeating the benchmark's purpose.
Reviewed by Cursor Bugbot for commit e64afd9. Configure here.
Codecov Report✅ All modified and coverable lines are covered by tests. Please upload reports for the commit 484bced to get more accurate results. Additional details and impacted files@@ Coverage Diff @@
## master #9380 +/- ##
==========================================
+ Coverage 81.30% 81.67% +0.36%
==========================================
Files 492 501 +9
Lines 66927 68114 +1187
Branches 23562 24625 +1063
==========================================
+ Hits 54414 55630 +1216
+ Misses 12274 12246 -28
+ Partials 239 238 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Automated performance analysis summaryThis comment was automatically generated given there is performance data available. Environment:
Architecture:
|
| Test Case | Baseline master (median obs. +- std.dev) | Comparison bench/expire-persist-coverage (median obs. +- std.dev) | % change (higher-better) | Note |
|---|---|---|---|---|
| search-aggregate-post-filter-simple.yml | 12666 +- 1.0% (20 datapoints) | 17820 | 40.7% | IMPROVEMENT |
| ftsb-10K-enwiki_abstract-hashes-fulltext-sortby | 1130 +- 3.1% (20 datapoints) | 1514 | 34.0% | IMPROVEMENT |
| ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query | 8167 +- 4.3% (20 datapoints) | 10272 | 25.8% | IMPROVEMENT |
| search-numeric-sortby-optimize | 424 +- 6.6% (20 datapoints) | 534 | 25.8% | IMPROVEMENT |
| ftsb-1M-nyc_taxis-hashes-load | 27683 +- 3.1% (20 datapoints) | 34723 | 25.4% | IMPROVEMENT |
| ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable | 1209 +- 5.8% (20 datapoints) | 1479 | 22.3% | IMPROVEMENT |
| ftsb-10K-enwiki_abstract-hashes-term-prefix | 9461 +- 2.2% (20 datapoints) | 11221 | 18.6% | IMPROVEMENT |
| search-ftsb-1M-enwiki_abstract-hashes-gc | 118 +- 9.9% (20 datapoints) | 140 | 18.5% | waterline=9.9%. IMPROVEMENT |
| search-high-cardinality-negation-term-baseline | 305 +- 1.1% (20 datapoints) | 353 | 15.5% | IMPROVEMENT |
| search-ftsb-370K-docs-union-iterators-q4 | 34 +- 1.2% (20 datapoints) | 39 | 14.8% | IMPROVEMENT |
| search-numeric | 16244 +- 8.2% (20 datapoints) | 18013 | 10.9% | waterline=8.2%. IMPROVEMENT |
| vecsim-arxiv-titles-384-angular-filters-m16-ef-128-numeric-filter | 2625 +- 5.6% (20 datapoints) | 2875 | 9.5% | IMPROVEMENT |
| ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query | 4350 +- 5.8% (20 datapoints) | 4727 | 8.7% | IMPROVEMENT |
| search-expire-numeric-field-10-milliseconds | 14 +- 1.8% (20 datapoints) | 15 | 8.3% | IMPROVEMENT |
Performance Regressions and Issues - Comparison between master and bench/expire-persist-coverage.
Time Period from a month ago. (environment used: oss-standalone)
| Test Case | Baseline master (median obs. +- std.dev) | Comparison bench/expire-persist-coverage (median obs. +- std.dev) | % change (higher-better) | Note |
|---|---|---|---|---|
| search-ftsb-arxiv-titles-384-angular-filters-m16-ef-128-json-load | 5066 +- 5.0% (20 datapoints) | 3679.0 | -27.4% | REGRESSION |
| ftsb-10K-multivalue-numeric-json | 7090 +- 6.7% (20 datapoints) | 5227.0 | -26.3% | REGRESSION |
| search-ftsb-10K-enwiki_abstract-hashes-term-withsuffix-trie | 13769 +- 5.2% (20 datapoints) | 10246.0 | -25.6% | REGRESSION |
| search-numeric-sortby | 16677 +- 9.1% (20 datapoints) | 13293.0 | -20.3% | waterline=9.1%. REGRESSION |
| ftsb-1M-nyc_taxis-ftadd-load | 33137 +- 5.4% (20 datapoints) | 27066.0 | -18.3% | REGRESSION |
| hybrid-arxiv-titles-384-angular-linear-numeric-vector | 1577 +- 10.2% UNSTABLE (20 datapoints) | 1381.0 | -12.4% | UNSTABLE (baseline high variance); server: FT.HYBRID p50 increased 21.1% (baseline CV=15.8%); client: Latency increased 14.2% (baseline CV=10.2%); confidence=LOW (FT.HYBRID baseline CV=12.4%; FT.HYBRID p99 +4.2% (stable baseline, minor change); CV=coefficient of variation (data stability: <30% stable, 30-50% moderate, >50% unstable)) |
| search-ftsb-10K-enwiki_abstract-hashes-fulltext-aggregate-sortby-limit-0-100 | 5705 +- 4.2% (20 datapoints) | 5002.0 | -12.3% | REGRESSION |
| ftsb-10K-enwiki_pages-hashes-fulltext-mixed_simple-1word-query_write_1_to_read_20.yml | 1681 +- 4.5% (20 datapoints) | 1487.0 | -11.5% | REGRESSION |
| hybrid-arxiv-titles-384-angular-rrf-text-vector | 1590 +- 10.7% UNSTABLE (20 datapoints) | 1418.0 | -10.8% | UNSTABLE (baseline high variance); server: FT.HYBRID p50 increased 19.0% (baseline CV=15.8%); client: Latency increased 14.7% (baseline CV=10.8%); confidence=LOW (FT.HYBRID baseline CV=13.3%; FT.HYBRID p99 +5.3% (stable baseline, minor change); CV=coefficient of variation (data stability: <30% stable, 30-50% moderate, >50% unstable)) |
| ftsb-1K-enwiki_abstract-hashes-term-contains | 11258 +- 4.8% (20 datapoints) | 10150.0 | -9.8% | REGRESSION |
| search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-non-sortable | 180 +- 11.6% UNSTABLE (20 datapoints) | 174.0 | -3.6% | UNSTABLE (baseline high variance); server: FT.SEARCH p50 increased 28.5% (baseline CV=13.2%); client: OverallQuantiles.allCommands.q50 increased 26.6% (baseline CV=12.8%) |
| hybrid-arxiv-titles-384-angular-rrf-tag-range | 4.2 +- 12.5% UNSTABLE (20 datapoints) | 4.1 | -3.3% | UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression |
| search-high-cardinality-negation-term-comparison_union_all_other_terms | 211 +- 14.7% UNSTABLE (20 datapoints) | 207.0 | -1.9% | UNSTABLE (baseline high variance); server: FT.SEARCH p50 decreased 28.0% (baseline CV=1.1%); client: client latency stable; neither server nor client side confirms regression |
| search-numeric-sortby-desc-optimize | 524 +- 10.4% UNSTABLE (20 datapoints) | 520.0 | -0.7% | UNSTABLE (baseline high variance); server: FT.SEARCH p50 decreased 6.3% (baseline CV=7.0%); client: Latency decreased 5.4% (baseline CV=6.9%); neither server nor client side confirms regression |
| search-expire-doc-1000-seconds | 17 +- 10.4% UNSTABLE (20 datapoints) | 19.0 | 14.3% | UNSTABLE (baseline high variance); server: p50 latency stable; client: Latency decreased 20.2% (baseline CV=1.5%); neither server nor client side confirms regression |
| ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable | 34 +- 24.7% UNSTABLE (20 datapoints) | 52.0 | 52.0% | UNSTABLE (baseline high variance); server: p50 latency stable; client: OverallQuantiles.allCommands.q50 decreased 6.1% (baseline CV=2.2%); neither server nor client side confirms regression |
Tests with No Significant Changes (23 tests)
Tests with No Significant Changes
| Test Case | Baseline master (median obs. +- std.dev) | Comparison bench/expire-persist-coverage (median obs. +- std.dev) | % change (higher-better) | Note |
|---|---|---|---|---|
| ftsb-10K-enwiki_abstract-hashes-term-suffix | 13949 +- 5.6% (20 datapoints) | 13798.0 | -1.1% | No Change |
| ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie | 16311 +- 5.4% (20 datapoints) | 15188.0 | -6.9% | potential REGRESSION |
| ftsb-10K-enwiki_abstract-hashes-term-wildcard | 14028 +- 5.7% (20 datapoints) | 13060.0 | -6.9% | potential REGRESSION |
| ftsb-10K-enwiki_pages-hashes-load | 66491 +- 8.5% (20 datapoints) | 61457.0 | -7.6% | waterline=8.5%. potential REGRESSION |
| ftsb-10K-singlevalue-numeric-json | 3803 +- 4.6% (20 datapoints) | 3877.0 | 1.9% | No Change |
| ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query | 9408 +- 1.7% (20 datapoints) | 9748.0 | 3.6% | potential IMPROVEMENT |
| ftsb-1M-enwiki_abstract-hashes-load | 23668 +- 7.1% (20 datapoints) | 23895.0 | 1.0% | No Change |
| hybrid-arxiv-titles-384-angular-linear-text-range | 3.9 +- 9.1% (20 datapoints) | 3.9 | -1.3% | waterline=9.1%. No Change |
| search-expire-doc-10-milliseconds | 17 +- 8.8% (20 datapoints) | 16.0 | -8.7% | waterline=8.8%. potential REGRESSION |
| search-expire-numeric-field-1000-seconds | 14 +- 2.4% (20 datapoints) | 14.0 | 3.5% | potential IMPROVEMENT |
| search-filtering-tag-numeric | 4097 +- 9.4% (20 datapoints) | 3785.0 | -7.6% | waterline=9.4%. potential REGRESSION |
| search-filtering-tag-numeric-filter-pipeline | 11269 +- 4.9% (20 datapoints) | 11222.0 | -0.4% | No Change |
| search-ftsb-10K-enwiki_abstract-hashes-fulltext-search-sortby-limit-0-100 | 5608 +- 4.2% (20 datapoints) | 5564.0 | -0.8% | No Change |
| search-ftsb-10K-enwiki_abstract-hashes-term-withoutsuffix-trie | 13718 +- 5.6% (20 datapoints) | 13152.0 | -4.1% | potential REGRESSION |
| search-ftsb-1700K-docs-union-iterators-q3 | 32 +- 0.5% (20 datapoints) | 35.0 | 7.5% | potential IMPROVEMENT |
| search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-one-indexed-field | 13202 +- 7.6% (20 datapoints) | 12499.0 | -5.3% | potential REGRESSION |
| search-ftsb-5200K-docs-union-iterators-q1 | 4.2 +- 6.7% (20 datapoints) | 3.9 | -7.4% | potential REGRESSION |
| search-ftsb-5500K-docs-union-iterators-q2 | 5.6 +- 6.0% (20 datapoints) | 5.8 | 2.3% | No Change |
| search-geo | 3237 +- 4.1% (20 datapoints) | 3225.0 | -0.4% | No Change |
| search-numeric-optimize | 8888 +- 9.3% (20 datapoints) | 8075.0 | -9.1% | waterline=9.3%. potential REGRESSION |
| search-numeric-sortby-desc | 12400 +- 4.3% (20 datapoints) | 13052.0 | 5.3% | potential IMPROVEMENT |
| vecsim-arxiv-titles-384-angular-filters-m16-ef-128-fulltext-filter | 7391 +- 2.6% (20 datapoints) | 7562.0 | 2.3% | No Change |
| vecsim-arxiv-titles-384-angular-filters-m16-ef-128-tag-filter | 15211 +- 6.2% (20 datapoints) | 14466.0 | -4.9% | potential REGRESSION |
Architecture: aarch64 — branch-over-branch
Deployment: oss-standalone
In summary:
- Detected a total of 40 stable tests between versions.
- Detected a total of 2 highly unstable benchmarks (2 baseline).
- Detected a total of 3 improvements above the improvement water line.
- Detected a total of 1 regressions bellow the regression water line 8.0%.
You can check a comparison in detail via the grafana link
Performance Improvements - Comparison between master and bench/expire-persist-coverage.
Time Period from a month ago. (environment used: oss-standalone)
| Test Case | Baseline master (median obs. +- std.dev) | Comparison bench/expire-persist-coverage (median obs. +- std.dev) | % change (higher-better) | Note |
|---|---|---|---|---|
| search-ftsb-1M-enwiki_abstract-hashes-gc | 118 +- 9.9% (20 datapoints) | 157 | 32.8% | waterline=9.9%. IMPROVEMENT |
| search-numeric-sortby-optimize | 424 +- 6.6% (20 datapoints) | 488 | 14.9% | IMPROVEMENT |
| ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable | 1209 +- 5.8% (20 datapoints) | 1346 | 11.3% | IMPROVEMENT |
Performance Regressions and Issues - Comparison between master and bench/expire-persist-coverage.
Time Period from a month ago. (environment used: oss-standalone)
| Test Case | Baseline master (median obs. +- std.dev) | Comparison bench/expire-persist-coverage (median obs. +- std.dev) | % change (higher-better) | Note |
|---|---|---|---|---|
| search-numeric-sortby-desc-optimize | 492 +- 7.0% (20 datapoints) | 443 | -9.9% | REGRESSION |
| search-filtering-tag-numeric | 3600 +- 10.6% UNSTABLE (20 datapoints) | 4161 | 15.6% | UNSTABLE (baseline high variance); server: FT.AGGREGATE p50 decreased 13.5% (baseline CV=13.5%); client: Latency decreased 13.5% (baseline CV=9.8%); neither server nor client side confirms regression |
| ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable | 34 +- 24.7% UNSTABLE (20 datapoints) | 46 | 34.5% | UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression |
Tests with No Significant Changes (40 tests)
Tests with No Significant Changes
| Test Case | Baseline master (median obs. +- std.dev) | Comparison bench/expire-persist-coverage (median obs. +- std.dev) | % change (higher-better) | Note |
|---|---|---|---|---|
| ftsb-10K-enwiki_abstract-hashes-fulltext-sortby | 1130 +- 3.1% (20 datapoints) | 1165.0 | 3.1% | potential IMPROVEMENT |
| ftsb-10K-enwiki_abstract-hashes-term-prefix | 9461 +- 2.2% (20 datapoints) | 9893.0 | 4.6% | potential IMPROVEMENT |
| ftsb-10K-enwiki_abstract-hashes-term-suffix | 10715 +- 1.2% (20 datapoints) | 10929.0 | 2.0% | No Change |
| ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie | 11541 +- 0.9% (20 datapoints) | 11549.0 | 0.1% | No Change |
| ftsb-10K-enwiki_abstract-hashes-term-wildcard | 10849 +- 1.2% (20 datapoints) | 11098.0 | 2.3% | No Change |
| ftsb-10K-enwiki_pages-hashes-fulltext-mixed_simple-1word-query_write_1_to_read_20.yml | 1681 +- 4.5% (20 datapoints) | 1677.0 | -0.2% | No Change |
| ftsb-10K-enwiki_pages-hashes-load | 60396 +- 6.5% (20 datapoints) | 61457.0 | 1.8% | No Change |
| ftsb-10K-multivalue-numeric-json | 5110 +- 1.7% (20 datapoints) | 5227.0 | 2.3% | No Change |
| ftsb-10K-singlevalue-numeric-json | 3015 +- 0.9% (20 datapoints) | 3014.0 | -0.1% | No Change |
| ftsb-1K-enwiki_abstract-hashes-term-contains | 9144 +- 1.7% (20 datapoints) | 9328.0 | 2.0% | No Change |
| ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query | 4350 +- 5.8% (20 datapoints) | 4490.0 | 3.2% | potential IMPROVEMENT |
| ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query | 9408 +- 1.7% (20 datapoints) | 9748.0 | 3.6% | potential IMPROVEMENT |
| ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query | 8167 +- 4.3% (20 datapoints) | 8332.0 | 2.0% | No Change |
| ftsb-1M-enwiki_abstract-hashes-load | 23668 +- 7.1% (20 datapoints) | 23895.0 | 1.0% | No Change |
| ftsb-1M-nyc_taxis-ftadd-load | 25940 +- 3.0% (20 datapoints) | 27066.0 | 4.3% | potential IMPROVEMENT |
| ftsb-1M-nyc_taxis-hashes-load | 27683 +- 3.1% (20 datapoints) | 28629.0 | 3.4% | potential IMPROVEMENT |
| search-aggregate-post-filter-simple.yml | 12666 +- 1.0% (20 datapoints) | 12723.0 | 0.5% | No Change |
| search-expire-doc-10-milliseconds | 14 +- 2.2% (20 datapoints) | 14.0 | -1.0% | No Change |
| search-expire-doc-1000-seconds | 14 +- 1.5% (20 datapoints) | 14.0 | -3.2% | potential REGRESSION |
| search-expire-numeric-field-10-milliseconds | 14 +- 1.8% (20 datapoints) | 14.0 | -0.3% | No Change |
| search-expire-numeric-field-1000-seconds | 14 +- 2.4% (20 datapoints) | 14.0 | 3.5% | potential IMPROVEMENT |
| search-filtering-tag-numeric-filter-pipeline | 9302 +- 0.9% (20 datapoints) | 9397.0 | 1.0% | No Change |
| search-ftsb-10K-enwiki_abstract-hashes-fulltext-aggregate-sortby-limit-0-100 | 4968 +- 2.9% (20 datapoints) | 5002.0 | 0.7% | No Change |
| search-ftsb-10K-enwiki_abstract-hashes-fulltext-search-sortby-limit-0-100 | 4897 +- 2.3% (20 datapoints) | 5026.0 | 2.6% | No Change |
| search-ftsb-10K-enwiki_abstract-hashes-term-withoutsuffix-trie | 10396 +- 1.2% (20 datapoints) | 10375.0 | -0.2% | No Change |
| search-ftsb-10K-enwiki_abstract-hashes-term-withsuffix-trie | 10351 +- 1.2% (20 datapoints) | 10246.0 | -1.0% | No Change |
| search-ftsb-1700K-docs-union-iterators-q3 | 32 +- 0.5% (20 datapoints) | 32.0 | 0.4% | No Change |
| search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-non-sortable | 167 +- 7.1% (20 datapoints) | 174.0 | 4.1% | potential IMPROVEMENT |
| search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-one-indexed-field | 10339 +- 1.6% (20 datapoints) | 10632.0 | 2.8% | No Change |
| search-ftsb-370K-docs-union-iterators-q4 | 34 +- 1.2% (20 datapoints) | 34.0 | -0.5% | No Change |
| search-ftsb-5200K-docs-union-iterators-q1 | 3.5 +- 1.8% (20 datapoints) | 3.5 | 1.3% | No Change |
| search-ftsb-5500K-docs-union-iterators-q2 | 5.3 +- 3.2% (20 datapoints) | 5.1 | -4.6% | potential REGRESSION |
| search-ftsb-arxiv-titles-384-angular-filters-m16-ef-128-json-load | 3552 +- 3.3% (20 datapoints) | 3679.0 | 3.6% | potential IMPROVEMENT |
| search-geo | 2649 +- 5.9% (20 datapoints) | 2661.0 | 0.4% | No Change |
| search-high-cardinality-negation-term-baseline | 305 +- 1.1% (20 datapoints) | 299.0 | -2.0% | No Change |
| search-high-cardinality-negation-term-comparison_union_all_other_terms | 158 +- 1.2% (20 datapoints) | 155.0 | -1.8% | No Change |
| search-numeric | 12380 +- 4.0% (20 datapoints) | 12370.0 | -0.1% | No Change |
| search-numeric-optimize | 7418 +- 1.2% (20 datapoints) | 7452.0 | 0.5% | No Change |
| search-numeric-sortby | 13153 +- 4.0% (20 datapoints) | 13293.0 | 1.1% | No Change |
| search-numeric-sortby-desc | 12400 +- 4.3% (20 datapoints) | 13052.0 | 5.3% | potential IMPROVEMENT |
Cross-arch delta on bench/expire-persist-coverage (x86_64 → aarch64)
Same commit (
bench/expire-persist-coverage) compared across architectures. Positive deltas =aarch64outperformsx86_64.
In summary:
- Detected a total of 17 stable tests between versions.
- Detected a total of 6 highly unstable benchmarks (6 baseline).
- Latency analysis confirmed regressions in 1 of the unstable tests:
- ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query: FT.SEARCH +19.3% 🔴
- Detected a total of 1 improvements above the improvement water line.
- Detected a total of 26 regressions bellow the regression water line 8.0%.
You can check a comparison in detail via the grafana link
Performance Improvements - Comparison between bench/expire-persist-coverage and bench/expire-persist-coverage.
Time Period from a month ago. (environment used: oss-standalone)
| Test Case | Baseline bench/expire-persist-coverage (median obs. +- std.dev) | Comparison bench/expire-persist-coverage (median obs. +- std.dev) | % change (higher-better) | Note |
|---|---|---|---|---|
| search-ftsb-1M-enwiki_abstract-hashes-gc | 140 +- 4.1% (4 datapoints) | 157 | 11.8% | IMPROVEMENT |
Performance Regressions and Issues - Comparison between bench/expire-persist-coverage and bench/expire-persist-coverage.
Time Period from a month ago. (environment used: oss-standalone)
| Test Case | Baseline bench/expire-persist-coverage (median obs. +- std.dev) | Comparison bench/expire-persist-coverage (median obs. +- std.dev) | % change (higher-better) | Note |
|---|---|---|---|---|
| search-numeric | 17861 +- 7.9% (4 datapoints) | 12370.0 | -30.7% | REGRESSION |
| search-expire-doc-1000-seconds | 19 +- 12.4% UNSTABLE (3 datapoints) | 14.0 | -29.7% | UNSTABLE (baseline high variance); server: p50 latency stable; client: Latency increased 29.4% (baseline CV=8.3%); only client side confirms regression (server side stable) - insufficient evidence |
| search-aggregate-post-filter-simple.yml | 17928 +- 3.6% (4 datapoints) | 12723.0 | -29.0% | REGRESSION |
| search-expire-doc-json-10-milliseconds | 18368 +- 3.6% (4 datapoints) | 13290.0 | -27.6% | REGRESSION |
| ftsb-10K-enwiki_abstract-hashes-fulltext-sortby | 1606 +- 5.7% (4 datapoints) | 1165.0 | -27.5% | REGRESSION |
| ftsb-10K-enwiki_abstract-hashes-term-suffix-withsuffixtrie | 15873 +- 2.7% (4 datapoints) | 11549.0 | -27.2% | REGRESSION |
| search-high-cardinality-negation-term-comparison_union_all_other_terms | 207 +- 6.4% (4 datapoints) | 155.0 | -25.1% | REGRESSION |
| search-expire-numeric-field-10-milliseconds | 18 +- 10.6% UNSTABLE (3 datapoints) | 14.0 | -23.6% | UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression |
| search-numeric-sortby-desc-optimize | 571 +- 8.2% (3 datapoints) | 443.0 | -22.4% | waterline=8.2%. REGRESSION |
| ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query | 5776 +- 11.1% UNSTABLE (4 datapoints) | 4490.0 | -22.3% | UNSTABLE (baseline high variance); server: FT.SEARCH p50 increased 19.3% (baseline CV=15.4%); client: OverallQuantiles.allCommands.q50 increased 28.0% (baseline CV=11.8%); confidence=HIGH (FT.SEARCH baseline CV=20.7%; FT.SEARCH p99 +22.6% (stable baseline); CV=coefficient of variation (data stability: <30% stable, 30-50% moderate, >50% unstable)) |
| search-ftsb-10K-enwiki_abstract-hashes-term-withoutsuffix-trie | 13361 +- 4.9% (4 datapoints) | 10375.0 | -22.3% | REGRESSION |
| ftsb-10K-enwiki_abstract-hashes-term-suffix | 14031 +- 3.4% (4 datapoints) | 10929.0 | -22.1% | REGRESSION |
| search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-one-indexed-field | 13535 +- 8.0% (4 datapoints) | 10632.0 | -21.5% | REGRESSION |
| ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query | 10476 +- 5.3% (4 datapoints) | 8332.0 | -20.5% | REGRESSION |
| ftsb-10K-singlevalue-numeric-json | 3742 +- 3.8% (4 datapoints) | 3014.0 | -19.5% | REGRESSION |
| ftsb-10K-enwiki_abstract-hashes-term-wildcard | 13730 +- 3.0% (4 datapoints) | 11098.0 | -19.2% | REGRESSION |
| ftsb-1M-nyc_taxis-hashes-load | 35412 +- 3.1% (4 datapoints) | 28629.0 | -19.2% | REGRESSION |
| ftsb-1M-enwiki_abstract-hashes-fulltext-2word-intersection-query-non-sortable | 56 +- 28.0% UNSTABLE (4 datapoints) | 46.0 | -18.1% | UNSTABLE (baseline high variance); server: p50 latency stable; client: client latency stable; neither server nor client side confirms regression |
| search-geo | 3232 +- 6.4% (4 datapoints) | 2661.0 | -17.7% | REGRESSION |
| search-filtering-tag-numeric-filter-pipeline | 11349 +- 4.2% (4 datapoints) | 9397.0 | -17.2% | REGRESSION |
| ftsb-1K-enwiki_abstract-hashes-term-contains | 10973 +- 6.1% (4 datapoints) | 9328.0 | -15.0% | REGRESSION |
| search-expire-doc-10-milliseconds | 16 +- 3.0% (3 datapoints) | 14.0 | -14.4% | REGRESSION |
| search-ftsb-5200K-docs-union-iterators-q1 | 4.1 +- 3.1% (4 datapoints) | 3.5 | -14.0% | REGRESSION |
| search-ftsb-1700K-docs-union-iterators-q3 | 37 +- 4.4% (4 datapoints) | 32.0 | -13.8% | REGRESSION |
| search-high-cardinality-negation-term-baseline | 347 +- 5.9% (4 datapoints) | 299.0 | -13.8% | REGRESSION |
| ftsb-10K-enwiki_abstract-hashes-term-prefix | 11387 +- 7.0% (4 datapoints) | 9893.0 | -13.1% | REGRESSION |
| ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query-non-sortable | 1534 +- 14.0% UNSTABLE (4 datapoints) | 1346.0 | -12.3% | UNSTABLE (baseline high variance); server: p50 latency stable; client: OverallQuantiles.allCommands.q50 increased 16.8% (baseline CV=13.0%); only client side confirms regression (server side stable) - insufficient evidence |
| search-numeric-optimize | 8473 +- 6.8% (3 datapoints) | 7452.0 | -12.1% | REGRESSION |
| search-ftsb-5500K-docs-union-iterators-q2 | 5.8 +- 2.8% (4 datapoints) | 5.1 | -11.9% | REGRESSION |
| search-ftsb-370K-docs-union-iterators-q4 | 37 +- 5.9% (4 datapoints) | 34.0 | -10.5% | REGRESSION |
| search-ftsb-10K-enwiki_abstract-hashes-fulltext-search-sortby-limit-0-100 | 5610 +- 2.9% (4 datapoints) | 5026.0 | -10.4% | REGRESSION |
| search-filtering-tag-numeric | 3671 +- 11.2% UNSTABLE (4 datapoints) | 4161.0 | 13.3% | UNSTABLE (baseline high variance); server: FT.AGGREGATE p50 decreased 11.4% (baseline CV=15.6%); client: Latency decreased 11.9% (baseline CV=10.3%); neither server nor client side confirms regression |
Tests with No Significant Changes (17 tests)
Tests with No Significant Changes
| Test Case | Baseline bench/expire-persist-coverage (median obs. +- std.dev) | Comparison bench/expire-persist-coverage (median obs. +- std.dev) | % change (higher-better) | Note |
|---|---|---|---|---|
| ftsb-10K-enwiki_pages-hashes-fulltext-mixed_simple-1word-query_write_1_to_read_20.yml | 1750 +- 9.0% (4 datapoints) | 1677 | -4.1% | waterline=9.0%. potential REGRESSION |
| ftsb-10K-enwiki_pages-hashes-load | 57930 +- 6.3% (4 datapoints) | 61457 | 6.1% | potential IMPROVEMENT |
| ftsb-10K-multivalue-numeric-json | 5065 +- 2.6% (4 datapoints) | 5227 | 3.2% | potential IMPROVEMENT |
| ftsb-1M-enwiki_abstract-hashes-fulltext-2word-union-query | 9566 +- 2.6% (4 datapoints) | 9748 | 1.9% | No Change |
| ftsb-1M-enwiki_abstract-hashes-load | 22821 +- 5.1% (4 datapoints) | 23895 | 4.7% | potential IMPROVEMENT |
| ftsb-1M-nyc_taxis-ftadd-load | 26382 +- 4.9% (4 datapoints) | 27066 | 2.6% | No Change |
| search-expire-doc-50-50-10-milliseconds | 50 +- 0.4% (3 datapoints) | 50 | 0.3% | No Change |
| search-expire-doc-multi-index-10-milliseconds | 32 +- 0.1% (4 datapoints) | 32 | -0.1% | No Change |
| search-expire-numeric-field-1000-seconds | 14 +- 1.9% (3 datapoints) | 14 | 1.8% | No Change |
| search-ftsb-10K-enwiki_abstract-hashes-fulltext-aggregate-sortby-limit-0-100 | 4934 +- 2.6% (4 datapoints) | 5002 | 1.4% | No Change |
| search-ftsb-10K-enwiki_abstract-hashes-term-withsuffix-trie | 10277 +- 0.6% (4 datapoints) | 10246 | -0.3% | No Change |
| search-ftsb-1M-enwiki_abstract-hashes-fulltext-simple-1word-query-non-sortable | 165 +- 5.1% (4 datapoints) | 174 | 5.4% | potential IMPROVEMENT |
| search-ftsb-arxiv-titles-384-angular-filters-m16-ef-128-json-load | 3638 +- 1.4% (4 datapoints) | 3679 | 1.1% | No Change |
| search-numeric-sortby | 12679 +- 4.7% (4 datapoints) | 13293 | 4.8% | potential IMPROVEMENT |
| search-numeric-sortby-desc | 12631 +- 4.6% (4 datapoints) | 13052 | 3.3% | potential IMPROVEMENT |
| search-numeric-sortby-optimize | 466 +- 8.9% (4 datapoints) | 488 | 4.6% | waterline=8.9%. potential IMPROVEMENT |
| search-persist-doc-1000-seconds | 35 +- 0.4% (3 datapoints) | 35 | 0.1% | No Change |
Brings the four new oss-standalone specs onto this branch so master baseline and this PR head exercise the same set, enabling direct master-vs-#9356 comparisons on the doc-expiration fast path: - search-persist-doc-1000-seconds: PERSIST notification branch (previously unbenched) - search-expire-doc-multi-index-10-milliseconds: 3-index fan-out signal amplifier - search-expire-doc-50-50-10-milliseconds: 50/50 write-ratio variant - search-expire-doc-json-10-milliseconds: JSON branch (Document_LoadSchemaFieldJson) No metadata changes to the four pre-existing expire/numeric-field specs.
The four expire/persist benchmark specs added in this PR push to RedisTimeSeries via redisbench-admin's exporter, which forwards metadata.use_case as a TS.CREATE label value verbatim. The server-side LABELS parser rejects values containing parentheses, em-dashes, forward slashes paired with comparators, and similar punctuation — producing "TSDB: Couldn't parse LABELS" and aborting the headline Ops/sec timeseries push for the affected tests. Server-side metrics (commandstats / latencystats) push through a different path and made it; the headline Ops/sec series did not, leaving redisbench-admin compare with 0 comparison points for these specs. Replace the offending characters with plain ASCII while preserving the intent of each description. No code or workload change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cherry-pick from joan-expire-not-fully-reindex covered the 4 specs that also live on PR #9356; this commit applies the same character-class sanitization to the 4 specs that only exist here (-10-milliseconds, -1000-seconds, -numeric-field-10-milliseconds, -numeric-field-1000-seconds). Same root cause: parens, em-dashes, forward slashes and commas in metadata.use_case make the RTS LABELS parser reject the TS.CREATE call and abort the headline Ops/sec push. No code or workload change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
8d14d1a to
484bced
Compare
|





Summary
Extend the expire/persist benchmark group so PR #9356 (doc-expiration fast path) — and any future change to the
expire_cmd/persist_cmdnotification branches — can be evaluated for both improvements and regressions on the existing on-the-fly EC2 infrastructure.Adds 4 new specs (all
oss-standaloneonly)search-persist-doc-1000-seconds.ymlPERSISTwas previously unbenched; this exercises the PERSIST branch ofOnKeySpaceNotificationand the matching fast path.search-expire-doc-multi-index-10-milliseconds.ymlsearch-expire-doc-50-50-10-milliseconds.ymlsearch-expire-doc-json-10-milliseconds.ymlDocument_LoadSchemaFieldJsonand the sharedGetKeyExpirationTimehelper.All four:
FT.SEARCH * NOCONTENT LIMIT 0 1so query latency is tight and re-triggers on the same branch produce stable throughput,DEBUG SET-ACTIVE-EXPIRE 0so the dataset never evicts during the test (the fast path is exercised purely as a metadata update),-c 16 -t 4(or-c 32 -t 4for the 10K JSON dataset) to sustain ≥1000 QPS.Adds metadata grouping (8 files)
Adds
metadata.group: "expire-persist"and a per-specuse_casestring to all eight expire/persist specs (the four new ones plus the four existingsearch-expire-{doc,numeric-field}-{10-milliseconds,1000-seconds}.yml). No behavioral changes to the existing specs — only metadata. The numeric-field specs serve as negative controls for the doc-expiration fast path (they exercisehexpire_cmd, which is not on the changed code path).Why a separate PR from #9356
These specs are evaluation infrastructure — they need to land independently of the optimization being measured so baseline numbers can be produced from
masterbefore #9356 merges.Test plan
redisbench-admin run-remoteagainst this branch onoss-standalonefor each new spec — confirm ≥1000 QPS and <5% CV across 3 datapoints.redisbench-admin comparemaster-baseline vs PR [MOD-14930] refactor expire handling without full reindexing #9356 head on the four doc-level specs (existing + new) — should show improvement on the doc-expiration fast path.--enable-profilers PROFILE=1onsearch-expire-doc-multi-index-10-millisecondsto verify the per-spec write-lock pattern inIndexes_UpdateMatchingDocExpirationis not contended.Note
Low Risk
Low risk: changes are confined to benchmark YAML specs (metadata and new workloads) with no production code impact.
Overview
Adds
metadata.group: "expire-persist"and a descriptiveuse_casefield to the existing expire benchmarks so they can be consistently grouped and understood in reporting.Introduces four new
oss-standalonebenchmark specs to broaden expire/persist coverage: aPERSISTnotification workload, a high write-ratioPEXPIREworkload, a multi-index fan-outPEXPIREworkload, and a JSON documentPEXPIREworkload (all using deterministicFT.SEARCHpatterns and disabled active expiration for stable signal).Reviewed by Cursor Bugbot for commit 8d14d1a. Bugbot is set up for automated code reviews on this repo. Configure here.