Fix starrocks and doris cheating at cold runs.#845
Conversation
Both StarRocks and Doris run a long-lived BE daemon with a process-internal
`storage_page_cache` (default ~20% of RAM) that holds decoded column data
across queries. The benchmark's `run.sh` only does
`echo 3 > /proc/sys/vm/drop_caches`, which clears the OS page cache but
does NOT touch the BE's in-process memory. As a result, the "cold run"
(first of three tries) is served from the BE's warm in-memory cache and
underreports cold-run latency - a clear violation of benchmark rules
(README "Caching" section: cold runs require all database caches to be
cleared, not only the OS page cache).
This is effectively cheating: every system with internal in-memory caching
that does not clear it before the first run gets an unearned advantage on
the cold-run leaderboard. Both systems' existing results are already
tagged `lukewarm-cold-run`, but they are still displayed under the cold
metric on the website.
Fix: disable the relevant in-process caches in `be.conf` before starting
the BE, so that all reads must go through the OS page cache (which
`run.sh` does clear).
starrocks/benchmark.sh:
disable_storage_page_cache = true
datacache_enable = false # covers unified Data Cache in v3.3+
doris/benchmark.sh:
disable_storage_page_cache = true
segment_cache_capacity = 0
Existing results still carry the stale `lukewarm-cold-run` tag and need
to be re-collected on AWS hardware to reflect the corrected configuration.
DuckDB does not have this problem: its `run.sh` launches a fresh `duckdb`
CLI process per query, so the buffer pool is empty at the start of each
cold run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 1753902 added an inline comment after the line-continuation backslash: -H "timeout:1000" \ # see #740 In bash this is *not* a continuation: the backslash escapes the space (not the newline), the `#` then starts an end-of-line comment, and the unescaped newline terminates the curl command. Curl runs without its URL and fails: curl: (3) URL using bad/illegal format or missing URL so the data never gets loaded into StarRocks. Move the comment to its own line above the curl invocation. No similar pattern was found in any other benchmark.sh / run.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ickHouse/ClickBench into fix-cold-run-cheating-starrocks-doris
|
@HappenLee Can you please clarify what information the segment cache and the storage page cache in Doris store and how they work (lifecycle)? I found some scattered bits of information in the Doris docs (e.g. https://doris.apache.org/docs/3.x/admin-manual/trouble-shooting/memory-management/memory-analysis/doris-cache-memory-analysis) but it is not really well documented. In particular, is there a way to clear these caches before each first cold query? If yes, then let's do so instead. Note that "doris-parquet/run.sh" (but not "doris/run.sh") does This PR disables the caches globally, which also impacts hot runs (and that may be unfair). @murphyatwork I have the similar question for Starrocks. However, in the case of the data cache, the docs say so disabling the data cache globally seems fair. Can you please also explain what the block cache is doing (mentioned here)? It is not disabled by this PR. Should it? Can it be cleared between queries otherwise? |
|
@rschu1ze @alexey-milovidov Hello, Here are my two questions regarding this issue: First, we consider the page cache mechanism to be reasonable. Its logic is similar to that of DuckDB's buffer pool—previously accessed disk files are pinned and cached in the queue. In real-world production environments, users actually use it this way. So why is it considered unreasonable? If that is the case, could I equally argue that DuckDB's results are unreasonable as well? Second, if this is indeed unreasonable, shouldn’t we clarify the rules clearly and check each database's results accordingly? For closed-source databases, how can we ensure fairness and verifiability under such rules? |
|
Nevermind, I reverted this PR, sorry for the confusion. |
|
@HappenLee, @rschu1ze, Cold result should run with no caches. Otherwise, the results are non-representative. |
No description provided.