-
Notifications
You must be signed in to change notification settings - Fork 251
Open
Description
According to the benchmark rules,
if it's a database with local on-disk storage, the first query should be run after dropping the page cache
The following local disk-based participants do not flush the OS page cache between query runs. This gives them an unfair advantage on repeated queries since data may be served from the OS cache rather than being read from disk.
The corresponding scripts should be fixed to put everyone in the same conditions.
For reference, the correct way to flush the page cache is:
sync && echo 3 | sudo tee /proc/sys/vm/drop_cachesList
Note that the list may be incomplete.
- chdb-dataframe | Reads parquet locally via Python chdb-dataframe: clear page cache between queries #779
- clickhouse-datalake | Uses
clickhouse local, no OS cache flush - clickhouse-datalake-partitioned | Uses
clickhouse local, no OS cache flush - duckdb-dataframe | Reads parquet locally via Python
- elasticsearch | Clears ES query cache only, not OS page cache
- hydra | PostgreSQL-based, no cache flush
- locustdb | Disk-based (RocksDB), no cache flush (benchmark broken)
- mongodb | Local installation, no cache flush
- pandas | Reads parquet locally via Python
- polars | Reads parquet locally via Python
- polars-dataframe | Reads parquet locally via Python
- tembo-olap | PostgreSQL-based, no cache flush
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels