Skip to content

Conversation

@alexey-milovidov
Copy link
Member

@alexey-milovidov alexey-milovidov commented Oct 26, 2025

Umbra, CedarDB, and Hyper have unrealistic results in cold runs due to excessive caching.

According to the rules:

It is okay if the system performs caching for source data (buffer pools and similar). If the cache or buffer pools can be flushed, they should be flushed before the first run of every query.

If the system contains a cache for intermediate data, that cache should be disabled if it is located near the end of the query execution pipeline, thus similar to a query result cache.

To ensure that the caches are flushed and data is actually read from disk during the cold run, I added the restart of the Docker container before the cold run of each query.

Also found that hyper-parquet didn't do a cache flush at all.

Note: fixing this problem was suggested by puzpuzpuz #656

@alexey-milovidov alexey-milovidov self-assigned this Oct 26, 2025
@alexey-milovidov alexey-milovidov merged commit 6ed1ea8 into main Oct 26, 2025
@rschu1ze rschu1ze mentioned this pull request Nov 13, 2025
@rschu1ze rschu1ze changed the title Prevent the possibility of cheating by databases from Munich Prevent lukewarm cold runs in databases from Munich Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants