The Accuracy of Cardinality Estimators: Unraveling the Evaluation Result Conundrum

This repository contains the source code regarding our VLDB-2025 paper submission "The Accuracy of Learned Versus Traditional Cardinality Estimators: Unraveling the Evaluation Result Conundrum"

For reproducing the evaluation results, the following steps are required:

Install XGBoost. The details are available at https://xgboost.readthedocs.io/en/latest/tutorials/c_api_tutorial.html
Clone the provided code, in $HOME for example.
cd $HOME/LearnedVSTraditionalCE/src/H2D/src/main
make
Run experiments using the following:

cd $HOME/LearnedVSTraditionalCE/src/H2D/src/main

For running experiments on dataset "longlat" in the set of datasets called "earthquake" using EXGB cardinality estimator, for example, you can run :

./main_queryset_estimates --sds earthquake --ds longlat --inDir /your/path/to/dataset --outDir your/path/for/outputfile --trainQDir /path/to/train/queries/dir --testQDir /path/to/test/queries/dir --exgb --kind -1

Note : For QTS-2 the kind argument needs to be set to 2

We evaluate each estimator using 1,000,000 test queries. The generated .out files contain detailed evaluation results for each specific dataset and estimator, presented as two-dimensional matrices. In these matrices, the rows represent the selectivity classes of the queries, while the columns correspond to q-error classes. The values in the cells indicate the frequency of observing a specific q-error for queries within the corresponding selectivity class.
After producing the output files, you can produce the aggregated results:

cd $HOME/LearnedVSTraditionalCE/src/H2D/src/Compare

./main_prepare your/path/for/outputfile /path/to/aggregated/results

The folders datasets and testQueries contain some sample datasets and test queries while our repository of 18,020 datasets and the corresponding test queries are shared on https://pi3.informatik.uni-mannheim.de/rashedi/VLDB_2025/

Note: The experiments on DeepDB (https://github.com/DataManagementLab/deepdb-public), MSCN (https://github.com/andreaskipf/learnedcardinalities), and Naru (https://github.com/naru-project/naru) have been performed using their published code

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
datasets		datasets
src		src
testQueries		testQueries
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Accuracy of Cardinality Estimators: Unraveling the Evaluation Result Conundrum

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

Nazanin-Rashedi/CE_Accuracy

Folders and files

Latest commit

History

Repository files navigation

The Accuracy of Cardinality Estimators: Unraveling the Evaluation Result Conundrum

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages