TiCard - TiDB Cardinality Estimation

This project demonstrates how to use machine learning models to improve the cardinality estimation of the TiDB database. It compares the predictions of TabPFN, Gradient Boosting Regressor (GBR), and Neurocard with TiDB's default optimizer estimates.

Project Structure

.
├── docker-compose.yml      # Docker compose file for TiDB
├── paper-db.md             # The plan for this project
├── pyproject.toml          # Project dependencies
├── README.md               # This file
├── scripts/
│   ├── download_tpch.sh    # Script to download TPC-H data
│   └── load_tpch.sh        # Script to load TPC-H data into TiDB
└── src/
    └── ticard/
        ├── __init__.py
        ├── config.py           # Configuration for database and model
        ├── dataset.py          # Data loading and preparation
        ├── features.py         # Feature extraction from query plans
        ├── main.py             # Main script to run the experiment
        └── model.py            # Model training and evaluation

Setup

1. Prerequisites

Python 3.8+
uv (Python package manager)
Docker

2. Initialize the project

uv sync

Note:

If you want to load the test dataset and make the query plan and execution plan, you need to start the TiDB, prepare datasets and load the datasets. But if you just want to run the TiCard, we have already prepared the query plan we ran previously in query_plans. So you can run the TiCard without setting the database.

3. (Optional) Start TiDB

docker-compose up -d

Wait a few minutes for TiDB to be ready. Or you can use the TiDB Cloud as well.

4. (Optional) Prepare TPC-H dataset

Download and load the TPC-H dataset.

cd scripts/tpch

chmod +x download_tpch.sh
./download_tpch.sh

chmod +x load.sh
./load.sh

cd -

5. (Optional) Prepare JOB dataset

cd scripts
git clone --recurse-submodules https://github.com/Icemap/join-order-benchmark.git

cd join-order-benchmark/csv_files/
wget https://event.cwi.nl/da/job/imdb.tgz
tar -xvzf imdb.tgz
cd ..

./split_and_load_data.sh
cd ../..

Run the experiment

Execute the main script to run the entire pipeline: data extraction, feature engineering, model training, and evaluation.

# Run with all algorithms (TabPFN, GBR, Neurocard)
uv run python -m ticard.main

# Run with specific algorithms
uv run python -m ticard.main -a tabpfn
uv run python -m ticard.main -a gbr
uv run python -m ticard.main -a neurocard

# Run with multiple algorithms
uv run python -m ticard.main -a tabpfn -a gbr

The script will output a comparison of the cardinality estimation Q-Error for:

TiDB's default optimizer estimates (baseline)
TabPFN: Tabular Prior-Fitted Network
GBR: Gradient Boosting Regressor
Neurocard: Deep autoregressive model (MADE) for cardinality estimation

Algorithm Details

TabPFN: A transformer-based model pre-trained on synthetic tabular data
GBR: Gradient Boosting Regressor with 200 estimators
Neurocard: MADE (Masked Autoencoder for Distribution Estimation) architecture from the Neurocard paper

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
query_plans		query_plans
scripts		scripts
src/ticard		src/ticard
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TiCard - TiDB Cardinality Estimation

Project Structure

Setup

1. Prerequisites

2. Initialize the project

3. (Optional) Start TiDB

4. (Optional) Prepare TPC-H dataset

5. (Optional) Prepare JOB dataset

Run the experiment

Algorithm Details

About

Uh oh!

Releases

Packages

Languages

Icemap/TiCard

Folders and files

Latest commit

History

Repository files navigation

TiCard - TiDB Cardinality Estimation

Project Structure

Setup

1. Prerequisites

2. Initialize the project

3. (Optional) Start TiDB

4. (Optional) Prepare TPC-H dataset

5. (Optional) Prepare JOB dataset

Run the experiment

Algorithm Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages