Artifact: Interpreting the Error of Differentially Private Median Queries through Randomization Intervals
This repository contains the official code, datasets, and pre-computed results for our paper. All core algorithm implementations, experiment pipelines, plotting scripts, and table generation functions are integrated into a single Jupyter Notebook for ease of use and reproducibility.
-
CI_clean.ipynb: The main notebook containing all implementations, evaluation logic, and visualization code. -
*.json: Pre-computed experimental results across different datasets and$\epsilon$ values. -
*.csv: The raw datasets used for the experiments.
Please ensure you have Python 3 installed along with the following standard scientific libraries. You can install them via pip:
pip install numpy pandas matplotlib tqdmPlease open CI_clean.ipynb and run the setup/function definitions at the top. Once the environment is loaded, scroll down to the very bottom of the notebook where the execution blocks are located. You can choose any of the following actions:
If you only want to view the final results (Mean and
- Action: Scroll to the "extract" code block at the bottom of the notebook and run it.
- Result: It will read the provided
*.jsonfiles and print neatly formatted metric tables directly matching the paper's findings.
If you want to reproduce the visual charts (e.g., Median Error vs. Epsilon) with shaded error bars.
- Action: Scroll to the "plot" code block at the bottom and run it.
- Result: It will parse the existing
.jsondata and generate the comparative performance plots shown in the paper.
If you wish to fully validate the code by re-running the algorithms from scratch.
-
Action: Scroll to the "Run experiment" section at the bottom, configure your desired datasets or parameters (e.g., the
$\epsilon$ list), and run the cell. -
Result: The code will execute both our method (
Our_EMCI) and the baselines, evaluating their correctness and confidence interval lengths, and save the outputs as new.jsonfiles.
⚠️ Note: Running all experiments from scratch for all datasets may take a significant amount of time. For quick verification, we highly recommend using Option 1 or 2 with the provided pre-computed.jsonfiles.
-
Base Implementation: Our implementation is built upon and extends the code provided by Sun et al. (2023). [cite_start]We have modified the domain handling and fixed specific logic errors as detailed in the paper.
-
Datasets:
- Banking Dataset: Sourced from the UCI Machine Learning Repository.
- Adult Dataset: Sourced from the UCI Machine Learning Repository.
- Airplane Dataset: This dataset was sourced from Kaggle (original URL currently unavailable). It contains aircraft specifications, including model, capacity, and price. We specifically evaluate the capacity attribute, which ranges from 4 to 396 with a true median of 162.
-
Documentation Note: This README was refined with the assistance of large language models solely for improving language clarity and readability. The technical content, experimental logic, and codebase are the original work of the authors.