Artifact: Interpreting the Error of Differentially Private Median Queries through Randomization Intervals

This repository contains the official code, datasets, and pre-computed results for our paper. All core algorithm implementations, experiment pipelines, plotting scripts, and table generation functions are integrated into a single Jupyter Notebook for ease of use and reproducibility.

📁 Repository Structure

CI_clean.ipynb: The main notebook containing all implementations, evaluation logic, and visualization code.
*.json: Pre-computed experimental results across different datasets and $\epsilon$ values.
*.csv: The raw datasets used for the experiments.

🛠️ Dependencies

Please ensure you have Python 3 installed along with the following standard scientific libraries. You can install them via pip:

pip install numpy pandas matplotlib tqdm

🚀 How to Reproduce the Results

Please open CI_clean.ipynb and run the setup/function definitions at the top. Once the environment is loaded, scroll down to the very bottom of the notebook where the execution blocks are located. You can choose any of the following actions:

1. Extract Metrics & Generate Tables (Fastest)

If you only want to view the final results (Mean and $\pm$ Std. Dev) presented in the paper's LaTeX tables, you do not need to re-run the experiments.

Action: Scroll to the "extract" code block at the bottom of the notebook and run it.
Result: It will read the provided *.json files and print neatly formatted metric tables directly matching the paper's findings.

2. Generate Plots

If you want to reproduce the visual charts (e.g., Median Error vs. Epsilon) with shaded error bars.

Action: Scroll to the "plot" code block at the bottom and run it.
Result: It will parse the existing .json data and generate the comparative performance plots shown in the paper.

3. Run Experiments from Scratch

If you wish to fully validate the code by re-running the algorithms from scratch.

Action: Scroll to the "Run experiment" section at the bottom, configure your desired datasets or parameters (e.g., the $\epsilon$ list), and run the cell.
Result: The code will execute both our method (Our_EMCI) and the baselines, evaluating their correctness and confidence interval lengths, and save the outputs as new .json files.

⚠️ Note: Running all experiments from scratch for all datasets may take a significant amount of time. For quick verification, we highly recommend using Option 1 or 2 with the provided pre-computed .json files.

📜 Credits & Acknowledgments

Base Implementation: Our implementation is built upon and extends the code provided by Sun et al. (2023). [cite_start]We have modified the domain handling and fixed specific logic errors as detailed in the paper.
Datasets:
- Banking Dataset: Sourced from the UCI Machine Learning Repository.
- Adult Dataset: Sourced from the UCI Machine Learning Repository.
- Airplane Dataset: This dataset was sourced from Kaggle (original URL currently unavailable). It contains aircraft specifications, including model, capacity, and price. We specifically evaluate the capacity attribute, which ranges from 4 to 396 with a true median of 162.
Documentation Note: This README was refined with the assistance of large language models solely for improving language clarity and readability. The technical content, experimental logic, and codebase are the original work of the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
CI_clean.ipynb		CI_clean.ipynb
README.md		README.md
adult.csv		adult.csv
adult_result_senq=1.json		adult_result_senq=1.json
airplane_capacity_result_senq=1.json		airplane_capacity_result_senq=1.json
airplane_price_dataset.csv		airplane_price_dataset.csv
bank_marketing.csv		bank_marketing.csv
bank_marketing_result_senq=1.json		bank_marketing_result_senq=1.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Artifact: Interpreting the Error of Differentially Private Median Queries through Randomization Intervals

📁 Repository Structure

🛠️ Dependencies

🚀 How to Reproduce the Results

1. Extract Metrics & Generate Tables (Fastest)

2. Generate Plots

3. Run Experiments from Scratch

📜 Credits & Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Artifact: Interpreting the Error of Differentially Private Median Queries through Randomization Intervals

📁 Repository Structure

🛠️ Dependencies

🚀 How to Reproduce the Results

1. Extract Metrics & Generate Tables (Fastest)

2. Generate Plots

3. Run Experiments from Scratch

📜 Credits & Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages