Code for the CSE3000 Research Project. This repository benchmarks seven tabular imputation methods across four missingness mechanisms and three missing rates on two DeepSense 6G scenarios, evaluated on reconstruction error, distributional fidelity, and downstream beam-prediction accuracy.
It includes a bootstrap script that builds a Python virtual environment with all the imputation libraries used in the project. Tested on Linux, macOS, and Windows via WSL (recommended).
The environment supports the full method shortlist:
- Mean —
scikit-learn(SimpleImputer) - kNN —
scikit-learn(KNNImputer) - MICE —
scikit-learn(IterativeImputer) - SoftImpute —
fancyimpute.SoftImpute(low-rank, important for CSI data) - HyperImpute —
hyperimputepackage - GRAPE — bundled inside DiffPuter at
external/DiffPuter/baselines/GRAPE/ - DiffPuter — included as a git submodule at
external/DiffPuter/
DiffPuter is a git submodule (not pip-installed) because it's a research codebase
rather than a packaged library. The submodule is pinned to a specific commit
via .gitmodules, so everyone builds against the same source.
src/common/ shared experiment code (amputation, metrics, datasets, tuning, ...)
src/imputers/ imputer wrappers (imputers.py, grape_imputer.py, diffputer_imputer.py)
notebooks/ tuning.ipynb, imputation.ipynb, downstream.ipynb
results/csv/ result tables the paper is built from
results/figures/ generated figures
results/ cached imputations (.npy) and tuned_params.json
scripts/ setup_imputation_env.sh
requirements/ base.txt, pyg.txt, dev.txt
external/DiffPuter DiffPuter submodule (contains GRAPE under baselines/GRAPE/)
data/ DeepSense 6G scenarios (not redistributed — see Dataset below)
- Python 3.10–3.12 (with
venvandpip) - git
- A C/C++ compiler (
build-essentialon Debian/Ubuntu) — only needed if a prebuilttorch-scatterwheel isn't available for your torch+CUDA combo
The DeepSense 6G V2I scenarios (Scenario 5 and Scenario 33) are not
redistributed in this repository (non-commercial academic licence). Obtain them
directly from https://deepsense6g.net and place them under data/ following the
structure expected by src/common/deepsense_data.py.
Clone with submodules so DiffPuter is populated:
git clone --recurse-submodules https://github.com/KenChan000/ImputationFor6GDatasets.git
cd ImputationFor6GDatasetsIf you already cloned without --recurse-submodules:
git submodule update --initThen build the environment (the script anchors to the repo root, so it works from anywhere; it also initialises the submodule if needed):
chmod +x scripts/setup_imputation_env.sh
./scripts/setup_imputation_env.shActivate the environment:
# Linux / macOS / WSL
source imputation_env/bin/activateEither:
- Open the project in VS Code (run
code .from inside WSL) and select the kernel "Imputation (6G Datasets)" in any notebook. The Python and Jupyter extensions handle activation for you — no manualsourceneeded inside notebooks. Recommended. - Or run
jupyter laband pick the same kernel.
Run the notebooks in order:
notebooks/tuning.ipynb— hyperparameter search (writesresults/tuned_params.json)notebooks/imputation.ipynb— imputation + reconstruction/fidelity metrics (writesresults/csv/*_imputation_results.csv)notebooks/downstream.ipynb— beam-prediction evaluation (writesresults/csv/*_downstream_*.csv)
Cached imputations under results/ let the downstream step run without re-imputing.