HRTFformer: A Spatially-Aware Transformer for Individual HRTF Upsampling in Immersive Audio Rendering
git clone HRTFformer.git
cd HRTFformer
conda create -n hrtfformer python=3.10 -y
conda activate hrtfformer
pip install torch==2.7.0+cu126 torchvision==0.22.0+cu126 torchaudio==2.7.0+cu126 --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
# Optional: MATLAB Engine for Python (only if you use MATLAB-based evaluation)
# cd <MATLABROOT>/extern/engines/python
# python -m pip install .
configs/ Configuration objects for data, training, and model hyperparameters
data/ HRTF dataset loaders, transforms, and preprocessing utilities
evaluation/ Objective evaluation scripts for LSD, localization, ILD, and ITD
model/ HRTFformer model components
trainer/ Training, testing, losses, metrics, and model factory utilities
main.py Command-line entry point
The active model is created in trainer/utils.py through get_model(config):
AutoEncoder(Encoder, encoder_config, TransConvDecoder, decoder_config)The encoder combines transformer blocks with downsampling layers. The decoder reconstructs high-resolution outputs with transformer-guided transposed-convolution blocks.
Install the Python dependencies needed by your data loader and evaluation workflow. The main training stack uses:
- Python 3.10+
- PyTorch
- NumPy
- SciPy
- Matplotlib
- pandas
- einops
- sofar
- netCDF4
Optional evaluation scripts may also require MATLAB Engine for Python, AMT, and spatialaudiometrics.
Update paths and hyperparameters in configs/config.py before running. In particular, set the dataset directory, output directory, device, and HRTF loader for your machine.
The SONICOM HRTF dataset can be downloaded from here.
Preprocess data:
python main.py preprocess -r True -d SonicomTrain:
python main.py train -r True -d SonicomTest and evaluate:
python main.py test -r True -d SonicomTraining writes logs, plots, and checkpoints under the configured output paths. Testing writes reconstructed HRTFs and evaluation artifacts next to the selected checkpoint.
Large datasets, generated checkpoints, reconstructed HRTFs, SOFA files, and pickle artifacts are intentionally ignored by Git. Keep those files outside the repository or regenerate them from the configured data paths.
Parts of the code are borrowed from the following repositories:
This study was made possible by support from SONICOM, a project that has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No. 101017743.
If you find this code useful for your research, please consider citing the following paper:
@article{hu2025hrtfformer,
title={HRTFformer: A Spatially-Aware Transformer for Individual HRTF Upsampling in Immersive Audio Rendering},
author={Hu, Xuyi and Li, Jian and Zhang, Shaojie and Goetz, Stefan and Picinali, Lorenzo and Akan, Ozgur B and Hogg, Aidan OT},
journal={IEEE Transactions on Multimedia},
year={2026}
}
@inproceedings{hu2025machine,
title={A machine learning approach for denoising and upsampling HRTFs},
author={Hu, Xuyi and Li, Jian and Picinali, Lorenzo and Hogg, Aidan OT},
booktitle={2025 33rd European Signal Processing Conference (EUSIPCO)},
pages={201--205},
year={2025},
organization={IEEE}
}