Skip to content
Real-Time Spherical Microphone Renderer for binaural reproduction in Python
Python HTML Shell MATLAB
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github/ISSUE_TEMPLATE Update of GitHub bug report template Feb 10, 2020
ReTiSAR Addition of TH Cologne "HOSMA 7n" array configuration Feb 14, 2020
res Addition of TH Cologne "HOSMA 7n" array configuration Feb 14, 2020
.editorconfig
.gitignore Extension of `DataRetriever` to download unavailable data file in cas… Feb 7, 2020
AUTHORS Addition of funding acknowledgements Feb 3, 2020
CODE_OF_CONDUCT.md Create CODE_OF_CONDUCT.md Feb 10, 2020
CONTRIBUTING.md
LICENSE First code publication Jan 30, 2020
README.md
environment.yml First code publication Jan 30, 2020
environment_dev.yml First code publication Jan 30, 2020
setup.py

README.md

ReTiSAR

Implementation of the Real-Time Spherical Microphone Renderer for binaural reproduction in Python [1].

Badge_OS Badge_Python Badge Version
Badge_Conda Badge_FFTW Badge_JACK Badge_SOFA Badge_OSC
Badge_LastCommit Badge_CommitActivity Badge_CodeSize Badge_RepoSize

Table of Contents:
  1. Requirements
  2. Setup
  3. Quickstart
  4. Execution parameters
  5. Execution modes
  6. Remote Control
  7. Validation - Setup and Execution
  8. Benchmark - Setup and Execution
  9. References
  10. Changelog
  11. Contributing
  12. Credits
  13. License

Requirements

  • MacOS (on Windows compatibly with the usual JACK binaries seems problematic, but this is not well investigated so far)
  • JACK library (usual prebuilt binaries are the easiest solution, otherwise you will have to build from source)
  • Conda installation (miniconda is sufficient; provides an easy way to get Intel MKL or alternatively OpenBLAS optimized numpy versions which is highly recommended)
  • Python installation (recommended way to get Python is to use Conda as described in the setup section)
  • Installation of the required Python packages (recommended way is to use Conda as described in the setup section)
  • Optional: Download of publicly available measurement data for alternative execution modes (always check command line output or log files in case the rendering pipeline does not initialize successfully!)
  • Optional: Install an OSC client for real-time feedback and remote control options during runtime

Setup

  • Clone repository with command line or any other git client:
    git clone https://github.com/AppliedAcousticsChalmers/ReTiSAR.git
    • Alternative: Download and extract snapshot manually from provided URL (not recommended due to not being able to pull updates)
  • Navigate into repository (the directory containing setup.py):
    cd ReTiSAR/
  • Install required Python packages i.e., Conda is recommended:
    • Make sure that Conda is up to date:
      conda update conda
    • Create new Conda environment from the specified requirements (--force to overwrite potentially existing outdated environment):
      conda env create --file environment.yml --force
    • Activate created Conda environment:
      source activate ReTiSAR

Quickstart

  • Follow requirements and setup instructions
  • During first execution, some small amount of additional mandatory external measurement data will be downloaded automatically, see remark in execution modes (requires Internet connection)
  • Start JACK server with 48 kHz sampling rate:
    jackd -d coreaudio -r 48000
  • Run package with [default] parameters to hear a binaural rendering of a raw Eigenmike recording:
    python -m ReTiSAR
  • Option 1: Modify configuration by changing default parameters in config.py (prepared block comments for the specific execution modes below exist).
  • Option 2: Modify configuration by command line arguments (like in the following examples showing different execution parameters and modes (see --help).

JACK initialization -- In case you have never started the JACK audio server on your system or want to make sure it initializes with appropriate values. Open the JackPilot application set your system specific default settings.
At this point the only relevant JACK audio server setting is the sampling frequency, which has to match the sampling frequency of your rendered audio source file or stream (no resampling will be applied for that specific file).

FFTW optimization -- In case the rendering takes very long to start (after the message "initializing FFTW DFT optimization ..."), you might want to endure this long computation time once (per rendering configuration) or lower your FFTW planner effort (see --help).

Rendering performance -- Follow these remarks to expect continuous and artifact free rendering:

  • Optional components like array pre-rendering, headphone equalization, noise generation, etc. will save performance in case they are not deployed.
  • Extended IR lengths (particularly for modes with array IR pre-rendering) will massively increase the computational load depending on the chosen block length (partitioned convolution).
  • Currently there is no partitioned convolution for the main binaural renderer with SH based processing, hence the FIR taps of applied HRIR, Modal Radial Filters and further compensations (e.g. Spherical Head Filter) need to cumulatively fit inside the chosen block length.
  • Higher block length means lower computational load in real-time rendering but also increased system latency, most relevant for modes with array live-stream rendering, but also all other modes in terms of a slightly "smeared" head-tracking experience (noticeable at 4096 samples).
  • Adjust output levels of all rendering components (default parameters chosen accordingly) to prevent signal clipping (indicated by warning messages during execution).
  • Check JACK system load (e.g. JackPilot or OSC_Remote_Demo.pd) to be below approx. 95% load, in order to prevent dropouts (i.e. the OS reported overall system load is not a good indicator).
  • Check JACK detected dropouts ("xruns" indicated during execution).
  • Most of all, use your ears! If something sounds strange, there is probably something going wrong... ;)

Always check the command line output or generated log files in case the rendering pipeline does not initialize successfully!

Execution parameters

The following parameters are all optional and available in combinations with the named execution modes subsequently:

  • Run with specific processing block size (choose value according to the individual rendering configuration and performance of your system)
    • Largest block size (best performance but noticeable input latency):
      python -m ReTiSAR -b=4096 [default]
    • Try smaller block sizes according to the specific rendering configuration and individual system performance:
      python -m ReTiSAR -b=1024
      python -m ReTiSAR -b=256
  • Run with specific processing word length
    • Single precision 32 bit (better performance):
      python -m ReTiSAR -SP=TRUE [default]
    • Double precision 64 bit (no configuration with an actual benefit is known):
      python -m ReTiSAR -SP=FALSE
  • Run with specific IR truncation cutoff level (applied to all IRs)
    • Cutoff -60 dB under peak (better performance and perceptually irrelevant in most cases):
      python -m ReTiSAR -irt=-60 [default]
    • No cutoff to render entire IR (tough performance requirements in case of rendering particularly reverberant array IRs):
      python -m ReTiSAR -irt=0 [applied in all scientific evaluations]
  • Run with specific head-tracking device (paths are system dependent!)
    • No tracking (head movement can be remote controlled):
      python -m ReTiSAR -tt=NONE [default]
    • Automatic rotation:
      python -m ReTiSAR -tt=AUTO_ROTATE
    • Tracker Razor AHRS:
      python -m ReTiSAR -tt=RAZOR_AHRS -t=/dev/tty.usbserial-AH03F9XC
    • Tracker Polhemus Patriot:
      python -m ReTiSAR -tt=POLHEMUS_PATRIOT -t=/dev/tty.UC-232AC
    • Tracker Polhemus Fastrack:
      python -m ReTiSAR -tt=POLHEMUS_FASTRACK -t=/dev/tty.UC-232AC
  • Run with specific HRTF dataset as MIRO [5] or SOFA [6] files
    • Neumann KU100 artificial head from [5] as SOFA:
      python -m ReTiSAR -hr=res/HRIR/KU100_THK/HRIR_L2702.sofa -hrt=HRIR_SOFA [default]
    • Neumann KU100 artificial head from [5] as MIRO:
      python -m ReTiSAR -hr=res/HRIR/KU100_THK/HRIR_L2702_struct.mat -hrt=HRIR_MIRO
    • FABIAN artificial head from [7] as SOFA:
      python -m ReTiSAR -hr=res/HRIR/FABIAN_TUB/FABIAN_HRIR_measured_HATO_0.sofa -hrt=HRIR_SOFA
  • Run with specific headphone equalization / compensation filters (arbitrary filter length). The compensation filter should match the utilized individual headphone (model)! In the best case scenario, the filter was also gathered on the identical utilized HRIR (artificial or individual head)!
    • No individual headphone compensation: python -m ReTiSAR -hp=NONE [default]
    • Sennheiser HD600 headphone on GRAS KEMAR artificial head: python -m ReTiSAR -hp=res/HpIR/KEMAR_TUR/hpComp_HD600_1Filter.wav
  • Run with specific SH processing compensation techniques (relevant for rendering modes utilizing spherical harmonics)
    • Modal Radial Filters [always applied] with individual amplification soft-limiting in dB according to [2 ]:
      python -m ReTiSAR -arr=18 [default]
    • Spherical Head Filter according to [3]:
      python -m ReTiSAR -sht=SHF
    • Spherical Harmonics Tapering in combination with Spherical Head Filter according to [4]:
      python -m ReTiSAR -sht=SHT+SHF [default]
  • Run with specific emulated self-noise as additive component to each microphone array sensor (performance requirements increase according to channel count)
    • No noise (best performance):
      python -m ReTiSAR -gt=NONE [default]
    • White noise (also setting the initial output level and mute state of the rendering component):
      python -m ReTiSAR -gt=NOISE_WHITE -gl=-30 -gm=FALSE
    • Pink noise by IIR filtering (higher performance requirements):
      python -m ReTiSAR -gt=NOISE_IIR_PINK -gl=-30 -gm=FALSE
  • For further configuration parameters, check Alternative 1 and Alternative 2 above.

Execution modes

This section list all the conceptually different rendering modes of the pipeline. Most of the other beforehand introduced execution parameters can be combined with the mode-specific parameters. In case no manual value for all specific rendering parameters is provided (as in the following examples), their respective default values will be used.

Most execution modes require additional external measurement data, which cannot be republished here. However, all provided examples are based on publicly available research data. Respective files are represented here by provided source reference files (see res/), containing a source URL and potentially further instructions. In case the respective resource data file is not yet available on your system, download instructions will be shown in the command line output and generated log files.

  • Run as array recording renderer

    • Eigenmike at Chalmers lab space with speaker moving horizontally around the array:
      python -m ReTiSAR -sh=4 -tt=NONE -s=res/record/EM32ch_lab_voice_around.wav -ar=res/ARIR/RT_calib_EM32ch_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/HRIR_L2702.sofa -hrt=HRIR_SOFA [default]
    • Eigenmike at Chalmers lab space with speaker moving vertically in front of the array:
      python -m ReTiSAR -sh=4 -tt=NONE -s=res/record/EM32ch_lab_voice_updown.wav -ar=res/ARIR/RT_calib_EM32ch_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/HRIR_L2702.sofa -hrt=HRIR_SOFA
    • HØSMA 7n at TH Cologne lecture hall (recording file not provided):
      python -m ReTiSAR -b=1024 -sh=7 -tt=NONE -s=res/record/HOS64_hall_lecture.wav -sp="[(90,0)]" -sl=9 -ar=res/ARIR/RT_calib_HOS64_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/HRIR_L2702.sofa -hrt=HRIR_SOFA
  • Run as array live-stream renderer with minimum latency (e.g. Eigenmike with the respective channel calibration provided by manufacturer)

    • Eigenmike Chalmers EM32 (SN 28):
      python -m ReTiSAR -b=256 -sh=4 -tt=NONE -s=None -ar=res/ARIR/RT_calib_EM32ch_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/HRIR_L2702.sofa -hrt=HRIR_SOFA
    • Eigenmike Facebook Reality Labs EM32 (SN ??):
      python -m ReTiSAR -b=256 -sh=4 -tt=NONE -s=None -ar=res/ARIR/RT_calib_EM32frl_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/HRIR_L2702.sofa -hrt=HRIR_SOFA
    • TH Cologne HØSMA 7n:
      python -m ReTiSAR -b=1024 -sh=7 -tt=NONE -s=None -ar=res/ARIR/RT_calib_HOS64_struct.mat -art=AS_MIRO -arl=0 -hr=res/HRIR/KU100_THK/HRIR_L2702.sofa -hrt=HRIR_SOFA
    • Zylia ZM-1:
      python -m ReTiSAR -b=256 -sh=3 ... (grid calibration file pending)
  • Run as array IR renderer, e.g. Eigenmike

    • Simulated plane wave: python -m ReTiSAR -sh=4 -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -ar=res/ARIR/DRIR_sim_EM32_PW_struct.mat -art=ARIR_MIRO -arl=-6 -hr=res/HRIR/KU100_THK/HRIR_L2702.sofa -hrt=HRIR_SOFA
    • Anechoic measurement: python -m ReTiSAR -sh=4 -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -ar=res/ARIR/DRIR_anec_EM32ch_S_struct.mat -art=ARIR_MIRO -arl=0 -hr=res/HRIR/KU100_THK/HRIR_L2702.sofa -hrt=HRIR_SOFA
  • Run as array IR renderer, e.g. sequential VSA measurements from [8] at the maximum respective SH order

    • 50ch (sh5), SBS center:
      python -m ReTiSAR -sh=5 -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -ar=res/ARIR/DRIR_SBS_VSA_50RS_PAC.sofa -art=ARIR_SOFA -arl=-12 -hr=res/HRIR/KU100_THK/HRIR_L2702.sofa -hrt=HRIR_SOFA
    • 86ch (sh7), LBS center:
      python -m ReTiSAR -sh=7 -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -ar=res/ARIR/DRIR_LBS_VSA_86RS_PAC.sofa -art=ARIR_SOFA -arl=-12 -hr=res/HRIR/KU100_THK/HRIR_L2702.sofa -hrt=HRIR_SOFA
    • 110ch (sh8), CR1 left:
      python -m ReTiSAR -sh=8 -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -sp="[(-37,0)]" -ar=res/ARIR/DRIR_CR1_VSA_110RS_L.sofa -art=ARIR_SOFA -arl=-12 -hr=res/HRIR/KU100_THK/HRIR_L2702.sofa -hrt=HRIR_SOFA
    • 1202ch (truncated sh12), CR1 left:
      python -m ReTiSAR -sh=12 -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -sp="[(-37,0)]" -ar=res/ARIR/DRIR_CR1_VSA_1202RS_L.sofa -art=ARIR_SOFA -arl=-12 -hr=res/HRIR/KU100_THK/HRIR_L2702.sofa -hrt=HRIR_SOFA
  • Run as BRIR renderer (partitioned convolution in frequency domain) for any BRIR compatible to the SoundScape Renderer, e.g. pre-processed array IRs by [9]
    python -m ReTiSAR -tt=AUTO_ROTATE -s=res/source/Drums_48.wav -art=NONE -hr=res/HRIR/KU100_THK/BRIR_CR1_VSA_110RS_L_SSR_SFA_-37_SOFA_RFI.wav -hrt=BRIR_SSR -hrl=-12

  • Run as "binauralizer" for an arbitrary number of virtual sound sources via HRTF (partitioned convolution in frequency domain) for any HRIR compatible to the SoundScape Renderer
    python -m ReTiSAR -tt=AUTO_ROTATE -s=res/source/PinkMartini_Lilly_44.wav -sp="[(30, 0),(-30, 0)]" -art=NONE -hr=res/HRIR/FABIAN_TUB/hrirs_fabian.wav -hrt=HRIR_SSR (provide respective source file and source positions!)

Remote Control

  • During runtime, certain parameters of the application can be remote controlled via Open Sound Control. Individual clients can be accessed by targeting them with specific OSC commands on port 5005 [default].
    Depending on the current configuration and rendering mode different commands are available, i.e. arbitrary combinations of the following targets and values:
    /generator/volume 0, /generator/volume -12 (set any client output volume in dBFS),
    /prerenderer/mute 1, /prerenderer/mute 0, /prerenderer/mute -1, /prerenderer/mute (set/toggle any client mute state),
    /hpeq/passthrough true, /hpeq/passthrough false, /hpeq/passthrough toggle (set/toggle any client passthrough state)
  • The target name is derived from the individual JACK client name for all commands, while the order of target client and command can be altered. Additional commands might be available.
    /renderer/crossfade, /crossfade/renderer (set/toggle crossfade state),
    /renderer/delay 350.0 (set additional input delay in ms),
    /renderer/order 0, /renderer/order 4 (set SH rendering order),
    /tracker/zero (calibrate tracker), /tracker/azimuth 45 (set tracker orientation),
    /player/stop, /player/play, /quit (quit all rendering components)
  • During runtime, individual JACK clients with their respective "target" name also report real-time feedback or analysis data on port 5006 [default] in the specified exemplary data format (number of values depends on output ports), i.e. arbitrary combinations of the name and parameters:
    /player/rms 0.0, /generator/peak 0.0 0.0 0.0 0.0 (current audio output metrics),
    /renderer/load 100 (current client load),
    /tracker/AzimElevTilt 0.0 0.0 0.0 (current head orientation),
    /load 100 (current JACK system load)
  • In the package included is an example remote control client implemented for "vanilla" PD, see further instructions in OSC_Remote_Demo.pd. Screenshot of OSC_Remote_Demo.pd

Validation - Setup and Execution

  • Download and build required ecasound library for signal playback and capture with JACK support
    in directory ./configure, make and sudo make install while having JACK installed
  • Optional: Install sendosc tool to be used for automation in shell scripts
    brew install yoggy/tap/sendosc
  • Remark: Make sure all subsequent rendering configurations are able to start up properly before recording starts (particularly FFTW optimization might take a long time, see above)
  • Validate impulse responses by comparing against a reference implementation, in this case the output of sound_field_analysis-py [8]
    • Execute recording script, consecutively starting the package and capturing impulse responses in different rendering configurations
      ./res/research/validation/record_ir.sh
      Remark: Both implementations compensate the source being at an incidence angle of -37 degrees in the measurement IR set
    • Run package in validation mode, executing a comparison of all beforehand captured IRs in res/research/validation/ against the provided reference IRs
      python -m ReTiSAR --VALIDATION_MODE=res/HRIR/KU100_THK/BRIR_CR1_VSA_110RS_L_SSR_SFA_-37_SOFA_RFI.wav
  • Validate signal-to-noise-ratio by comparing input and output signals of the main binaural renderer for wanted target signals and emulated sensor self-noise respectively
    • Execute recording script consecutively starting the package and capturing target-noise as well as self-noise input and output signals in different rendering configurations
      ./res/research/validation/record_snr.sh
    • Open (and run) MATLAB analysis script to execute an SNR comparison of beforehand captured signals
      open ./res/research/validation/calculate_snr.m

Benchmark - Setup and Execution

  • Install addition required Python packages into Conda environment
    conda env update --file environment_dev.yml
  • Run the JACK server with arbitrary sampling rate via JackPilot or open a new command line window [CMD]+[T] and
    jackd -d coreaudio
  • Run in benchmark mode, instantiating one rendering JACK client with as many convolver instances as possible (35-60 minutes)
    python -m ReTiSAR --BENCHMARK_MODE=PARALLEL_CONVOLVERS
  • Run in benchmark mode, instantiating as many rendering JACK clients as possible with one convolver instance (10-15 minutes)
    python -m ReTiSAR --BENCHMARK_MODE=PARALLEL_CLIENTS
  • Find generated results in the specified files at the end of the script.

References

[1] Helmholz, H., Andersson, C., and Ahrens, J. (2019). “Real-Time Implementation of Binaural Rendering of High-Order Spherical Microphone Array Signals,” Fortschritte der Akust. -- DAGA 2019, Deutsche Gesellschaft für Akustik, Rostock, Germany, 1462-1465.
[2] Bernschütz, B., Pöschmann, C., Spors, S., and Weinzierl, S. (2011). “Soft-Limiting der modalen Amplitudenverstärkung bei sphärischen Mikrofonarrays im Plane Wave Decomposition Verfahren,” Fortschritte der Akust. -- DAGA 2011, Deutsche Gesellschaft für Akustik, Düsseldorf, Germany, 661–662.
[3] Hold, C., Gamper, H., Pulkki, V., Raghuvanshi, N., and Tashev, I. J. (2019). “Improving Binaural Ambisonics Decoding by Spherical Harmonics Domain Tapering and Coloration Compensation,” Int. Conf. Acoust. Speech Signal Process., IEEE, Brighton, UK, 261–265. doi:10.1109/ICASSP.2019.8683751
[4] Ben-Hur, Z., Brinkmann, F., Sheaffer, J., Weinzierl, S., and Rafaely, B. (2017). “Spectral equalization in binaural signals represented by order-truncated spherical harmonics,” J. Acoust. Soc. Am., 141, 4087–4096. doi:10.1121/1.4983652
[5] Bernschütz, B. (2013). “A spherical far field HRIR/HRTF compilation of the Neumann KU 100,” Fortschritte der Akust. -- AIA/DAGA 2013, Deutsche Gesellschaft für Akustik, Meran, Italy, 592–595.
[6] Majdak, P., Iwaya, Y., Carpentier, T., Nicol, R., Parmentier, M., Roginska, A., Suzuki, Y., et al. (2013). “Spatially Oriented Format for Acoustics: A Data Exchange Format Representing Head-Related Transfer Functions,” AES Conv. 134, Audio Engineering Society, Rome, 262–272.
[7] F. Brinkmann et al., “The FABIAN head-related transfer function data base.” Technische Universität Berlin, Berlin, Germany, 2017.
[8] Stade, P., Bernschütz, B., and Rühl, M. (2012). “A Spatial Audio Impulse Response Compilation Captured at the WDR Broadcast Studios,” 27th Tonmeistertagung -- VDT Int. Conv., Verband Deutscher Tonmeister e.V., Cologne, Germany, 551–567.
[9] Hohnerlein, C., and Ahrens, J. (2017). “Spherical Microphone Array Processing in Python with the sound field analysis-py Toolbox,” Fortschritte der Akust. -- DAGA 2017, Deutsche Gesellschaft für Akustik, Kiel, Germany, 1033–1036.

Changelog

  • v2020.2.14
    • Addition of TH Cologne HØSMA 7n array configuration
  • v2020.2.10
    • Addition of project community information (contributing, code of conduct, issue templates)
  • v2020.2.7
    • Extension of DataRetriever to automatically download data files
    • Addition of missing ignored project resources
  • v2020.2.2
    • Change of default rendering configuration to contained Eigenmike recording
    • Update of README structure (including Quickstart section)
  • v2020.1.30
    • First publication of code

Contributing

See CONTRIBUTING for full details.

Credits

Written by Hannes Helmholz.

Scientific supervision by Jens Ahrens.

Contributions by Carl Andersson and Tim Lübeck.

This work was funded by Facebook Reality Labs.

License

This software is licensed under a Non-Commercial Software License (see LICENSE for full details).

Copyright (c) 2018
Division of Applied Acoustics
Chalmers University of Technology

You can’t perform that action at this time.