-
Notifications
You must be signed in to change notification settings - Fork 7
[Domain & Simulation Scope] Introducing Augmentations to Simulation #34
Description
Hi all, thanks so much for getting this out! This is really cool work you guys are doing, and would love to help out by introducing augmentations to the simulation - specifically on the user side.
Note: I had Claude help me with the formatting for below for better readability. I went through myself and edited the content to be aligned with what I actually want to work on.
Proposed Solution
Introduce an audio augmentation layer that can be applied to the existing 50 clean scenarios post-synthesis (or at simulation time), producing noisy/reverberant variants without needing to re-record or re-generate the base utterances.
Background Noise
- Curate or source a noise corpus covering key categories: airport/terminal ambience, vehicle cabin (car, bus, train), street/urban, crowd/café, wind/outdoor, and PA/intercom bleed.
- Mix noise into clean utterances at a configurable range of SNR levels (e.g. 20 dB, 15 dB, 10 dB, 5 dB) to represent mild through severe conditions. Realistically, we'll just construct a continuous distribution to sample from?
Reverberation
- Apply synthetic room impulse responses (RIRs) representing small rooms (RT60 ~0.2–0.4s), medium rooms (~0.5–0.8s), and large/reflective spaces (~1.0–1.5s).
- Use publicly available RIR datasets (e.g. OpenSLR RIRs, MIT IR Survey, EchoThief) or generate synthetic RIRs with a tool like
pyroomacoustics. - Optionally combine reverberation with noise to simulate realistic compound conditions.
Integration
Each augmented scenario should log the exact augmentation parameters used so results are reproducible. We can also have fixed seeds for reproducibility as well.
Out of Scope
Claude gave this idea, but I think it can be introduced in the same PR if you would like. It could also be a separate PR as well.
Codec or telephony channel simulation (e.g. G.711 compression artifacts) — worth a follow-up issue.