Normalize WAV voice recordings so all speakers sound at the same volume level (-6 dB by default), regardless of their distance from the microphone.
Handles real-world scenarios: background noise, wind, echo, multiple speakers, and speakers moving during recording.
pip install voxlevelimport voxlevel
# From a WAV file
voxlevel.normalize("input.wav", "output.wav")
# From a numpy array
import numpy as np
result = voxlevel.normalize(audio_array, sample_rate=16000)
# With custom parameters
voxlevel.normalize(
"input.wav",
"output.wav",
target_db=-6.0,
max_gain_db=30.0,
rms_window_ms=400.0,
smooth_window_ms=200.0,
)# Single file
voxlevel input.wav -o output.wav
# Batch processing
voxlevel *.wav -o normalized/
# Custom target level
voxlevel input.wav -o output.wav --target-db -3.0voxlevel uses a two-pass offline approach (not real-time compression):
- Preprocessing -- DC removal + 80 Hz high-pass filter to cut wind noise, handling noise, and plosives
- Voice Activity Detection -- Silero-VAD (ONNX) identifies speech vs. silence segments
- Automatic Gain Control -- Sliding RMS envelope computes the gain needed at each sample to reach the target level, with interpolation across silence gaps and bidirectional smoothing
- Lookahead limiter -- 5 ms lookahead prevents peaks from exceeding the target, reducing transient distortion compared to brick-wall clipping
The two-pass design means gain is correct from sample 0 -- no lag or adaptation artifacts that real-time compressors exhibit.
- 16-bit mono WAV at 8 kHz or 16 kHz
- Offline processing only (no streaming)
MIT