Real-time Angle of Arrival (AoA) sound localization using GCC-PHAT on an ESP32 with two I2S MEMS microphones.
A real-time system to detect the direction of a sound source using an ESP32-DevKit-V1 and two SPH0645 I2S MEMS microphones. This project implements the GCC-PHAT (Generalized Cross-Correlation with Phase Transform) algorithm to robustly calculate the Angle of Arrival (AoA) of a sound wave.
This was completed as a Final Year Project for the Dokuz Eylül University, Department of Electrical & Electronics Engineering.
!(httpsDEU_EEE_Report/Figure-4.1-Setup.png) (This image and others are placeholders. See instructions below to add your own.)
- [cite_start]Microcontroller: ESP32-DevKit-V1
- [cite_start]Sensors: 2x SPH0645 I2S MEMS Microphones
- [cite_start]Array: 10 cm Uniform Linear Array (ULA) [cite: 95]
- [cite_start]Algorithm: GCC-PHAT (Generalized Cross-Correlation with Phase Transform)
- [cite_start]Signal Processing: Real-time FFT, IFFT, Hamming Window, DC Offset Removal [cite: 120, 121]
- [cite_start]Protocol: Shared I2S bus in stereo configuration [cite: 96]
- [cite_start]Development: Arduino IDE [cite: 335]
The two SPH0645 microphones are configured in a stereo setup on a single, shared I2S bus. [cite_start]This is achieved by setting the SEL pin on one microphone to GND (Left Channel) and the other to VDD (Right Channel)[cite: 96, 323].
- [cite_start]ESP32 Pin 26 (BCLK) -> Microphones BCLK [cite: 555]
- [cite_start]ESP32 Pin 25 (WS/LRCLK) -> Microphones LRCLK [cite: 555]
- [cite_start]ESP32 Pin 22 (DIN) -> Microphones DOUT [cite: 555]
!(httpsDEU_EEE_Report/Figure-4.2-Schematic.png) [cite_start](Figure 4.2 from your report )
The system continuously processes audio in 1024-sample blocks to find the time delay between the two microphones and then calculates the angle.
-
[cite_start]Audio Capture: The ESP32 reads 1024-sample blocks of stereo audio at a 64 kHz sampling rate from the shared I2S bus[cite: 558, 762]. [cite_start]The 32-bit I2S data is right-shifted by 8 bits to get the 24-bit audio data [cite: 569, 1088-1097].
-
Signal Preprocessing:
- [cite_start]DC Offset Removal: The mean (average) of the signal block is calculated and subtracted to center the waveform around zero[cite: 583, 588, 589].
- [cite_start]Hamming Window: A Hamming window is applied to the signal to reduce spectral leakage during the FFT[cite: 616, 620].
-
[cite_start]FFT (Fast Fourier Transform): Both audio signals (Left and Right) are transformed from the time domain to the frequency domain using FFT[cite: 637].
-
GCC-PHAT Calculation: This is the core of the algorithm.
- [cite_start]Bandpass Filtering: The code filters for the target 1 kHz test tone by only processing frequency bins between 950 Hz and 1050 Hz[cite: 677, 1022].
- [cite_start]Aliasing Filter: Frequencies above the spatial aliasing limit (~1716 Hz for 10cm spacing) are zeroed out to prevent errors[cite: 678, 1127].
- [cite_start]Cross-Spectrum: The FFT of Mic 1 is multiplied by the complex conjugate of Mic 2's FFT (
R12 = X1(f) * conj(X2(f)))[cite: 648, 657]. - PHAT Normalization: The "Phase Transform" is applied by dividing the cross-spectrum by its own magnitude. This removes all amplitude information, leaving only the phase difference between the signals. [cite_start]This is what makes the algorithm robust against noise and reverberation[cite: 650, 651, 668].
-
Time Delay (TDoA) Estimation:
- [cite_start]Inverse FFT (IFFT): The normalized cross-spectrum is transformed back into the time domain[cite: 691].
-
Peak Finding: The result is a correlation function. [cite_start]The index of the highest peak in this function represents the Time Difference of Arrival (
$\tau$ )—the delay (in samples) between the two microphones [cite: 692, 1153-1159].
-
[cite_start]Angle Calculation: The TDoA (
$\tau$ ) is converted from samples to seconds, and the final Angle of Arrival ($\theta$ ) is calculated with basic trigonometry[cite: 1165]:$$\theta = \arcsin\left(\frac{\tau \cdot c}{d}\right)$$ -
$c$ = Speed of Sound (343.2 m/s) [cite: 1015] -
$d$ = Microphone Distance (0.10 m) [cite: 1011]
-
[cite_start]The system was tested using a 1 kHz sine wave[cite: 709]. The system accurately detects the angle, especially at close range.
| Test Angle | Serial Monitor Output |
|---|---|
| 0 Degrees | [cite_start]AoA = 0.0* At = 0.000000 s [cite: 782] |
| +90 Degrees | [cite_start]AoA = 90.0* At = 0.000291 s [cite: 813] |
| -90 Degrees | [cite_start]AoA = -90.0* At = -0.000291 s [cite: 800] |
!(httpsDEU_EEE_Report/Figure-4.3-Results.png)
The system is highly accurate at short distances (10-20 cm). [cite_start]As documented in the test report, accuracy degrades at farther distances (>30 cm) or for non-perpendicular angles (like 45°), which is expected behavior for a 2-microphone array[cite: 774, 873].
[cite_start]Error Table (from Report Table 4.4 [cite: 871]):
| Distance | Target: 0° | Target: -90° | Target: 90° | Target: -45° |
|---|---|---|---|---|
| 10 cm | 0% error | 0% error | 0% error | 0% error |
| 20 cm | 0% error | 0% error | 0% error | 0% error |
| 30 cm | 3.1% error | 16.78% error | 0% error | 27% error |
| >30 cm | No output | High Error | High Error | High Error |
- Add More Microphones: Implement a 4-microphone Uniform Circular Array (UCA) to enable 360° detection.
- [cite_start]Add Display: Use an LCD or OLED screen to display the angle visually instead of using the Serial Monitor[cite: 879].
- Improve Algorithm: Implement a wider-band GCC or a different algorithm (like MUSIC) to detect arbitrary sounds, not just a 1 kHz test tone.