Skip to content

Real-time Angle of Arrival (AoA) sound localization using GCC-PHAT on an ESP32 with two I2S MEMS microphones.

Notifications You must be signed in to change notification settings

EkinUgurr/sound-localization-esp32

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

sound-localization-esp32

Real-time Angle of Arrival (AoA) sound localization using GCC-PHAT on an ESP32 with two I2S MEMS microphones.

ESP32 Sound Source Localization (Angle of Arrival)

A real-time system to detect the direction of a sound source using an ESP32-DevKit-V1 and two SPH0645 I2S MEMS microphones. This project implements the GCC-PHAT (Generalized Cross-Correlation with Phase Transform) algorithm to robustly calculate the Angle of Arrival (AoA) of a sound wave.

This was completed as a Final Year Project for the Dokuz Eylül University, Department of Electrical & Electronics Engineering.

!(httpsDEU_EEE_Report/Figure-4.1-Setup.png) (This image and others are placeholders. See instructions below to add your own.)

🛠️ Technical Overview

  • [cite_start]Microcontroller: ESP32-DevKit-V1
  • [cite_start]Sensors: 2x SPH0645 I2S MEMS Microphones
  • [cite_start]Array: 10 cm Uniform Linear Array (ULA) [cite: 95]
  • [cite_start]Algorithm: GCC-PHAT (Generalized Cross-Correlation with Phase Transform)
  • [cite_start]Signal Processing: Real-time FFT, IFFT, Hamming Window, DC Offset Removal [cite: 120, 121]
  • [cite_start]Protocol: Shared I2S bus in stereo configuration [cite: 96]
  • [cite_start]Development: Arduino IDE [cite: 335]

🔌 Hardware Setup & Schematic

The two SPH0645 microphones are configured in a stereo setup on a single, shared I2S bus. [cite_start]This is achieved by setting the SEL pin on one microphone to GND (Left Channel) and the other to VDD (Right Channel)[cite: 96, 323].

  • [cite_start]ESP32 Pin 26 (BCLK) -> Microphones BCLK [cite: 555]
  • [cite_start]ESP32 Pin 25 (WS/LRCLK) -> Microphones LRCLK [cite: 555]
  • [cite_start]ESP32 Pin 22 (DIN) -> Microphones DOUT [cite: 555]

!(httpsDEU_EEE_Report/Figure-4.2-Schematic.png) [cite_start](Figure 4.2 from your report )


💡 How It Works: The GCC-PHAT Pipeline

The system continuously processes audio in 1024-sample blocks to find the time delay between the two microphones and then calculates the angle.

  1. [cite_start]Audio Capture: The ESP32 reads 1024-sample blocks of stereo audio at a 64 kHz sampling rate from the shared I2S bus[cite: 558, 762]. [cite_start]The 32-bit I2S data is right-shifted by 8 bits to get the 24-bit audio data [cite: 569, 1088-1097].

  2. Signal Preprocessing:

    • [cite_start]DC Offset Removal: The mean (average) of the signal block is calculated and subtracted to center the waveform around zero[cite: 583, 588, 589].
    • [cite_start]Hamming Window: A Hamming window is applied to the signal to reduce spectral leakage during the FFT[cite: 616, 620].
  3. [cite_start]FFT (Fast Fourier Transform): Both audio signals (Left and Right) are transformed from the time domain to the frequency domain using FFT[cite: 637].

  4. GCC-PHAT Calculation: This is the core of the algorithm.

    • [cite_start]Bandpass Filtering: The code filters for the target 1 kHz test tone by only processing frequency bins between 950 Hz and 1050 Hz[cite: 677, 1022].
    • [cite_start]Aliasing Filter: Frequencies above the spatial aliasing limit (~1716 Hz for 10cm spacing) are zeroed out to prevent errors[cite: 678, 1127].
    • [cite_start]Cross-Spectrum: The FFT of Mic 1 is multiplied by the complex conjugate of Mic 2's FFT (R12 = X1(f) * conj(X2(f)))[cite: 648, 657].
    • PHAT Normalization: The "Phase Transform" is applied by dividing the cross-spectrum by its own magnitude. This removes all amplitude information, leaving only the phase difference between the signals. [cite_start]This is what makes the algorithm robust against noise and reverberation[cite: 650, 651, 668].
  5. Time Delay (TDoA) Estimation:

    • [cite_start]Inverse FFT (IFFT): The normalized cross-spectrum is transformed back into the time domain[cite: 691].
    • Peak Finding: The result is a correlation function. [cite_start]The index of the highest peak in this function represents the Time Difference of Arrival ($\tau$)—the delay (in samples) between the two microphones [cite: 692, 1153-1159].
  6. [cite_start]Angle Calculation: The TDoA ($\tau$) is converted from samples to seconds, and the final Angle of Arrival ($\theta$) is calculated with basic trigonometry[cite: 1165]: $$\theta = \arcsin\left(\frac{\tau \cdot c}{d}\right)$$

    • $c$ = Speed of Sound (343.2 m/s) [cite: 1015]
    • $d$ = Microphone Distance (0.10 m) [cite: 1011]

📊 Results & Performance

[cite_start]The system was tested using a 1 kHz sine wave[cite: 709]. The system accurately detects the angle, especially at close range.

Test Angle Serial Monitor Output
0 Degrees [cite_start]AoA = 0.0* At = 0.000000 s [cite: 782]
+90 Degrees [cite_start]AoA = 90.0* At = 0.000291 s [cite: 813]
-90 Degrees [cite_start]AoA = -90.0* At = -0.000291 s [cite: 800]

!(httpsDEU_EEE_Report/Figure-4.3-Results.png)

Performance & Error Analysis

The system is highly accurate at short distances (10-20 cm). [cite_start]As documented in the test report, accuracy degrades at farther distances (>30 cm) or for non-perpendicular angles (like 45°), which is expected behavior for a 2-microphone array[cite: 774, 873].

[cite_start]Error Table (from Report Table 4.4 [cite: 871]):

Distance Target: 0° Target: -90° Target: 90° Target: -45°
10 cm 0% error 0% error 0% error 0% error
20 cm 0% error 0% error 0% error 0% error
30 cm 3.1% error 16.78% error 0% error 27% error
>30 cm No output High Error High Error High Error

🚀 Future Improvements

  • Add More Microphones: Implement a 4-microphone Uniform Circular Array (UCA) to enable 360° detection.
  • [cite_start]Add Display: Use an LCD or OLED screen to display the angle visually instead of using the Serial Monitor[cite: 879].
  • Improve Algorithm: Implement a wider-band GCC or a different algorithm (like MUSIC) to detect arbitrary sounds, not just a 1 kHz test tone.

About

Real-time Angle of Arrival (AoA) sound localization using GCC-PHAT on an ESP32 with two I2S MEMS microphones.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages