
**GCC-PHAT (generalized cross correlation Phase Transform) 기본 원리**

1. 투 채널을 가정했을 때, 두 개의 마이크에서 받은 신호들을 각각 푸리에 변환
2. 주파수 도메인 상에서, 두 신호의 cross power spectrum을 구함
3. phase transform을 적용해 정규화하여 phase 정보만 남김
4. inverse fourier transform해서 cross correlation을 얻음
5. cross correlation이 최댓값을 갖는 delay를 찾음
6. TDoA를 아는 상태에서 마이크 간 거리, 소리의 속도 등 마이크 어레이 정보를 통해 DoA를 계산함


In [160]:
import os
import numpy as np
import librosa

### 데이터 준비

In [161]:
folder = "data/150r2_M"
audios = os.listdir(folder)
# bottom, top
sr = 48000

sig1, _ = librosa.load(folder + "/" + "hal_in_pure_24_4ch_48k_1.wav", sr=None)
sig2, _ = librosa.load(folder + "/" + "hal_in_pure_24_4ch_48k_2.wav", sr=None)

### GCC-PHAT 함수 구현

In [163]:
def gcc_phat(sig1, sig2, sr):
    n = sig1.size + sig2.size
    
    SIG1 = np.fft.rfft(sig1, n=n)
    SIG2 = np.fft.rfft(sig2, n=n)
    
    R = SIG1 * np.conj(SIG2)
    cc = np.fft.irfft(R / np.abs(R))
    
    # cross-correlation = sample delay
    # TDoA = sample delay / sampling rate

    # max_shift -> How the signal is delayed relative to another
    # cc -> consider all possible delays in the range -max shift ~ +max_shift
    
    # Due to FFT, the cross-correlation function has circular properties.
    # The process of converting linear convolution result is necessary.
    # So, the result of the cross-correlation function is shifted and converted to center-based form.
    
    sample_delay = 0
    max_shift = n // 2
    cc = np.concatenate((cc[-max_shift:], cc[:max_shift]))
    sample_delay = np.argmax(cc) - max_shift
    tdoa = sample_delay / float(sr)

    return tdoa

# tdoa = cos(doa) * distance / sound_speed
tdoa = gcc_phat(sig1, sig2, sr)
distance = 0.16
sound_speed = 343.0
doa = np.arccos(tdoa * sound_speed / distance)
print(f"TDoA: {tdoa:.4f}")
print(f"DoA: {doa:.4f}")
print(f"Azimuth: {np.degrees(doa)-90:.4f}")

TDoA: 0.0001
DoA: 1.3456
Azimuth: -12.9034
