In [None]:
# :: 4th December 2022 :: @2:12pm

Time-Feature Pipeline:
- 1) ADC first. (Sample and quantize)
- 2) Framing (bundle together samples, they are overlapped)

Frames
- Perceivable audio chunks
    - Issue is 1 sample @ 44.1khz = 0.0227ms
    - Duration of 1 sample << Ear time resolution (~10ms)
- Usually in the power of 2 samples.
    - Fast fourier transform speeds up having ^2 samples.
- Typical frame value vary from 256-8192

Frame Duration:
$$
d_f = \frac{1}{s_r} \times K
$$
where s_r = sampling rate, K = frame size and d = duration of a frame

In [3]:
def frame_duration(sampling_rate, frame_size):
    return (1 / sampling_rate) * frame_size

print(f'Frame size 512 at sampling rate 44100hz = {frame_duration(44100, 512)} aka 11.6milliseconds which is just within the ~10ms ear discernibility range')

Frame size 512 at sampling rate 44100hz = 0.011609977324263039 aka 11.6milliseconds


Time to compute time domain features on each frame after framing.

Then aggregate certain results such as means, medians and GMM (Gaussian mixture models) for a feature vector of the entire file.

Frequency-Domain Feature Pipeline:
- ADC -> Framing -> move from time domain to the frequency domain (using fourier transformation)

Spectral Leakage
- Processed signal is not an integer number of periods. Happens all the time. There is a certain amount fo time of audio and is rarely hte number of case that the frames are whole integers.
- aka Endpoints are discontinuous.
- Discontinuities appear as high-frequency components not present in the original signal (they are artifacts due to discontinuities with the fourier transform.)

How to resolve this
- We use windowing.

Windowing
- We apply a windowing function before we feed the frames into the fourier transform
- Eliminates samples at both ends of a frame.
- Completely remove the end point information
- Generates a periodic signal which removes spectral leakage.

Hann window equation:
$$
w(k) = 0.5 \times (1 - cos(\frac{2 \times \pi \times k}{K - 1})), k = 1...K)
$$
- The Hann windows is a function of the samples.
- Where k = sample
- Creates a bell shape curve where the endpoints converge to 0.


For applying Hann window to original signal
$$
s_w(k) = s(k) \times w(k), k = 1...k
$$
- We essentially multiply the original signal by the Hann window and each correspondent sample.
- The signal essentially smoothens the endpoints -> no discontinuities.

Issue:
- Once we glue the frames together we lose signal as the endpoints all converge to 0.
- To fix this we use overlapping frames.

Overlapping frames:
- To account for lost end-point information due to Hann windowing we overlap frames.
- Hop length (size)
    - amount of samples to shift by when taking a new frame.

Total Frequency domain feature pipeline:
- ADC -> Quant/Sample -> frame -> window -> fourier transform -> feature computation -> aggregation statistics -> feature value / vector / matrix