#### *Spectrograms*
Spectrograms are a visual representation of an audio sample, where the x axis represents time, the y axis represents frequency, and the color intensity indicates the amplitude or power of the signal at each frequency and time.

Spectrograms are typically generated using a digital signal processing technique called **Short Time Fourier Transforms** (STFT). In essence, STFTs output the Fourier Coefficients for the $k^{\text{th}}$ frequency at the $m^{\text{th}}$ temporal frame. This coefficient is represented as a complex number composed of both phase and magnitude.

#### *Short Time Fourier Transform Formula*
$$
S(m,k)=\sum_{n=0}^{N-1}x(n+mH)\cdot w(n)\cdot e^{-i2\pi n\frac{k}{N}}
$$
-  $k=$ proxy for frequency *(also measured in # of frequency bins)*
- $m=$ proxy for time *(also measures frame number or # of frames)*
- $N=$ all samples contained in frame size $\equiv$  frame size
- $x(n+mH)=$ all samples contained in current frame
	- $mH=$ starting sample of current frame
- $w(n)=$ windowing function
- $e^{\dfrac{-j2\pi n k}{N}}=$ is a pure tone whose frequency is given by $\dfrac{k}{N}$
	- essentially projects a windowed sample onto pure tone
#### *Windowing*
Windowing is the method in which the segments of the original audio signal are procured.

$$
x_w(k)=x(k)\cdot w(k)
$$
- $x_w(k)=$ windowed signal
- $x(k)=$ original signal
- $w(k)=$ windowing function
Bell shaped windowing functions are most commonly used, typically the **Hann Window**.

$$
w(k)=0.5\cdot(1-\cos(\dfrac{2\pi k}{K-1})),~k=1...K
$$
#### *Windowing Hop Size*
The windowing hop size indicates the gap between each window applied to the original function. Similar to the step size in a sliding window algorithm. Can also be represented as ratios related to the frame size.
#### *Frequency Bins*
In the context of the Fast Fourier Transform (FFT), _frequency bins_ are specific intervals in the frequency domain that correspond to ranges of frequencies captured by the FFT of a signal. Each bin holds the amplitude (and sometimes the phase) of a certain frequency range from the original time-domain signal.
$$
\text{\# of Frequency bins}=\frac{\text{frame size}}{2}+1
$$
**Note:**
- This is what the output of the `np.fft.ffreaq` produces
- These frequency bins hold frequencies 0 and the sampling rate divided by 2, AKA. they Nyquist range
#### *Frames*
$$
\text{\# of Frames}=\frac{\text{samples}-\text{frame size}}{\text{hop size}} + 1
$$
#### *Audio Visualization*
From the STFT shown above, containing complex Fourier coefficients, we can take the square of the STFT magnitude to generate a matrix with the same dimensions, but now with real numbers which can now be plotted in a heat map, AKA, a spectrogram.
$$
Y(m,k)=|S(m,k)|^2
$$


In [6]:
import os
import librosa
import librosa.display
import numpy as np
import matplotlib.pyplot as plt

# Numpy 2.0.0 or less is required by Numba
print(np.__version__)

2.0.2


In [7]:
# Define source and destination paths
input_dir = '../data/preprocessed/not-gunshot'
output_dir = '../data/processed/not-gunshot'
file_prefix = 'not_gunshot_'

In [8]:
# Set sample rate
SAMPLE_RATE = 16000

# Set frame size (samples)
FRAME_SIZE = 2048

# Set hop size (samples)
HOP_SIZE = 128

# Sliding window segment length (2 seconds)
SEGMENT_LENGTH = SAMPLE_RATE * 2

# Segment hop distance (50% hop distance)
SEGMENT_HOP = SEGMENT_LENGTH // 2

In [9]:
def spectrogram_generator(src = '', dst = '', file_num = 0):
    # Pull audio data and sample rate from audio file
    data, _ = librosa.load(src, sr = SAMPLE_RATE)

    # Compute Short-Time Fourier Transforms
    stft = librosa.core.stft(data, n_fft = FRAME_SIZE, hop_length = HOP_SIZE)

    # Take the real part of the stft
    stft = np.abs(stft) ** 2

    # Compute the log-amplitude of the frequency spectrum
    stft_db = librosa.amplitude_to_db(stft)

    # Plot spectrogram and save as an image
    plt.figure(figsize=(10, 4))
    plt.axis('off')
    librosa.display.specshow(
        stft_db,
        sr = SAMPLE_RATE,
        hop_length = HOP_SIZE,
        x_axis = 'time',
        y_axis = 'log',
        cmap = 'magma'
    )

    # Create output file path
    output_path = os.path.join(dst, f"{file_prefix}{file_num}.png")
    plt.savefig(output_path, bbox_inches='tight', pad_inches=0, transparent=True)
    plt.close()

    print(f"Generated spectrogram for {src} and saved to {output_path}")

In [10]:
# Process all .wav files in the input directory
count = len(os.listdir(output_dir))
for file in os.listdir(input_dir):
    if file.endswith(".wav"):
        file_path = os.path.join(input_dir, file)
        spectrogram_generator(file_path, output_dir, file_num = count)
        count += 1

Generated spectrogram for ../data/preprocessed/not-gunshot/not_gunshot_0_1.wav and saved to ../data/processed/not-gunshot/not_gunshot_0.png
Generated spectrogram for ../data/preprocessed/not-gunshot/not_gunshot_0_0.wav and saved to ../data/processed/not-gunshot/not_gunshot_1.png
Generated spectrogram for ../data/preprocessed/not-gunshot/not_gunshot_0_2.wav and saved to ../data/processed/not-gunshot/not_gunshot_2.png
