# Getting Spectrogram of the recorded audio

In [1]:
!pip install pydub

Collecting pydub
  Downloading https://files.pythonhosted.org/packages/2f/73/bb9c093882d647437a9e6e87c7e6592d2df852f83ffac6f348b878979be0/pydub-0.23.0-py2.py3-none-any.whl
Installing collected packages: pydub
Successfully installed pydub-0.23.0


In [2]:
import matplotlib.pyplot as plt
from scipy.io import wavfile
from pydub import AudioSegment



## Spectrogram

### What is Spectrogram?? 
A visual representation of the spectrum of frequencies of sound or other signal as they vary with time. When the data is represented in a 3D plot they may be called waterfalls. The graph below is an example, the x-axis represents time and the y-axis is the frequency.

<img src="images/Spectrogram-19thC.png">
(Source : Wikipedia)

### Why do we need it?
Spectrograms of audio, as in our case, can be used to identify spoken words phonetically, and to analyse the various changes in frequency when the Trigger word is spoken in the presence of other background frequencies. Our recorded signal (.wav) file is a time signal and the Spectrogram is a time-frequency graph. We convert the time signal to the time-frequency signal by using Fourier Transform. Fourier Transform takes the time signal and decomposes into the frequency frames, using complex mathematics. A Digital Signal Processing course has a very profound analysis on the Transformations and their Inverse Transformations to get back the time signal.

But if you haven't taken the course, worry not, Python has inbuilt libraries to handle the conversion on its own!



In [1]:
# Calculate and plot spectrogram for a wav audio file
def graph_spectrogram(wav_file):
    rate, data = get_wav_info(wav_file)
    nfft = 200 # Length of each window segment
    fs = 8000 # Sampling frequencies
    noverlap = 120 # Overlap between windows
    nchannels = data.ndim
    if nchannels == 1:
        pxx, freqs, bins, im = plt.specgram(data, nfft, fs, noverlap = noverlap)
    elif nchannels == 2:
        pxx, freqs, bins, im = plt.specgram(data[:,0], nfft, fs, noverlap = noverlap)
    return pxx

In [2]:
# Load a wav file
def get_wav_info(wav_file):
    rate, data = wavfile.read(wav_file)
    return rate, data

In [3]:
# Used to standardize volume of audio clip
def match_target_amplitude(sound, target_dBFS):
    change_in_dBFS = target_dBFS - sound.dBFS
    return sound.apply_gain(change_in_dBFS)

In [4]:
# Load raw audio files for speech synthesis
def load_raw_audio():
    activates = []
    backgrounds = []
    negatives = []
    for filename in os.listdir("./raw_data/activates"):
        if filename.endswith("wav"):
            activate = AudioSegment.from_wav("./raw_data/activates/"+filename)
            activates.append(activate)
    for filename in os.listdir("./raw_data/backgrounds"):
        if filename.endswith("wav"):
            background = AudioSegment.from_wav("./raw_data/backgrounds/"+filename)
            backgrounds.append(background)
    for filename in os.listdir("./raw_data/negatives"):
        if filename.endswith("wav"):
            negative = AudioSegment.from_wav("./raw_data/negatives/"+filename)
            negatives.append(negative)
    return activates, negatives, backgrounds