# Creating a univariate time series out of underwater microphones

This notebook creates the `data/Whales.txt.gz` dataset, a univariate time series with the magnitude (in Decibels) of an underwater microphone signal in the frequency range between 360 and 370 Hz, where some whale vocalizations occur (see the notebook `Whale.ipynb` in this same repository).

The original data is obtained from the [NOAA](https://sanctuaries.noaa.gov/science/monitoring/sound/) project using the following command (which does not work behind a firewall):

    gsutil -m cp "gs://noaa-passive-bioacoustic/pifsc/CrossSM/CSM02/audio/Cross_A_02_06*.d20.x.flac" .

The procedure is the following:

- The original signal (in the time domain) is divided in (overlapping) windows of 2048 samples each. The offset between window starts is 64 samples.
- For each window we compute the Fourier Transform
- The amplitudes in the frequence range 360-370Hz are summed, the others discarded
- The resulting univariate time series is transformed in Decibels, using the maximum amplitude as reference point.

The resulting time series has 308941605 values, spanning around 22 days worth of data.

In [13]:
import librosa
import librosa.display
import soundfile
import numpy as np
import matplotlib.pyplot as plt
import glob
import stumpy
from tqdm.notebook import tqdm
import gzip

In [2]:
n_fft = 2048
win_length = n_fft
hop_length = win_length // 32
sr = 10000

In [3]:
mpw = librosa.time_to_frames(1, n_fft=n_fft, hop_length=hop_length, sr=sr)

In [4]:
freqs = librosa.fft_frequencies(n_fft=n_fft, sr=sr)
freqs_focus = np.where((freqs >= 360) & (freqs <= 370))
freqs_focus, freqs[freqs_focus]

((array([74, 75]),), array([361.328125 , 366.2109375]))

In [5]:
def iter_all(pat):
    files = sorted(glob.glob(pat))
    for f in tqdm(files):
        yield soundfile.read(f)

In [6]:
amplitude = np.concatenate([
    np.abs(librosa.stft(signal, n_fft=n_fft, win_length=win_length, hop_length=hop_length)[freqs_focus].sum(axis=0))
    for signal, sr in iter_all('data/Cross_A_02_06*.flac')
])

  0%|          | 0/884 [00:00<?, ?it/s]

In [10]:
dbs = librosa.amplitude_to_db(amplitude, ref=np.max)

In [15]:
with gzip.open('data/Whales.txt.gz', 'wb') as fp:
    np.savetxt(fp, dbs)

In [20]:
librosa.frames_to_time(dbs.shape[0], hop_length=hop_length, n_fft=n_fft, sr=sr) / (60 * 60 * 24)

22.88456451851852