<a href="https://colab.research.google.com/github/bagustris/SERAB/blob/main/demo_serab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SERAB demo
* This demo notebook decomposes `clf_benchmark.py`, which evaluate a given pre-trained model against SERAB to show the main steps behind the script.



In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
!git clone https://github.com/Neclow/SERAB.git

Cloning into 'SERAB'...
remote: Enumerating objects: 139, done.[K
remote: Counting objects: 100% (30/30), done.[K
remote: Compressing objects: 100% (25/25), done.[K
remote: Total 139 (delta 10), reused 17 (delta 5), pack-reused 109[K
Receiving objects: 100% (139/139), 95.35 MiB | 16.83 MiB/s, done.
Resolving deltas: 100% (48/48), done.


In [4]:
!git clone https://github.com/Neclow/SERAB.git
!cd SERAB/SERAB/ && curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/config.yaml && patch --ignore-whitespace < config.diff
!cd SERAB/SERAB/ && curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/train.py && patch < train.diff
!cd SERAB/SERAB/byol_a/ && curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/byol_a/augmentations.py && patch < augmentations.diff
!cd SERAB/SERAB/byol_a/ &&curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/byol_a/common.py && patch < common.diff
!cd SERAB/SERAB/byol_a/ &&curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/byol_a/dataset.py && patch < dataset.diff
!cd SERAB/SERAB/byol_a/ && curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/byol_a/models.py && mv models.py models/audio_ntt.py

fatal: destination path 'SERAB' already exists and is not an empty directory.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   564  100   564    0     0   1611      0 --:--:-- --:--:-- --:--:--  1616
patching file config.yaml
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4867  100  4867    0     0  13633      0 --:--:-- --:--:-- --:--:-- 13594
patching file train.py
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8273  100  8273    0     0  22980      0 --:--:-- --:--:-- --:--:-- 23044
patching file augmentations.py
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   To

## Install libraries

In [5]:
!apt-get install libsox-fmt-all libsox-dev sox > /dev/null
!pip install -q opensmile

!pip install -q pytorch-lightning==1.4.0 torchmetrics==0.6.0
!pip install -q torch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 torchtext==0.10.0
!pip install -q einops==0.4.0
!pip install -q pydub==0.25.1
!pip install -q tensorflow_datasets==4.3.0
!pip install -q librosa==0.8.1

[K     |████████████████████████████████| 4.5 MB 36.6 MB/s 
[K     |████████████████████████████████| 60 kB 8.2 MB/s 
[K     |████████████████████████████████| 635 kB 71.5 MB/s 
[K     |████████████████████████████████| 12.2 MB 62.9 MB/s 
[K     |████████████████████████████████| 167 kB 79.8 MB/s 
[?25h  Building wheel for iso-639 (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 913 kB 28.4 MB/s 
[K     |████████████████████████████████| 329 kB 73.8 MB/s 
[K     |████████████████████████████████| 829 kB 64.9 MB/s 
[?25h  Building wheel for future (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 831.4 MB 14 kB/s 
[K     |████████████████████████████████| 22.1 MB 1.1 MB/s 
[K     |████████████████████████████████| 1.9 MB 74.8 MB/s 
[K     |████████████████████████████████| 7.6 MB 51.6 MB/s 
[K     |████████████████████████████████| 3.9 MB 40.3 MB/s 
[?25h

## Load imports and config variables

In [6]:
SERAB_PATH = 'SERAB/SERAB/'

import sys

sys.path.append(SERAB_PATH)

In [7]:
import collections
import os

import librosa
import numpy as np
import pandas as pd
import tensorflow_datasets as tfds
import tensorflow_hub as hub
import torch

from pytorch_lightning.utilities.seed import seed_everything
from sklearn.model_selection import GridSearchCV, PredefinedSplit
from sklearn.metrics import recall_score
from torchaudio.transforms import MelSpectrogram

from byol_a.common import load_yaml_config
from byol_a.augmentations import PrecomputedNorm
from clf_benchmark import dat_from_split, get_sklearn_models
from settings import CLF_STATS_DICT, RANDOM_SEED, REQUIRED_SAMPLE_RATE
from utils import compute_norm_stats, generate_embeddings, load_model, save_results, speaker_normalization

In [8]:
seed_everything(RANDOM_SEED)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
cfg = load_yaml_config(f'{SERAB_PATH}/config.yaml') 
to_melspec = MelSpectrogram(
    sample_rate=cfg.sample_rate,
    n_fft=cfg.n_fft,
    win_length=cfg.win_length,
    hop_length=cfg.hop_length,
    n_mels=cfg.n_mels,
    f_min=cfg.f_min,
    f_max=cfg.f_max,
)
results = {}

INFO:pytorch_lightning.utilities.seed:Global seed set to 42


## Load data
* If you want to test a model on a dataset other than CREMA-D you will need to download and build the dataset.
    * An example of how to proceed with emoDB is shown below. Uncomment the lines below (`UNCOMMENT CODE HERE`) to build the emodB dataset.
        * You will need to create the subfolders `downloads/manual` under `tensorflow_datasets` and paste the compressed emodB dataset in there.
* For this demo, we will use CREMA-D as already pre-built in TFDS.

In [9]:
# Create tensorflow_datasets folders
!mkdir -p ../root/tensorflow_datasets/
!mkdir -p ../root/tensorflow_datasets/downloads/
!mkdir -p ../root/tensorflow_datasets/downloads/manual

## UNCOMMENT CODE HERE 
## Copy compressed dataset files into download/manual
# !cp -n drive/MyDrive/SERAB/tensorflow_datasets/downloads/manual/*.zip ../root/tensorflow_datasets/downloads/manual/

## Copy the emoDB TFDS scripts and build the emoDB dataset
# !cp -r -n SERAB/tensorflow_datasets/emoDB ../root/tensorflow_datasets/emoDB
# !cd ../root/tensorflow_datasets/emoDB && tfds build emoDB

In [10]:
dataset_name = 'crema_d'

In [11]:
model_selection = 'predefined'

SingleSplit = collections.namedtuple('SingleSplit', ['audio', 'labels', 'speaker_id'])
Data = collections.namedtuple('Data', ['train', 'validation', 'test'])
all_data = Data(
    train=SingleSplit(*dat_from_split(dataset_name, 'train')),
    validation=SingleSplit(*dat_from_split(dataset_name, 'validation')),
    test=SingleSplit(*dat_from_split(dataset_name, 'test'))
)

orig_sr = tfds.builder(dataset_name).info.features['audio'].sample_rate
num_classes = len(np.unique(all_data.train.labels))

Downloading and preparing dataset 579.25 MiB (download: 579.25 MiB, generated: 1.65 GiB, total: 2.21 GiB) to /root/tensorflow_datasets/crema_d/1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/5144 [00:00<?, ? examples/s]

Shuffling crema_d-train.tfrecord...:   0%|          | 0/5144 [00:00<?, ? examples/s]

Generating validation examples...:   0%|          | 0/738 [00:00<?, ? examples/s]

Shuffling crema_d-validation.tfrecord...:   0%|          | 0/738 [00:00<?, ? examples/s]

Generating test examples...:   0%|          | 0/1556 [00:00<?, ? examples/s]

Shuffling crema_d-test.tfrecord...:   0%|          | 0/1556 [00:00<?, ? examples/s]

Dataset crema_d downloaded and prepared to /root/tensorflow_datasets/crema_d/1.0.0. Subsequent calls will reuse this data.
Finished train
Finished validation
Finished test


In [12]:
# Load data statistics
try:
    stats = CLF_STATS_DICT[dataset_name]
except KeyError:
    print(f'Did not find mean/std stats for {dataset_name}.')
    stats = compute_norm_stats(dataset_name, all_data.train.audio, orig_sr, to_melspec)

    CLF_STATS_DICT[dataset_name] = stats

    print(CLF_STATS_DICT)
normalizer = PrecomputedNorm(stats)

## Load the model
* For this demo, we will use "BYOL-S", a re-trained BYOL-A on speech samples of AudioSet with 512 features.
    * More information regarding the models can be found in the pre-print (https://arxiv.org/abs/2110.03414)

In [13]:
model_name = 'default'
ckpt_folder = f'{SERAB_PATH}/checkpoints/'

model, weight_file = load_model(model_name, cfg, device, ckpt_folder)

print(weight_file)

SERAB/SERAB/checkpoints/default1024_BYOLAs64x96-2107292000-e100-bs256-lr0003-rs42.pth


In [14]:
# Generate embeddings
embeddings = Data(
    train=generate_embeddings(
        model,
        model_name,
        all_data.train.audio,
        'train',
        orig_sr,
        to_melspec,
        normalizer,
        device
    ),
    validation=generate_embeddings(
        model,
        model_name,
        all_data.validation.audio,
        'validation',
        orig_sr,
        to_melspec,
        normalizer,
        device
    ),
    test=generate_embeddings(
        model,
        model_name,
        all_data.test.audio,
        'test',
        orig_sr,
        to_melspec,
        normalizer,
        device
    )
)

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Generating embeddings for train: 100%|##########| 5144/5144 [00:17<00:00, 296.36it/s]


Finished train


Generating embeddings for validation: 100%|##########| 738/738 [00:02<00:00, 294.89it/s]


Finished validation


Generating embeddings for test: 100%|##########| 1556/1556 [00:05<00:00, 309.24it/s]

Finished test





In [15]:
print(embeddings.train.mean(), embeddings.train.std())
print(embeddings.validation.mean(), embeddings.validation.std())
print(embeddings.test.mean(), embeddings.test.std())
print(embeddings.train.shape)
print(embeddings.validation.shape)
print(embeddings.test.shape)

# Load classifiers
log_list, estimator_list, param_list = get_sklearn_models()

4.344952 4.3818917
4.3764834 4.4149346
4.289814 4.375481
(5144, 1024)
(738, 1024)
(1556, 1024)


In [16]:
# Speaker normalization
# Can also try with normal standardization (StandardScaler), should not change the results too much
normalized_train = speaker_normalization(embeddings.train, all_data.train.speaker_id)
normalized_validation = speaker_normalization(embeddings.validation, all_data.validation.speaker_id)
normalized_test = speaker_normalization(embeddings.test, all_data.test.speaker_id)

# Aggregate labels and speaker IDs
normalized_train = np.append(normalized_train, normalized_validation, axis=0)
labels_train = np.append(all_data.train.labels, all_data.validation.labels, axis=0)
speaker_id_train = np.append(all_data.train.speaker_id, all_data.validation.speaker_id, axis=0)

In [17]:
for i, (estimator_name, estimator, param_grid) in enumerate(zip(log_list, estimator_list, param_list)):
    print(f'Step {i+1}/{len(estimator_list)}: {estimator_name}...')
    if model_selection == 'predefined':
        split_indices = np.repeat([-1, 0], [embeddings.train.shape[0], embeddings.validation.shape[0]])
        split = PredefinedSplit(split_indices)
        clf = GridSearchCV(
            estimator,
            param_grid,
            cv=split,
            n_jobs=-1,
            refit=True,
            verbose=0
        )

    else:
        clf = estimator

    clf.fit(normalized_train, labels_train)
    test_acc = clf.score(normalized_test, all_data.test.labels)
    test_uar = recall_score(all_data.test.labels, clf.predict(normalized_test), average='macro')

    results[estimator_name] = {
        'test_acc': test_acc,
        'test_uar': test_uar
    }

    print('Done')

results_df = pd.DataFrame(results).apply(lambda x: round(x * 100, 1))

results_df = results_df[results_df.loc['test_acc'].idxmax()].to_frame()
print(results_df)
results_df = results_df.rename(columns={results_df.columns[0]: dataset_name})

Step 1/5: LDA...
Done
Step 2/5: LR...
Done
Step 3/5: QDA...




Done
Step 4/5: RF...
Done
Step 5/5: SVC...
Done
           SVC
test_acc  75.3
test_uar  75.4


In [18]:
results_df 

Unnamed: 0,crema_d
test_acc,75.3
test_uar,75.4


In [19]:
filename = os.path.splitext(os.path.basename(weight_file))[0] if weight_file else model_name
results_folder = f'{SERAB_PATH}/clf_results/'
save_results(filename + '.csv', results_df, results_folder)

File default1024_BYOLAs64x96-2107292000-e100-bs256-lr0003-rs42.csv does not exist yet. Creating a new results file.


In [20]:
pd.read_csv(f'{SERAB_PATH}/clf_results/default1024_BYOLAs64x96-2107292000-e100-bs256-lr0003-rs42.csv', index_col=0)

Unnamed: 0,crema_d
test_acc,75.3
test_uar,75.4


In [21]:
!python --version

Python 3.8.15


In [22]:
import tensorflow as tf
tf.__version__

'2.9.2'

In [1]:
!pip list

Package                       Version
----------------------------- ----------------------
absl-py                       1.3.0
aeppl                         0.0.33
aesara                        2.7.9
aiohttp                       3.8.3
aiosignal                     1.3.1
alabaster                     0.7.12
albumentations                1.2.1
altair                        4.2.0
appdirs                       1.4.4
arviz                         0.12.1
astor                         0.8.1
astropy                       4.3.1
astunparse                    1.6.3
async-timeout                 4.0.2
asynctest                     0.13.0
atari-py                      0.2.9
atomicwrites                  1.4.1
attrs                         22.1.0
audioread                     3.0.0
autograd                      1.5
Babel                         2.11.0
backcall                      0.2.0
beautifulsoup4                4.6.3
bleach                        5.0.1
blis                          0.7.9
bokeh

In [2]:
!pwd

/content


In [5]:
SERAB_PATH

NameError: ignored