## Drum Kit Sounds: Audio Classification

Dataset Source: https://www.kaggle.com/datasets/anubhavchhabra/drum-kit-sound-samples

#### Install Missing Libraries

In [19]:
%pip install datasets transformers librosa

Note: you may need to restart the kernel to use updated packages.


#### Import Necessary Libraries

In [20]:
import os, sys, random
os.environ['TOKENIZERS_PARALLELISM']='false'

import numpy as np

import datasets
from datasets import load_dataset, Audio, DatasetDict

import transformers
from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
from transformers import TrainingArguments, Trainer

import evaluate

from IPython.display import Audio, display

#### Display Library Versions

In [21]:
print("Python:".rjust(15), sys.version[0:6])
print("NumPy:".rjust(15), np.__version__)
print("Datasets:".rjust(15), datasets.__version__)
print("Transformers:".rjust(15), transformers.__version__)
print("Evaluate:".rjust(15), evaluate.__version__)

        Python: 3.9.12
         NumPy: 1.23.3
      Datasets: 2.8.0
  Transformers: 4.25.1
      Evaluate: 0.2.2


#### Ingest Dataset

In [22]:
audio_data = load_dataset("/Users/briandunn/Desktop/Vit_Image_Datasets/Audio Data/Drum Kit Sound Samples/drums", 
                          name="en-US", 
                          split="train")

print(len(audio_data))

A Jupyter Widget

Using custom data configuration drums-5275f1687a56f823
Found cached dataset audiofolder (/Users/briandunn/.cache/huggingface/datasets/audiofolder/drums-5275f1687a56f823/0.0.0/6cbdd16f8688354c63b4e2a36e1585d05de285023ee6443ffd71c4182055c0fc)


160


#### Split Dataset into Training & Testing Datasets

In [23]:
# Train Test split dataset
audio_data_split = audio_data.train_test_split(test_size=0.20)

ds = DatasetDict({
    'train' : audio_data_split['train'],
    'eval' : audio_data_split['test']
})

#### Information about Training & Testing Datasets

In [24]:
print("Training Dataset")
print("Training Dataset Info: ", ds['train'])
print("First Sample in Training Dataset", ds['train'][0])
print("Last Sample in Training Dataset", ds['train'][-1])
print("Unique Values in Label/Class: ", ds['train'].unique("label"))

print("\n\nEvaluation Dataset")
print("Evaluation Dataset Info: ", ds['eval'])
print("First Sample in Evaluation Dataset", ds['eval'][0])
print("Last Sample in Evaluation Dataset", ds['eval'][-1])
print("Unique Values in Label/Class: ", ds['eval'].unique("label"))

Training Dataset
Training Dataset Info:  Dataset({
    features: ['audio', 'label'],
    num_rows: 128
})
First Sample in Training Dataset {'audio': {'path': '/Users/briandunn/Desktop/Vit_Image_Datasets/Audio Data/Drum Kit Sound Samples/drums/toms/Tom Sample 25.wav', 'array': array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 1.9073486e-06,
       1.6093254e-06, 1.4305115e-06], dtype=float32), 'sampling_rate': 44100}, 'label': 3}
Last Sample in Training Dataset {'audio': {'path': '/Users/briandunn/Desktop/Vit_Image_Datasets/Audio Data/Drum Kit Sound Samples/drums/snare/Snare Sample 15.wav', 'array': array([ 4.7063828e-04, -1.5287995e-03, -2.3927331e-02, ...,
       -1.9967556e-05, -1.8537045e-05, -1.6987324e-05], dtype=float32), 'sampling_rate': 44100}, 'label': 2}


A Jupyter Widget

Unique Values in Label/Class:  [3, 2, 1, 0]


Evaluation Dataset
Evaluation Dataset Info:  Dataset({
    features: ['audio', 'label'],
    num_rows: 32
})
First Sample in Evaluation Dataset {'audio': {'path': '/Users/briandunn/Desktop/Vit_Image_Datasets/Audio Data/Drum Kit Sound Samples/drums/toms/Tom Sample 33.wav', 'array': array([ 0.        ,  0.        ,  0.        , ..., -0.00013328,
       -0.00015301, -0.00014514], dtype=float32), 'sampling_rate': 44100}, 'label': 3}
Last Sample in Evaluation Dataset {'audio': {'path': '/Users/briandunn/Desktop/Vit_Image_Datasets/Audio Data/Drum Kit Sound Samples/drums/toms/Tom Sample 34.wav', 'array': array([ 0.000000e+00,  0.000000e+00,  0.000000e+00, ...,  2.014637e-05,
        1.013279e-06, -1.758337e-05], dtype=float32), 'sampling_rate': 44100}, 'label': 3}


A Jupyter Widget

Unique Values in Label/Class:  [3, 1, 0, 2]


#### Create Dictionaries to Convert Labels Between Strings & Integers

In [25]:
labels = ds["train"].features["label"].names

print(labels)

label2id, id2label = dict(), dict()

for i, label in enumerate(labels):
    label2id[label] = str(i)
    id2label[str(i)] = label

['kick', 'overheads', 'snare', 'toms']


#### Display Some Examples with Ability to Listen to Them

In [26]:
for _ in range(5):
    rand_idx = random.randint(0, len(ds["train"])-1)
    example = ds["train"][rand_idx]
    audio = example["audio"]
    
    print(f'Label: {id2label[str(example["label"])]}')
    print(f'Shape: {audio["array"].shape}, sampling rate: {audio["sampling_rate"]}')
    display(Audio(audio["array"], rate=audio["sampling_rate"]))
    print()

Label: toms
Shape: (88200,), sampling rate: 44100



Label: toms
Shape: (88200,), sampling rate: 44100



Label: toms
Shape: (88200,), sampling rate: 44100



Label: snare
Shape: (88200,), sampling rate: 44100



Label: kick
Shape: (88200,), sampling rate: 44100





#### Remember to Install git lfs & Enter HuggingFace Access Token

In [27]:
!git lfs install

# HuggingFace Access Token ...

Git LFS initialized.


#### Basic Values/Constants

In [28]:
MODEL_CKPT = "facebook/wav2vec2-base"
MODEL_NAME = MODEL_CKPT.split("/")[-1] + "-Drum_Kit_Sounds"

NUM_OF_EPOCHS = 12
LEARNING_RATE = 3e-5

BATCH_SIZE = 32
STRATEGY = "epoch"

#### Set Sample Rate

In [29]:
sampling_rate = ds["train"].features["audio"].sampling_rate
sampling_rate

#### Instantiate Instance of Feature Extractor

In [30]:
feature_extractor = AutoFeatureExtractor.from_pretrained(MODEL_CKPT)

loading configuration file preprocessor_config.json from cache at /Users/briandunn/.cache/huggingface/hub/models--facebook--wav2vec2-base/snapshots/0b5b8e868dd84f03fd87d01f9c4ff0f080fecfe8/preprocessor_config.json
loading configuration file config.json from cache at /Users/briandunn/.cache/huggingface/hub/models--facebook--wav2vec2-base/snapshots/0b5b8e868dd84f03fd87d01f9c4ff0f080fecfe8/config.json
Model config Wav2Vec2Config {
  "_name_or_path": "facebook/wav2vec2-base",
  "activation_dropout": 0.0,
  "adapter_kernel_size": 3,
  "adapter_stride": 2,
  "add_adapter": false,
  "apply_spec_augment": true,
  "architectures": [
    "Wav2Vec2ForPreTraining"
  ],
  "attention_dropout": 0.1,
  "bos_token_id": 1,
  "classifier_proj_size": 256,
  "codevector_dim": 256,
  "contrastive_logits_temperature": 0.1,
  "conv_bias": false,
  "conv_dim": [
    512,
    512,
    512,
    512,
    512,
    512,
    512
  ],
  "conv_kernel": [
    10,
    3,
    3,
    3,
    3,
    2,
    2
  ],
  "conv_st

#### Define function to Preprocess Data

In [31]:
def preprocess_function(examples):
    '''
    This function prepares the dataset for the transformer
    by applying the feature extractor to it (among other 
    processes).
    '''
    max_duration = 1.0 # seconds
    audio_arrays = [x["array"] for x in examples["audio"]]
    inputs = feature_extractor(audio_arrays, 
                               sampling_rate=feature_extractor.sampling_rate, 
                               max_length=int(feature_extractor.sampling_rate * max_duration),
                               truncation=True)
    return inputs

encoded_audio = ds.map(preprocess_function, remove_columns="audio", batched=True)

A Jupyter Widget

A Jupyter Widget

#### Define Metrics Evaluation Function 

In [32]:
def compute_metrics(p):
    accuracy_metric = evaluate.load("accuracy")
    accuracy = accuracy_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)['accuracy']
    
    ### ------------------- F1 scores -------------------
    
    f1_score_metric = evaluate.load("f1")
    weighted_f1_score = f1_score_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='weighted')["f1"]
    micro_f1_score = f1_score_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='micro')['f1']
    macro_f1_score = f1_score_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='macro')["f1"]
    
    ### ------------------- recall -------------------
    
    recall_metric = evaluate.load("recall")
    weighted_recall = recall_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='weighted')["recall"]
    micro_recall = recall_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='micro')["recall"]
    macro_recall = recall_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='macro')["recall"]
    
    ### ------------------- precision -------------------
    
    precision_metric = evaluate.load("precision")
    weighted_precision = precision_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='weighted')["precision"]
    micro_precision = precision_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='micro')["precision"]
    macro_precision = precision_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='macro')["precision"]
    
    return {"accuracy" : accuracy, 
            "Weighted F1" : weighted_f1_score,
            "Micro F1" : micro_f1_score,
            "Macro F1" : macro_f1_score,
            "Weighted Recall" : weighted_recall,
            "Micro Recall" : micro_recall,
            "Macro Recall" : macro_recall,
            "Weighted Precision" : weighted_precision,
            "Micro Precision" : micro_precision,
            "Macro Precision" : macro_precision
            }

#### Instantiate Model

In [33]:
num_of_labels = len(id2label)

model = AutoModelForAudioClassification.from_pretrained(MODEL_CKPT, 
                                                        num_labels=num_of_labels, 
                                                        label2id=label2id,
                                                        id2label= id2label)

loading configuration file config.json from cache at /Users/briandunn/.cache/huggingface/hub/models--facebook--wav2vec2-base/snapshots/0b5b8e868dd84f03fd87d01f9c4ff0f080fecfe8/config.json
Model config Wav2Vec2Config {
  "_name_or_path": "facebook/wav2vec2-base",
  "activation_dropout": 0.0,
  "adapter_kernel_size": 3,
  "adapter_stride": 2,
  "add_adapter": false,
  "apply_spec_augment": true,
  "architectures": [
    "Wav2Vec2ForPreTraining"
  ],
  "attention_dropout": 0.1,
  "bos_token_id": 1,
  "classifier_proj_size": 256,
  "codevector_dim": 256,
  "contrastive_logits_temperature": 0.1,
  "conv_bias": false,
  "conv_dim": [
    512,
    512,
    512,
    512,
    512,
    512,
    512
  ],
  "conv_kernel": [
    10,
    3,
    3,
    3,
    3,
    2,
    2
  ],
  "conv_stride": [
    5,
    2,
    2,
    2,
    2,
    2,
    2
  ],
  "ctc_loss_reduction": "sum",
  "ctc_zero_infinity": false,
  "diversity_loss_weight": 0.1,
  "do_stable_layer_norm": false,
  "eos_token_id": 2,
  "fe

#### Define Training Arguments

In [34]:
args = TrainingArguments(
    output_dir=MODEL_NAME,
    evaluation_strategy=STRATEGY,
    num_train_epochs=NUM_OF_EPOCHS,
    save_strategy=STRATEGY,
    logging_strategy=STRATEGY,
    learning_rate=LEARNING_RATE,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=BATCH_SIZE,
    warmup_ratio=0.10,
    load_best_model_at_end=True,
    logging_first_step=True,
    hub_private_repo=True,
    push_to_hub=True
)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


#### Define Trainer

In [35]:
trainer = Trainer(
    model = model,
    args = args,
    train_dataset = encoded_audio["train"],
    eval_dataset = encoded_audio["eval"],
    tokenizer = feature_extractor,
    compute_metrics = compute_metrics
)

/Users/briandunn/Documents/nlpnn/Audio Projects/wav2vec2-base-Drum_Kit_Sounds is already a clone of https://huggingface.co/DunnBC22/wav2vec2-base-Drum_Kit_Sounds. Make sure you pull the latest changes with `repo.git_pull()`.


#### Train Model

In [36]:
trainer.train()

***** Running training *****
  Num examples = 128
  Num Epochs = 12
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 48
  Number of trainable parameters = 94569604


A Jupyter Widget



{'loss': 1.3588, 'learning_rate': 6e-06, 'epoch': 0.25}


***** Running Evaluation *****
  Num examples = 32
  Batch size = 32


{'loss': 1.3743, 'learning_rate': 2.4e-05, 'epoch': 1.0}


A Jupyter Widget

Saving model checkpoint to wav2vec2-base-Drum_Kit_Sounds/checkpoint-4
Configuration saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-4/config.json


{'eval_loss': 1.3632267713546753, 'eval_accuracy': 0.5625, 'eval_Weighted F1': 0.5801282051282051, 'eval_Micro F1': 0.5625, 'eval_Macro F1': 0.5677655677655677, 'eval_Weighted Recall': 0.5625, 'eval_Micro Recall': 0.5625, 'eval_Macro Recall': 0.5669642857142857, 'eval_Weighted Precision': 0.6785714285714286, 'eval_Micro Precision': 0.5625, 'eval_Macro Precision': 0.6428571428571429, 'eval_runtime': 16.2419, 'eval_samples_per_second': 1.97, 'eval_steps_per_second': 0.062, 'epoch': 1.0}


Model weights saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-4/pytorch_model.bin
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-4/preprocessor_config.json
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 32
  Batch size = 32


{'loss': 1.3074, 'learning_rate': 2.7906976744186048e-05, 'epoch': 2.0}


A Jupyter Widget

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to wav2vec2-base-Drum_Kit_Sounds/checkpoint-8
Configuration saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-8/config.json


{'eval_loss': 1.3149489164352417, 'eval_accuracy': 0.34375, 'eval_Weighted F1': 0.25674019607843135, 'eval_Micro F1': 0.34375, 'eval_Macro F1': 0.2696078431372549, 'eval_Weighted Recall': 0.34375, 'eval_Micro Recall': 0.34375, 'eval_Macro Recall': 0.375, 'eval_Weighted Precision': 0.30671296296296297, 'eval_Micro Precision': 0.34375, 'eval_Macro Precision': 0.3148148148148148, 'eval_runtime': 15.9651, 'eval_samples_per_second': 2.004, 'eval_steps_per_second': 0.063, 'epoch': 2.0}


Model weights saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-8/pytorch_model.bin
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-8/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 32
  Batch size = 32


{'loss': 1.2393, 'learning_rate': 2.5116279069767445e-05, 'epoch': 3.0}


A Jupyter Widget

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to wav2vec2-base-Drum_Kit_Sounds/checkpoint-12
Configuration saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-12/config.json


{'eval_loss': 1.3121272325515747, 'eval_accuracy': 0.21875, 'eval_Weighted F1': 0.07852564102564102, 'eval_Micro F1': 0.21875, 'eval_Macro F1': 0.08974358974358974, 'eval_Weighted Recall': 0.21875, 'eval_Micro Recall': 0.21875, 'eval_Macro Recall': 0.25, 'eval_Weighted Precision': 0.0478515625, 'eval_Micro Precision': 0.21875, 'eval_Macro Precision': 0.0546875, 'eval_runtime': 15.9761, 'eval_samples_per_second': 2.003, 'eval_steps_per_second': 0.063, 'epoch': 3.0}


Model weights saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-12/pytorch_model.bin
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-12/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 32
  Batch size = 32


{'loss': 1.2317, 'learning_rate': 2.2325581395348837e-05, 'epoch': 4.0}


A Jupyter Widget

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to wav2vec2-base-Drum_Kit_Sounds/checkpoint-16
Configuration saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-16/config.json


{'eval_loss': 1.3111870288848877, 'eval_accuracy': 0.28125, 'eval_Weighted F1': 0.1799924924924925, 'eval_Micro F1': 0.28125, 'eval_Macro F1': 0.2057057057057057, 'eval_Weighted Recall': 0.28125, 'eval_Micro Recall': 0.28125, 'eval_Macro Recall': 0.3214285714285714, 'eval_Weighted Precision': 0.26979166666666665, 'eval_Micro Precision': 0.28125, 'eval_Macro Precision': 0.30833333333333335, 'eval_runtime': 14.7309, 'eval_samples_per_second': 2.172, 'eval_steps_per_second': 0.068, 'epoch': 4.0}


Model weights saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-16/pytorch_model.bin
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-16/preprocessor_config.json
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 32
  Batch size = 32


{'loss': 1.2107, 'learning_rate': 1.9534883720930235e-05, 'epoch': 5.0}


A Jupyter Widget

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to wav2vec2-base-Drum_Kit_Sounds/checkpoint-20
Configuration saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-20/config.json


{'eval_loss': 1.2604163885116577, 'eval_accuracy': 0.4375, 'eval_Weighted F1': 0.30295698924731185, 'eval_Micro F1': 0.4375, 'eval_Macro F1': 0.34623655913978496, 'eval_Weighted Recall': 0.4375, 'eval_Micro Recall': 0.4375, 'eval_Macro Recall': 0.5, 'eval_Weighted Precision': 0.25520833333333337, 'eval_Micro Precision': 0.4375, 'eval_Macro Precision': 0.2916666666666667, 'eval_runtime': 15.949, 'eval_samples_per_second': 2.006, 'eval_steps_per_second': 0.063, 'epoch': 5.0}


Model weights saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-20/pytorch_model.bin
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-20/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 32
  Batch size = 32


{'loss': 1.1663, 'learning_rate': 1.674418604651163e-05, 'epoch': 6.0}


A Jupyter Widget

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to wav2vec2-base-Drum_Kit_Sounds/checkpoint-24
Configuration saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-24/config.json


{'eval_loss': 1.2112082242965698, 'eval_accuracy': 0.46875, 'eval_Weighted F1': 0.3895833333333333, 'eval_Micro F1': 0.46875, 'eval_Macro F1': 0.430952380952381, 'eval_Weighted Recall': 0.46875, 'eval_Micro Recall': 0.46875, 'eval_Macro Recall': 0.5267857142857143, 'eval_Weighted Precision': 0.5040760869565217, 'eval_Micro Precision': 0.46875, 'eval_Macro Precision': 0.5403726708074534, 'eval_runtime': 15.9874, 'eval_samples_per_second': 2.002, 'eval_steps_per_second': 0.063, 'epoch': 6.0}


Model weights saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-24/pytorch_model.bin
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-24/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 32
  Batch size = 32


{'loss': 1.1247, 'learning_rate': 1.3953488372093024e-05, 'epoch': 7.0}


A Jupyter Widget

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to wav2vec2-base-Drum_Kit_Sounds/checkpoint-28
Configuration saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-28/config.json


{'eval_loss': 1.174644947052002, 'eval_accuracy': 0.59375, 'eval_Weighted F1': 0.5142628205128206, 'eval_Micro F1': 0.59375, 'eval_Macro F1': 0.5602564102564103, 'eval_Weighted Recall': 0.59375, 'eval_Micro Recall': 0.59375, 'eval_Macro Recall': 0.65625, 'eval_Weighted Precision': 0.5219983552631579, 'eval_Micro Precision': 0.59375, 'eval_Macro Precision': 0.5608552631578947, 'eval_runtime': 14.6822, 'eval_samples_per_second': 2.18, 'eval_steps_per_second': 0.068, 'epoch': 7.0}


Model weights saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-28/pytorch_model.bin
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-28/preprocessor_config.json
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 32
  Batch size = 32


{'loss': 1.0856, 'learning_rate': 1.1162790697674418e-05, 'epoch': 8.0}


A Jupyter Widget

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to wav2vec2-base-Drum_Kit_Sounds/checkpoint-32
Configuration saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-32/config.json


{'eval_loss': 1.1434305906295776, 'eval_accuracy': 0.59375, 'eval_Weighted F1': 0.5142628205128206, 'eval_Micro F1': 0.59375, 'eval_Macro F1': 0.5602564102564103, 'eval_Weighted Recall': 0.59375, 'eval_Micro Recall': 0.59375, 'eval_Macro Recall': 0.65625, 'eval_Weighted Precision': 0.5219983552631579, 'eval_Micro Precision': 0.59375, 'eval_Macro Precision': 0.5608552631578947, 'eval_runtime': 15.9557, 'eval_samples_per_second': 2.006, 'eval_steps_per_second': 0.063, 'epoch': 8.0}


Model weights saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-32/pytorch_model.bin
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-32/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 32
  Batch size = 32


{'loss': 1.0601, 'learning_rate': 8.372093023255815e-06, 'epoch': 9.0}


A Jupyter Widget

Saving model checkpoint to wav2vec2-base-Drum_Kit_Sounds/checkpoint-36
Configuration saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-36/config.json


{'eval_loss': 1.141687273979187, 'eval_accuracy': 0.65625, 'eval_Weighted F1': 0.6028747294372294, 'eval_Micro F1': 0.65625, 'eval_Macro F1': 0.6389069264069265, 'eval_Weighted Recall': 0.65625, 'eval_Micro Recall': 0.65625, 'eval_Macro Recall': 0.7125, 'eval_Weighted Precision': 0.8439797794117647, 'eval_Micro Precision': 0.65625, 'eval_Macro Precision': 0.8216911764705882, 'eval_runtime': 16.6552, 'eval_samples_per_second': 1.921, 'eval_steps_per_second': 0.06, 'epoch': 9.0}


Model weights saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-36/pytorch_model.bin
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-36/preprocessor_config.json
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 32
  Batch size = 32


{'loss': 1.0375, 'learning_rate': 5.581395348837209e-06, 'epoch': 10.0}


A Jupyter Widget

Saving model checkpoint to wav2vec2-base-Drum_Kit_Sounds/checkpoint-40
Configuration saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-40/config.json


{'eval_loss': 1.1227267980575562, 'eval_accuracy': 0.6875, 'eval_Weighted F1': 0.6581521739130435, 'eval_Micro F1': 0.6875, 'eval_Macro F1': 0.6831262939958592, 'eval_Weighted Recall': 0.6875, 'eval_Micro Recall': 0.6875, 'eval_Macro Recall': 0.7330357142857142, 'eval_Weighted Precision': 0.845703125, 'eval_Micro Precision': 0.6875, 'eval_Macro Precision': 0.8236607142857143, 'eval_runtime': 15.2713, 'eval_samples_per_second': 2.095, 'eval_steps_per_second': 0.065, 'epoch': 10.0}


Model weights saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-40/pytorch_model.bin
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-40/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 32
  Batch size = 32


{'loss': 1.0168, 'learning_rate': 2.7906976744186046e-06, 'epoch': 11.0}


A Jupyter Widget

Saving model checkpoint to wav2vec2-base-Drum_Kit_Sounds/checkpoint-44
Configuration saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-44/config.json


{'eval_loss': 1.106468677520752, 'eval_accuracy': 0.78125, 'eval_Weighted F1': 0.7691964285714286, 'eval_Micro F1': 0.78125, 'eval_Macro F1': 0.7845238095238096, 'eval_Weighted Recall': 0.78125, 'eval_Micro Recall': 0.78125, 'eval_Macro Recall': 0.81875, 'eval_Weighted Precision': 0.8716947115384616, 'eval_Micro Precision': 0.78125, 'eval_Macro Precision': 0.8533653846153846, 'eval_runtime': 14.8587, 'eval_samples_per_second': 2.154, 'eval_steps_per_second': 0.067, 'epoch': 11.0}


Model weights saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-44/pytorch_model.bin
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-44/preprocessor_config.json
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 32
  Batch size = 32


{'loss': 1.0093, 'learning_rate': 0.0, 'epoch': 12.0}


A Jupyter Widget

Saving model checkpoint to wav2vec2-base-Drum_Kit_Sounds/checkpoint-48
Configuration saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-48/config.json


{'eval_loss': 1.0886529684066772, 'eval_accuracy': 0.78125, 'eval_Weighted F1': 0.7691964285714286, 'eval_Micro F1': 0.78125, 'eval_Macro F1': 0.7845238095238096, 'eval_Weighted Recall': 0.78125, 'eval_Micro Recall': 0.78125, 'eval_Macro Recall': 0.81875, 'eval_Weighted Precision': 0.8716947115384616, 'eval_Micro Precision': 0.78125, 'eval_Macro Precision': 0.8533653846153846, 'eval_runtime': 16.2005, 'eval_samples_per_second': 1.975, 'eval_steps_per_second': 0.062, 'epoch': 12.0}


Model weights saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-48/pytorch_model.bin
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/checkpoint-48/preprocessor_config.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from wav2vec2-base-Drum_Kit_Sounds/checkpoint-48 (score: 1.0886529684066772).


{'train_runtime': 2710.1217, 'train_samples_per_second': 0.567, 'train_steps_per_second': 0.018, 'train_loss': 1.1549896597862244, 'epoch': 12.0}


TrainOutput(global_step=48, training_loss=1.1549896597862244, metrics={'train_runtime': 2710.1217, 'train_samples_per_second': 0.567, 'train_steps_per_second': 0.018, 'train_loss': 1.1549896597862244, 'epoch': 12.0})

#### Evaluate Model

In [37]:
trainer.evaluate()

***** Running Evaluation *****
  Num examples = 32
  Batch size = 32


A Jupyter Widget

{'eval_loss': 1.0886529684066772,
 'eval_accuracy': 0.78125,
 'eval_Weighted F1': 0.7691964285714286,
 'eval_Micro F1': 0.78125,
 'eval_Macro F1': 0.7845238095238096,
 'eval_Weighted Recall': 0.78125,
 'eval_Micro Recall': 0.78125,
 'eval_Macro Recall': 0.81875,
 'eval_Weighted Precision': 0.8716947115384616,
 'eval_Micro Precision': 0.78125,
 'eval_Macro Precision': 0.8533653846153846,
 'eval_runtime': 16.1801,
 'eval_samples_per_second': 1.978,
 'eval_steps_per_second': 0.062,
 'epoch': 12.0}

#### Push Model to Hub (My Profile!!!)

In [38]:
trainer.push_to_hub()

Saving model checkpoint to wav2vec2-base-Drum_Kit_Sounds
Configuration saved in wav2vec2-base-Drum_Kit_Sounds/config.json
Model weights saved in wav2vec2-base-Drum_Kit_Sounds/pytorch_model.bin
Feature extractor saved in wav2vec2-base-Drum_Kit_Sounds/preprocessor_config.json
Several commits (2) will be pushed upstream.
The progress bars may be unreliable.


A Jupyter Widget

A Jupyter Widget

A Jupyter Widget

remote: Scanning LFS files for validity, may be slow...        
remote: LFS file scan complete.        
To https://huggingface.co/DunnBC22/wav2vec2-base-Drum_Kit_Sounds
   32a9dd7..c37fc6e  main -> main

Dropping the following result as it does not have all the necessary fields:
{'dataset': {'name': 'audiofolder', 'type': 'audiofolder', 'config': 'drums', 'split': 'train', 'args': 'drums'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.78125}]}
To https://huggingface.co/DunnBC22/wav2vec2-base-Drum_Kit_Sounds
   c37fc6e..60efa5c  main -> main



'https://huggingface.co/DunnBC22/wav2vec2-base-Drum_Kit_Sounds/commit/c37fc6e2112e1e7564f8a0ce66024e60a4ddff94'

### Notes & Other Takeaways
****
- I was not expecting this project to yield metrics this good. That said, it still is not up to where I think it should be to go further with it. 
- Including more samples would definitely improve this model.
****