## Is This Vinyl Scratched: Audio Classification

Dataset Source: https://www.kaggle.com/datasets/seandaly/detecting-scratch-noise-in-vinyl-playback

#### Install Missing Libraries

In [1]:
%pip install datasets transformers librosa evaluate tensorboard

Note: you may need to restart the kernel to use updated packages.


#### Import Necessary Libraries

In [2]:
import os, sys, random
os.environ['TOKENIZERS_PARALLELISM']='false'

import numpy as np
import pandas as pd

import datasets
from datasets import load_dataset, Audio, DatasetDict, ClassLabel

import transformers
from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
from transformers import TrainingArguments, Trainer

import evaluate

from IPython.display import Audio, display

#### Display Library Versions

In [3]:
print("Python:".rjust(15), sys.version[0:6])
print("NumPy:".rjust(15), np.__version__)
print("Pandas:".rjust(15), pd.__version__)
print("Datasets:".rjust(15), datasets.__version__)
print("Transformers:".rjust(15), transformers.__version__)
print("Evaluate:".rjust(15), evaluate.__version__)

        Python: 3.9.7 
         NumPy: 1.23.3
        Pandas: 1.4.4
      Datasets: 2.8.0
  Transformers: 4.26.0
      Evaluate: 0.2.2


#### Prepare Metadata File

In [4]:
image_file_names = os.listdir("/Users/leedunn/Desktop/Projects to Train/Audio Projects/scratched or not/")

metadata = pd.DataFrame(image_file_names, columns=["file_name"])

metadata['label'] = metadata['file_name'].apply(lambda x: x.split("sect")[-1].split(".wav")[0])
metadata['label'].filter(items=["0", "1"])

metadata.to_csv("/Users/leedunn/Desktop/Projects to Train/Audio Projects/scratched or not/metadata.csv", index=False)
metadata.head()

Unnamed: 0,file_name,label
0,358DSOTM sect0.wav,0
1,398GG sect1.wav,1
2,05DSOTM sect0.wav,0
3,451CR_RTH sect1.wav,1
4,323SW sect0.wav,0


#### Ingest & Preprocess Dataset

In [5]:
audio_data = load_dataset("/Users/leedunn/Desktop/Projects to Train/Audio Projects/scratched or not/", 
                          name="en-US", 
                          split="train")

audio_data = audio_data.filter(lambda example: example["label"] != None)
audio_data = audio_data.cast_column("label", ClassLabel(names=["0", "1"]))

print(len(audio_data))
print(audio_data)

Resolving data files:   0%|          | 0/3430 [00:00<?, ?it/s]

Using custom data configuration scratched or not-5a1b77f9dedd940e


Downloading and preparing dataset audiofolder/scratched or not to /Users/leedunn/.cache/huggingface/datasets/audiofolder/scratched or not-5a1b77f9dedd940e/0.0.0/6cbdd16f8688354c63b4e2a36e1585d05de285023ee6443ffd71c4182055c0fc...


Downloading data files:   0%|          | 0/3430 [00:00<?, ?it/s]

Downloading data files: 0it [00:00, ?it/s]

Extracting data files: 0it [00:00, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset audiofolder downloaded and prepared to /Users/leedunn/.cache/huggingface/datasets/audiofolder/scratched or not-5a1b77f9dedd940e/0.0.0/6cbdd16f8688354c63b4e2a36e1585d05de285023ee6443ffd71c4182055c0fc. Subsequent calls will reuse this data.


  0%|          | 0/4 [00:00<?, ?ba/s]

Casting the dataset:   0%|          | 0/4 [00:00<?, ?ba/s]

3428
Dataset({
    features: ['audio', 'label'],
    num_rows: 3428
})


#### Split Dataset into Training & Testing Datasets

In [6]:
audio_data = audio_data.shuffle(seed=42)

audio_data_split = audio_data.train_test_split(test_size=0.20)

ds = DatasetDict({
    'train' : audio_data_split['train'],
    'eval' : audio_data_split['test']
})

#### Some Information About Training & Validation Datasets

In [7]:
print("Training Dataset")
print("Training Dataset Info: ", ds['train'])
print("First Sample in Training Dataset", ds['train'][0])
print("Last Sample in Training Dataset", ds['train'][-1])
print("Unique Values in Label/Class: ", ds['train'].unique("label"))

print("\n\nEvaluation Dataset")
print("Evaluation Dataset Info: ", ds['eval'])
print("First Sample in Evaluation Dataset", ds['eval'][0])
print("Last Sample in Evaluation Dataset", ds['eval'][-1])
print("Unique Values in Label/Class: ", ds['eval'].unique("label"))

Training Dataset
Training Dataset Info:  Dataset({
    features: ['audio', 'label'],
    num_rows: 2742
})
First Sample in Training Dataset {'audio': {'path': '/Users/leedunn/Desktop/Projects to Train/Audio Projects/scratched or not/09CR_RTH sect1.wav', 'array': array([ 0.00808716,  0.01249695,  0.00984192, ..., -0.00941467,
       -0.00822449, -0.0078125 ], dtype=float32), 'sampling_rate': 22050}, 'label': 1}
Last Sample in Training Dataset {'audio': {'path': '/Users/leedunn/Desktop/Projects to Train/Audio Projects/scratched or not/366Rev sect1.wav', 'array': array([0.00476074, 0.00543213, 0.00109863, ..., 0.05975342, 0.04443359,
       0.01904297], dtype=float32), 'sampling_rate': 22050}, 'label': 1}


Flattening the indices:   0%|          | 0/3 [00:00<?, ?ba/s]

Unique Values in Label/Class:  [1, 0]


Evaluation Dataset
Evaluation Dataset Info:  Dataset({
    features: ['audio', 'label'],
    num_rows: 686
})
First Sample in Evaluation Dataset {'audio': {'path': '/Users/leedunn/Desktop/Projects to Train/Audio Projects/scratched or not/464DSOTM sect1.wav', 'array': array([-0.00311279, -0.01051331, -0.02253723, ..., -0.01145935,
       -0.01435852, -0.0118866 ], dtype=float32), 'sampling_rate': 22050}, 'label': 1}
Last Sample in Evaluation Dataset {'audio': {'path': '/Users/leedunn/Desktop/Projects to Train/Audio Projects/scratched or not/233DSOTM sect0.wav', 'array': array([ 0.02046204,  0.01290894, -0.00392151, ..., -0.0690918 ,
       -0.06637573, -0.06207275], dtype=float32), 'sampling_rate': 22050}, 'label': 0}


Flattening the indices:   0%|          | 0/1 [00:00<?, ?ba/s]

Unique Values in Label/Class:  [1, 0]


#### Create Dictionaries to Convert Labels Between Strings & Integers

In [8]:
labels = sorted(set(audio_data["label"]))

print(labels)

label2id, id2label = dict(), dict()

for i, label in enumerate(labels):
    label2id[label] = str(i)
    id2label[str(i)] = label

[0, 1]


#### Display Some Examples with Ability to Listen to Them

In [9]:
for _ in range(5):
    rand_idx = random.randint(0, len(ds["train"])-1)
    example = ds["train"][rand_idx]
    audio = example["audio"]
    
    print(f'Label: {id2label[str(example["label"])]}')
    print(f'Shape: {audio["array"].shape}, sampling rate: {audio["sampling_rate"]}')
    display(Audio(audio["array"], rate=audio["sampling_rate"]))
    print()

Label: 0
Shape: (88200,), sampling rate: 22050



Label: 0
Shape: (88200,), sampling rate: 22050



Label: 0
Shape: (88200,), sampling rate: 22050



Label: 0
Shape: (88200,), sampling rate: 22050



Label: 1
Shape: (88200,), sampling rate: 22050





#### Remember to Install git lfs & Enter HuggingFace Access Token

In [10]:
!git lfs install

# HuggingFace Access Token ...

Git LFS initialized.


#### Basic Values/Constants

In [11]:
MODEL_CKPT = "facebook/wav2vec2-base"
MODEL_NAME = MODEL_CKPT.split("/")[-1] + "-is_vinyl_scratched_or_not"

NUM_OF_EPOCHS = 10
LEARNING_RATE = 3e-5

BATCH_SIZE = 32
STRATEGY = "epoch"

#### Set Sample Rate

In [12]:
sampling_rate = ds["train"].features["audio"].sampling_rate
sampling_rate

#### Instantiate Instance of Feature Extractor

In [13]:
feature_extractor = AutoFeatureExtractor.from_pretrained(MODEL_CKPT)



#### Define function to Preprocess Data

In [14]:
def preprocess_function(examples):
    '''
    This function prepares the dataset for the transformer
    by applying the feature extractor to it (among other 
    processes).
    '''
    max_duration = 5.5 # seconds
    audio_arrays = [x["array"] for x in examples["audio"]]
    inputs = feature_extractor(audio_arrays, 
                               sampling_rate=feature_extractor.sampling_rate, 
                               max_length=int(feature_extractor.sampling_rate * max_duration),
                               truncation=True)
    return inputs

encoded_audio = ds.map(preprocess_function, remove_columns="audio", batched=True)

  0%|          | 0/3 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

#### Define Metrics Evaluation Function 

In [15]:
def compute_metrics(p):
    '''
    This function calculates & returns the following metrics:
    - accuracy
    - f1 score
    - recall
    - precision
    '''
    import evaluate
    
    accuracy_metric = evaluate.load("accuracy")
    accuracy = accuracy_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)['accuracy']
    
    ### ------------------- F1 scores -------------------
    
    f1_score_metric = evaluate.load("f1")
    f1_score = f1_score_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)["f1"]
    
    ### ------------------- recall -------------------
    
    recall_metric = evaluate.load("recall")
    recall = recall_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)["recall"]
    
    ### ------------------- precision -------------------
    
    precision_metric = evaluate.load("precision")
    precision = precision_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)["precision"]
    
    return {"accuracy" : accuracy, 
            "F1" : f1_score,
            "Recall" : recall,
            "Precision" : precision,
            }

#### Instantiate Model

In [16]:
num_of_labels = len(id2label)

model = AutoModelForAudioClassification.from_pretrained(MODEL_CKPT, 
                                                        num_labels=num_of_labels, 
                                                        label2id=label2id,
                                                        id2label= id2label)

Some weights of the model checkpoint at facebook/wav2vec2-base were not used when initializing Wav2Vec2ForSequenceClassification: ['project_q.bias', 'quantizer.weight_proj.bias', 'project_q.weight', 'project_hid.weight', 'quantizer.codevectors', 'project_hid.bias', 'quantizer.weight_proj.weight']
- This IS expected if you are initializing Wav2Vec2ForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForSequenceClassification were not initialized from the model checkpoint at facebook/wav2vec2-base and are newly initialized: ['projector.bias', 'projector.weight', 'classifier.

#### Define Training Arguments

In [17]:
args = TrainingArguments(
    output_dir=MODEL_NAME,
    evaluation_strategy=STRATEGY,
    num_train_epochs=NUM_OF_EPOCHS,
    save_strategy=STRATEGY,
    logging_strategy=STRATEGY,
    learning_rate=LEARNING_RATE,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=BATCH_SIZE,
    warmup_ratio=0.10,
    gradient_accumulation_steps=4,
    load_best_model_at_end=True,
    metric_for_best_model="F1",
    greater_is_better=True,
    logging_first_step=True,
    report_to="tensorboard",
    hub_private_repo=True,
    push_to_hub=True
)

#### Define Trainer

In [18]:
trainer = Trainer(
    model = model,
    args = args,
    train_dataset = encoded_audio["train"],
    eval_dataset = encoded_audio["eval"],
    tokenizer = feature_extractor,
    compute_metrics = compute_metrics,
)

Cloning https://huggingface.co/DunnBC22/wav2vec2-base-is_vinyl_scratched_or_not into local empty directory.


#### Train Model

In [19]:
trainer.train()

***** Running training *****
  Num examples = 2742
  Num Epochs = 10
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 128
  Gradient Accumulation steps = 4
  Total optimization steps = 210
  Number of trainable parameters = 94569090


  0%|          | 0/210 [00:00<?, ?it/s]



{'loss': 0.6901, 'learning_rate': 1.4285714285714286e-06, 'epoch': 0.05}
{'loss': 0.6671, 'learning_rate': 3e-05, 'epoch': 0.98}


***** Running Evaluation *****
  Num examples = 686
  Batch size = 32


  0%|          | 0/22 [00:00<?, ?it/s]

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/6.77k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/7.36k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/7.55k [00:00<?, ?B/s]

  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-21
Configuration saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-21/config.json


{'eval_loss': 0.6235492825508118, 'eval_accuracy': 0.6559766763848397, 'eval_F1': 0.0, 'eval_Recall': 0.0, 'eval_Precision': 0.0, 'eval_runtime': 1064.9816, 'eval_samples_per_second': 0.644, 'eval_steps_per_second': 0.021, 'epoch': 0.98}


Model weights saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-21/pytorch_model.bin
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-21/preprocessor_config.json
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/preprocessor_config.json


{'loss': 0.4954, 'learning_rate': 2.6666666666666667e-05, 'epoch': 1.98}


***** Running Evaluation *****
  Num examples = 686
  Batch size = 32


  0%|          | 0/22 [00:00<?, ?it/s]

Saving model checkpoint to wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-42
Configuration saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-42/config.json


{'eval_loss': 0.28244873881340027, 'eval_accuracy': 0.9416909620991254, 'eval_F1': 0.9095022624434389, 'eval_Recall': 0.8516949152542372, 'eval_Precision': 0.9757281553398058, 'eval_runtime': 1042.1602, 'eval_samples_per_second': 0.658, 'eval_steps_per_second': 0.021, 'epoch': 1.98}


Model weights saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-42/pytorch_model.bin
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-42/preprocessor_config.json
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/preprocessor_config.json


{'loss': 0.2406, 'learning_rate': 2.3333333333333336e-05, 'epoch': 2.98}


***** Running Evaluation *****
  Num examples = 686
  Batch size = 32


  0%|          | 0/22 [00:00<?, ?it/s]

Saving model checkpoint to wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-63
Configuration saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-63/config.json


{'eval_loss': 0.17554502189159393, 'eval_accuracy': 0.956268221574344, 'eval_F1': 0.9336283185840708, 'eval_Recall': 0.8940677966101694, 'eval_Precision': 0.9768518518518519, 'eval_runtime': 1040.1013, 'eval_samples_per_second': 0.66, 'eval_steps_per_second': 0.021, 'epoch': 2.98}


Model weights saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-63/pytorch_model.bin
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-63/preprocessor_config.json
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/preprocessor_config.json


{'loss': 0.169, 'learning_rate': 1.9999999999999998e-05, 'epoch': 3.98}


***** Running Evaluation *****
  Num examples = 686
  Batch size = 32


  0%|          | 0/22 [00:00<?, ?it/s]

Saving model checkpoint to wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-84
Configuration saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-84/config.json


{'eval_loss': 0.1545204222202301, 'eval_accuracy': 0.9591836734693877, 'eval_F1': 0.9385964912280702, 'eval_Recall': 0.9067796610169492, 'eval_Precision': 0.9727272727272728, 'eval_runtime': 1051.4657, 'eval_samples_per_second': 0.652, 'eval_steps_per_second': 0.021, 'epoch': 3.98}


Model weights saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-84/pytorch_model.bin
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-84/preprocessor_config.json
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/preprocessor_config.json


{'loss': 0.1287, 'learning_rate': 1.6666666666666667e-05, 'epoch': 4.98}


***** Running Evaluation *****
  Num examples = 686
  Batch size = 32


  0%|          | 0/22 [00:00<?, ?it/s]

Saving model checkpoint to wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-105
Configuration saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-105/config.json


{'eval_loss': 0.1248895600438118, 'eval_accuracy': 0.9606413994169096, 'eval_F1': 0.9406593406593408, 'eval_Recall': 0.9067796610169492, 'eval_Precision': 0.9771689497716894, 'eval_runtime': 1050.486, 'eval_samples_per_second': 0.653, 'eval_steps_per_second': 0.021, 'epoch': 4.98}


Model weights saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-105/pytorch_model.bin
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-105/preprocessor_config.json
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/preprocessor_config.json


{'loss': 0.1102, 'learning_rate': 1.3333333333333333e-05, 'epoch': 5.98}


***** Running Evaluation *****
  Num examples = 686
  Batch size = 32


  0%|          | 0/22 [00:00<?, ?it/s]

Saving model checkpoint to wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-126
Configuration saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-126/config.json


{'eval_loss': 0.11587227135896683, 'eval_accuracy': 0.9723032069970845, 'eval_F1': 0.9594882729211086, 'eval_Recall': 0.9533898305084746, 'eval_Precision': 0.9656652360515021, 'eval_runtime': 1049.2012, 'eval_samples_per_second': 0.654, 'eval_steps_per_second': 0.021, 'epoch': 5.98}


Model weights saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-126/pytorch_model.bin
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-126/preprocessor_config.json
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/preprocessor_config.json


{'loss': 0.0923, 'learning_rate': 9.999999999999999e-06, 'epoch': 6.98}


***** Running Evaluation *****
  Num examples = 686
  Batch size = 32


  0%|          | 0/22 [00:00<?, ?it/s]

Saving model checkpoint to wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-147
Configuration saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-147/config.json


{'eval_loss': 0.10729651153087616, 'eval_accuracy': 0.9664723032069971, 'eval_F1': 0.9515789473684211, 'eval_Recall': 0.9576271186440678, 'eval_Precision': 0.9456066945606695, 'eval_runtime': 1059.4975, 'eval_samples_per_second': 0.647, 'eval_steps_per_second': 0.021, 'epoch': 6.98}


Model weights saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-147/pytorch_model.bin
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-147/preprocessor_config.json
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/preprocessor_config.json


{'loss': 0.0877, 'learning_rate': 6.666666666666667e-06, 'epoch': 7.98}


***** Running Evaluation *****
  Num examples = 686
  Batch size = 32


  0%|          | 0/22 [00:00<?, ?it/s]

Saving model checkpoint to wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-168
Configuration saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-168/config.json


{'eval_loss': 0.10392837971448898, 'eval_accuracy': 0.9752186588921283, 'eval_F1': 0.9637526652452025, 'eval_Recall': 0.9576271186440678, 'eval_Precision': 0.9699570815450643, 'eval_runtime': 1046.4548, 'eval_samples_per_second': 0.656, 'eval_steps_per_second': 0.021, 'epoch': 7.98}


Model weights saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-168/pytorch_model.bin
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-168/preprocessor_config.json
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/preprocessor_config.json
Adding files tracked by Git LFS: ['.DS_Store']. This may take a bit of time if the files are large.


{'loss': 0.0807, 'learning_rate': 3.3333333333333333e-06, 'epoch': 8.98}


***** Running Evaluation *****
  Num examples = 686
  Batch size = 32


  0%|          | 0/22 [00:00<?, ?it/s]

Saving model checkpoint to wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-189
Configuration saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-189/config.json


{'eval_loss': 0.1087886393070221, 'eval_accuracy': 0.967930029154519, 'eval_F1': 0.9535864978902954, 'eval_Recall': 0.9576271186440678, 'eval_Precision': 0.9495798319327731, 'eval_runtime': 1049.1807, 'eval_samples_per_second': 0.654, 'eval_steps_per_second': 0.021, 'epoch': 8.98}


Model weights saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-189/pytorch_model.bin
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-189/preprocessor_config.json
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/preprocessor_config.json


{'loss': 0.0744, 'learning_rate': 0.0, 'epoch': 9.98}


***** Running Evaluation *****
  Num examples = 686
  Batch size = 32


  0%|          | 0/22 [00:00<?, ?it/s]

Saving model checkpoint to wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-210
Configuration saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-210/config.json


{'eval_loss': 0.1040809378027916, 'eval_accuracy': 0.9752186588921283, 'eval_F1': 0.9637526652452025, 'eval_Recall': 0.9576271186440678, 'eval_Precision': 0.9699570815450643, 'eval_runtime': 1111.0181, 'eval_samples_per_second': 0.617, 'eval_steps_per_second': 0.02, 'epoch': 9.98}


Model weights saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-210/pytorch_model.bin
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-210/preprocessor_config.json
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/preprocessor_config.json
Several commits (2) will be pushed upstream.


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from wav2vec2-base-is_vinyl_scratched_or_not/checkpoint-168 (score: 0.9637526652452025).


{'train_runtime': 215824.6111, 'train_samples_per_second': 0.127, 'train_steps_per_second': 0.001, 'train_loss': 0.21470175442241488, 'epoch': 9.98}


TrainOutput(global_step=210, training_loss=0.21470175442241488, metrics={'train_runtime': 215824.6111, 'train_samples_per_second': 0.127, 'train_steps_per_second': 0.001, 'train_loss': 0.21470175442241488, 'epoch': 9.98})

#### Evaluate Model

In [20]:
trainer.evaluate()

***** Running Evaluation *****
  Num examples = 686
  Batch size = 32


  0%|          | 0/22 [00:00<?, ?it/s]

{'eval_loss': 0.10392837971448898,
 'eval_accuracy': 0.9752186588921283,
 'eval_F1': 0.9637526652452025,
 'eval_Recall': 0.9576271186440678,
 'eval_Precision': 0.9699570815450643,
 'eval_runtime': 1035.8474,
 'eval_samples_per_second': 0.662,
 'eval_steps_per_second': 0.021,
 'epoch': 9.98}

#### Push Model to Hub (My Profile!!!)

In [26]:
trainer.push_to_hub()

Saving model checkpoint to wav2vec2-base-is_vinyl_scratched_or_not
Configuration saved in wav2vec2-base-is_vinyl_scratched_or_not/config.json
Model weights saved in wav2vec2-base-is_vinyl_scratched_or_not/pytorch_model.bin
Feature extractor saved in wav2vec2-base-is_vinyl_scratched_or_not/preprocessor_config.json
Dropping the following result as it does not have all the necessary fields:
{'dataset': {'name': 'audiofolder', 'type': 'audiofolder', 'config': 'scratched or not', 'split': 'train', 'args': 'scratched or not'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.9752186588921283}, {'name': 'F1', 'type': 'f1', 'value': 0.9637526652452025}, {'name': 'Recall', 'type': 'recall', 'value': 0.9576271186440678}, {'name': 'Precision', 'type': 'precision', 'value': 0.9699570815450643}]}


### Notes & Other Takeaways From This Project
****
- For some reason, it did not push the results of epoch 8 or 9 to the hub. I will attempt to manually upload that information.
- Results:
    - Accuracy: 0.9752186588921283
    - F1: 0.9637526652452025
    - Recall: 0.9576271186440678
    - Precision: 0.9699570815450643
****