### REI505M Final project: Music genre classification starter pack

The following Dataset class operates on the GTZAN dataset.

* The duration of most GTZAN files are 30 seconds (3022050=661500 samples) but some are slightly shorter (approx 29.9 seconds). For this reason we truncate at 660000 samples below.
* It may be beneficial to work with smaller chunks than ~30 seconds.
* You may want to perform the data augmentations in the `__get_item__` function.
* For now, `train_dataset` contains all the dataset, you need to set aside some examples for validation and test sets.

In [1]:
import torch
from torch.utils.data import DataLoader
import torch.nn as nn

from src.Conv1D import Conv1D
from src.Config import Config
from src.AudioDataset import AudioDataset
from src.DataPreparation import get_partitioned_data
import src.Utils as Utils 

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("Training on:", device)

config = Config(#Path to folder with GTZAN files:
                audio_dir_path='../music/',
                # music/
                #  - rock/
                #       rock.00099.wav
                #       ...
                #  - reggie/
                #  ...
                #  - blues/
                #Choose how many genres we want to use:
                num_genres=10, # eg. 2, 3, 5, 10
                duration_size= 29, # lÃ¤nge der wav-files
                sampling_rate = 22050,
                #Data Partition
                train_part_size=0.7,
                val_part_size=0.15,
                test_part_size=0.15,
                batch_size=8, 
                learning_rate=1e-3,
                epochs=7, 
                seed=42,
                device=device)

torch.manual_seed(config.seed) # Reproducible results

Training on: cuda


<torch._C.Generator at 0x1d5c2f88830>

In [3]:
#Load num_genres from data and partition them
train_files, train_labels, val_files, val_labels, test_files, test_labels = get_partitioned_data(config)

#Create Datasets and Dataloaders
train_dataset = AudioDataset(audio_files=train_files, labels=train_labels, audio_path=config.audio_dir_path, 
                             sampling_rate=config.sampling_rate, 
                             duration=config.duration_size, #Duration *before* augmentation.
                             num_augments=2,
                             always_augment=[0, 1],
                             print_augments=True)
train_loader = DataLoader(train_dataset, batch_size=config.batch_size, shuffle=True)

val_dataset = AudioDataset(val_files, val_labels, config.audio_dir_path,
                           sampling_rate= config.sampling_rate, duration= config.duration_size)
val_loader = DataLoader(val_dataset, batch_size=config.batch_size, shuffle=False)

test_dataset = AudioDataset(test_files, test_labels, config.audio_dir_path,
                            sampling_rate=config.sampling_rate, duration=config.duration_size)
test_loader = DataLoader(test_dataset, batch_size=config.batch_size, shuffle=False)                            


tmp_features, tmp_labels = next(iter(train_loader))
print(f"Feature batch shape {tmp_features.size()} | Labels {tmp_labels.size()}")

Using 10 genres ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock']
Total selected files 1000 | Training set 700 | Validation set 150 | Test set 150
Chosen augmentations [no_augment -> zero]
Feature batch shape torch.Size([8, 639450]) | Labels torch.Size([8])


In [4]:
#Train model
n_classes = config.num_genres
model = Conv1D(num_blocks=3,
               num_conv_layers_per_block=2,
               kernel_size=7,
               num_first_layer_kernels=32,
               conv_stride=2,
               pool_stride=2,
               dense_size=100,
               do_batch_norm=True,
               n_classes=n_classes,
               config=config
               ).to(config.device)

opt = torch.optim.Adam(model.parameters(), config.learning_rate)
crit = nn.CrossEntropyLoss()

Utils.train(train_dataset, train_loader, model, opt, lossfunc=crit, config=config, show_batch_time=False)

epoch 1 | train loss 2.3453 | train acc 0.0929 | time 5.14s | per batch 58.44ms
epoch 2 | train loss 2.3256 | train acc 0.0900 | time 4.72s | per batch 53.63ms
epoch 3 | train loss 2.3279 | train acc 0.0700 | time 4.71s | per batch 53.54ms
epoch 4 | train loss 2.3231 | train acc 0.0814 | time 4.69s | per batch 53.34ms
epoch 5 | train loss 2.3166 | train acc 0.0771 | time 4.68s | per batch 53.17ms
epoch 6 | train loss 2.3193 | train acc 0.0786 | time 4.69s | per batch 53.33ms
epoch 7 | train loss 2.3192 | train acc 0.0814 | time 4.68s | per batch 53.13ms


In [5]:
#Test model
Utils.test(test_dataset, test_loader, model, lossfunc=crit, config=config)

test loss 29899.9004 | test acc 0.0800 | time 0.34s | per batch 17.66ms
