### REI505M Final project: Music genre classification starter pack

The following Dataset class operates on the GTZAN dataset.

* The duration of most GTZAN files are 30 seconds (3022050=661500 samples) but some are slightly shorter (approx 29.9 seconds). For this reason we truncate at 660000 samples below.
* It may be beneficial to work with smaller chunks than ~30 seconds.
* You may want to perform the data augmentations in the `__get_item__` function.
* For now, `train_dataset` contains all the dataset, you need to set aside some examples for validation and test sets.

In [6]:
import torch
from torch.utils.data import DataLoader
import torch.nn as nn

from src.Conv1D import Conv1D
from src.Config import Config
from src.AudioDataset import AudioDataset
from src.DataPreparation import get_partitioned_data
import src.Utils as Utils 

In [7]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("Training on:", device)

config = Config(#Path to folder with GTZAN files:
                audio_dir_path='../music/',
                # music/
                #  - rock/
                #       rock.00099.wav
                #       ...
                #  - reggie/
                #  ...
                #  - blues/
                #Choose how many genres we want to use:
                num_genres=2, # eg. 2, 3, 5, 10
                #Data Partition
                train_part_size=0.7,
                val_part_size=0.15,
                test_part_size=0.15,
                batch_size=8, 
                learning_rate=1e-3,
                epochs=7, 
                seed=42,
                device=device)

torch.manual_seed(config.seed) # Reproducible results

Training on: cpu


<torch._C.Generator at 0x10f550c50>

In [8]:
#Load num_genres from data and partition them
train_files, train_labels, val_files, val_labels, test_files, test_labels = get_partitioned_data(config)

#Create Datasets and Dataloaders
train_dataset = AudioDataset(audio_files=train_files, labels=train_labels,
                             audio_path=config.audio_dir_path, 
                             maxlen=660000, sampling_rate=22050, duration=25)
train_loader = DataLoader(train_dataset, batch_size=config.batch_size, shuffle=True)

val_dataset = AudioDataset(val_files, val_labels, config.audio_dir_path,
                           maxlen=660000, sampling_rate=22050, duration=25)
val_loader = DataLoader(val_dataset, batch_size=config.batch_size, shuffle=False)

test_dataset = AudioDataset(test_files, test_labels, config.audio_dir_path,
                            maxlen=660000, sampling_rate=22050, duration=25)
test_loader = DataLoader(test_dataset, batch_size=config.batch_size, shuffle=False)                            

print("Sanity Check Train Loader:")
tmp_features, tmp_labels = next(iter(train_loader))
print(f"Feature batch shape: {tmp_features.size()}")
print(f"Labels batch shape: {tmp_labels.size()}")

print("Sanity Check Valdidation Loader:")
tmp_features, tmp_labels = next(iter(val_loader))
print(f"Feature batch shape: {tmp_features.size()}")
print(f"Labels batch shape: {tmp_labels.size()}")

print("Sanity Check Test Loader:")
tmp_features, tmp_labels = next(iter(test_loader))
print(f"Feature batch shape: {tmp_features.size()}")
print(f"Labels batch shape: {tmp_labels.size()}")

Using 2 genres: ['blues', 'classical']
Updated label map: {'blues': 0, 'classical': 1}
Total selected files: 200
Training set: 140
Validation set: 30
Test set: 30
Sanity Check Train Loader:
Feature batch shape: torch.Size([8, 551250])
Labels batch shape: torch.Size([8])
Sanity Check Valdidation Loader:
Feature batch shape: torch.Size([8, 551250])
Labels batch shape: torch.Size([8])
Sanity Check Test Loader:
Feature batch shape: torch.Size([8, 551250])
Labels batch shape: torch.Size([8])


In [None]:
#Train model
n_classes = config.num_genres
model = Conv1D(num_blocks=1,
               num_conv_layers_per_block=2,
               kernel_size=7,
               num_first_layer_kernels=8,
               conv_stride=2,
               pool_stride=2,
               dense_size=100,
               do_batch_norm=True,
               n_classes=n_classes,
               config=config
               ).to(config.device)

opt = torch.optim.Adam(model.parameters(), config.learning_rate)
crit = nn.CrossEntropyLoss()

best_val_acc = 0.0
best_model_state = None

Utils.train(train_dataset, train_loader, val_dataset, val_loader, model, opt, lossfunc=crit, config=config, show_batch_time=True)

Epoch 1/7 -  Val Loss: 0.0839 |  Val Acc: 0.53%| average 97.07ms3.28ms
epoch 1/7 | train loss 0.6939 | train acc 0.5214 | time 0.39s | per batch 97.07ms
Epoch 2/7 -  Val Loss: 0.0768 |  Val Acc: 0.53% | average 146.92ms76ms
epoch 2/7 | train loss 0.6040 | train acc 0.6071 | time 0.59s | per batch 146.92ms
(14/18) batch time 718.87ms | cumulative 9380.50ms | average 670.04ms

In [None]:
#Test model
Utils.test(test_dataset, test_loader, model, lossfunc=crit, config=config)