To create the data for this classifier:

1. Download the Musicnet library
2. Put the folder titled "musicnet" in the music-translation folder
3. Create a folder titled "music_classification_data" in the "musicnet" folder
4. Create two folders in "music_classification_data" titled "test" and "train"
5. In the "test" and "train" folders, put folders with the wav files in each such that the folders are titled with the labels for the wav files (e.g. "Beethoven_Accompanied_Violin")

Import libraries:

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import Dataset
import librosa
import numpy as np

from pathlib import Path

Check current device:

In [2]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cpu


Puts data and labels into 2d arrays, splits into train and test:

Tutorial on pathlib libary: https://realpython.com/python-pathlib/

In [20]:
dataset = Path.cwd().parent.joinpath("musicnet", "music_classification_data")

train = dataset.joinpath("train")
test = dataset.joinpath("test")

train_labels = [p.stem for p in train.iterdir()]
test_labels = [p.stem for p in test.iterdir()]

train_labels.remove(".DS_Store")
test_labels.remove(".DS_Store")

print("train labels:", train_labels, "\n")
print("test labels:", test_labels, "\n")

train_wav = []
test_wav = []

for label in train_labels:
    train_wav.append([wav for wav in train.joinpath(label).iterdir() if wav.name != ".DS_Store"])
    
for label in test_labels:
    test_wav.append([wav for wav in test.joinpath(label).iterdir() if wav.name != ".DS_Store"])
    
print(len(train_wav), len(train_wav[0]))
print(len(test_wav), len(test_wav[0]))

train labels: ['Beethoven_Accompanied_Violin', 'Bach_Solo_Piano', 'Bach_Solo_Cello', 'Beethoven_Solo_Piano', 'Beethoven_String_Quartet', 'Cambini_Wind_Quintet'] 

test labels: ['Beethoven_Accompanied_Violin', 'Bach_Solo_Piano', 'Bach_Solo_Cello', 'Beethoven_Solo_Piano', 'Beethoven_String_Quartet', 'Cambini_Wind_Quintet'] 

6 16
6 6


Puts data and labels into 1d arrays:

In [21]:
train_y = []
test_y = []

unprocessed_train_x = []
unprocessed_test_x = []

for i in range(len(train_labels)):
    for j in range(len(train_wav[i])):
        train_y.append(train_labels[i])
        
for i in range(len(test_labels)):
    for j in range(len(test_wav[i])):
        test_y.append(test_labels[i])
        
for arr in train_wav:
    unprocessed_train_x.extend(arr)
    
for arr in test_wav:
    unprocessed_test_x.extend(arr)
        
print(len(train_y), len(unprocessed_train_x))
print(len(test_y), len(unprocessed_test_x))

153 153
50 50


Processed data, uses librosa to turn wav file into a tensor. Takes first 160,000 samples (~4s), and samples every 5 to get processed audio tensor.

In [23]:
train_x = []
test_x = []

train_progress_counter = 0
test_progress_counter = 0

print("Processing train wav files:")

for path in unprocessed_train_x:
    data, rate = librosa.load(path, sr=16000)
    assert rate == 16000
    sample_tensor = torch.tensor(data).float()
    short_tensor = sample_tensor[:160000]
    downsampled_tensor = short_tensor[::5]
    print(sample_tensor.size())
    train_progress_counter += 1
    print(train_progress_counter, "/", len(unprocessed_train_x), "processed")
    train_x.append(downsampled_tensor.size())
    
print("Processing test wav files:")
    
for path in unprocessed_test_x:
    data, rate = librosa.load(path, sr=16000)
    assert rate == 16000
    sample_tensor = torch.tensor(data).float()
    short_tensor = sample_tensor[:160000]
    downsampled_tensor = short_tensor[::5]
    print(sample_tensor.size())
    test_progress_counter += 1
    print(test_progress_counter, "/", len(unprocessed_test_x), "processed")
    test_x.append(downsampled_tensor.size())

Processing test wav files:
torch.Size([3216614])
1 / 50 processed
torch.Size([5441829])
2 / 50 processed
torch.Size([8287295])
3 / 50 processed
torch.Size([1994920])
4 / 50 processed


KeyboardInterrupt: 