-
Related Series in YouTube: PyTorch for Audio + Music Processing
Codes given by the author: GitHub Repo
-
Files / Directories Info (Listed in the order explained in the tutorial above):
Directories:
MNIST
: A dataset, auto downloaded in filetrain.py
.UrbanSound8K
: A dataset, downloaded from website URBANSOUND8K DATASET
Files:
train_feed_forward_network.py
: Contains a classFeedForwardNet
and functionsdownload_mnist_datasets
,train_one_epoch
andtrain
which are used for downloading MNIST dataset and training them using FeedForwardNet model.feedforwardnet.pth
: Model saved fromtrain.py
.predict_feed_forward_network.py
: Contains a functionpredict
for validating the modelfeedforwardnet.pth
.urban_sound_dataset.py
: Contains a classUrbanSoundDataset
for loading.wav
sound file in urbansound8k dataset and getting the waveform signals, sample rates and mel-spectorgrams of each audio. Serveral works are done in method__getitem__
:- Load the
.wav
audio file and get its waveform signal and sample rate. - Resample the signal if the original sample rate is not equal to the target sample rate.
- Mix down multiple channels to moto.
- If the number of samples is more than the expected, apply cutting operation.
- if the number of samples is less than the expected, apply right padding operation.
- Use transforming function (here it's
mel_spectrogram
) to transform it.
- Load the
cnn.py
: A simple CNN model.train_cnn
: Use the model incnn.py
to train the urban sound dataset.predict_cnn
: Predict the model in the same way ofpredict_feed_forward_network.py
.
-
Get
RuntimeError: No audio I/O backend is available.
message while running codetorchaudio.load(audio_sample_path)
at fileurban_sound_dataset.py
:# try with commands below pip install SoundFile # or pip install sox
-
Get error message below when plotting mel-spectrogram using matplotlib:
manager_pyplot_show = vars(manager_class).get("pyplot_show") TypeError: vars() argument must have __dict__ attribute
Solutions (Stack Overflow):
mpl.use('TkAgg') # Add this code