<a href="https://colab.research.google.com/github/allispaul/audiobot/blob/main/EDA/Audiobots_Spectrogram_Creation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#The Erd&#337;s Institute Fall Boot Camp - Team Audiobots

We're using data from [this dataset](https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification) to try and classify one thousand 30s samples of audio into one of 10 genres:

*   blues
*   classical
*   country
*   disco
*   hiphop
*   jazz
*   metal
*   pop
*   reggae
*   rock

For use with Vision-Transformers and Convolutional Neural Networks, I'm going to create some consistent CNNs we can load, so we don't have to do it on the fly.

This code does the following:

*   Downloads the data from HuggingFace
*   Splits the data into a CONSISTENT training and validation set. No need to create a test set until we get to the larger datasets.
*   Creates mel-spectrograms AND log-spectrograms for each song, and puts them in 10 folders

Notably, I'm NOT going to resample the songs. They'll all be done at whatever their native sampling rate is (which varies (edit: does it? JK, they're all the same). This ensures we'll have the higest possible quality spectrogram at each resolution, instead of encoding... encoding errors, lol


In [1]:
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import librosa
import librosa.display

#from IPython.display import Audio

!pip install datasets
from datasets import load_dataset, Audio

#!pip install git+https://github.com/huggingface/transformers


import os.path




Note: This links your Google Drive to Colab. Useful if the data is stored in Google Drive.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Alternatively, you can download the data from scratch

In [3]:
gtzan = load_dataset("marsyas/gtzan", split='train')
gtzan

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


Dataset({
    features: ['file', 'audio', 'genre'],
    num_rows: 999
})

In [4]:
gtzan = gtzan.train_test_split(seed=42, test_size=0.1, stratify_by_column = 'genre')
gtzan

DatasetDict({
    train: Dataset({
        features: ['file', 'audio', 'genre'],
        num_rows: 899
    })
    test: Dataset({
        features: ['file', 'audio', 'genre'],
        num_rows: 100
    })
})

Import the 30s version of the too short song and replace.


In [51]:
# Default FFT window size
n_fft = 446#8192 # FFT window size
hop_length = 2966#512 # number audio of frames between STFT columns

#See my rant about settings in Slack for details


In [5]:
parent_dir = "/content/drive/MyDrive/Colab Notebooks/Erdos Institute Boot Camp/Audiobots/Data/GTZAN/mel/train"

!pwd

/content


In [6]:
def create_spectrograms(train, mel, n_fft, hop_length):
    if train == True:
        my_set = "train"
    else:
        my_set = "test"

    if mel == True:
        y_axis = 'mel'
    else:
        y_axis = 'log'

    parent_dir = "/content/drive/MyDrive/Colab Notebooks/Erdos Institute Boot Camp/Audiobots/Data/GTZAN/"+y_axis+"/"+my_set

    last_genre = "null"

    for song in gtzan[my_set]["audio"]:
      genre = song['path'].split('/')[-2]
      name = song['path'].split('/')[-1].replace(".wav",".png")
      print(name)

      if genre != last_genre:
        os.chdir(parent_dir)
        path = os.path.join(parent_dir, genre)
        if os.path.exists(path) == False:
          os.mkdir(path)
        os.chdir(path)

      last_genre = genre


      sr = song['sampling_rate'] #should be 22050
      D = np.abs(librosa.stft(song['array'], n_fft = n_fft, hop_length = hop_length))

      # Convert an amplitude spectrogram to Decibels-scaled spectrogram.
      DB = librosa.amplitude_to_db(D, ref = np.max)

      # Creating the Spectogram
      fig, axes = plt.subplots(figsize=(224/100, 224/100))
      fig.subplots_adjust(top=1.0, bottom=0, right=1.0, left=0, hspace=0, wspace=0)
      img = librosa.display.specshow(DB, sr = sr, hop_length=hop_length, n_fft=n_fft, y_axis = y_axis, cmap = 'Greys_r')
      axes.axis('off')

      plt.savefig(name, dpi=100, format='png')
      fig.clear()
      plt.close(fig)


In [53]:
DB.shape

(224, 224)

In [73]:
create_spectrograms(train = True, mel = True, n_fft = 446, hop_length = 2966)

/content/drive/MyDrive/Colab Notebooks/Erdos Institute Boot Camp/Audiobots/Data/GTZAN/mel/train


In [None]:
create_spectrograms(train = False, mel = True, n_fft = 446, hop_length = 2966)

In [7]:
create_spectrograms(train = True, mel = False, n_fft = 8192, hop_length = 2966)

country.00020.png
metal.00086.png
country.00064.png
reggae.00064.png
classical.00093.png
disco.00032.png
classical.00067.png
jazz.00050.png
country.00039.png
rock.00098.png
hiphop.00098.png
disco.00060.png
pop.00022.png
hiphop.00019.png
disco.00017.png
country.00089.png
metal.00034.png
reggae.00052.png
country.00005.png
blues.00084.png
rock.00044.png
rock.00000.png
metal.00016.png
metal.00063.png
classical.00059.png
country.00027.png
blues.00090.png
country.00022.png
pop.00013.png
country.00034.png
country.00063.png
country.00054.png
metal.00059.png
hiphop.00016.png
rock.00031.png
disco.00058.png
country.00033.png
blues.00020.png
metal.00022.png
classical.00091.png
reggae.00031.png
hiphop.00024.png
blues.00014.png
country.00038.png
metal.00094.png
disco.00031.png
hiphop.00089.png
rock.00010.png
classical.00062.png
country.00036.png
hiphop.00052.png
hiphop.00059.png
rock.00087.png
jazz.00069.png
rock.00046.png
country.00084.png
country.00021.png
pop.00046.png
metal.00073.png
disco.00090

In [8]:
create_spectrograms(train = False, mel = False, n_fft = 8192, hop_length = 2966)

pop.00071.png
country.00085.png
blues.00054.png
reggae.00035.png
classical.00061.png
classical.00013.png
rock.00096.png
pop.00097.png
metal.00050.png
hiphop.00097.png
metal.00026.png
pop.00098.png
reggae.00043.png
jazz.00053.png
jazz.00010.png
country.00078.png
metal.00014.png
pop.00057.png
blues.00006.png
disco.00052.png
classical.00087.png
reggae.00002.png
jazz.00032.png
pop.00018.png
blues.00097.png
rock.00064.png
rock.00074.png
classical.00065.png
pop.00074.png
disco.00003.png
disco.00073.png
metal.00042.png
disco.00016.png
metal.00008.png
hiphop.00041.png
classical.00092.png
rock.00017.png
reggae.00018.png
metal.00062.png
blues.00041.png
country.00058.png
rock.00048.png
metal.00021.png
jazz.00084.png
metal.00070.png
rock.00047.png
blues.00049.png
disco.00088.png
country.00092.png
blues.00047.png
country.00030.png
blues.00019.png
hiphop.00058.png
jazz.00047.png
reggae.00034.png
country.00031.png
jazz.00016.png
jazz.00021.png
country.00047.png
classical.00099.png
classical.00019.png