# Final project of deep learning:

This notebook documents the pre-processing part of our project. By utilizing the Free Music Archive (FMA) dataset, we process audio tracks into spectrograms using the Librosa library and create a structured dataset. The notebook covers environment setup, data preprocessing, and preparing the data for model training and evaluation. Detailed explanations and code for converting audio to spectrograms, organizing the data, knowing its structure and contents are provided.

## Import and installs:

Before running the code you have to create a local environment with conda and activate it. The provided environment.yml file has all the required dependencies. Run the following command: 
- conda env create --file environment.yml  

to create a conda environment with all the required dependencies and then activate it:
- conda activate xnap-example

In [1]:
#!pip install librosa numpy matplotlib pandas torch
import os
import librosa
import math
import json
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
from pathlib import Path
import ast
import zipfile
import shutil

In [2]:
!pip install --upgrade pip setuptools wheel
#!pip install numpy==1.12.1  # workaround resampy's bogus setup.py
!pip install -r requirements.txt

[31mERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'[0m[31m
[0m

In [3]:
!git clone https://github.com/mdeff/fma.git

Cloning into 'fma'...
remote: Enumerating objects: 823, done.[K
remote: Counting objects: 100% (822/822), done.[K
remote: Compressing objects: 100% (287/287), done.[K
remote: Total 823 (delta 532), reused 621 (delta 527), pack-reused 1[K
Receiving objects: 100% (823/823), 4.08 MiB | 27.11 MiB/s, done.
Resolving deltas: 100% (532/532), done.


In [4]:
cd fma

/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/fma


In [None]:
!sudo apt-get install -y p7zip-full

In [6]:
# Change directory to 'fma'
%cd fma

# Download the datasets
!curl -O https://os.unil.cloud.switch.ch/fma/fma_small.zip 
!curl -O https://os.unil.cloud.switch.ch/fma/fma_metadata.zip

# Verify the downloaded files
!echo "ade154f733639d52e35e32f5593efe5be76c6d70  fma_small.zip"    | sha1sum -c -
!echo "f0df49ffe5f2a6008d7dc83c6915b31835dfe733  fma_metadata.zip" | sha1sum -c -

[Errno 2] No such file or directory: 'fma'
/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/fma
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  341M  100  341M    0     0   120M      0  0:00:02  0:00:02 --:--:--  120M
fma_metadata.zip: OK


In [7]:
# Unzip the downloaded files without prompts
!7z x -aoa fma_metadata.zip


7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,6 CPUs Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (406F1),ASM,AES-NI)

Scanning the drive for archives:
  0M Sca        1 file, 358412441 bytes (342 MiB)

Extracting archive: fma_metadata.zip
--
Path = fma_metadata.zip
Type = zip
Physical Size = 358412441

      0% 4 - fma_metadata/raw_albums.c                                    1% 4 - fma_metadata/raw_albums.c                                    2% 5 - fma_metadata/raw_artists.cs                                      3% 6 - fma_metadata/raw_tracks.c                                    4% 6 - fma_metadata/raw_tracks.c                                    5% 6 - fma_metadata/raw_tracks.c                                    6% 6 - fma_metadata/raw_tracks.c                                    7% 6 - fma_metadata/raw_tracks.c                                    8% 6 - fma_metadata/raw_tracks.c                       

In [8]:
!7z x -aoa fma_small.zip 

## Functions to extract and work with the data.

In [8]:
def get_genre_from_genre_id(df, id):
    track_with_id = df[df['genre_id'] == id]
    genre = track_with_id['title'].iloc[0]  # Make sure to extract the first element
    return genre

def get_genre_from_track_id(df, id, df2):
    '''
    df: tracks dataframe
    id: track id
    df2: geners dataframe
    '''
    track_with_id = df[df['track_id'] == id]
    top_genre = track_with_id['genre_top'].iloc[0]  # Extract the first element
    genres_all = track_with_id['genres_all'].iloc[0]  # Extract the first element
    genres_all = ast.literal_eval(genres_all)  # Convert the string representation of a list to a list
    genres_all = [get_genre_from_genre_id(df2, genre_id) for genre_id in genres_all]
    return top_genre, genres_all

In [9]:
## To take out the left 0, which the code cannot work with
def remove_leading_zeros(number):
    return int(str(number))

# Example usage
number_with_zeros_str = "00002"
number_without_zeros = remove_leading_zeros(number_with_zeros_str)
print(number_without_zeros)  # Output: 2

2


In [11]:
def CreateSpectrograms(load_path, save_path):
    """
    Recursively loads each song from subdirectories and creates its Mel Spectrogram using librosa,
    with a Fast Fourier Transform window of 2048 and a hop length of 512. It also saves the spectrogram
    on the save defined path with the id as the name

    Parameters:
    - load_path: Path where the audio files to load are found.
    - save_path: Path where the spectrograms are saved.
    """

    load_path = Path(load_path)
    save_path = Path(save_path)

    # Recursive function to process all files in the directory and subdirectories>
    def process_directory(directory):
        for file in directory.iterdir():
            if file.is_dir():
                process_directory(file)  # Recursively process subdirectory
            else:
                id_track = str(file.stem)  # Use the stem (filename without suffix) as id
                try:
                    waveform, sample_rate = librosa.load(file, mono=True)  # Load in mono
                    spec = librosa.feature.melspectrogram(y=waveform, sr=sample_rate, n_fft=2048, hop_length=512)

                    # Plot and save the spectrogram as .png without axis
                    plt.figure(figsize=(10, 4))
                    librosa.display.specshow(librosa.power_to_db(spec, ref=np.max), y_axis=None, x_axis=None)
                    plt.axis('off')
                    plt.savefig(save_path / f"{id_track}.png", bbox_inches='tight', pad_inches=0, transparent=True)
                    plt.close()
                except Exception as e:
                    print(f"Error processing {file}: {e}")
                    pass

    process_directory(load_path)

In [38]:
def get_data(load_path, df, df2):
    """
    Parameters:
    - load_path: Path where the audio files to load are found.
    - df: DataFrame containing track information.
    - df2: DataFrame containing genre information.
    """
    load_path = Path(load_path)
    data = []

    # Recursive function to process all files in the directory and subdirectories
    def process_directory(directory):
        for file in directory.iterdir():
            if file.is_dir():
                process_directory(file)  # Recursively process subdirectory
            elif file.suffix == '.png':  # Check if file has .mp3 extension
                # Extract the ID from the file path using the stem attribute
                id = file.stem
                id = remove_leading_zeros(id)
                genre, genres_all = get_genre_from_track_id(df, id, df2)
                genre_top = df[df['track_id'] == id]['genre_top'].values[0]
                data.append({'ID': id, 'Genre': genre, 'Genres_all': genres_all, 'Genre_top': genre_top})

    process_directory(load_path)

    return pd.DataFrame(data)  # Return DataFrame containing ID, Genre, Genres_all, and Genre_top columns 

In [None]:
# Load the CSV files into DataFrames
features = pd.read_csv('fma_metadata/features.csv')
echonest = pd.read_csv('fma_metadata/echonest.csv')
artists = pd.read_csv('fma_metadata/raw_artists.csv')
albums = pd.read_csv('fma_metadata/raw_albums.csv')
genres = pd.read_csv('fma_metadata/genres.csv')
tracks = pd.read_csv('fma_metadata/tracks.csv')

In [12]:
# Correct the header values
new_header = tracks.iloc[0]
new_header[0] = 'track_id'  #track_id is the second row. the rest of elements are in the first one

tracks = tracks[2:] # remove the first two rows of the dataframe

tracks.columns = new_header

tracks['track_id'] = tracks['track_id'].astype(int)  # track_id column has both int and string, all mixed up. convert everything to int

In [34]:
tracks.columns

Index(['track_id', 'comments', 'date_created', 'date_released', 'engineer',
       'favorites', 'id', 'information', 'listens', 'producer', 'tags',
       'title', 'tracks', 'type', 'active_year_begin', 'active_year_end',
       'associated_labels', 'bio', 'comments', 'date_created', 'favorites',
       'id', 'latitude', 'location', 'longitude', 'members', 'name',
       'related_projects', 'tags', 'website', 'wikipedia_page', 'split',
       'subset', 'bit_rate', 'comments', 'composer', 'date_created',
       'date_recorded', 'duration', 'favorites', 'genre_top', 'genres',
       'genres_all', 'information', 'interest', 'language_code', 'license',
       'listens', 'lyricist', 'number', 'publisher', 'tags', 'title'],
      dtype='object', name=0)

In [37]:
genres.columns

Index(['genre_id', '#tracks', 'parent', 'title', 'top_level'], dtype='object')

The 'tracks' dataframe comprises crucial details for each track, including 'release date', 'producer', 'duration', 'genre_top', 'genres', and 'genres_all'.

Given the project's objective to classify .mp3 files based on track genre, the primary focus lies on 'genre_top'. This column denotes the main genre of the track. Additionally, 'genres' and 'genres_all' provide supplementary genre information, capturing multiple genres associated with each track. While 'genre_top' is utilized for model training, it was considered to use 'genres_all' for assessing the model performance, offering a broader perspective owing to the inclusion of multiple track genres. However, 'genres_all' contained more than 8 unique genres, complicating the task of assessing the model using those. 

In [17]:
# This is to remove the files that are corrupted or give some kind of mistake. 

#The processing already handles corrupted files and removes them

#os.remove("/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/fma/fma_small/099/099134.mp3")
#os.remove("/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/fma/fma_small/108/108925.mp3")
#os.remove("/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/fma/fma_small/133/133297.mp3")

#os.remove("/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/fma/fma_small/098/098569.mp3")
#os.remove("/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/fma/fma_small/098/098565.mp3")
#os.remove("/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/fma/fma_small/098/098567.mp3")

#os.remove("/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/fma/fma_small/checksums")

## Create spectrograms:

In [18]:
# Define the save path
save_path = 'spectrograms'

# Create the directory if it does not exist
if not os.path.exists(save_path):
    os.makedirs(save_path)

# Call the CreateSpectrograms function
CreateSpectrograms('fma_small', save_path)

### Zip into 17 different folders

This is done because github can't upload large files

In [17]:
def zip_and_delete_directory(dir_path, max_zip_size=100*1024*1024):
    # Check if directory exists
    if os.path.exists(dir_path):
        # If it exists, create zip archives
        base_name = os.path.basename(dir_path)
        files = os.listdir(dir_path)
        zip_file_number = 1
        zip_file = zipfile.ZipFile(os.path.join(os.path.dirname(dir_path), f"{base_name}_{zip_file_number}.zip"), 'w', zipfile.ZIP_DEFLATED)
        total_size = 0

        for file in files:
            file_path = os.path.join(dir_path, file)
            total_size += os.path.getsize(file_path)

            if total_size > max_zip_size:
                zip_file.close()
                zip_file_number += 1
                zip_file = zipfile.ZipFile(os.path.join(os.path.dirname(dir_path), f"{base_name}_{zip_file_number}.zip"), 'w', zipfile.ZIP_DEFLATED)
                total_size = os.path.getsize(file_path)

            zip_file.write(file_path, arcname=file)

        zip_file.close()

        # Delete the original directory
        shutil.rmtree(dir_path)

zip_and_delete_directory('/home/xnmaster/deep-learning-project-2024-ai_nndl_g9/spectrograms')

### Unzip for each use:

In [3]:
def unzip_files(zip_filepaths, dest_dir):
    # Check if destination directory exists
    if not os.path.exists(dest_dir):
        # If not, create the directory
        os.makedirs(dest_dir)

    for zip_filepath in zip_filepaths:
        # Check if zip file exists
        if not os.path.exists(zip_filepath):
            print(f"Zip file {zip_filepath} does not exist. Skipping.")
            continue

        with zipfile.ZipFile(zip_filepath, 'r') as zip_ref:
            zip_ref.extractall(dest_dir)

# List of zip files to unzip
zip_filepaths = [f'/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/spectrograms_{i}.zip' for i in range(1, 18)]  ###.azureml

unzip_files(zip_filepaths, '/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/spectrograms')

In [39]:
########### GET THE DATA: track id and its corresponding genre(s)
data = get_data('/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/spectrograms', tracks, genres)

In [40]:
data

Unnamed: 0,ID,Genre,Genres_all,Genre_top
0,146019,Instrumental,"[Soundtrack, Instrumental]",Instrumental
1,24430,Hip-Hop,[Hip-Hop],Hip-Hop
2,27612,International,"[International, Europe, Flamenco, Spanish, Latin]",International
3,115925,Folk,"[Psych-Folk, Folk, Freak-Folk]",Folk
4,55233,International,"[International, Reggae - Dub]",International
...,...,...,...,...
7513,115698,Electronic,"[Techno, House, Electronic]",Electronic
7514,62748,Pop,"[Pop, Synth Pop]",Pop
7515,126670,Rock,"[Indie-Rock, Rock]",Rock
7516,97841,Experimental,"[Improv, Experimental]",Experimental


In [41]:
are_columns_equal = data['Genre'].equals(data['Genre_top'])
print(are_columns_equal)

True


In [16]:
data = data.sort_values('ID')
data = data.reset_index(drop=True)

In [17]:
data

Unnamed: 0,ID,Genre,Genres_all
0,2,Hip-Hop,[Hip-Hop]
1,5,Hip-Hop,[Hip-Hop]
2,10,Pop,[Pop]
3,140,Folk,[Folk]
4,141,Folk,[Folk]
...,...,...,...
7989,154308,Hip-Hop,"[Hip-Hop Beats, Rap, Hip-Hop]"
7990,154309,Hip-Hop,"[Hip-Hop Beats, Rap, Hip-Hop]"
7991,154413,Pop,"[Pop, Experimental Pop]"
7992,154414,Pop,"[Pop, Experimental Pop]"


In [30]:
unique_genres = data['Genre'].unique()
unique_genres

array(['Hip-Hop', 'Pop', 'Folk', 'Experimental', 'Rock', 'International',
       'Electronic', 'Instrumental'], dtype=object)

In [31]:
len(unique_genres)

8

## Create dataframe and CSV to work with when training:

In [26]:
RANDOM_SEED = 42
from sklearn.model_selection import GridSearchCV, train_test_split
import os
from PIL import Image
import numpy as np

spectrogram_dir = '/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/spectrograms' 
spectrogram_files = os.listdir(spectrogram_dir)

'''
spectrograms = []
for file in spectrogram_files:
    if file.endswith('.png'):
        image = Image.open(os.path.join(spectrogram_dir, file))
        image_array = np.array(image)
        spectrograms.append(image_array)
'''
y = data['Genre'].values


In [28]:
print(len([f'/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/spectrograms/{spectrogram_files[i]}' for i in range(len(spectrogram_files))]))
print(len(y))
print(len(data['Genres_all'].values))


7518
7518
7518


Some of the audio files could not be converted into spectograms due to corrupted data, empty files... This dataset reduction comes from this corrupted files

In [29]:
df = pd.DataFrame({
    'image_paths': [f'/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/spectrograms/{spectrogram_files[i]}' for i in range(len(spectrogram_files))],
    'labels': y,
    'other_labels': data['Genres_all'].values
})

df = pd.DataFrame(df)
df.to_csv('/home/xnmaster/TestMachine/deep-learning-project-2024-ai_nndl_g9/MY_DATA.csv', index=False)