<a href="https://colab.research.google.com/github/AnovaYoung/Classifying-Music-Genre-with-CNNs-and-LSTM-Models/blob/main/Music_Genre_and_Composer_Classification_Using_Deep_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Classifying Music Genre and Composers Using CNNs and LSTM Models**

I'm going to load the data and create new directories: This will ensure that only the relevant composers' files are loaded for further processing. I only care to pull out Chopin, Bach, Beethoven, and Mozart.

In [None]:
import zipfile
import os
import shutil

zip_path = r'\Users\manov\Downloads\archive (3).zip'
extract_dir = r'\Users\manov\Downloads\extracted_composers'

if not os.path.exists(extract_dir):
    os.makedirs(extract_dir)

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)

# List of composers I'm interested in
composers_of_interest = ['Bach', 'Beethoven', 'Chopin', 'Mozart']

# Create directories for the selected composers
for composer in composers_of_interest:
    composer_dir = os.path.join(extract_dir, composer)
    if not os.path.exists(composer_dir):
        os.makedirs(composer_dir)

# Move the MIDI files of the composers of interest to their respective directories
for root, dirs, files in os.walk(extract_dir):
    for file in files:
        if file.endswith('.mid') or file.endswith('.midi'):
            file_path = os.path.join(root, file)
            # Check if the file belongs to one of the composers of interest
            for composer in composers_of_interest:
                if composer in root or composer in file:
                    composer_dir = os.path.join(extract_dir, composer)
                    destination_path = os.path.join(composer_dir, file)
                    if os.path.exists(destination_path):
                        # File already exists, skip or rename it
                        continue
                    shutil.move(file_path, composer_dir)
                    break

print("MIDI files have been organized by composer.")

MIDI files have been organized by composer.


**DATA PREPROCESSING**

 The next step is to preprocess the data. This involves converting the MIDI files into a format suitable for my deep learning models and applying data augmentation techniques such as pitch shifting.

In [None]:
import pretty_midi
import pretty_midi.utilities
import numpy as np
import os
from sklearn.preprocessing import LabelEncoder

# Define the path to the directories for each composer
base_dir = r'C:\Users\manov\Downloads\extracted_composers'
composers = ['Bach', 'Beethoven', 'Chopin', 'Mozart']

def load_midi_files(base_dir, composers):
    midi_files = []
    labels = []
    for composer in composers:
        composer_dir = os.path.join(base_dir, composer)
        for file_name in os.listdir(composer_dir):
            if file_name.endswith('.mid') or file_name.endswith('.midi'):
                midi_files.append(os.path.join(composer_dir, file_name))
                labels.append(composer)
    return midi_files, labels

def midi_to_notes(midi_file):
    try:
        midi_data = pretty_midi.PrettyMIDI(midi_file)
        notes = []
        for instrument in midi_data.instruments:
            if not instrument.is_drum:
                for note in instrument.notes:
                    notes.append([note.start, note.end, note.pitch])
        return np.array(notes)
    except Exception as e:
        print(f"Error processing {midi_file}: {e}")
        return None

def augment_data(notes, shift_range=2):
    augmented_data = []
    for shift in range(-shift_range, shift_range + 1):
        if shift == 0:
            continue
        shifted_notes = notes.copy()
        shifted_notes[:, 2] += shift
        augmented_data.append(shifted_notes)
    return augmented_data

# Load and preprocess MIDI files
midi_files, composers = load_midi_files(base_dir, composers)

preprocessed_data = []
labels = []

for midi_file, composer in zip(midi_files, composers):
    notes = midi_to_notes(midi_file)
    if notes is not None:
        augmented_notes = augment_data(notes)
        preprocessed_data.extend([notes] + augmented_notes)
        labels.extend([composer] * (len(augmented_notes) + 1))

# Convert labels to numerical values
preprocessed_data = np.concatenate(preprocessed_data)
np.save('preprocessed_data.npy', preprocessed_data)
np.save('labels.npy', labels)




Error processing C:\Users\manov\Downloads\extracted_composers\Beethoven\Anhang 14-3.mid: Could not decode key with 3 flats and mode 255
Error processing C:\Users\manov\Downloads\extracted_composers\Mozart\K281 Piano Sonata n03 3mov.mid: Could not decode key with 2 flats and mode 2


To explain the output: the code handled the errors gracefully by skipping the problematic MIDI files and printing the above and relevant error messages. The warning from pretty_midi about the non-zero tracks is not critical, but it indicates that some MIDI files might not conform to the expected format.

Some Summary Statistics for the data

In [None]:
import numpy as np
import pandas as pd

# Load the extracted features and labels
features = np.load('features.npy', allow_pickle=True)
labels = np.load('labels.npy')

#  Concatenate the nested feature arrays into a single array for easier analysis.
flattened_features = np.concatenate(features)

# Convert the flattened feature array into a DataFrame for easier manipulation and summary statistics calculation.
df_features = pd.DataFrame(flattened_features, columns=['start', 'duration', 'pitch'])

#  I'm using 'describe' method to generate summary statistics for the features, including count, mean, standard deviation, min, max, and quartiles.
print("Summary Statistics for Features:")
print(df_features.describe())


Summary Statistics for Features:
              start      duration         pitch
count  2.446408e+07  2.446408e+07  2.446408e+07
mean   3.068068e+02  3.209468e-01  6.403889e+01
std    3.571839e+02  4.817936e-01  1.230178e+01
min    0.000000e+00  4.166667e-04  7.000000e+00
25%    8.597463e+01  1.153845e-01  5.600000e+01
50%    2.105478e+02  1.898735e-01  6.500000e+01
75%    4.153311e+02  3.513514e-01  7.300000e+01
max    5.164605e+03  9.863928e+01  1.090000e+02


Start Times: The wide range (from 0 to 5164 seconds) and high standard deviation indicate that notes are spread out over a long duration, which is typical for musical pieces with varying lengths.
Durations: Most notes have relatively short durations, as indicated by the mean (0.32 seconds) and the quartiles. However, there are some notes with much longer durations (up to 98.64 seconds), which might be sustained notes.
Pitches: The pitches are centered around 64, with most notes falling between 56 and 73. This range is typical for classical music, which often centers around the middle of the piano keyboard.

In [None]:
import numpy as np
import pandas as pd

# Load the extracted features and labels
features = np.load('features.npy', allow_pickle=True)
labels = np.load('labels.npy')

# Print the number of samples and shape of each feature array
print("Number of samples (features):", len(features))
print("Shape of each feature array:", features[0].shape if len(features) > 0 else "N/A")

# Print the number of labels
print("Number of samples (labels):", len(labels))

# Calculate memory usage for features and labels
feature_size = sum([f.nbytes for f in features]) if len(features) > 0 else 0
label_size = labels.nbytes
total_size = feature_size + label_size

# Print memory usage
print("Memory usage (features): {:.2f} MB".format(feature_size / (1024 * 1024)))
print("Memory usage (labels): {:.2f} MB".format(label_size / (1024 * 1024)))
print("Total memory usage: {:.2f} MB".format(total_size / (1024 * 1024)))

# Print the shape of the flattened features array for an overview
flattened_features = np.concatenate(features)
print("Shape of flattened features array:", flattened_features.shape)

# Convert the features and labels to a DataFrame for an overview
df_features = pd.DataFrame(flattened_features, columns=['start', 'duration', 'pitch'])
df_labels = pd.Series(labels, name='label')

# Print basic information about the DataFrame
print("\nFeatures DataFrame info:")
print(df_features.info(memory_usage='deep'))
print("\nLabels DataFrame info:")
print(df_labels.describe())


Number of samples (features): 24464075
Shape of each feature array: (1, 3)
Number of samples (labels): 7665
Memory usage (features): 559.94 MB
Memory usage (labels): 0.26 MB
Total memory usage: 560.20 MB
Shape of flattened features array: (24464075, 3)

Features DataFrame info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24464075 entries, 0 to 24464074
Data columns (total 3 columns):
 #   Column    Dtype  
---  ------    -----  
 0   start     float64
 1   duration  float64
 2   pitch     float64
dtypes: float64(3)
memory usage: 559.9 MB
None

Labels DataFrame info:
count     7665
unique       4
top       Bach
freq      4650
Name: label, dtype: object


The dataset consists of 24,464,075 feature samples and 7,665 label samples. Each feature sample has a shape of (1, 3), indicating that each sample is represented by three features: start, duration, and pitch. The total memory usage for the features is approximately 559.94 MB, while the labels occupy around 0.26 MB, resulting in a combined memory usage of 560.20 MB. The flattened features array has a shape of (24,464,075, 3), confirming that there are 24,464,075 individual note records, each with three attributes.

The features are stored as floating-point numbers (float64), and the memory usage of the DataFrame containing these features aligns with the calculated memory usage. The labels DataFrame, containing 7,665 entries, indicates that there are four unique composers in the dataset. The most frequent composer in the labels is Bach, with 4,650 entries. This summary provides a clear understanding of the dataset's size, memory footprint, and distribution of the composer labels.

**FEATURE EXTRACTION**

Now, I'LL proceed with the feature extraction step. I'll extract features such as start time, duration, and pitch from the preprocessed notes. This will help in training the models.

In [None]:
# The extract_features function calculates the start time, duration, and pitch for each note.
def extract_features(notes):
    features = []
    if len(notes.shape) == 1:
        start, end, pitch = notes
        duration = end - start
        features.append([start, duration, pitch])
    else:
        for note in notes:
            start, end, pitch = note
            duration = end - start
            features.append([start, duration, pitch])
    return np.array(features)

# Load the preprocessed data and labels
preprocessed_data = np.load('preprocessed_data.npy', allow_pickle=True)
labels = np.load('labels.npy')

# Apply the extract_features function to each set of notes in the preprocessed data.
features = [extract_features(notes) for notes in preprocessed_data]

# Save the extracted features for further use in model building
np.save('features.npy', features)

print("Feature extraction completed.")

Feature extraction completed.
