# Music Composer Classification Using CNN and LSTM

## Introduction
  
  This project aims to classify music composers; Bach, Beethoven, Chopin, and Mozart—based on their musical pieces using deep learning techniques. The core approach involves extracting features from MIDI files and using CNN and LSTM models to predict the composer.
  

  ---------------------

## Project Goal
Use deep learning to accurately identify the composer of a given piece of music.

## Objective:
Develop a model to predict the composer from the four specified: Bach, Beethoven, Chopin, and Mozart, leveraging CNN and LSTM architectures.


## External Libraries and Frameworks

| Library/Tool            | Purpose                                            | Reference / Installation Source                                                                         |
| ----------------------- | -------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| **TensorFlow**          | Deep learning framework                            | `pip install tensorflow` <br> [https://www.tensorflow.org/](https://www.tensorflow.org/)                |
| **Librosa**             | Audio processing and feature extraction            | `pip install librosa` <br> [https://librosa.org/](https://librosa.org/)                                 |
| **SoundFile**           | Audio file reading                                 | `pip install soundfile` <br> [https://pysoundfile.readthedocs.io/](https://pysoundfile.readthedocs.io/) |
| **Tqdm**                | Progress bars                                      | `pip install tqdm` <br> [https://tqdm.github.io/](https://tqdm.github.io/)                              |
| **Timidity**            | MIDI to audio conversion                           | `apt-get install -y timidity` (Linux)                                                                   |
| **Python Standard Lib** | File operations, JSON handling, subprocess control | [https://docs.python.org/3/standard-library/](https://docs.python.org/3/standard-library/)              |


--------------------------


## Methodology and Code Explanation

1. **Data Preparation and Pre-processing**


*   Unzipping and organizing dataset: To make raw data accessible for processing, the dataset is unzipped into a working directory in Colab environment.

*   Cleaning composer folders:
To ensure data consistency and remove irrelevant files before feature extraction, the function `clean_composer_folder` is used to move all files from nested subfolders to a single composer directory, removes non-MIDI files, and deletes empty folders.


2. **Feature Extraction**


*   Rendering MIDI to audio:
The function `render_midi_to_audio` converts MIDI files into audio (WAV) format using timidity, then loads the audio for processing.
to convert to raw MIDI files to the required numeric audio signal input format needed for Deep learning models.

*   Extracting MFCC features:
The `save_mfcc` function loads the audio signal, normalizes length, splits it into segments, and extracts Mel-frequency cepstral coefficients (MFCCs) using librosa. It saves the features and labels in a JSON file.
MFCCs are widely used features representing the audio’s frequency spectrum, making them suitable for music classification.


3. **Model Development**



*   `Loading data` is a function used to load the JSON file containing MFCCs and labels into NumPy arrays for training.

*   `Building CNN model`:
A CNN model is built using TensorFlow/Keras layers (Conv2D, MaxPooling2D, Dense, Dropout). The model input shape corresponds to the MFCC feature dimensions. CNNs are effective for spatial feature extraction from spectrogram-like inputs (MFCCs).

* Training: The model is trained with training and validation splits, optimizing for classification accuracy.

4. **Evaluation and Optimization**
Evaluation is done through metrics such as accuracy, precision, and recall.

Hyperparameter tuning and data augmentation may be applied for improving the model.

----------------------------------


## Load Dataset

In [None]:
! pip install kaggle

In [None]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("blanderbuss/midi-classic-music")

print("Path to dataset files:", path)

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import zipfile

# Path to your zip file
zip_path = "/content/drive/MyDrive/Aritificial Intelligence/midi-classic-music.zip"   # change this to your file path
extract_dir = "//content/drive/MyDrive/Aritificial Intelligence/midi-classic-music_unzipped"  # where you want to extract

# Make sure output directory exists
os.makedirs(extract_dir, exist_ok=True)

# Unzip
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)

print(f"Extracted to {extract_dir}")

Extracted to //content/drive/MyDrive/Aritificial Intelligence/midi-classic-music_unzipped


In [4]:
# Get data folder to reflect selected composers only
import os
import shutil

# Original dataset
root_dir = "/content/drive/MyDrive/Aritificial Intelligence/midi-classic-music_unzipped/midiclassics"

# New filtered dataset folder
filtered_dir = "/content/drive/MyDrive/Aritificial Intelligence/midclassics_filtered"

# Composers to keep
keep_composers = {"Bach", "Beethoven", "Chopin", "Mozart"}

# Checks and ensure the filtered dataset folder exists
os.makedirs(filtered_dir, exist_ok=True)

# Loop through composers in root_dir
for composer in keep_composers:
    src_folder = os.path.join(root_dir, composer)
    dest_folder = os.path.join(filtered_dir, composer)

    if os.path.exists(src_folder):
        shutil.copytree(src_folder, dest_folder)
        print(f"{composer} Copied successfully")
    else:
        print(f"Warning: {composer} folder not found in {root_dir}")

# Delete the original dataset folder after copying
shutil.rmtree('/content/drive/MyDrive/Aritificial Intelligence/midi-classic-music_unzipped')

print("Filtering complete. Original dataset removed.")



Bach Copied successfully
Chopin Copied successfully
Mozart Copied successfully
Beethoven Copied successfully
Filtering complete. Original dataset removed.


## Data Preparation

In [None]:
from typing import Set

def clean_composer_folder(composer_dir: str) -> None:
    """
    Clean and flatten a composer's directory.

    This function moves all files from subfolders into `composer_dir`,
    removes non-.mid files, and deletes any empty subfolders.

    Args:
        composer_dir (str): Path to the composer's folder.
    """
    # Move files up to composer_dir
    for root, _, files in os.walk(composer_dir, topdown=False):
        if root == composer_dir:
            continue

        for file_name in files:
            src_file = os.path.join(root, file_name)
            dst_file = os.path.join(composer_dir, file_name)

            # Rename to avoid overwrite
            if os.path.exists(dst_file):
                base, ext = os.path.splitext(file_name)
                counter = 1
                while os.path.exists(
                    os.path.join(composer_dir, f"{base}_{counter}{ext}")
                ):
                    counter += 1
                dst_file = os.path.join(
                    composer_dir, f"{base}_{counter}{ext}"
                )

            shutil.move(src_file, dst_file)

    # Remove empty subdirectories
    for root, _, _ in os.walk(composer_dir, topdown=False):
        if root == composer_dir:
            continue
        if not os.listdir(root):
            os.rmdir(root)

    # Delete non-.mid files in composer_dir
    for file_name in os.listdir(composer_dir):
        file_path = os.path.join(composer_dir, file_name)
        if os.path.isfile(file_path) and not file_name.lower().endswith(".mid"):
            os.remove(file_path)
            print(f"Deleted non-MIDI file: {file_path}")



root_dataset_dir = "/content/drive/MyDrive/Aritificial Intelligence/midclassics_filtered"
composers = {"Bach", "Beethoven", "Chopin", "Mozart"}

for composer in composers:
    composer_path = os.path.join(root_dataset_dir, composer)
    if os.path.exists(composer_path):
        print(f"\nProcessing {composer} folder...")
        clean_composer_folder(composer_path)
    else:
        print(f"Folder for {composer} not found at {composer_path}")


In [None]:
!pip install pretty_midi librosa numpy tqdm
!apt-get install -y fluidsynth timidity
!pip install soundfile

# Feature Extraction

In [8]:
import math
import json
import librosa
import subprocess
import tempfile
from typing import Tuple, Optional, Dict, Any
from tqdm import tqdm
import soundfile as sf  # For reading audio


JSON_PATH: str = (
    "/content/drive/MyDrive/Aritificial Intelligence/composer_mfcc_data.json"
)
SAMPLE_RATE: int = 22050
DURATION: int = 15
SAMPLES_PER_TRACK: int = SAMPLE_RATE * DURATION


def render_midi_to_audio(
    midi_path: str, sample_rate: int = SAMPLE_RATE
) -> Tuple[Optional[list], Optional[int]]:
    """
    Render a MIDI file to audio and return the waveform and sample rate.

    Args:
        midi_path (str): Path to the MIDI file.
        sample_rate (int): Desired sample rate for the output audio.

    Returns:
        Tuple[Optional[list], Optional[int]]:
            - audio waveform as a list or None if failed
            - sample rate or None if failed
    """
    try:
        with tempfile.NamedTemporaryFile(suffix=".wav") as tmpwav:
            cmd = ["timidity", midi_path, "-Ow", "-o", tmpwav.name]
            subprocess.run(
                cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE
            )
            audio, sr = sf.read(tmpwav.name)

            if sr != sample_rate:
                audio = librosa.resample(audio, orig_sr=sr, target_sr=sample_rate)
                sr = sample_rate

            return audio, sr

    except Exception as e:
        print(f"Rendering MIDI failed for {midi_path} with error: {e}")
        return None, None


def save_mfcc(
    dataset_path: str,
    json_path: str,
    n_mfcc: int = 13,
    n_fft: int = 2048,
    hop_length: int = 512,
    num_segments: int = 5
) -> None:
    """
    Extract MFCC features from a dataset of MIDI files and save them to a JSON file.

    Args:
        dataset_path (str): Path to dataset with subfolders for each composer.
        json_path (str): Path to save the output JSON file.
        n_mfcc (int): Number of MFCCs to extract per segment.
        n_fft (int): FFT window size.
        hop_length (int): Number of samples between successive frames.
        num_segments (int): Number of equal segments to split each track into.
    """
    data: Dict[str, Any] = {
        "mapping": [],
        "mfcc": [],
        "labels": []
    }

    num_samples_per_segment = int(SAMPLES_PER_TRACK / num_segments)
    expected_num_mfcc_per_segment = math.ceil(num_samples_per_segment / hop_length)

    # Loop through all composer folders
    for i, (dirpath, _, filenames) in enumerate(os.walk(dataset_path)):
        if dirpath != dataset_path:
            semantic_label = os.path.basename(dirpath)
            data["mapping"].append(semantic_label)
            print(f"\nProcessing {semantic_label}")

            for file_name in tqdm(filenames, desc=f"{semantic_label} files"):
                file_path = os.path.join(dirpath, file_name)

                if not file_name.lower().endswith((".mid", ".midi")):
                    continue

                signal, sr = render_midi_to_audio(file_path, sample_rate=SAMPLE_RATE)
                if signal is None:
                    continue

                # Ensure fixed length
                if len(signal) < SAMPLES_PER_TRACK:
                    signal = librosa.util.fix_length(signal, size=SAMPLES_PER_TRACK)
                else:
                    signal = signal[:SAMPLES_PER_TRACK]

                # Process segments
                for s in range(num_segments):
                    start_sample = num_samples_per_segment * s
                    finish_sample = start_sample + num_samples_per_segment

                    if finish_sample > len(signal):
                        continue

                    mfcc = librosa.feature.mfcc(
                        y=signal[start_sample:finish_sample],
                        sr=sr,
                        n_mfcc=n_mfcc,
                        n_fft=n_fft,
                        hop_length=hop_length
                    ).T

                    if len(mfcc) == expected_num_mfcc_per_segment:
                        data["mfcc"].append(mfcc.tolist())
                        data["labels"].append(i - 1)
                        print(f"{file_path}, segment: {s + 1}")

    with open(json_path, "w") as fp:
        json.dump(data, fp, indent=4)


In [None]:
save_mfcc(root_dataset_dir, JSON_PATH, num_segments=20)

In [None]:
# Load JSON data
import numpy as np

data_path = "/content/drive/MyDrive/Aritificial Intelligence/composer_mfcc_data.json"
def load_data(data_path):
  with open(data_path, "r") as fp:
    data = json.load(fp)

  X = np.array(data["mfcc"])
  y = np.array(data["labels"])
  return X, y, data["mapping"]

X, y, mapping = load_data(JSON_PATH)
print(f"Data loaded: X.shape={X.shape}, y.shape={y.shape}, classes={mapping}")

In [None]:
from sklearn.model_selection import train_test_split
import tensorflow as tf

# Train/Test/Val split
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, stratify=y_temp, random_state=42)

# Add channel dimension for CNN
X_train = X_train[..., np.newaxis]
X_val = X_val[..., np.newaxis]
X_test = X_test[..., np.newaxis]

## Build CNN Model

In [1]:
!pip install tensorflow



In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    Input,
    Conv2D,
    MaxPooling2D,
    Flatten,
    Dense,
    Dropout
)
from tensorflow.keras.callbacks import History


def build_cnn_model(input_shape: Tuple[int, int, int], num_classes: int) -> tf.keras.Model:
    """
    Build and compile a CNN model for classification.

    Args:
        input_shape (Tuple[int, int, int]): Shape of the input data (height, width, channels).
        num_classes (int): Number of output classes.

    Returns:
        tf.keras.Model: Compiled CNN model.
    """
    model = Sequential([
        Input(shape=input_shape),
        Conv2D(32, (3, 3), activation="relu"),
        MaxPooling2D((2, 2)),
        Conv2D(64, (3, 3), activation="relu"),
        MaxPooling2D((2, 2)),
        Flatten(),
        Dense(128, activation="relu"),
        Dropout(0.3),
        Dense(num_classes, activation="softmax"),
    ])

    model.compile(
        optimizer="adam",
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"],
    )
    return model


# call function to train model
model = build_cnn_model(X_train.shape[1:], num_classes=len(mapping))
model.summary()

history: History = model.fit(
    X_train,
    y_train,
    validation_data=(X_val, y_val),
    epochs=30,
    batch_size=32,
)


## Training & Evaluation

In [None]:
# Evaluate model
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
print(f"\nTest accuracy: {test_acc:.4f}")