# Music Composer Identification using Deep Learning

The primary objective of this project is to develop a deep learning model that can predict the composer of a given musical score accurately. The project aims to accomplish this objective by using two deep learning techniques: Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN).

## Project Team & Responsibilities:

* **Dom:** Data Collection, Data Preprocessing (MIDI conversion, segmentation, augmentation), Feature Extraction (Piano Rolls for CNN, Sequential Features for LSTM).
* **Santosh:** CNN Model Building, Training, Evaluation, Optimization.
* **Jim:** LSTM Model Building, Training, Evaluation, Optimization.

## Project Roadmap & Status:

Here's a breakdown of our project phases and current status:

1.  **Initial Setup & Data Download (COMPLETED by Jim):**
    * Basic imports are set up.
    * The `blanderbuss/midi-classic-music` dataset has been downloaded from Kaggle.
    * *Status:* Ready for data processing.

2.  **Data Preprocessing & Feature Extraction (COMPLETED by Dom):**
    * **Goal:** Convert raw MIDI files into numerical features (Piano Rolls for CNNs, Sequential Features for LSTMs) and augment dataset.
    * **Responsible:** Dom.
    * *Current Status:* Completed / Needs implementation of the sections below.

3.  **Model Building (NEXT STEP for Team):**
    * **Goal:** Design CNN and LSTM model architectures.
    * **Responsible:** Santosh (CNN), Jim (LSTM).
    * *Dependencies:* Requires processed data from Phase 2.

4.  **Model Training & Evaluation (AFTER Model Building):**
    * **Goal:** Train the models and evaluate their performance using metrics like accuracy, precision, and recall.
    * **Responsible:** Santosh (CNN), Jim (LSTM).
    * *Dependencies:* Requires built models from Phase 3.

5.  **Model Optimization (Post Training):**
    * **Goal:** Fine-tune model hyperparameters to improve performance.
    * **Responsible:** Santosh (CNN), Jim (LSTM) & Dom (Feature Engineering).
    * *Dependencies:* Requires initial model training.

In [17]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import pandas as pd

Data Collection
The dataset contains the midi files of compositions from well-known classical composers like Bach, Beethoven, Chopin, and Mozart. The dataset has been labeled with the name of the composer for each score. Predictions are performed for only the below composers:

1-Bach

2-Beethoven

3-Chopin

4-Mozart

In [18]:
#%pip install kagglehub

import kagglehub

# Download latest version
path = kagglehub.dataset_download("blanderbuss/midi-classic-music")

print("Path to dataset files:", path)

Path to dataset files: C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1


In [19]:
import os

# List all files in the dataset path
for root, dirs, files in os.walk(path):
    for file in files:
        #print(os.path.join(root, file))
        # Check if the file is a MIDI file and contains 'bach' in its name.
        # There are other composers that need to be processed too.
        if (file.endswith('.mid') or file.endswith('.midi')) and 'bach' in file.lower():
            print(f"Found MIDI file: {file}")
            # Add file to Bach dataset processing logic here


Found MIDI file: C.P.E.Bach Solfeggieto.mid
Found MIDI file: Liszt Bach Prelude Transcription.mid
Found MIDI file: Piano version of Bachs two part inventions No...mid
Found MIDI file: Piano version of Bachs two part inventions No.1.mid
Found MIDI file: Piano version of Bachs two part inventions No.10.mid
Found MIDI file: Piano version of Bachs two part inventions No.11.mid
Found MIDI file: Piano version of Bachs two part inventions No.12.mid
Found MIDI file: Piano version of Bachs two part inventions No.13.mid
Found MIDI file: Piano version of Bachs two part inventions No.14.mid
Found MIDI file: Piano version of Bachs two part inventions No.15.mid
Found MIDI file: Piano version of Bachs two part inventions No.2.mid
Found MIDI file: Piano version of Bachs two part inventions No.3.mid
Found MIDI file: Piano version of Bachs two part inventions No.4.mid
Found MIDI file: Piano version of Bachs two part inventions No.5.mid
Found MIDI file: Piano version of Bachs two part inventions No.6.mid

Convert MIDI file to something useful for LSTM and CNN.

In [20]:
# I will place these here so they run after Kaggle download, as I encountered conflicts with the initial setup when adding above.
#!pip install music21
#!pip install pretty_midi
#!pip install --upgrade numpy # Ensure I have a recent numpy version

In [21]:
# Imports
import os
import glob
import music21
import pretty_midi
import numpy as np # Already imported, but good to have here for clarity for my feature engineering
import pickle
import collections

Data Pre-processing: Convert the musical scores into a format suitable for deep learning models. This involves converting the musical scores into MIDI files and applying data augmentation techniques.


In [22]:
# Data Preprocessing and Feature Extraction
KAGGLE_DOWNLOAD_PATH = "/root/.cache/kagglehub/datasets/blanderbuss/midi-classic-music/versions/1"
MIDI_DIR = KAGGLE_DOWNLOAD_PATH

OUTPUT_DIR = "/content/processed_data"
SEGMENT_DURATION_SECONDS = 5
SAMPLES_PER_SECOND = 100

PITCH_LOW = 21
PITCH_HIGH = 108
NUM_PITCHES = PITCH_HIGH - PITCH_LOW + 1

AUGMENT_TRANSPOSITION_STEPS = [-3, -2, -1, 1, 2, 3]
AUGMENT_TEMPO_SCALES = [0.9, 1.1]

# Defines composers
COMPOSERS = ["Bach", "Beethoven", "Chopin", "Mozart"]

# Creates output directory
os.makedirs(OUTPUT_DIR, exist_ok=True)
print(f"MIDI data will be processed from: {MIDI_DIR}")
print(f"Processed data will be saved to: {OUTPUT_DIR}")

MIDI data will be processed from: /root/.cache/kagglehub/datasets/blanderbuss/midi-classic-music/versions/1
Processed data will be saved to: /content/processed_data


###Feature Extraction : Extracts features from the MIDI files, such as notes, chords, and tempo, using music analysis tools.

Here, the preprocessed MIDI segments are converted into numerical representations. I've generated different formats for the CNN and LSTM models to leverage the strengths of each.

* **For CNNs: The Piano Roll**
    * **Purpose:** CNNs excel at recognizing visual patterns. A piano roll converts music into a 2D image (pitch vs. time), allowing the CNN to "see" and learn characteristic melodic shapes, harmonic voicings, and rhythmic patterns that define a composer's style.
    * **Details:** The piano roll captures note activity (velocity) across a defined pitch range (MIDI 21-108) over time, sampled at 100 samples per second. All outputs are normalized to [0,1] and padded/truncated to a consistent shape.
* **For LSTMs: Sequential Features (Chroma & Note Density)**
    * **Purpose:** LSTMs are great tools for understanding temporal sequences. These features describe the harmonic content and musical activity at each point in time, allowing the LSTM to learn how a composer's musical ideas evolve.
    * **Details:** Each time step in the sequence contains a 12-element Pitch Class Profile (Chroma) representing harmonic presence (e.g., C, C#, D) and a single value for overall note density/volume. These are also sampled at 100 samples per second and normalized.

In [31]:
# Feature Extraction - midi_to_sequential_features (for LSTMs)
# This function extracts time-series features like Pitch Class Profiles and note density from a MIDI segment for LSTMs

def midi_to_sequential_features(midi_data_segment: pretty_midi.PrettyMIDI, duration: float,
                                samples_per_second: int, pitch_low: int, pitch_high: int) -> np.ndarray:
    if not midi_data_segment.instruments:
        return None

    num_target_time_steps = int(duration * samples_per_second)
    num_features_per_timestep = 12 + 1 # Chroma + Note Density
    sequential_features = np.zeros((num_target_time_steps, num_features_per_timestep), dtype=np.float32)

    chroma_features = midi_data_segment.get_chroma(fs=samples_per_second).T
    print("Original chroma shape:", chroma_features.shape)  # should be (12, T)
    if chroma_features.shape[0] < num_target_time_steps:
        padding_needed = num_target_time_steps - chroma_features.shape[0]
        chroma_features = np.pad(chroma_features, ((0, padding_needed), (0, 0)), mode='constant')
    elif chroma_features.shape[0] > num_target_time_steps:
        chroma_features = chroma_features[:num_target_time_steps, :]
    
    note_density = np.zeros(num_target_time_steps, dtype=np.float32)
    for instrument in midi_data_segment.instruments:
        for note in instrument.notes:
            start_idx = int(note.start * samples_per_second)
            end_idx = int(note.end * samples_per_second)
            start_idx = max(0, min(start_idx, num_target_time_steps - 1))
            end_idx = max(0, min(end_idx, num_target_time_steps - 1))
            if end_idx >= start_idx:
                note_density[start_idx:end_idx] += note.velocity

    max_density = np.max(note_density)
    if max_density > 0:
        note_density /= max_density

    sequential_features[:, :12] = chroma_features
    sequential_features[:, 12] = note_density

    return sequential_features

In [47]:
from typing import Optional

# Feature Extraction - midi_to_piano_roll (for CNNs)
# This function converts a MIDI segment into a 2D image-like "piano roll" for CNNs.
def is_piano(instrument: pretty_midi.Instrument) -> bool:
    # Check program number (0-7 are all piano-related in General MIDI)
    return not instrument.is_drum and 0 <= instrument.program <= 7

def midi_to_piano_roll(midi_data_segment: pretty_midi.PrettyMIDI, duration: float,
                        samples_per_second: int, pitch_low: int, pitch_high: int) -> Optional[np.ndarray]:
    if not midi_data_segment.instruments:
        return None
    piano = None # Default instrument of acoustic piano, will be updated if a piano instrument is found
    for instrument in midi_data_segment.instruments:
        if is_piano(instrument):
            piano = instrument
    if (piano is None):
        print("No piano instrument found in MIDI segment.")
        return None
    piano_roll = piano.get_piano_roll(fs=samples_per_second, low=pitch_low, high=pitch_high)
    piano_roll = piano_roll / 127.0

    num_target_time_steps = int(duration * samples_per_second)
    num_pitches = pitch_high - pitch_low  # Should be 88
    current_time_steps = piano_roll.shape[1]

    if current_time_steps < num_target_time_steps:
        padding = np.zeros((num_pitches, num_target_time_steps - current_time_steps), dtype=np.float32)
        piano_roll = np.hstack([piano_roll, padding])
    elif current_time_steps > num_target_time_steps:
        piano_roll = piano_roll[:, :num_target_time_steps]

    #if current_time_steps < num_target_time_steps:
    #    padding_needed = num_target_time_steps - current_time_steps
    #    piano_roll = np.pad(piano_roll, ((0, 0), (0, padding_needed)), mode='constant')
    #elif current_time_steps > num_target_time_steps:
    #    piano_roll = piano_roll[:, :num_target_time_steps]

    return piano_roll.reshape(num_pitches, num_target_time_steps, 1)

In [25]:
# Utility Function - create_pretty_midi_segment
# This function extracts a specific time segment from a larger MIDI file.

def create_pretty_midi_segment(full_midi_data: pretty_midi.PrettyMIDI, start_time: float, end_time: float) -> pretty_midi.PrettyMIDI:
    segment_pm = pretty_midi.PrettyMIDI()
    for instrument in full_midi_data.instruments:
        new_instrument = pretty_midi.Instrument(program=instrument.program, is_drum=instrument.is_drum, name=instrument.name)
        for note in instrument.notes:
            if note.end > start_time and note.start < end_time:
                new_note = pretty_midi.Note(
                    velocity=note.velocity,
                    pitch=note.pitch,
                    start=max(0.0, note.start - start_time),
                    end=min(end_time - start_time, note.end - start_time)
                )
                if new_note.end > new_note.start:
                    new_instrument.notes.append(new_note)
        if new_instrument.notes:
            segment_pm.instruments.append(new_instrument)
    return segment_pm

In [26]:
# Utility Function - apply_augmentation
# This function modifies a MIDI segment by transposing its pitch or scaling its tempo.

def apply_augmentation(midi_data_segment: pretty_midi.PrettyMIDI, augmentation_type: str, value) -> pretty_midi.PrettyMIDI:
    augmented_midi = pretty_midi.PrettyMIDI()
    for instrument in midi_data_segment.instruments:
        new_instrument = pretty_midi.Instrument(program=instrument.program, is_drum=instrument.is_drum, name=instrument.name)
        for note in instrument.notes:
            new_note = pretty_midi.Note(note.velocity, note.pitch, note.start, note.end)
            new_instrument.notes.append(new_note)
        augmented_midi.instruments.append(new_instrument)

    if augmentation_type == 'transpose':
        for instrument in augmented_midi.instruments:
            for note in instrument.notes:
                note.pitch = int(max(0, min(127, note.pitch + value)))
    elif augmentation_type == 'tempo_scale':
        for instrument in augmented_midi.instruments:
            for note in instrument.notes:
                note.start *= value
                note.end *= value
    else:
        raise ValueError(f"Unknown augmentation type: {augmentation_type}")
    return augmented_midi

In [27]:
def extract_segments_from_midi(midi_path, segment_duration=5.0, samples_per_second=100):
    try:
        full_midi = pretty_midi.PrettyMIDI(midi_path)
    except Exception as e:
        print(f"Error loading {midi_path}: {e}")
        return []

    total_duration = full_midi.get_end_time()
    segments = []

    for start_time in np.arange(0, total_duration, segment_duration):
        end_time = min(start_time + segment_duration, total_duration)

        segment = pretty_midi.PrettyMIDI()
        for instrument in full_midi.instruments:
            new_instrument = pretty_midi.Instrument(program=instrument.program, is_drum=instrument.is_drum)
            for note in instrument.notes:
                if start_time <= note.start < end_time:
                    new_note = pretty_midi.Note(
                        velocity=note.velocity,
                        pitch=note.pitch,
                        start=note.start - start_time,
                        end=min(note.end, end_time) - start_time
                    )
                    new_instrument.notes.append(new_note)
            if new_instrument.notes:
                segment.instruments.append(new_instrument)

        # Only append segments with valid instruments
        if segment.instruments:
            segments.append(segment)

    return segments



### Data Processing Loop & Output Conclusion

This section orchestrates the loading of MIDI files, segmenting them, applying all augmentations, extracting features, and finally saving the processed data.

* **Process:** Iterates through each composer's MIDI files, segments them, applies both transposition and tempo scaling for each segment, and then generates both CNN and LSTM features.
* **Output Data:** The processed features and corresponding labels are saved as `.pkl` files in the `/content/processed_data/` directory.

---

#### **The data is ready for model training!**

* **For CNN Model (Santosh):**
    * Load `features_cnn.pkl`.
    * Expected input shape: `(num_segments, 88, 500, 1)` - (total samples, pitches, time steps, channels).
* **For LSTM Model (Jim):**
    * Load `features_lstm.pkl`.
    * Expected input shape: `(num_segments, 500, 13)` - (total samples, time steps, features per time step).
* **Labels:**
    * Load `labels.pkl` (numerical labels corresponding to composers).
    * Load `composer_to_label.pkl` and `label_to_composer.pkl` to map between numerical labels and composer names.

You can/should convert these NumPy arrays to PyTorch tensors for your models (e.g., `torch.tensor(data, dtype=torch.float32)` for features, `torch.tensor(labels, dtype=torch.long)` for labels).


In [49]:
# Define label mappings
composer_to_label = {composer: i for i, composer in enumerate(COMPOSERS)}
label_to_composer = {i: composer for composer, i in composer_to_label.items()}

features_cnn = []
features_lstm = []
labels = []

# Iterate through each composer
for composer in COMPOSERS:
    composer_dir = os.path.join(MIDI_DIR)
    print(f"Processing composer: {composer}")

    for root, dirs, files in os.walk(path):
        for file in files:
            print(os.path.join(root, file))
            # Check if the file is a MIDI file and contains 'bach' in its name.
            # There are other composers that need to be processed too.
            if (file.endswith('.mid') or file.endswith('.midi')) and composer.lower() in file.lower():
                midi_path = os.path.join(root, file)
                print("Reading file: ", file)

                try:
                    segments = extract_segments_from_midi(midi_path, SEGMENT_DURATION_SECONDS, SAMPLES_PER_SECOND)
                except Exception as e:
                    print(f"Skipping {file}: {e}")
                    continue

                for segment in segments:
                    all_augmented = [segment]

                    for step in AUGMENT_TRANSPOSITION_STEPS:
                        all_augmented.append(apply_augmentation(segment, 'transpose', step))
                    for scale in AUGMENT_TEMPO_SCALES:
                        all_augmented.append(apply_augmentation(segment, 'tempo_scale', scale))

                    for augmented_segment in all_augmented:
                        # CNN Features
                        piano_roll = midi_to_piano_roll(augmented_segment, duration=SEGMENT_DURATION_SECONDS,
                                                        samples_per_second=SAMPLES_PER_SECOND,
                                                        pitch_low=PITCH_LOW, pitch_high=PITCH_HIGH)
                        if piano_roll is not None:
                            features_cnn.append(piano_roll)

                        # LSTM Features
                        sequential = midi_to_sequential_features(augmented_segment, duration=SEGMENT_DURATION_SECONDS,
                                                                 samples_per_second=SAMPLES_PER_SECOND,
                                                                 pitch_low=PITCH_LOW, pitch_high=PITCH_HIGH)
                        if sequential is not None:
                            features_lstm.append(sequential)

                        # Append label only if both features were generated
                        if piano_roll is not None and sequential is not None:
                            labels.append(composer_to_label[composer])

print("Finished processing all composers.")

# Convert to NumPy arrays
features_cnn = np.array(features_cnn, dtype=np.float32)
features_lstm = np.array(features_lstm, dtype=np.float32)
labels = np.array(labels, dtype=np.int64)

# Save to disk
with open(os.path.join(OUTPUT_DIR, 'features_cnn.pkl'), 'wb') as f:
    pickle.dump(features_cnn, f)

with open(os.path.join(OUTPUT_DIR, 'features_lstm.pkl'), 'wb') as f:
    pickle.dump(features_lstm, f)

with open(os.path.join(OUTPUT_DIR, 'labels.pkl'), 'wb') as f:
    pickle.dump(labels, f)

with open(os.path.join(OUTPUT_DIR, 'composer_to_label.pkl'), 'wb') as f:
    pickle.dump(composer_to_label, f)

with open(os.path.join(OUTPUT_DIR, 'label_to_composer.pkl'), 'wb') as f:
    pickle.dump(label_to_composer, f)

print(f"Saved {len(labels)} labeled examples for training.")


Processing composer: Bach
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\midiclassics.zip
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Rothchild Symphony Rmw12 2mov.mid
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Rothchlid Symphony Rmw12 3mov.mid
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Sibelius Kuolema Vals op44.mid
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Tchaicovsky Waltz of the Flowers.MID
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Tchaikovsky Lake Of The Swans Act 1 1mov.mid
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Tchaikovsky Lake Of The Swans Act 1 2mov.mid
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Tchaikovsky

TypeError: Instrument.get_piano_roll() got an unexpected keyword argument 'low'

CNN Input: (batch_size, 1, 88, 500) → channel-first PyTorch format (grayscale piano roll)
CNN Output per segment: (batch_size, time_steps=some_N, features_per_step=some_M)


In [None]:
class ComposerCNN(nn.Module):
    def __init__(self, num_pitches, num_time_steps):
        super(ComposerCNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=(3, 3), padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=(3, 3), padding=1)
        self.fc1 = nn.Linear(64 * num_pitches * (num_time_steps // 4), 128)
        self.fc2 = nn.Linear(128, len(COMPOSERS))

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.max_pool2d(x, (2, 2))
        x = torch.relu(self.conv2(x))
        x = torch.max_pool2d(x, (2, 2))
        x = x.view(x.size(0), -1)  # Flatten
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In [None]:
model_cnn = ComposerCNN(NUM_PITCHES, int(SEGMENT_DURATION_SECONDS * SAMPLES_PER_SECOND))


Model Building: Develop a deep learning model using LSTM and CNN architectures to classify the musical scores according to the composer.


LSTM Input Shape: (batch_size, time_steps, features_per_step) → same as (batch_size, seq_len, input_size)

In [None]:
# Define label mappings
composer_to_label = {composer: i for i, composer in enumerate(COMPOSERS)}
label_to_composer = {i: composer for composer, i in composer_to_label.items()}

features_cnn = []
features_lstm = []
labels = []

# Iterate through each composer
for composer in COMPOSERS:
    composer_dir = os.path.join(MIDI_DIR)
    print(f"Processing composer: {composer}")

    for root, dirs, files in os.walk(path):
        for file in files:
            print(os.path.join(root, file))
            # Check if the file is a MIDI file and contains 'bach' in its name.
            # There are other composers that need to be processed too.
            if (file.endswith('.mid') or file.endswith('.midi')) and composer.lower() in file.lower():
                print("Reading file: ", file)
                midi_path = os.path.join(root, file)

                midi_path = os.path.join(root, file)

                try:
                    segments = extract_segments_from_midi(midi_path, SEGMENT_DURATION_SECONDS, SAMPLES_PER_SECOND)
                except Exception as e:
                    print(f"Skipping {file}: {e}")
                    continue

                for segment in segments:
                    all_augmented = [segment]

                    for step in AUGMENT_TRANSPOSITION_STEPS:
                        all_augmented.append(apply_augmentation(segment, 'transpose', step))
                    for scale in AUGMENT_TEMPO_SCALES:
                        all_augmented.append(apply_augmentation(segment, 'tempo_scale', scale))

                    for augmented_segment in all_augmented:
                        # CNN Features
                        piano_roll = midi_to_piano_roll(augmented_segment, duration=SEGMENT_DURATION_SECONDS,
                                                        samples_per_second=SAMPLES_PER_SECOND,
                                                        pitch_low=PITCH_LOW, pitch_high=PITCH_HIGH)
                        if piano_roll is not None:
                            features_cnn.append(piano_roll)

                        # LSTM Features
                        sequential = midi_to_sequential_features(augmented_segment, duration=SEGMENT_DURATION_SECONDS,
                                                                 samples_per_second=SAMPLES_PER_SECOND,
                                                                 pitch_low=PITCH_LOW, pitch_high=PITCH_HIGH)
                        if sequential is not None:
                            features_lstm.append(sequential)

                        # Append label only if both features were generated
                        if piano_roll is not None and sequential is not None:
                            labels.append(composer_to_label[composer])

print("Finished processing all composers.")

# Convert to NumPy arrays
features_cnn = np.array(features_cnn, dtype=np.float32)
features_lstm = np.array(features_lstm, dtype=np.float32)
labels = np.array(labels, dtype=np.int64)

# Save to disk
with open(os.path.join(OUTPUT_DIR, 'features_cnn.pkl'), 'wb') as f:
    pickle.dump(features_cnn, f)

with open(os.path.join(OUTPUT_DIR, 'features_lstm.pkl'), 'wb') as f:
    pickle.dump(features_lstm, f)

with open(os.path.join(OUTPUT_DIR, 'labels.pkl'), 'wb') as f:
    pickle.dump(labels, f)

with open(os.path.join(OUTPUT_DIR, 'composer_to_label.pkl'), 'wb') as f:
    pickle.dump(composer_to_label, f)

with open(os.path.join(OUTPUT_DIR, 'label_to_composer.pkl'), 'wb') as f:
    pickle.dump(label_to_composer, f)

print(f"Saved {len(labels)} labeled examples for training.")


Processing composer: Bach
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\midiclassics.zip
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Rothchild Symphony Rmw12 2mov.mid
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Rothchlid Symphony Rmw12 3mov.mid
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Sibelius Kuolema Vals op44.mid
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Tchaicovsky Waltz of the Flowers.MID
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Tchaikovsky Lake Of The Swans Act 1 1mov.mid
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Tchaikovsky Lake Of The Swans Act 1 2mov.mid
C:\Users\jim.mccarthy\.cache\kagglehub\datasets\blanderbuss\midi-classic-music\versions\1\Tchaikovsky

TypeError: PrettyMIDI.get_piano_roll() got an unexpected keyword argument 'low'

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset, random_split
import pickle
import numpy as np

# Hyperparameters
input_size = 13         # 12 chroma + 1 note density
hidden_size = 128       # Can be tuned
num_layers = 2          # Can be tuned
num_classes = 10        # Update based on your label count
batch_size = 64
num_epochs = 30
learning_rate = 0.001

# ------------------------------
# Load Preprocessed Data
# ------------------------------
with open('/content/processed_data/features_lstm.pkl', 'rb') as f:
    X = pickle.load(f)
with open('/content/processed_data/labels.pkl', 'rb') as f:
    y = pickle.load(f)

# Convert to PyTorch tensors
X_tensor = torch.tensor(X, dtype=torch.float32)        # Shape: (N, 500, 13)
y_tensor = torch.tensor(y, dtype=torch.long)           # Shape: (N,)

# Dataset and DataLoader
dataset = TensorDataset(X_tensor, y_tensor)
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_ds, val_ds = random_split(dataset, [train_size, val_size])
train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_ds, batch_size=batch_size)

# ------------------------------
# Define the LSTM Model
# ------------------------------
class ComposerLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(ComposerLSTM, self).__init__()
        self.lstm = nn.LSTM(input_size=input_size,
                            hidden_size=hidden_size,
                            num_layers=num_layers,
                            batch_first=True,
                            dropout=0.3,
                            bidirectional=False)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # x: (batch_size, seq_len, input_size)
        lstm_out, _ = self.lstm(x)  # output: (batch_size, seq_len, hidden_size)
        out = lstm_out[:, -1, :]    # Take last time step
        out = self.fc(out)
        return out

# Initialize model, loss, optimizer
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = ComposerLSTM(input_size, hidden_size, num_layers, num_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)



Model Training: Train the deep learning model using the pre-processed and feature-extracted data.


In [None]:
# ------------------------------
# Training Loop
# ------------------------------
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for X_batch, y_batch in train_loader:
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)

        optimizer.zero_grad()
        outputs = model(X_batch)
        loss = criterion(outputs, y_batch)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Validation
    model.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for X_val, y_val in val_loader:
            X_val, y_val = X_val.to(device), y_val.to(device)
            outputs = model(X_val)
            _, predicted = torch.max(outputs.data, 1)
            total += y_val.size(0)
            correct += (predicted == y_val).sum().item()

    val_accuracy = 100 * correct / total
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}, Validation Accuracy: {val_accuracy:.2f}%")

# Save the model
torch.save(model.state_dict(), "composer_lstm_model.pth")

Model Evaluation: Evaluate the performance of the deep learning model using accuracy, precision, and recall metrics.


Model Optimization: Optimize the deep learning model by fine-tuning hyperparameters.