[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MusicalInformatics/miws2024/blob/main/alignment/Symbolic_Music_Alignment.ipynb)

In [55]:
try:
    import google.colab
    IN_COLAB = True
except:
    IN_COLAB = False

if IN_COLAB:
    # Install python packages
    # Issues on Colab with newer versions of MIDO
    ! pip install mido==1.2.10
    ! pip install partitura
    ! pip install fastdtw

    # To be able to access helper modules in the repo for this tutorial
    # (not necessary if the jupyter notebook is run locally instead of google colab)
    !git clone https://github.com/neosatrapahereje/music_alignment_tutorial
    !git clone --recurse-submodules https://github.com/MusicalInformatics/miws2024.git
    import sys
    sys.path.insert(0, "./miws2024/alignment/")

# Symbolic Music Alignment

Automatic Music Alignment refers to the task of linking or matching two musical signals of the same musical work. This can be, e.g., matching *different performances* of the same piece, or matching the performance of a piece with its musical score.

<img src="figures/alignment_figure.gif" alt="alignment_figure" width="600"/>

The following figure shows a common music alignment pipeline:

<img src="figures/alignment_pipeline.png" alt="alignment_pipeline" width="600"/>

In this notebook we are going to explore these components in more detail.

In [56]:
# Let's start by importing some stuff
import os 
# import glob
import warnings

import numpy as np
import matplotlib.pyplot as plt
import partitura as pt

from alignment import fast_dynamic_time_warping, greedy_note_alignment

from typing import List

warnings.filterwarnings("ignore")
%config InlineBackend.figure_format ='retina'

if IN_COLAB:
    V4X22_DATASET_DIR = "./miws2024/vienna4x22"
else:
    # Path to the Vienna 4x22 dataset
    # Get the absolute path of the current notebook
    notebook_path = os.path.abspath(".")
    V4X22_DATASET_DIR = os.path.join(os.path.dirname(notebook_path), "vienna4x22")

MUSICXML_DIR = os.path.join(V4X22_DATASET_DIR, "musicxml")
MIDI_DIR = os.path.join(V4X22_DATASET_DIR, "midi")
MATCH_DIR = os.path.join(V4X22_DATASET_DIR, "match")



## 1. Music Representation

Music representations, since this is a tutorial on symbolic music processing, we will focus on symbolic music representations, such that can be stored in formats such as MIDI, MusicXML or MEI, and that can be generated by editors like MuseScore, Finale, etc.

### 1.1 Audio vs. Symbolic Alignment

* In **Audio-based alignment**, the alignment itself typically refers to  of *timestamps* (in absolute time in seconds) in one audio recording of a musical work to the corresponding *timestamp* in another recording. (In audio recordings, identifying individual notes is not a trivial task)


* In **Symbolic-based alignment**, we can have two types of alignment:
    * **Time-wise alignments**: similar to audio-based alignment, we can map timestamps (in symbolic time units like musical beats or MIDI ticks) from one version of the work to another (e.g., a MIDI performance to a score in MusicXML/MEI/Humdrum format).

    * **Note-wise alignment**: We can map individual symbolic music elements (most commonly notes) from one version to another. This is very useful for modeling expressive performance.


### 1.2 Types of music alignment

We can categorize musical alignment in two main dimensions: (representation) modality and time.

#### 1.2.1 Representation modality


* **Audio-to-audio alignment**: Alignment of two (audio) recordings. This is probably the most studied type of alignment in the MIR literature.

* **Symbolic-to-audio alignment**: Alignment of symbolically encoded musical events with timestamps in an audio recording.

* **Symbolic-to-symbolic alignment**: Alignment of symbolically encoded musical events in two recordings/documents of the same work.

* **Image-to-audio alignment**: Alignment of spatial positions of (images) of sheet music with timestamps in an audio recording.

* **Lyrics-to-audio alignment**: Alignment of lyrics (text) with timestamps in an audio recording.

#### 1.2.2 Time

* **Offline**: Alignment of two *recordings/documents* (i.e., audio recordings, MIDI performances, MusicXML scores, etc.). These recordings/documents can be in any of the modalities described above, the important thing being that the music is occurring in real-time.

* **Online**: Alignment of a live (i.e., real time) performance to the music encoded in a target document (e.g., a pre-annotated audio recording, a symbolic score, etc.). The problem of real time online alignment is known in the MIR literature a **score following**, and can be useful in live interactive settings, such as automatic accompaniment systems

In this tutorial we are going to focus on the case of offline alignment.

### 1.3 Representing Alignments

#### 1.3.1 The Match file format

* Format that extends a MIDI human performance with note-, beat-, and downbeat-level alignments to a corresponding musical score. 

* Enables advanced analyses of the performance that are relevant for various tasks, such as

    * expressive performance modeling, 
    * score following, 
    * music transcription
    * performer classification, etc. 

* The match file includes a set of score-related descriptors that makes it usable also as a bare-bones score representation. 

<img src="figures/matchfile_schema.png" alt="matchfile_schema" width="600"/>

The documentation of the matchfile format can be found [here](https://cpjku.github.io/matchfile/).

#### 1.3.2 Loading Alignments

An important use case of partitura is to handle symbolic alignment information

**Note that partitura itself does not contain methods for alignment**

Partitura supports 2 formats for encoding score-to-performance alignments

* Our match file format
    * Datasets including match files: Vienna4x22, ASAP, Batik (soon!)
* The format introduced by [Nakamura et al. (2017).](https://eita-nakamura.github.io/articles/EN_etal_ErrorDetectionAndRealignment_ISMIR2017.pdf)

Let's load an alignment!

We have two common use cases

* We have both the match file and the symbolic score file (e.g., MusicXML or MEI)
* We have only the match file (only works for our format!)

##### 1.3.2.1 Loading an alignment if we only have a match file

A useful property of match files is that they include information about the **score and the performance**. Therefore, it is possible to create both a  `Part` and a `PerformedPart` directly from a match file.

* Match files contain all information included in performances in MIDI files, i.e., a MIDI file could be reconstructed from a match file.

* Match files include all information information about pitch spelling and score position and duration of the notes in the score, as well as time and key signature information, and can encode some note-level markings, like accents. Nevertheless, it is important to note that the score information included in a match file is not necessarily complete. For example, match files do not generally include dynamics or tempo markings.

In [57]:
# Let's start by importing some stuff
import warnings

warnings.filterwarnings("ignore")
import glob
import numpy as np
import matplotlib.pyplot as plt
import partitura as pt

In [58]:
# path to the match
match_fn = os.path.join(MATCH_DIR, "Chopin_op10_no3_p01.match")

# loading a match file
performance, alignment, score = pt.load_match(match_fn, create_score=True)

##### 1.3.2.2 Loading an alignment if we have both score and match files

In many cases, however, we have access to both the score and match files. Using the original score file has a few advantages:

* It ensures that the score information is correct. Generating a `Part` from a match file involves inferring information for non-note elements (e.g., start and end time of the measures, voice information, clefs, staves, etc.).
* If we want to load several performances of the same piece, we can load the score only once!

This should be the preferred way to get alignment information!

In [59]:
# path to the match
match_fn = os.path.join(MATCH_DIR, "Chopin_op10_no3_p01.match")

# Path to the MusicXML file
score_fn = os.path.join(MUSICXML_DIR, "Chopin_op10_no3.musicxml")

# Load the score into a `Score` object
score = pt.load_musicxml(score_fn)

# loading a match file
performance, alignment = pt.load_match(match_fn)

Score-to-performance alignments are represented by lists of dictionaries, which contain the following keys:

* `label`

    * `'match'`: there is a performed note corresponding to a score note
    * `'insertion'`: the performed note does not correspond to any note in the score
    * `'deletion'`: there is no performed note corresponding to a note in the score
    * `'ornament'`: the performed note corresponds to the performance of an ornament (e.g., a trill). These notes are matched to the main note in the score. Not all alignments (in the datasets that we have) include ornamnets! Otherwise, ornaments are just treated as insertions.
* `score_id`: id of the note in the score (in the `Part` object) (only relevant for matches, deletions and ornaments)
* `performance_id`: Id of the note in the performance (in the `PerformedPart`) (only relevant for matches, insertions and ornaments)

In [None]:
alignment[:10]

#### 1.3.3 Getting information from the alignments

Partitura includes a few methods for getting information from the alignments.

Let's start by getting the subset of score notes that have a corresponding performed note

In [61]:
# note array of the score
snote_array = score.note_array()
# note array of the performance
pnote_array = performance.note_array()
# indices of the notes that have been matched
matched_note_idxs = pt.musicanalysis.performance_codec.get_matched_notes(
    spart_note_array=snote_array,
    ppart_note_array=pnote_array,
    alignment=alignment,
)

# note array of the matched score notes
matched_snote_array = snote_array[matched_note_idxs[:, 0]]
# note array of the matched performed notes
matched_pnote_array = pnote_array[matched_note_idxs[:, 1]]

## 2. Feature Representations

To make musical data comparable for alignment algorithms, the first step is to extract features that capture relevant aspects while suppressing irrelevant details.

Let's make a quick mathematical parenthesis. For algorithmic purposes, it is convenient to represent the music captured by whichever music representation that we working with as *sequences of features*. 

Let us consider two sequences $\mathbf{X} = \{\mathbf{x}_1, \dots \mathbf{x}_N\}$ and $\mathbf{Y} = \{\mathbf{y}_1, \dots, \mathbf{y}_M\}$ for which we want to find an aligment.

* This sequences could be discrete signals, feature sequences, sequences of characters, etc.

* The elements $\mathbf{x}_n$, $\mathbf{y}_m$ belong to the same **feature space** $\mathcal{F}$. For the purposes of this tutorial, let us consider these elements as $K$-dimensional real-vectors, i.e., $\mathbf{x}, \mathbf{y} \in \mathbb{R}^K$ although they can other kind of objects (e.g., characters in an alphabet).

An important aspect of this feature space is that it allows us to use *quantitive measures* of how similar the elements of sequence $\mathbf{X}$ are to the elements in sequence $\mathbf{Y}$. We will come back to this point in a moment.

In this tutorial we are going to focus on 2 common features representations:

1. Piano Rolls
2. Pitch Class Distributions

### 2.1 Piano Rolls

A piano roll is a 2D representation of (MIDI) pitch and time. We can extract piano rolls from symbolic music files with Partitura!

In [62]:
from helper import generate_example_sequences, plot_alignment
from alignment import greedy_note_alignment

from typing import List

%config InlineBackend.figure_format ='retina'

In [63]:
# Let's load a score and a performance of the score

# Path to the MusicXML file
score_fn = os.path.join(MUSICXML_DIR, "Chopin_op10_no3.musicxml")
performance_fn = os.path.join(MIDI_DIR, "Chopin_op10_no3_p01.mid")

score = pt.load_score(score_fn)
performance = pt.load_performance(performance_fn)

In [64]:
# Compute piano roll
use_piano_range = False
score_pr = pt.utils.music.compute_pianoroll(
    note_info=score,
    # time_unit="auto"
    # time_div="auto",
    # onset_only=False,
    # note_separation=False,
    # remove_silence=True,
    piano_range=use_piano_range,
    # return_idxs=False,
)

performance_pr = pt.utils.music.compute_pianoroll(
    note_info=performance,
    # time_unit="auto"
    # time_div="auto",
    # onset_only=False,
    # note_separation=False,
    # remove_silence=True,
    piano_range=use_piano_range,
    # return_idxs=False,
)

Let's have a look at the output of these functions:

In [None]:
score_pr

By default, piano rolls computed with partitura are stored in scipy's sparse matrices, since most of the elements are 0.

The first dimension of the array is MIDI pitch (128) and the second dimension are discrete time-steps defined by the `time_div` and `time_unit` arguments of the  `compute_pianoroll` function.

Let's visualize the piano rolls!

In [None]:
%matplotlib inline

fig, axes = plt.subplots(2, figsize=(10, 7))
axes[0].imshow(
    score_pr.todense(),
    aspect="auto",
    origin="lower",
    cmap="gray",
    interpolation="nearest",
)
axes[1].imshow(
    performance_pr.todense(),
    aspect="auto",
    origin="lower",
    cmap="gray",
    interpolation="nearest",
)
y_label = "Piano key" if use_piano_range else "MIDI pitch"
axes[0].set_ylabel(y_label)
axes[1].set_ylabel(y_label)
axes[0].set_title("Score")
axes[1].set_title("Performance")
axes[1].set_xlabel("Time")
plt.show()

For more information, see the documentation of  [`compute_pianoroll`](https://partitura.readthedocs.io/en/latest/modules/partitura.utils.html#partitura.utils.compute_pianoroll).

### 2.2 Pitch Class Distributions

These features are the symbolic equivalent to *chroma* features in audio. This representation is basically a piano roll that has been folded into a single octave.

In [67]:
score_pc_pr = pt.utils.music.compute_pitch_class_pianoroll(
    score,
    normalize=True,
    time_unit="beat",
    time_div=4,
)

Let's plot this feature and compare it to a piano roll of the same score!

In [None]:
score_pr = pt.utils.music.compute_pianoroll(
    note_info=score,
    time_unit="beat",
    time_div=4,
    piano_range=False,
)

fig, axes = plt.subplots(2, figsize=(10, 5), sharex=True)

axes[0].imshow(
    score_pc_pr,
    aspect="auto",
    origin="lower",
    cmap="gray",
    interpolation="nearest",
)
axes[0].set_title("Pitch Class Distribution")
axes[0].set_ylabel("Pitch classes")
axes[1].imshow(
    score_pr.todense(),
    aspect="auto",
    origin="lower",
    cmap="gray",
    interpolation="nearest",
)
axes[1].set_title("Piano roll")
axes[1].set_ylabel("MIDI pitch")

plt.show()

## 3. Alignment Methods

We move now to methods for computing the alignment between features from one version of a piece of music to another. Common methods are dynamic programming approaches like dynamic time warping (DTW) and probabilistic approaches like hidden Markov models.

### 3.1 Alignments with Dynamic Time Warping.

DTW is a dynamic programming algorithm to find the **optimal** alignment between to time-dependent sequences. In a nutshell, the DTW algorithm finds the alignment between two sequence in three steps:

1. Compute the pairwise distance between elements in sequence $\mathbf{X}$ and $\mathbf{Y}$.
2. Compute the accumulated cost matrix
3. Find the best alignment by backtracking 

We will explore these steps with a simple example. 

## 4 Music Alignment with DTW

1. **Feature Computation**: Compute features from score and the performance (in this case the piano rolls)

2. **Time-wise Alignment**: Compute the alignment between the sequences of features using DTW. This produces a time-wise alignment that tells us which time in the performance (in seconds) corresponds to which time in the score (in musical beats).

3. **Note-level Alignment**: To get a note-level alignment, we match notes in the performance to notes in the score considering the time-wise alignment, and matching notes greedily by pitch.

Dynamic Time Warping and related sequence alignment algorithms return a path between two sequences or time series. Note alignment of two polyphonic parts is categorically different from a time series alignment. To get to a note alignment, we need to figure out what notes are played at a specific time in the piano roll. Sometimes this information might be imprecise so we need to relax the search for notes at some piano roll time to find all relevant notes.

In [69]:
from alignment import fast_dynamic_time_warping

score_pr, sidx = pt.utils.music.compute_pianoroll(
    note_info=score,
    time_unit="beat",
    time_div=8,
    return_idxs=True,
    piano_range=True,
    binary=True,
    note_separation=True,
)

performance_pr, pidx = pt.utils.music.compute_pianoroll(
    note_info=performance,
    time_unit="sec",
    time_div=10,
    return_idxs=True,
    piano_range=True,
    binary=True,
    note_separation=True,
)

reference_features = score_pr.toarray().T
performance_features = performance_pr.toarray().T

In [None]:
# pitch_index, onset, offset, midi_pitch
sidx[:5]

In [None]:
# idx correspond to notes in note_array
snote_array = score.note_array()
print(snote_array[:5])

# Check that the pitch in the note array corresponds to
# the fourth column in the indices from the note array
assert all(snote_array["pitch"] == sidx[:, 3])

In [72]:
# Dynamic time warping
dtw_alignment = fast_dynamic_time_warping(
    X=reference_features,
    Y=performance_features,
    metric="cityblock",
)

In [73]:
note_alignment = greedy_note_alignment(
    warping_path=dtw_alignment,
    idx1=sidx,
    note_array1=score.note_array(),
    idx2=pidx,
    note_array2=performance.note_array(),
)

In [None]:
note_alignment[:5]

### 4.1 Comparing alignments

Let's compare different alignment methods

In [75]:
# This file contains the ground truth alignment
gt_alignment_fn = os.path.join(MATCH_DIR, "Chopin_op10_no3_p01.match")

# Load the alignment and the performance
performance, gt_alignment = pt.load_match(
    gt_alignment_fn, pedal_threshold=127, first_note_at_zero=True
)
pnote_array = performance.note_array()

# Load the score
score_fn = os.path.join(MUSICXML_DIR, "Chopin_op10_no3.musicxml")
score = pt.load_score(score_fn)
snote_array = score.note_array()

In [76]:
# Compute the features
score_pcr, sidx = pt.utils.music.compute_pitch_class_pianoroll(
    note_info=score,
    time_unit="beat",
    time_div=8,
    return_idxs=True,
    binary=True,
    note_separation=True,
)

performance_pcr, pidx = pt.utils.music.compute_pitch_class_pianoroll(
    note_info=performance,
    time_unit="sec",
    time_div=8,
    return_idxs=True,
    binary=True,
    note_separation=True,
)

reference_features = score_pcr.T
performance_features = performance_pcr.T

In [77]:
# DTW
dtw_pcr_warping_path = fast_dynamic_time_warping(
    X=reference_features,
    Y=performance_features,
    metric="cityblock",
)

dtw_pcr_alignment = greedy_note_alignment(
    warping_path=dtw_pcr_warping_path,
    idx1=sidx,
    note_array1=snote_array,
    idx2=pidx,
    note_array2=pnote_array,
)

In [78]:
# Compute the features
score_pr, sidx = pt.utils.music.compute_pianoroll(
    note_info=score,
    time_unit="beat",
    time_div=8,
    return_idxs=True,
    piano_range=True,
    binary=True,
    note_separation=True,
)

performance_pr, pidx = pt.utils.music.compute_pianoroll(
    note_info=performance,
    time_unit="sec",
    time_div=8,
    return_idxs=True,
    piano_range=True,
    binary=True,
    note_separation=True,
)

reference_features = score_pr.toarray().T
performance_features = performance_pr.toarray().T

# DTW
dtw_pr_warping_path = fast_dynamic_time_warping(
    X=reference_features,
    Y=performance_features,
    metric="cityblock",
)

dtw_pr_alignment = greedy_note_alignment(
    warping_path=dtw_pr_warping_path,
    idx1=sidx,
    note_array1=snote_array,
    idx2=pidx,
    note_array2=pnote_array,
)

In [79]:
# invent a linear alignment for testing
from helper import dummy_linear_alignment

# Dummy linear alignment
linear_warping_path = dummy_linear_alignment(
    X=reference_features,
    Y=performance_features,
)

linear_alignment = greedy_note_alignment(
    warping_path=linear_warping_path,
    idx1=sidx,
    note_array1=snote_array,
    idx2=pidx,
    note_array2=pnote_array,
)

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
axes[0].plot(
    linear_warping_path[:, 0],
    linear_warping_path[:, 1],
    label="linear",
)
axes[0].plot(
    dtw_pr_warping_path[:, 0],
    dtw_pr_warping_path[:, 1],
    label="DTW (piano roll)",
)
axes[0].plot(
    dtw_pcr_warping_path[:, 0],
    dtw_pcr_warping_path[:, 1],
    label="DTW (pitch class)",
)
axes[1].plot(
    linear_warping_path[:, 0],
    linear_warping_path[:, 1],
    label="linear",
)
axes[1].plot(
    dtw_pr_warping_path[:, 0],
    dtw_pr_warping_path[:, 1],
    label="DTW (piano roll)",
)
axes[1].plot(
    dtw_pcr_warping_path[:, 0],
    dtw_pcr_warping_path[:, 1],
    label="DTW (pitch class)",
)
axes[0].set_xlabel("Index in score")
axes[1].set_xlabel("Index in score")
axes[0].set_ylabel("Index in performance")
axes[1].set_xlim((200, 300))
axes[1].set_ylim((450, 550))
plt.legend()
plt.show()

To inspect an alignment, we can use [**Parangonada**](https://sildater.github.io/parangonada/), a tool to compare alignments developed at our institute!

In [81]:
from partitura import save_parangonada_csv

# Export files to Parangonada
outdir = "parangonada_files"
if not os.path.exists(outdir):
    os.mkdir(outdir)
save_parangonada_csv(
    alignment=dtw_pr_alignment,
    performance_data=performance,
    score_data=score,
    zalign=linear_alignment,
    outdir="parangonada_files",
)

In [None]:
from helper import evaluate_alignment_notewise

print(f"Method\tF-score\tPrecision\tRecall")
method = "linear"

methods = [
    (linear_alignment, "linear"),
    (dtw_pr_alignment, "DTW (piano roll)"),
    (dtw_pcr_alignment, "DTW (pitch class)"),
]

for align, method in methods:
    precision, recall, fscore = evaluate_alignment_notewise(
        prediction=align,
        ground_truth=gt_alignment,
    )
    print(f"{method}\t{fscore:.4f}\t{precision:.4f}\t{recall:.4f}")

## 5. Alignment Applications

In this example, we are going to compare tempo curves of different performances of the same piece. Partitura includes a utility function called `get_time_maps_from_alignment`which creates functions (instances of Scipy's [interp1d](https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html)) that map score time to performance time (and the other way around).

In [None]:
from helper import plot_tempo_curves

# get all match files
piece = "Mozart_K331_1st-mov"
# piece = "Schubert_D783_no15"
# piece = "Chopin_op38"
# piece = "Chopin_op10_no3"

plot_tempo_curves(piece, V4X22_DATASET_DIR)