# Aligning Orchestral Music

Below we show how to use this library to align some of the audio used in the paper.  For this notebook to work, we assume the user has installed the audio requirements (<code>pip install -r requirements_audio.txt</code> at the root of the repository). we assume that the user has already downloaded the "short" orchestral pieces in the benchmark using the script in <code>experiments/orchestral.py</code>. We do not provide the pieces here for copyright reasons.


### Audio example 1: Vivalid's Spring
The first step is to load in the audio

In [1]:
import linmdtw
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
import warnings
warnings.filterwarnings("ignore")
import IPython.display as ipd

sr = 44100
x0_0, sr = linmdtw.load_audio("../experiments/OrchestralPieces/Short/0_0.mp3", sr)
x0_1, sr = linmdtw.load_audio("../experiments/OrchestralPieces/Short/0_1.mp3", sr)

Next, we'll compute the "MFCC mod" features for each audio clip, as described in [1]

[1] Gadermaier, Thassilo, and Gerhard Widmer. "A Study of Annotation and Alignment Accuracy for Performance Comparison in Complex Orchestral Music." arXiv preprint arXiv:1910.07394 (2019).

In [2]:
hop_length = 512
X0_0 = linmdtw.get_mfcc_mod(x0_0, sr, hop_length)
X0_1 = linmdtw.get_mfcc_mod(x0_1, sr, hop_length)

Now, we can extract a warping path between the two audio streams using the main DTW library

In [3]:
import time
metadata = {'totalCells':0, 'M':X0_0.shape[0], 'N':X0_1.shape[0], 
            'timeStart':time.time(), 'perc':10}
path0 = linmdtw.linmdtw(X0_0, X0_1, do_gpu=True, metadata=metadata)

Parallel Alignment 10.0% Elapsed time: 2.61
Parallel Alignment 20.0% Elapsed time: 6.56
Parallel Alignment 30.0% Elapsed time: 10.7
Parallel Alignment 40.0% Elapsed time: 13.9
Parallel Alignment 50.0% Elapsed time: 17.9
Parallel Alignment 60.0% Elapsed time: 20.9
Parallel Alignment 70.0% Elapsed time: 24.7
Parallel Alignment 80.0% Elapsed time: 27.8
Parallel Alignment 90.0% Elapsed time: 30.3


Before we apply the computed warping path, let's compare the first 40 seconds of the two audio clips side by side.  We'll put the first one in the left ear and the second one in the right ear.  The one on the left goes faster than the one on the right, but it starts later. Because of this, they are in sync for a brief moment, but the left one then overtakes the right one for the rest of it.

In [4]:
xunsync0 = np.zeros((sr*40, 2))
xunsync0[:, 0] = x0_0[0:sr*40]
xunsync0[:, 1] = x0_1[0:sr*40]
linmdtw.save_audio(xunsync0, sr, "unsync0")
ipd.Audio("unsync0.mp3")

Let's now apply the computed warping path to see how the alignment went.  This library wraps arround the pyrubberband library, which we can use to stretch the audio in x1 to match x2, according to this warping path.  The method <code>stretch_audio</code> returns a stereo audio stream with the resulting stretched version of x1 in the left ear and the original version of x2 in the right ear.  Let's save the first 30 seconds of this to disk and listen to it

In [5]:
xsync0 = linmdtw.stretch_audio(x0_0, x0_1, sr, path0, hop_length)
linmdtw.save_audio(xsync0[0:sr*30, ::], sr, "sync0")
ipd.Audio("sync0.mp3")

Stretching...


### Audio Example 2: Schubert's Unfinished Symphony
We now show one more example with a 45 second clip from Schubert's Unfinished Symphony (short clip index 5 in the paper corpus)

In [6]:
## Step 1: Load in audio
sr = 44100
x5_0, sr = linmdtw.load_audio("../experiments/OrchestralPieces/Short/5_0.mp3", sr)
x5_1, sr = linmdtw.load_audio("../experiments/OrchestralPieces/Short/5_1.mp3", sr)
## Step 2: Compute Features
hop_length = 512
X5_0 = linmdtw.get_mfcc_mod(x5_0, sr, hop_length)
X5_1 = linmdtw.get_mfcc_mod(x5_1, sr, hop_length)

## Step 3: Run DTW in verbose mode
metadata = {'totalCells':0, 'M':X5_0.shape[0], 'N':X5_1.shape[0], 
            'timeStart':time.time(), 'perc':10}
path5 = linmdtw.linmdtw(X5_0, X5_1, do_gpu=True, metadata=metadata)

Parallel Alignment 10.0% Elapsed time: 51.5
Parallel Alignment 20.0% Elapsed time: 105
Parallel Alignment 30.0% Elapsed time: 161
Parallel Alignment 40.0% Elapsed time: 215
Parallel Alignment 50.0% Elapsed time: 270
Parallel Alignment 60.0% Elapsed time: 325
Parallel Alignment 70.0% Elapsed time: 374
Parallel Alignment 80.0% Elapsed time: 424
Parallel Alignment 90.0% Elapsed time: 477


In [7]:
## Step 4: Synchronize audio and play the results
xsync5 = linmdtw.stretch_audio(x5_0, x5_1, sr, path5, hop_length)
linmdtw.save_audio(xsync5[sr*45:sr*90, ::], sr, "sync5")
ipd.Audio("sync5.mp3")

Stretching...
