Before you turn this exercise in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says "YOUR ANSWER HERE" or `YOUR CODE HERE` and remove the `raise NotImplementedError()` lines. Please add your name and student ID below:

In [None]:
NAME = ""
STUDENT_ID = ""

---

# Intelligent Audio and Music Analysis Exercise 1

The goal of this exercise is to learn the basics in onset detection, beat
tracking and tempo estimation.

After completing this exercise you should have learned some music information
retrieval (MIR) basics and fostered your knowledge about these topics.

All data needed for this exercise is in the `data` directory. 
The folder contains audio files as well as annotations
(simple text files, one annotation per line) for `onsets`, `beats`, and `tempo`.
Not all audio files have all kinds of annotations, thus depending on the task
only a subset of all files can be used for evaluation.

For development of the algorithms, you can use any software packages as long
as you code the steps by yourself (exceptions are indicated).

Note: steps marked as optional are not needed to be implemented to achieve all points,
but can compensate for otherwise missing points throughout this exercise.

Grading will be based on the solution and not on the achieved performance.
Max. 100 points are achievable.

The notebook structure for tasks 1 to 3 is rather strict and split into sub-tasks to
provide some guidance. Tasks 4 to 6 are more flexible, but it is recommended to define
the functions similar to those of tasks 1 to 3.

Recommended software packages:

- madmom (https://github.com/CPJKU/madmom)
- librosa (https://github.com/librosa/librosa)
- mir_eval (https://github.com/craffel/mir_eval)

You are free to add code and textual cells as you need them.
However `CONSTANTS` should not be altered.
You may add visualisations, tables, etc. to enhance your assignment.

### Chocolate challenge

There will be again a chocolate challenge comprising prices for the following sub-challenges:

1. best performing tempo estimation on a hidden test set,
2. best performing beat tracking on a hidden test set,
3. nicest visualisation.

In order to participate in the challenge, please make sure that the `chocolate_challenge()`
function writes the detected tempo and beats of the supplied test file in `data/test` to `.txt`
files.

Good luck!


In [None]:
import os
import pickle

import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

import madmom
import librosa
import mir_eval

Define default parameters:

In [None]:
FPS = 100

Define audio files and function to match them to annotation files.

In [None]:
from madmom.utils import search_files, match_file

AUDIO_FILES = search_files('data/train', '.wav')

def find_audio_files(ann_files, audio_files, ann_suffix=None, audio_suffix='.wav'):
    """
    Find matching audio files.
    
    Parameters
    ----------
    ann_files : list
        List with annotation file names.
    audio_files : list
        List with audio file names to be matched
    ann_suffix : str, optional
        Suffix of the annotation files. If 'None'
        the suffix is inferred from the annotation
        files.
    audio_suffix : str, optional
        Suffix of the audio files.
    
    Returns
    -------
    matched_files : list
        List of matched audio file (names).
    matched_indices : list
        List of matching indices in `audio_files`.
        
    """
    matched_files = []
    matched_indices = []
    for i, ann_file in enumerate(ann_files):
        if ann_suffix is None:
            ann_suffix = os.path.splitext(ann_file)[1]
        matches = match_file(ann_file, audio_files,
                             ann_suffix, audio_suffix)
        if len(matches) == 1:
            matched_files.append(matches[0])
            matched_indices.append(i)
        else:
            continue
    return matched_files, matched_indices



---
# Audio pre-processing
---

## Task 1: audio pre-processing (10 points)


Step 1: read in the audio signal (all audio files: `.wav` format, 44.1kHz, 16bit, mono)

Step 2: split signal into overlapping frames of length 2048 samples and a frame rate of 100 fps

Step 3: for each frame compute the STFT

Step 4: discard phase information and keep only the magnitudes
  
Step 5: filter the magnitudes with a Mel filterbank (40 bands)

Step 6: apply logarithmic scaling (adding a constant for numerical stability)

You are allowed to use the functionality of any audio framework to load the audio files and compute the discrete Fourier transform.
However, all remaining steps should be coded by yourself and recognisable as such.


In [None]:
def pre_process(filename, frame_size=2048, frame_rate=FPS, num_bands=40, **kwargs):
    """
    Pre-process the audio signal.

    Parameters
    ----------
    filename : str
        File to be processed.
    frame_size : int
        Size of the frames.
    frame_rate : float
        Frame rate used for the STFT.
    num_bands : int
        Number of frequency bands for the Mel filterbank.
    kwargs : dict, optional
        Additional keyword arguments.

    Returns
    -------
    spectrogram : numpy array
        Spectrogram.

    """
    # YOUR CODE HERE
    raise NotImplementedError()
    return spectrogram

Pre-compute the spectrograms for all audio files with onset annotations.

In [None]:
# list for collecting pre-processed spectrograms
# Note: it is not necessary to use this list but recommended in order to
#       avoid recomputation of the same features over and over again.
#       *_AUDIO_IDX canbe used to acces the precomputed spectrograms by
#       index.
SPECTROGRAMS = []

for audio_file in AUDIO_FILES:
    spec = pre_process(audio_file)
    SPECTROGRAMS.append(spec)

---
# Onset detection
---

In [None]:
# you are not required to use these predefined constants, but it is recommended
ONSET_ANNOTATION_FILES = search_files('data/train', '.onsets')
ONSET_AUDIO_FILES, ONSET_AUDIO_IDX = find_audio_files(ONSET_ANNOTATION_FILES, AUDIO_FILES)
ONSET_AUDIO = [SPECTROGRAMS[i] for i in ONSET_AUDIO_IDX]
ONSET_ANNOTATIONS = [madmom.io.load_onsets(f) for f in ONSET_ANNOTATION_FILES]

assert len(ONSET_ANNOTATION_FILES) == 321
assert len(ONSET_AUDIO_FILES) == 321
assert len(ONSET_AUDIO) == 321
assert len(ONSET_ANNOTATIONS) == 321

## Task 2: signal processing-based onset detection (20 + 5 points)

For onset detection, the spectral flux should be used.

### Task 2a: define onset detection function (5 points)

Step 1: compute the temporal difference  

Step 2: keep only the positive differences

Step 3: sum or average these differences, to obtain the onset detection function (ODF)

In [None]:
def onset_detection_function(spectrogram):
    """
    Compute an onset detection function.

    Parameters
    ----------
    spectrogram : numpy array
        Spectrogram

    Returns
    -------
    odf : numpy array
        Onset detection function.

    """
    # YOUR CODE HERE
    raise NotImplementedError()
    return odf

### Task 2b: detect onsets from onset detection function (6 points)

To detect the onsets in the ODF, the following procedure should be applied:

Step 1: (optional) subtract a moving average from the ODF

Step 2: discard all ODF values below a certain threshold 

Step 3: select local maxima as onset positions

Step 4: (optional) discard onsets too close together (recommended value: within 30ms)


In [None]:
def detect_onsets(odf, threshold, frame_rate=FPS, **kwargs):
    """
    Detect the onsets in the onset detection function (ODF).

    Parameters
    ----------
    odf : numpy array
        Onset detection function.
    threshold : float
        Threshold for peak picking
    frame_rate : float
        Frame rate of the onset detection function.
    kwargs : dict, optional
        Additional keyword arguments.

    Returns
    -------
    onsets : numpy array
        Detected onsets (in seconds).

    """
    # YOUR CODE HERE
    raise NotImplementedError()
    return onsets

### Task 2c: predict onsets on dataset (4 points)

Run the complete onset detection pipeline on all audio files of the dataset.

Step 1: Pre-process the audio.

Step 2: Compute the ODF.

Step 3: Detect the onsets. Set the threshold such that F-measure gets maximises on the dataset (see also task 2d).

In [None]:
# list for collecting the onset detections
onset_detections = []

# YOUR CODE HERE
raise NotImplementedError()

### Task 2d: evaluate detected onsets against the ground truth (5 points)

Evaluate onset detection performance with `precision`, `recall`, and `fmeasure`.
Either use the `madmom.evaluate.onsets` module or the `mir_eval` package.
Compute the average over all files with corresponding onset annotations.
As an evaluation window, ±25ms should be used.

In [None]:
def evaluate_onsets(onsets, annotations):
    """
    Evaluate detected onsets against ground truth annotations.
    
    Parameters
    ----------
    onsets : list
        List with onset detections for all files.
    annotations : list
        List with corresponding ground truth annotations.

    Returns
    -------
    precision : float
        Averaged precision.
    recall : float
        Averaged recall.
    fmeasure : float
        Averaged f-measure.
    
    """
    # YOUR CODE HERE
    raise NotImplementedError()
    return precision, recall, fmeasure

# YOUR CODE HERE
raise NotImplementedError()
    
# evaluate against ground truth
p, r, f = evaluate_onsets(onset_detections, ONSET_ANNOTATIONS)

print('Signal processing-based onset detection\nPrecision: %.3f\nRecall:    %.3f\nF-measure: %.3f' % (p, r, f))

### Task 2e: (optional) optimise parameters (5 points)

Optimise the parameters of task 1 and 2 to get the best performance on the dataset.

Parameters to be optimised: frame size (e.g. 1024, 2048, 4096), number of filter bands
(e.g. 20, 40, 80), different logarithmic scaling parameters (e.g. natural logarithm or
base 10; adding a constant) and the detection threshold.
Replace the default arguments/values in the functions with the optimised parameters.

The values in parentheses are suggested variations, experiment as you like.
Please be aware that parameters may very likely have mutual influences.
A coarse optimisation is enough. The main goal of this step is to understand 
the impact of these variations rather than getting another 0.01% performance.
      

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Put your observations/findings about task 2e in textual form below:

YOUR ANSWER HERE

---
## Task 3: machine learning-based onset detection (20 points)

A simple machine learning approach should be investigated. The question to 
be answered is: can a simple neural network improve the onset detection
performance compared to the standard spectral flux approach above?

In order to answer this question, the hand-crafted ODF computation should be
replaced by a multiplayer perceptron (MLP).

### Task 3a: define a trainig function (10 points)

Step 1: Use `sklearn` to create an `MLPRegressor` with given parameters.

Step 2: Use the same features as in the audio pre-processing section (task 1)
        as inputs (or if task 2e was done: use the optimised parameters).

Step 3: As targets, use the annotated onset positions of the dataset and
        assign each target frame a value of 1.

Step 4: Concatenate all audio frames and target frames to be used for training.

Step 5: Fit the model with the given data.

Step 6: Save the model to the given file name. Use Python's `pickle` module.

In [None]:
def train(audio, annotations, diffs=False, early_stopping=False,
          verbose=True, model='model.pkl', **kwargs):
    """
    Train an MLP on the data.

    Parameters
    ----------
    audio : list
        List of audio files or precomputed spectrograms.
    annotations : list of numpy arrays
        List with corresponding onset annotations.
    diffs : bool, optional
        Include diffs as input features (step 7).
    early_stopping : bool, optional
        Use early stopping to prevent overfitting (step 8).
    verbose : bool, optional
        Be verbose during training.
    model : str, optional
        Save the fitted model to given file name.
    kwargs : dict, optional
        Additional keyword arguments.
        
    Returns
    -------
    mlp : MLPRegressor
        Trained MLP.

    """
    from sklearn.neural_network import MLPRegressor
    # define MLP
    mlp = MLPRegressor(hidden_layer_sizes=(50, 50), tol=1e-4, max_iter=100,
                       early_stopping=early_stopping, verbose=verbose)
    if verbose:
        print(mlp)
        
    # prepare input features and targets
    x = []
    y = []
    
    # YOUR CODE HERE
    raise NotImplementedError()
    
    # reshape x and y
    # Note: depending on your data pre-processing these lines might
    #       need to be adjusted accordingly
    x = np.vstack(x)
    y = np.hstack(y)
    
    # train model
    if verbose:
        print('training model:', model)
    mlp.fit(x.squeeze(), y.squeeze())
    
    # save model and return it
    with open(model, 'wb') as f:
        pickle.dump(mlp, f)
    return mlp
    

### Task 3b: train the model (2 points)

Train the model on the dataset and save as `model.pkl`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### Task 3c: evaluate performance on the dataset (3 points)

Step 1: Predict onset activations for the dataset.

Step 2: Adjust the threshold parameter to yield the best F-measure on the dataset
        (use the `detect_onsets()` function defined in task 2b).

Step 3: Evaluate performance on the dataset.

In [None]:
mlp_onset_detections = []

# YOUR CODE HERE
raise NotImplementedError()

# evaluate against ground truth
p, r, f = evaluate_onsets(mlp_onset_detections, ONSET_ANNOTATIONS)

print('MLP onset detection\nPrecision: %.3f\nRecall:    %.3f\nF-measure: %.3f' % (p, r, f))


### Task 3d: describe your findings (5 points)

Describe your findings/observations in textual form below:

YOUR ANSWER HERE

### Task 3e: add temporal differences as additional features (2 points)

Train a new model with first order temporal differences (as in spectral flux) as
aditional features (stacked to the magnitudes) and save as `model_diff.pkl`.

Note: modify the `train()` function of task 3a to be able to be called with `diffs=True`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### Task 3f: evaluate model (3 points)

Compare the performance of this model with the one of task task 2 and task 3b.
Again, use a suitable threshold which to maximises the performance on the dataset.


In [None]:
mlp_diff_detections = []

# YOUR CODE HERE
raise NotImplementedError()

# evaluate against ground truth
p, r, f = evaluate_onsets(mlp_diff_detections, ONSET_ANNOTATIONS)

print('MLP onset detection with temporal diffs\nPrecision: %.3f\nRecall:    %.3f\nF-measure: %.3f' % (p, r, f))

### Task 3g: describe your findings (5 points)

Describe your findings/observations in textual form below:

YOUR ANSWER HERE

---
# Tempo estimation
---

In [None]:
# you are not required to use these predefined constants, but it is recommended
TEMPO_ANNOTATION_FILES = search_files('data/train', '.bpm')
TEMPO_AUDIO_FILES, TEMPO_AUDIO_IDX = find_audio_files(TEMPO_ANNOTATION_FILES, AUDIO_FILES)
TEMPO_AUDIO = [SPECTROGRAMS[i] for i in TEMPO_AUDIO_IDX]
TEMPO_ANNOTATIONS = [madmom.io.load_tempo(f)[0, 0] for f in TEMPO_ANNOTATION_FILES]

assert len(TEMPO_ANNOTATION_FILES) == 107
assert len(TEMPO_AUDIO_FILES) == 107
assert len(TEMPO_AUDIO) == 107
assert len(TEMPO_ANNOTATIONS) == 107

## Task 4: detect tempo of ODF (25 + 5 points)

To detect the tempo/periodicity the ODF, the following procedure should be
applied:

Step 1: Compute the auto-correlation function (ACF) of the ODF.

Step 2: Select an appropriate peak of the ACF as the main periodicity.

Step 3: Compute the tempo (in bpm, beats per minute).

Step 4: Evaluate the mean tempo estimation performance (e.g. with `madmom.evaluation.tempo` module)
        on the dataset. Use `Accuracy 1` (with 4% tolerance) and `Accuracy 2` (allowing 4% tolerance,
        including double and half tempo variants) as metrics.

Step 5: (optional) optimise the parameters to get the best performance on the dataset
        parameters to be optimised: lag range for ACF computation (lower bound: 40-80bpm,
        upper bound 140-220bpm), peak selection mechanism (e.g. clustering of peaks).
        Replace the default arguments/values in the function definition with the optimised parameters.


In [None]:
def detect_tempo(odf, min_bpm=60, max_bpm=180, frame_rate=FPS, **kwargs):
    """
    Detect the tempo of the onset detection function (ODF).

    Parameters
    ----------
    odf : numpy array
        Onset detection function.
    min_bpm : float
        Minimum tempo, given in beats per minute (BPM).
    max_bpm : float
        Maximum tempo, given in beats per minute (BPM).
    frame_rate : float
        Frame rate of the onset detection function.
    kwargs : dict, optional
        Additional keyword arguments.

    Returns
    -------
    tempo : float
        Detected tempo (in BPM).

    """
    # YOUR CODE HERE
    raise NotImplementedError()
    return tempo


def evaluate_tempo(tempi, annotations):
    """
    Evaluate detected tempi against ground truth annotations.
    
    Parameters
    ----------
    tempi : list
        List with tempo detections for all files.
    annotations : list
        List with corresponding ground truth annotations.

    Returns
    -------
    accuracy_1 : float
        Averaged accuracy 1.
    accuracy_2 : float
        Averaged accuracy 2.
    
    """
    # YOUR CODE HERE
    raise NotImplementedError()
    return accuracy_1, accuracy_2


# YOUR CODE HERE
raise NotImplementedError()

Summarise your observations/findings in textual form below:

YOUR ANSWER HERE

---
# Beat tracking
---

In [None]:
# you are not required to use these predefined constants, but it is recommended
BEAT_ANNOTATION_FILES = search_files('data/train', '.beats')
BEAT_AUDIO_FILES, BEAT_AUDIO_IDX = find_audio_files(BEAT_ANNOTATION_FILES, AUDIO_FILES)
BEAT_AUDIO = [SPECTROGRAMS[i] for i in BEAT_AUDIO_IDX]
BEAT_ANNOTATIONS = [madmom.io.load_beats(f) for f in BEAT_ANNOTATION_FILES]

assert len(BEAT_ANNOTATION_FILES) == 177
assert len(BEAT_AUDIO_FILES) == 177
assert len(BEAT_AUDIO) == 177
assert len(BEAT_ANNOTATIONS) == 177

## Task 5: track the beats based on ODF and periodicity (25 + 5 points)

To detect the beats in the ODF, the following procedure should be applied:

Step 1: Determine the best possible offset for beat tracking given the tempo
        or periodicity determined in task 4 and select the first beat.

Step 2: Determine consecutive beats based on the tempo; allow ±10% tempo 
        deviation between consecutive beats.

Step 3: Continue until all beats are tracked.

Step 4: Evaluate beat tracking performance (e.g. with `madmom.evaluation.beats` module)
        on the dataset. Use `CMLt` and `AMLt` as evaluation metrics, 

Step 5: (optional) optimise the parameters to get the best performance on the dataset.
        Parameters to be optimised: allowed deviation of the tempo, length of the audio
        used to determine the tempo.
        Replace the default arguments/values in the functions with the optimised parameters.

In [None]:
def detect_beats(odf, min_bpm=60, max_bpm=180, frame_rate=FPS, **kwargs):
    """
    Detect the beats in an onset detection function (ODF).

    Parameters
    ----------
    odf : numpy array
        Onset detection function.
    min_bpm : float
        Minimum tempo, given in beats per minute (BPM).
    max_bpm : float
        Maximum tempo, given in beats per minute (BPM).
    frame_rate : float
        Frame rate of the onset detection function.
    kwargs : dict, optional
        Additional keyword arguments.

    Returns
    -------
    beats : numpy array
        Detected beats (in seconds).

    """
    # determine tempo from within this function in order to be used
    # with a single input (the ODF)
    tempo = detect_tempo(odf, min_bpm, max_bpm, frame_rate)
    # YOUR CODE HERE
    raise NotImplementedError()
    return beats


def evaluate_beats(beats, annotations):
    """
    Evaluate detected beats against ground truth annotations.
    
    Parameters
    ----------
    beats : list
        List with beats detections for all files.
    annotations : list
        List with corresponding ground truth annotations.

    Returns
    -------
    cmlt : float
        Averaged CMLt.
    amlt : float
        Averaged AMLt.
    
    """
    # YOUR CODE HERE
    raise NotImplementedError()
    return cmlt, amlt

# YOUR CODE HERE
raise NotImplementedError()

Summarise your observations/findings in textual form below:

YOUR ANSWER HERE

---
## Task 6: (optional) visualise the results (10 points)

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# Chocolate challenge (no points, only chocolate and glory)

Put all needed functions defined above in place to be able to detect the tempo and beats in the given audio files.

To qualify for the chocolate challenge, please check that running the function below produces
two (hopefully empty) detection files.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
print("Well done!")