# Defining basis functions

In this notebook we will define a basis function that we will later use to train a performance model. A basis function takes a score as input, and returns an array where each row corresponds to a note in the score, and each column corresponds to some descriptor defined by the basis function. There may be just a single descriptor per basis function, or several.

We will define a very simple basis function that has a single descriptor, namely the MIDI pitch of the notes. 

To test our basis function we need a score, so let's start by calling `init_dataset()` from the `data` module. This ensures we have a local copy of the vienna4x22 corpus.

In [None]:
%matplotlib notebook
from helper import init_dataset, data

init_dataset() # download the corpus if necessary; set some variables

After the data has been downloaded `init()` sets a couple of global variables that make it easier to access the data. For now we want just a  MusicXML file from the corpus so we can define a basis function and test it on the file. `data.SCORE_PERFORMANCE_PAIRS` holds a list of MusicXML/Match filename pairs, so let's grab the first pair:

In [None]:
xml_fn, match_fn = data.SCORE_PERFORMANCE_PAIRS[0]
print(xml_fn)

It's Chopin's Etude Opus 10 number 3. We also got a match file for some performance of the piece (stored in `match_fn`) but we don't need it for the basis function.

Let's load the score into Python using the [partitura](https://github.com/OFAI/partitura) package:

In [None]:
import partitura

part = partitura.load_musicxml(xml_fn)
print(part)

The `Part` object contains the musical elements that are defined in the MusicXML file, such as notes, measures, performance directions, and slurs. You can read more in the [online documentation](https://partitura.readthedocs.io/en/latest/index.html). For now let's start with some basics.

Through the attribute [notes](https://partitura.readthedocs.io/en/latest/modules/partitura.score.html#partitura.score.Part.notes) you get a list of all the notes in the piece. Let's count the number of notes:

In [None]:
len(part.notes)

There are 498 notes in this part. However, some notes in the score are [tied](https://en.wikipedia.org/wiki/Tie_(music)). That means that they are encoded as separate notes in the score, but they should sound as a single note. In the context of expression modeling, we want to treat tied notes as a single note. The attribute [notes_tied](https://partitura.readthedocs.io/en/latest/modules/partitura.score.html#partitura.score.Part.notes_tied) does just that:

In [None]:
len(part.notes_tied)

Now there are 486 notes, that means that some notes in the score were indeed tied.

We are now ready to define our MIDI pitch basis function. All we need to do is to define a function, say `midi_pitch_basis`, that takes the part object, and returns the following: a `N` x 1 numpy array where `N` equals `len(part.notes_tied)`, and column names of the array:

In [None]:
%%writefile mybasis.py

import numpy as np

def midi_pitch_basis(part):
    # the list of descriptors
    names = ['pitch']
    # the N midi pitches
    basis = np.array([n.midi_pitch for n in part.notes_tied])
    # we need an N x 1 array, so we reshape
    basis = basis.reshape((-1, 1))
    # finally we normalize the so the values are between 0 and 1
    basis = basis/127.
    return basis, names


The `%%writefile -a mybasis.py` magic line saves your newly defined function to the file `mybasis.py`. That way you can reuse the basis function in the next notebook where you train a predictive model.

To test the function we have to run the contents of `mybasis.py` (executing the last cell only wrote the contents to the file) first, using `%run mybasis.py`:

In [None]:
%run mybasis.py

basis, names = midi_pitch_basis(part)

We use the function `plot_basis` for visual inspection of the values of basis function over the course of the piece:

In [None]:
from helper import plot_basis

plot_basis(basis, names)

Let's define another basis function that encodes the number of notes starting simultaneously:

In [None]:
%%writefile -a mybasis.py

import numpy as np
import partitura

def n_sim_notes_basis(part):
    # the list of descriptors
    names = ['simultaneous_notes']
    # notes
    notes = part.notes_tied
    
    basis = np.array([len(n.start.starting_objects[partitura.score.Note])
                      for n in part.notes_tied])
    # we need an N x 1 array, so we reshape
    basis = basis.reshape((-1, 1))

    return basis, names


In [None]:
%run mybasis.py

basis, names = n_sim_notes_basis(part)

plot_basis(basis, names)