## Common Features in MIR

For MIR tasks we need to extract specific information out of scores or performances. 
Two of the most common representations are **note arrays** and **piano rolls**. Note that while in the literature, there is some overlap in the way that these terms are used.

Partitura provides convenience methods to extract these common features in a few lines!

In [None]:
import partitura
import numpy as np
import matplotlib.pyplot as plt

### Note Arrays

A **note array** is a 2D array in which each row represents a note in the score/performance and each column represents different attributes of the note.

In partitura, note arrays are [structured numpy arrays](https://numpy.org/devdocs/user/basics.rec.html), which are ndarrays in which each "column" has a name, and can be of different datatypes. 
This allows us to hold information that can be represented as integers (MIDI pitch/velocity), floating point numbers (e.g., onset time) or strings (e.g., note ids). 

In this tutorial we are going to cover 3 main cases

* Getting a note array from `Part` and `PerformedPart` objects
* Extra information and alternative ways to generate a note array
* Creating a custom note array from scratch from a `Part` object


#### 1. Getting a note array from `Part` and `PerformedPart` objects

##### Getting a note array from `Part` objects

In [None]:
# Note array from a score

# Path to the MusicXML file
score_fn = './data/musicxml/Chopin_op10_no3.musicxml'

# Load the score into a `Part` object
score_part = partitura.load_musicxml(score_fn)

# Get note array.
score_note_array = score_part.note_array

It is that easy!

By default, Partitura includes some of the most common note-level information in the note array:

In [None]:
print(score_note_array.dtype.names)

* `onset_beat` is the onset time in beats (as indicated by the time signature).
* `duration_beat` is the duration of the note in beats
* `onset_quarter` is the onset time of the note in quarters (independent of the time signature)
* `duration_quarter`is the duration of the note in quarters
* `onset_div` is the onset of the note in *divs*, which is generally a number that allows to represent the note position and duration losslessly with integers. 
* `duration_div` is the duration of the note in divs.
* `pitch` is the MIDI pitch (MIDI note number) of the note
* `voice` is the voice of the note (in polyphonic music, where there can be multiple notes at the same time)
* `id` is the note id (as appears in MusicXML or MEI formats)

In [None]:
# Lets see the first notes in this note array
print(score_note_array[:5])

#### Getting a note array from a  `PerformedPart`

In a similar way, we can obtain a note array from a MIDI file in a few lines

In [None]:
# Note array from a performance

# Path to the MIDI file
performance_fn = './data/midi/Chopin_op10_no3_p01.mid'

# Loading the file to a PerformedPart
performance_part = partitura.load_performance_midi(performance_fn)

# Get note array!
performance_note_array = performance_part.note_array

Since performances contain have other information not included in scores, the default fields in the note array are a little bit different:

In [None]:
performance_note_array.dtype.names

* `onset_sec` is the onset time of the note in seconds
* `duration_sec` is the duration of the note in seconds
* `pitch` is the MIDI pitch
* `velocity` is the MIDI velocity
* `track` is the track number in the MIDI file
* `channel` is the channel in the MIDI file
* `id` is the ID of the notes (automatically generated for MIDI file according to onset time)

In [None]:
print(performance_note_array[:5])

We can also create a `PerformedPart` directly from a note array

In [None]:
note_array = np.array(
    [(60, 0, 2, 40),
     (65, 0, 1, 15),
     (67, 0, 1, 72),
     (69, 1, 1, 90),
     (66, 2, 1, 80)],
    dtype=[("pitch", "i4"),
           ("onset_sec", "f4"),
           ("duration_sec", "f4"),
           ("velocity", "i4"),
          ]
)

# Note array to `PerformedPart`
performed_part = partitura.performance.PerformedPart.from_note_array(note_array)

We can then export the `PerformedPart` to a MIDI file!

In [None]:
# export as MIDI file
partitura.save_performance_midi(performed_part, "example.mid")

### Piano rolls


#### Extracting a piano roll

In [None]:
# TODO: change the example
# Path to the MusicXML file
score_fn = './data/musicxml/Chopin_op10_no3.musicxml'

# Load the score into a `Part` object
score_part = partitura.load_musicxml(score_fn)
pianoroll = partitura.utils.compute_pianoroll(score_part)

The `compute_pianoroll` method has a few arguments to customize the resulting piano roll

In [None]:
piano_range = True
time_unit = 'beat'
time_div = 10
pianoroll = partitura.utils.compute_pianoroll(
    note_info=score_part, # a `Part`, `PerformedPart` or a note array
    time_unit=time_unit, # beats, quarters
    time_div=time_div, # Number of cells per time unit
    piano_range=piano_range # Use range of the piano (88 keys)
)

An important thing to remember is that in piano rolls generated by `compute_pianoroll`, rows (the vertical axis) represent the pitch dimension and the columns (horizontal) the time dimension. 
This results in a more intuitive way of plotting the piano roll. 
For other applications the transposed version of this piano roll might be more useful (i.e., rows representing time steps and columns representing pitch information).

Since piano rolls can result in very large matrices where most of the elements are 0, the output of `compute_pianoroll` is a [scipy sparse matrix](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html). To convert it to a regular numpy array, we can simply do

In [None]:
pianoroll = pianoroll.toarray()

Let's plot the piano roll!

In [None]:
plt.imshow(pianoroll, origin="lower", cmap='gray', interpolation='nearest', aspect='auto')
plt.xlabel(f'Time ({time_unit}s/{time_div})')
plt.ylabel('Piano key' if piano_range else 'MIDI pitch')
plt.show()

In some cases, we want to know the "coordinates" of each of the notes in the piano roll. The `compute_pianoroll` method includes an option to return 

In [None]:
pianoroll, note_indices = partitura.utils.compute_pianoroll(score_part, return_idxs=True)

# MIDI pitch, start, end
print(note_indices[:5])


#### Generating a Note Array from a piano roll

Partitura also includes a method to generate a note array from a piano roll, which can be used to generate a MIDI file. 
This method would be useful, e.g., for music generation tasks

In [None]:
pianoroll = partitura.utils.compute_pianoroll(score_part, time_unit='div', time_div=1)

new_note_array = partitura.utils.pianoroll_to_notearray(pianoroll, time_unit='div', time_div=1)

# Generate MIDI
ppart = partitura.performance.PerformedPart.from_note_array(new_note_array)

partitura.save_performance_midi(ppart, "newmidi.mid")

## Handling Alignment Information (Matchfiles)



In [None]:
match_fn = './data/match/Chopin_op10_no3_p01.match'

ppart, alignment = partitura.load_match(match_fn)

In [None]:
print(alignment[:5])